linux/include/net
Jesper Dangaard Brouer 93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

-  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 25.33% init_conntrack
      + 24.86% nf_ct_delete_from_lists
      + 24.62% __nf_conntrack_confirm
      + 24.38% destroy_conntrack
      + 0.70% tcp_packet
+   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
+   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement.  SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

 Base kernel:   810.405 new conntrack/sec
 After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:13 +01:00
..
9p
bluetooth Bluetooth: Add quirk for disabling Delete Stored Link Key command 2014-01-04 20:10:40 +02:00
caif
irda include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
iucv
netfilter netfilter: conntrack: remove central spinlock nf_conntrack_lock 2014-03-07 11:41:13 +01:00
netns netfilter: conntrack: remove central spinlock nf_conntrack_lock 2014-03-07 11:41:13 +01:00
nfc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
phonet
sctp net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer 2014-02-17 00:16:56 -05:00
tc_act net_sched: act: hide struct tcf_common from API 2014-02-12 19:23:32 -05:00
Space.h drivers: net: Include new header file in sbni.c 2013-12-19 18:51:20 -05:00
act_api.h net_sched: act: refuse to remove bound action outside 2014-02-12 19:23:32 -05:00
addrconf.h ipv6: enable anycast addresses as source addresses for datagrams 2014-01-22 21:57:05 -08:00
af_ieee802154.h
af_rxrpc.h
af_unix.h
af_vsock.h
ah.h
arp.h arp: make arp_invalidate static 2013-12-28 17:02:46 -05:00
atmclip.h
ax25.h
ax88796.h
busy_poll.h sched, net: Fixup busy_loop_us_clock() 2014-01-13 17:39:11 +01:00
cfg80211-wext.h
cfg80211.h cfg80211: Add a function to get the number of supported channels 2014-01-09 14:24:24 +01:00
checksum.h
cipso_ipv4.h cipso: cleanup cipso_v4_translate() when !CONFIG_NETLABEL 2013-12-10 17:56:54 -05:00
cls_cgroup.h net: net_cls: move cgroupfs classid handling into core 2014-01-03 23:41:41 +01:00
codel.h net: introduce reciprocal_scale helper and convert users 2014-01-21 23:17:20 -08:00
compat.h
datalink.h net: Move prototype declaration to header file include/net/datalink.h from net/ipx/af_ipx.c 2014-02-09 17:32:50 -08:00
dcbevent.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
dcbnl.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
dn.h net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dn_dev.h dn_dev: add support for IFA_FLAGS nl attribute 2013-12-10 21:50:00 -05:00
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h net: Move prototype declaration to appropriate header file from decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dsa.h
dsfield.h
dst.h net: Add utility functions to clear rxhash 2013-12-17 16:36:21 -05:00
dst_ops.h
esp.h
ethoc.h net: ethoc: set up MII management bus clock 2014-02-04 20:19:51 -08:00
fib_rules.h
firewire.h
flow.h net: Remove FLOWI_FLAG_CAN_SLEEP 2013-12-06 07:24:39 +01:00
flow_keys.h
garp.h
gen_stats.h
genetlink.h genl: Add genlmsg_new_unicast() for unicast message allocation 2014-01-06 15:51:53 -08:00
gre.h gre_offload: statically build GRE offloading support 2014-01-06 20:28:34 -05:00
gro_cells.h
icmp.h
ieee80211_radiotap.h
ieee802154.h
ieee802154_netdev.h
if_inet6.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-18 00:55:41 -08:00
inet6_connection_sock.h
inet6_hashtables.h
inet_common.h
inet_connection_sock.h
inet_ecn.h
inet_frag.h
inet_hashtables.h
inet_sock.h
inet_timewait_sock.h ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT 2014-01-17 17:56:33 -08:00
inetpeer.h ipv4: remove unused function 2013-12-28 17:03:20 -05:00
ip.h ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams 2014-01-19 19:53:18 -08:00
ip6_checksum.h
ip6_fib.h ipv6: remove prune parameter for fib6_clean_all 2014-01-02 03:30:35 -05:00
ip6_route.h IPv6: add the option to use anycast addresses as source addresses in echo reply 2014-01-07 15:51:39 -05:00
ip6_tunnel.h net: unify the pcpu_tstats and br_cpu_netstats as one 2014-01-04 20:10:24 -05:00
ip_fib.h
ip_tunnels.h ipv4: fix a dst leak in tunnels 2014-01-17 18:36:39 -08:00
ip_vs.h
ipcomp.h
ipconfig.h
ipv6.h ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts 2014-01-21 16:59:19 -08:00
ipx.h net: Move prototype declaration to header file include/net/ipx.h from net/ipx/af_ipx.c 2014-02-09 17:32:50 -08:00
iw_handler.h
lapb.h
lib80211.h
llc.h llc: make lock static 2014-01-03 20:56:48 -05:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1 2014-01-01 22:22:43 -05:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
mac80211.h mac80211: handle MMPDUs at EOSP correctly 2014-01-10 09:50:02 +01:00
mac802154.h ieee802154: add netlink APIs for smartMAC configuration 2014-02-17 16:42:39 -05:00
mip6.h include/net/: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
mld.h
mrp.h
ndisc.h
neighbour.h neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
net_namespace.h net: Move prototype declaration to header file include/net/net_namespace.h from net/ipx/af_ipx.c 2014-02-09 17:32:50 -08:00
net_ratelimit.h
netdma.h
netevent.h
netlabel.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
netlink.h
netprio_cgroup.h net: netprio: rename config to be more consistent with cgroup configs 2014-01-03 23:41:42 +01:00
netrom.h
nexthop.h
nl802154.h
p8022.h
ping.h ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams 2014-01-19 19:53:18 -08:00
pkt_cls.h net_sched: optimize tcf_match_indev() 2014-01-13 11:50:15 -08:00
pkt_sched.h pkt_sched: give visibility to mq slave qdiscs 2013-12-09 19:54:47 -05:00
protocol.h net: Add GRO support for UDP encapsulating protocols 2014-01-21 18:05:04 -08:00
psnap.h
raw.h
rawv6.h
red.h reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
regulatory.h cfg80211: make regulatory_hint() remove REGULATORY_CUSTOM_REG 2014-01-13 14:46:58 -05:00
request_sock.h
rose.h
route.h ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing 2014-01-13 11:22:54 -08:00
rtnetlink.h rtnl: make ifla_policy static 2014-02-18 18:15:42 -05:00
sch_generic.h net_sched: add struct net pointer to tcf_proto_ops->dump 2014-01-13 11:50:14 -08:00
scm.h
secure_seq.h
slhc_vj.h
snmp.h
sock.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2014-01-05 20:18:50 -05:00
stp.h
tcp.h tcp: remove unused min_cwnd member of tcp_congestion_ops 2014-02-13 18:22:34 -05:00
tcp_memcontrol.h
tcp_states.h
timewait_sock.h
transp_v6.h ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams 2014-01-19 19:53:18 -08:00
udp.h
udplite.h
vsock_addr.h
vxlan.h net: Add GRO support for vxlan traffic 2014-01-21 18:05:04 -08:00
wext.h
wimax.h
wpan-phy.h ieee802154: add netlink APIs for smartMAC configuration 2014-02-17 16:42:39 -05:00
x25.h
x25device.h
xfrm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00