linux/include/net
Eric Dumazet 5640f76858 net: use a per task frag allocator
We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-24 16:31:37 -04:00
..
9p
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-09-15 11:43:53 -04:00
caif caif-hsi: Remove use of module parameters 2012-06-25 16:44:12 -07:00
irda
iucv af_iucv: add shutdown for HS transport 2012-03-07 22:52:24 -08:00
netfilter netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
netns ipv6: add a new namespace for nf_conntrack_reasm 2012-09-19 17:23:28 -04:00
nfc netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp sctp: fix a compile error in sctp.h 2012-08-15 03:43:43 -07:00
tc_act
act_api.h
addrconf.h netfilter: ip6tables: add MASQUERADE target 2012-08-30 03:00:18 +02:00
af_ieee802154.h
af_rxrpc.h
af_unix.h af_unix: speedup /proc/net/unix 2012-06-08 14:27:23 -07:00
ah.h
arp.h net: Dont use ifindices in hash fns 2012-08-09 16:18:06 -07:00
atmclip.h
ax25.h userns: Convert net/ax25 to use kuid_t where appropriate 2012-08-14 21:49:42 -07:00
ax88796.h
cfg80211-wext.h
cfg80211.h Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-09-14 13:53:49 -04:00
checksum.h net: core: add function for incremental IPv6 pseudo header checksum updates 2012-08-30 03:00:16 +02:00
cipso_ipv4.h cipso: handle CIPSO options correctly when NetLabel is disabled 2012-06-01 14:18:29 -04:00
cls_cgroup.h
codel.h codel: refine one condition to avoid a nul rec_inv_sqrt 2012-08-10 16:52:54 -07:00
compat.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
datalink.h
dcbevent.h
dcbnl.h net/dcb: Add an optional max rate attribute 2012-04-05 05:08:04 -04:00
dn.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dn_dev.h
dn_fib.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: Use neighbours privately in dn_route struct. 2012-07-05 01:12:14 -07:00
dsa.h
dsfield.h
dst.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
dst_ops.h net: Fix warnings in dst_ops.h 2012-07-19 10:43:03 -07:00
esp.h
ethoc.h
fib_rules.h ipv4: Elide fib_validate_source() completely when possible. 2012-06-29 01:36:36 -07:00
flow.h ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. 2012-07-20 13:36:54 -07:00
flow_keys.h
garp.h
gen_stats.h
genetlink.h netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
gre.h
icmp.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ieee80211_radiotap.h wireless: add radiotap A-MPDU status field 2012-08-20 13:53:09 +02:00
ieee802154.h
ieee802154_netdev.h mac802154: declare reduced mlme operations 2012-05-16 15:16:56 -04:00
if_inet6.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
inet6_connection_sock.h ipv6: Add helper inet6_csk_update_pmtu(). 2012-07-16 03:44:56 -07:00
inet6_hashtables.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
inet_common.h net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) 2012-07-19 11:02:03 -07:00
inet_connection_sock.h net: ipv6: fix TCP early demux 2012-08-06 13:33:21 -07:00
inet_ecn.h
inet_frag.h ipv6: unify fragment thresh handling code 2012-09-19 17:23:28 -04:00
inet_hashtables.h ipv4: Early TCP socket demux. 2012-06-19 21:22:05 -07:00
inet_sock.h net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
inet_timewait_sock.h
inetpeer.h ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip.h ipv4: fix path MTU discovery with connection tracking 2012-08-26 19:13:55 +02:00
ip6_checksum.h
ip6_fib.h ipv6: fix handling of blackhole and prohibit routes 2012-09-05 17:49:28 -04:00
ip6_route.h ipv6: fix inet6_csk_xmit() 2012-07-18 08:59:58 -07:00
ip6_tunnel.h gre: Support GRE over IPv6 2012-08-14 14:28:32 -07:00
ip_fib.h ipv4: Cache routes in nexthop exception entries. 2012-07-31 15:02:02 -07:00
ip_vs.h ipvs: add pmtu_disc option to disable IP DF for TUN packets 2012-08-10 10:35:07 +09:00
ipcomp.h
ipconfig.h
ipip.h tunnel: implement 64 bits statistics 2012-04-14 14:47:05 -04:00
ipv6.h ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline 2012-09-19 17:23:28 -04:00
ipx.h
iw_handler.h
lapb.h lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
lib80211.h
llc.h llc: Remove stray reference to sysctl_llc_station_ack_timeout. 2012-09-17 13:13:24 -04:00
llc_c_ac.h
llc_c_ev.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
mac80211.h mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF 2012-08-20 13:58:23 +02:00
mac802154.h mac802154: add wpan device-class support 2012-06-26 21:06:11 -07:00
mip6.h
mld.h
ndisc.h net: Dont use ifindices in hash fns 2012-08-09 16:18:06 -07:00
neighbour.h net: output path optimizations 2012-08-07 16:24:55 -07:00
net_namespace.h ipv6: add a new namespace for nf_conntrack_reasm 2012-09-19 17:23:28 -04:00
net_ratelimit.h
netdma.h
netevent.h net: Pass neighbours and dest address into NETEVENT_REDIRECT events. 2012-07-05 02:21:55 -07:00
netlabel.h
netlink.h netlink: Rename pid to portid to avoid confusion 2012-09-10 15:30:41 -04:00
netprio_cgroup.h net: netprio_cgroup: rework update socket logic 2012-07-22 12:44:01 -07:00
netrom.h
nexthop.h
nl802154.h
p8022.h
ping.h
pkt_cls.h
pkt_sched.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
protocol.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
psnap.h
raw.h
rawv6.h ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
red.h net_sched: red: Make minor corrections to comments 2012-04-16 23:53:11 -04:00
regulatory.h cfg80211: add cellular base station regulatory hint support 2012-07-17 12:16:39 +02:00
request_sock.h tcp: TCP Fast Open Server - support TFO listeners 2012-08-31 20:02:19 -04:00
rose.h
route.h ipv4/route: arg delay is useless in rt_cache_flush() 2012-09-07 14:44:08 -04:00
rtnetlink.h rtnl: allow to specify different num for rx and tx queue count 2012-07-20 11:06:59 -07:00
sch_generic.h net sched: Pass the skb into change so it can access NETLINK_CB 2012-08-14 21:55:28 -07:00
scm.h net: Remove unnecessary NULL check in scm_destroy(). 2012-09-24 15:52:33 -04:00
secure_seq.h
slhc_vj.h
snmp.h net: avoid reloads in SNMP_UPD_PO_STATS 2012-08-06 13:40:47 -07:00
sock.h net: use a per task frag allocator 2012-09-24 16:31:37 -04:00
stp.h
tcp.h tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS 2012-09-22 15:47:10 -04:00
tcp_memcontrol.h cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg 2012-04-10 10:04:07 -07:00
tcp_states.h
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h
udp.h net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 2012-04-28 22:21:51 -04:00
udplite.h net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
wext.h
wimax.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wpan-phy.h mac802154: monitor device support 2012-05-16 15:17:08 -04:00
x25.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
x25device.h
xfrm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-09-15 11:43:53 -04:00