linux/include/uapi/linux
Willem de Bruijn 77f65ebdca packet: packet fanout rollover during socket overload
Changes:
  v3->v2: rebase (no other changes)
          passes selftest
  v2->v1: read f->num_members only once
          fix bug: test rollover mode + flag

Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.

The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.

Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.

To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.

Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).

Also, passes psock_fanout.c unit test added to selftests.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 17:15:04 -04:00
..
byteorder
caif
can can: gw: indicate and count deleted frames due to misconfiguration 2013-01-26 16:59:02 +01:00
dvb [media] dvb: Add DVBv5 statistics properties 2013-01-23 19:06:33 -02:00
hdlc
hsi
isdn
mmc
netfilter netfilter: xt_CT: add alias flag 2013-02-05 01:49:26 +01:00
netfilter_arp
netfilter_bridge
netfilter_ipv4
netfilter_ipv6
nfsd
raid
spi
sunrpc
tc_act
tc_ematch
usb ALSA: usb: Fix Processing Unit Descriptor parsers 2013-02-21 13:55:12 +01:00
wimax
Kbuild Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 2013-02-20 14:05:45 -05:00
a.out.h
acct.h
adb.h
adfs_fs.h
affs_hardblocks.h
agpgart.h
aio_abi.h
apm_bios.h
arcfb.h
atalk.h
atm.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_tcp.h
atm_zatm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
audit.h linux/audit.h: move ptrace.h include to kernel header 2013-01-11 14:54:56 -08:00
auto_fs.h unbreak automounter support on 64-bit kernel with 32-bit userspace (v2) 2013-02-08 20:42:18 +01:00
auto_fs4.h
auxvec.h
ax25.h
b1lli.h
baycom.h
bfs_fs.h
binfmts.h
blkpg.h
blktrace_api.h
bpqether.h
bsg.h
btrfs.h Btrfs: set/change the label of a mounted file system 2013-02-20 12:59:59 -05:00
can.h
capability.h
capi.h
cciss_defs.h
cciss_ioctl.h
cdrom.h libata: identify and init ZPODD devices 2013-01-21 15:40:35 -05:00
cgroupstats.h
chio.h
cm4000_cs.h
cn_proc.h
coda.h
coda_psdev.h
coff.h
connector.h
const.h
cramfs_fs.h
cuda.h
cyclades.h
cycx_cfm.h
dcbnl.h
dccp.h
dlm.h
dlm_device.h
dlm_netlink.h
dlm_plock.h
dlmconstants.h
dm-ioctl.h dm ioctl: allow message to return data 2013-03-01 22:45:49 +00:00
dm-log-userspace.h
dn.h
dqblk_xfs.h
edd.h
efs_fs_sb.h
elf-em.h
elf-fdpic.h
elf.h ImgTec Meta architecture changes for v3.9-rc1 2013-03-03 12:06:09 -08:00
elfcore.h
errno.h
errqueue.h
ethtool.h
eventpoll.h
fadvise.h
falloc.h
fanotify.h
fb.h
fcntl.h
fd.h
fdreg.h
fib_rules.h
fiemap.h
filter.h
firewire-cdev.h
firewire-constants.h
flat.h
fs.h block: optionally snapshot page contents to provide stable pages during write 2013-02-21 17:22:20 -08:00
fsl_hypervisor.h
fuse.h fuse: allow control of adaptive readdirplus use 2013-02-07 14:25:44 +01:00
futex.h
gameport.h
gen_stats.h
genetlink.h
gfs2_ondisk.h
gigaset_dev.h
hdlc.h
hdlcdrv.h
hdreg.h
hid.h
hiddev.h
hidraw.h
hpet.h
hw_breakpoint.h
hysdn_if.h
i2c-dev.h
i2c.h
i2o-dev.h
i8k.h
icmp.h
icmpv6.h
if.h
if_addr.h
if_addrlabel.h
if_alg.h
if_arcnet.h
if_arp.h
if_bonding.h
if_bridge.h bridge: use __u16 in if_bridge.h 2013-02-14 00:54:17 -05:00
if_cablemodem.h
if_eql.h
if_ether.h net/8021q: Implement Multiple VLAN Registration Protocol (MVRP) 2013-02-10 20:37:22 -05:00
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h rtnl: expose carrier value with possibility to set it 2012-12-28 15:24:18 -08:00
if_ltalk.h
if_packet.h packet: packet fanout rollover during socket overload 2013-03-19 17:15:04 -04:00
if_phonet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h
if_slip.h
if_team.h
if_tun.h
if_tunnel.h
if_vlan.h net/8021q: Implement Multiple VLAN Registration Protocol (MVRP) 2013-02-10 20:37:22 -05:00
if_x25.h
igmp.h
in.h
in6.h mcast: define and use MRT[6]_MAX in ip[6]_mroute_opt() 2013-01-21 13:55:14 -05:00
in_route.h
inet_diag.h
inotify.h
input.h
ioctl.h
ip.h
ip6_tunnel.h
ip_vs.h
ipc.h
ipmi.h ipmi: remove superfluous kernel/userspace explanation 2013-02-27 19:10:21 -08:00
ipmi_msgdefs.h
ipsec.h
ipv6.h ipv6: Store Router Alert option in IP6CB directly. 2013-01-13 20:17:14 -05:00
ipv6_route.h
ipx.h
irda.h
irqnr.h
isdn.h
isdn_divertif.h
isdn_ppp.h
isdnif.h
iso_fs.h
ivtv.h
ivtvfb.h
ixjuser.h
jffs2.h
joystick.h
kd.h
kdev_t.h
kernel-page-flags.h
kernel.h
kernelcapi.h
kexec.h
keyboard.h
keyctl.h
kvm.h Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2013-02-24 13:07:18 -08:00
kvm_para.h
l2tp.h
limits.h
llc.h
loop.h
lp.h
magic.h
major.h
map_to_7segment.h
matroxfb.h
mdio.h
media.h
mei.h
mempolicy.h
meye.h [media] meye: convert to the control framework 2013-02-05 18:23:47 -02:00
mii.h
minix_fs.h
mman.h
mmtimer.h
module.h
mqueue.h
mroute.h mcast: add multicast proxy support (IPv4 and IPv6) 2013-01-21 13:55:14 -05:00
mroute6.h mcast: add multicast proxy support (IPv4 and IPv6) 2013-01-21 13:55:14 -05:00
msdos_fs.h fat: mark fs as dirty on mount and clean on umount 2013-02-27 19:10:11 -08:00
msg.h ipc: introduce message queue copy feature 2013-01-04 16:11:45 -08:00
mtio.h
n_r3964.h
nbd.h nbd: support FLUSH requests 2013-02-27 19:10:22 -08:00
ncp.h
ncp_fs.h
ncp_mount.h
ncp_no.h
neighbour.h vxlan: generalize forwarding tables 2013-03-17 12:23:46 -04:00
net.h
net_dropmon.h
net_tstamp.h
netconf.h
netdevice.h
netfilter.h
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h
netfilter_ipv6.h
netlink.h
netrom.h
nfc.h NFC: Change nfc.h license 2013-01-14 12:28:54 +01:00
nfs.h
nfs2.h
nfs3.h
nfs4.h
nfs4_mount.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfsacl.h
nl80211.h nl80211: renumber NL80211_FEATURE_FULL_AP_CLIENT_STATE 2013-02-15 09:41:44 +01:00
nubus.h
nvram.h
omap3isp.h
omapfb.h
oom.h
packet_diag.h
param.h
parport.h
patchkey.h
pci.h
pci_regs.h PCI: Add PCIe Link Capability link speed and width names 2012-12-26 10:39:23 -07:00
perf_event.h perf: Missing field in PERF_RECORD_SAMPLE documentation 2013-01-24 16:40:19 -03:00
personality.h
pfkeyv2.h
pg.h
phantom.h
phonet.h
pkt_cls.h
pkt_sched.h htb: add HTB_DIRECT_QLEN attribute 2013-03-06 15:40:53 -05:00
pktcdvd.h
pmu.h
poll.h
posix_types.h
ppdev.h
ppp-comp.h
ppp-ioctl.h
ppp_defs.h
pps.h
prctl.h
ptp_clock.h
ptrace.h
qnx4_fs.h
qnxtypes.h
quota.h
radeonfb.h
random.h
raw.h
rds.h
reboot.h
reiserfs_fs.h
reiserfs_xattr.h
resource.h
rfkill.h
romfs_fs.h
rose.h
route.h
rtc.h
rtnetlink.h bridge: Dump vlan information from a bridge port 2013-02-13 19:41:46 -05:00
scc.h
sched.h
screen_info.h
sdla.h
seccomp.h
securebits.h
selinux_netlink.h
sem.h
serial.h
serial_core.h serial_core: Fix type definition for PORT_BRCM_TRUMANAGE. 2013-02-04 15:05:04 -08:00
serial_reg.h
serio.h
shm.h
signal.h
signalfd.h
snmp.h tcp: TLP loss detection. 2013-03-12 08:30:34 -04:00
sock_diag.h
socket.h
sockios.h
som.h
sonet.h
sonypi.h
sound.h
soundcard.h
stat.h
stddef.h
string.h
suspend_ioctls.h
swab.h
synclink.h
sysctl.h
sysinfo.h
taskstats.h
tcp.h tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
tcp_metrics.h
telephony.h
termios.h
time.h
times.h
timex.h
tiocl.h
tipc.h
tipc_config.h
toshiba.h
tty.h
tty_flags.h
types.h
udf_fs_i.h
udp.h
uhid.h HID: uhid: use __packed__ for uhid_feature_answer_req 2013-01-03 10:38:24 +01:00
uinput.h
uio.h
ultrasound.h
un.h
unistd.h
unix_diag.h
usbdevice_fs.h
utime.h
utsname.h
uuid.h
uvcvideo.h
v4l2-common.h
v4l2-controls.h [media] meye: convert to the control framework 2013-02-05 18:23:47 -02:00
v4l2-dv-timings.h
v4l2-mediabus.h [media] V4L: DocBook: Add V4L2_MBUS_FMT_YUV10_1X30 media bus pixel code 2012-12-21 12:11:11 -02:00
v4l2-subdev.h
veth.h
vfio.h vfio-pci: Add support for VGA region access 2013-02-18 10:11:13 -07:00
vhost.h
videodev2.h [media] Move DV-class control IDs from videodev2.h to v4l2-controls.h 2013-02-05 18:00:00 -02:00
virtio_9p.h
virtio_balloon.h
virtio_blk.h
virtio_config.h
virtio_console.h
virtio_ids.h
virtio_net.h virtio-net: introduce a new control to set macaddr 2013-01-21 14:07:44 -05:00
virtio_pci.h
virtio_ring.h
virtio_rng.h
vm_sockets.h VSOCK: Split vm_sockets.h into kernel/uapi 2013-03-08 12:24:48 -05:00
vt.h
wait.h
wanrouter.h wanrouter: delete now orphaned header content, files/drivers 2013-01-31 19:56:35 -05:00
watchdog.h
wimax.h
wireless.h
x25.h
xattr.h hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes 2013-02-27 19:10:10 -08:00
xfrm.h