Linux kernel source tree
Go to file
Eric Dumazet b650bf0977 udp: remove busylock and add per NUMA queues
busylock was protecting UDP sockets against packet floods,
but unfortunately was not protecting the host itself.

Under stress, many cpus could spin while acquiring the busylock,
and NIC had to drop packets. Or packets would be dropped
in cpu backlog if RPS/RFS were in place.

This patch replaces the busylock by intermediate
lockless queues. (One queue per NUMA node).

This means that fewer number of cpus have to acquire
the UDP receive queue lock.

Most of the cpus can either:
- immediately drop the packet.
- or queue it in their NUMA aware lockless queue.

Then one of the cpu is chosen to process this lockless queue
in a batch.

The batch only contains packets that were cooked on the same
NUMA node, thus with very limited latency impact.

Tested:

DDOS targeting a victim UDP socket, on a platform with 6 NUMA nodes
(Intel(R) Xeon(R) 6985P-C)

Before:

nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 1004179            0.0
Udp6InErrors                    3117               0.0
Udp6RcvbufErrors                3117               0.0

After:
nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams                 1116633            0.0
Udp6InErrors                    14197275           0.0
Udp6RcvbufErrors                14197275           0.0

We can see this host can now proces 14.2 M more packets per second
while under attack, and the victim socket can receive 11 % more
packets.

I used a small bpftrace program measuring time (in us) spent in
__udp_enqueue_schedule_skb().

Before:

@udp_enqueue_us[398]:
[0]                24901 |@@@                                                 |
[1]                63512 |@@@@@@@@@                                           |
[2, 4)            344827 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8)            244673 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
[8, 16)            54022 |@@@@@@@@                                            |
[16, 32)          222134 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   |
[32, 64)          232042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
[64, 128)           4219 |                                                    |
[128, 256)           188 |                                                    |

After:

@udp_enqueue_us[398]:
[0]              5608855 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]              1111277 |@@@@@@@@@@                                          |
[2, 4)            501439 |@@@@                                                |
[4, 8)            102921 |                                                    |
[8, 16)            29895 |                                                    |
[16, 32)           43500 |                                                    |
[32, 64)           31552 |                                                    |
[64, 128)            979 |                                                    |
[128, 256)            13 |                                                    |

Note that the remaining bottleneck for this platform is in
udp_drops_inc() because we limited struct numa_drop_counters
to only two nodes so far.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250922104240.2182559-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23 16:38:39 -07:00
Documentation tcp: move tcp_clean_acked to tcp_sock_read_tx group 2025-09-22 17:55:25 -07:00
LICENSES LICENSES: Replace the obsolete address of the FSF in the GFDL-1.2 2025-07-24 11:15:39 +02:00
arch dibs: Move event handling to dibs layer 2025-09-23 11:13:22 +02:00
block vfs-6.17-rc6.fixes 2025-09-08 07:53:01 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: sha512 - Implement export_core() and import_core() 2025-09-02 19:02:39 -07:00
drivers dibs: Move event handling to dibs layer 2025-09-23 11:13:22 +02:00
fs 15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues 2025-09-17 21:34:26 -07:00
include udp: remove busylock and add per NUMA queues 2025-09-23 16:38:39 -07:00
init 20 hotfixes. 15 are cc:stable and the remainder address post-6.16 issues 2025-09-10 21:19:34 -07:00
io_uring io_uring/zcrx: fix ifq->if_rxq is -1, get dma_dev is NULL 2025-09-15 18:12:53 -07:00
ipc vfs-6.17-rc1.mmap_prepare 2025-07-28 13:43:25 -07:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-18 11:26:06 -07:00
lib hardening fixes for v6.17-rc4 2025-08-31 08:56:45 -07:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-18 11:26:06 -07:00
net udp: remove busylock and add per NUMA queues 2025-09-23 16:38:39 -07:00
rust Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-18 11:26:06 -07:00
samples 15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues 2025-09-17 21:34:26 -07:00
scripts Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-11 17:40:13 -07:00
security + Features 2025-08-04 08:17:28 -07:00
sound ALSA: hda/hdmi: Add pin fix for another HP EliteDesk 800 G4 model 2025-09-01 13:51:57 +02:00
tools selftests/net: Test tcp port reuse after unbinding a socket 2025-09-23 10:12:15 +02:00
usr usr/include: openrisc: don't HDRTEST bpf_perf_event.h 2025-05-12 15:03:17 +09:00
virt Merge tag 'kvm-x86-no_assignment-6.17' of https://github.com/kvm-x86/linux into HEAD 2025-07-29 08:36:42 -04:00
.clang-format Linux 6.15-rc5 2025-05-06 16:39:25 +10:00
.clippy.toml rust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` configuration 2025-05-07 00:11:47 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore gitignore: allow .pylintrc to be tracked 2025-07-02 17:10:04 -06:00
.mailmap MAINTAINERS: Update Nobuhiro Iwamatsu's email address 2025-09-01 10:36:10 +02:00
.pylintrc docs: add a .pylintrc file with sys path for docs scripts 2025-04-09 12:10:33 -06:00
.rustfmt.toml
COPYING
CREDITS MAINTAINERS: retire Boris from TLS maintainers 2025-08-26 17:36:01 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS dibs: Move event handling to dibs layer 2025-09-23 11:13:22 +02:00
Makefile Linux 6.17-rc6 2025-09-14 14:21:14 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.