linux/kernel/bpf
Kumar Kartikeya Dwivedi a650d38915 bpf: Convert ringbuf map to rqspinlock
Convert the raw spinlock used by BPF ringbuf to rqspinlock. Currently,
we have an open syzbot report of a potential deadlock. In addition, the
ringbuf can fail to reserve spuriously under contention from NMI
context.

It is potentially attractive to enable unconstrained usage (incl. NMIs)
while ensuring no deadlocks manifest at runtime, perform the conversion
to rqspinlock to achieve this.

This change was benchmarked for BPF ringbuf's multi-producer contention
case on an Intel Sapphire Rapids server, with hyperthreading disabled
and performance governor turned on. 5 warm up runs were done for each
case before obtaining the results.

Before (raw_spinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.440 ± 0.019M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  2.706 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  3.130 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  2.472 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  2.352 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.813 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.988 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.245 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.148 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.190 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.490 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.180 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.201 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.226 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.164 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.874 ± 0.001M/s (drops 0.000 ± 0.000M/s)

After (rqspinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.078 ± 0.019M/s (drops 0.000 ± 0.000M/s) (-3.16%)
rb-libbpf nr_prod 2  2.801 ± 0.014M/s (drops 0.000 ± 0.000M/s) (3.51%)
rb-libbpf nr_prod 3  3.454 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.35%)
rb-libbpf nr_prod 4  2.567 ± 0.002M/s (drops 0.000 ± 0.000M/s) (3.84%)
rb-libbpf nr_prod 8  2.468 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.93%)
rb-libbpf nr_prod 12 2.510 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-10.77%)
rb-libbpf nr_prod 16 2.075 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.38%)
rb-libbpf nr_prod 20 2.640 ± 0.001M/s (drops 0.000 ± 0.000M/s) (17.59%)
rb-libbpf nr_prod 24 2.092 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-2.61%)
rb-libbpf nr_prod 28 2.426 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.78%)
rb-libbpf nr_prod 32 2.331 ± 0.004M/s (drops 0.000 ± 0.000M/s) (-6.39%)
rb-libbpf nr_prod 36 2.306 ± 0.003M/s (drops 0.000 ± 0.000M/s) (5.78%)
rb-libbpf nr_prod 40 2.178 ± 0.002M/s (drops 0.000 ± 0.000M/s) (-1.04%)
rb-libbpf nr_prod 44 2.293 ± 0.001M/s (drops 0.000 ± 0.000M/s) (3.01%)
rb-libbpf nr_prod 48 2.022 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-6.56%)
rb-libbpf nr_prod 52 1.809 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-3.47%)

There's a fair amount of noise in the benchmark, with numbers on reruns
going up and down by 10%, so all changes are in the range of this
disturbance, and we see no major regressions.

Reported-by: syzbot+850aaf14624dc0c6d366@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004aa700061379547e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250411101759.4061366-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-11 10:28:26 -07:00
..
preload bpf: preload: Add MODULE_DESCRIPTION 2025-03-15 11:48:58 -07:00
Kconfig
Makefile rqspinlock: Add entry to Makefile, MAINTAINERS 2025-03-19 08:03:05 -07:00
arena.c bpf_try_alloc_pages 2025-03-30 13:45:28 -07:00
arraymap.c bpf: Remove migrate_{disable|enable} in ->map_for_each_callback 2025-01-08 18:06:35 -08:00
bloom_filter.c
bpf_cgrp_storage.c bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage 2025-03-18 19:05:46 -07:00
bpf_inode_storage.c bpf: Disable migration when destroying inode storage 2025-01-08 18:06:36 -08:00
bpf_iter.c bpf: Make every prog keep a copy of ctx_arg_info 2025-02-17 18:47:27 -08:00
bpf_local_storage.c bpf: Remove migrate_{disable|enable} from bpf_selem_free() 2025-01-08 18:06:37 -08:00
bpf_lru_list.c
bpf_lru_list.h
bpf_lsm.c bpf: lsm: Add two more sleepable hooks 2025-02-13 19:35:31 -08:00
bpf_struct_ops.c bpf: Allow struct_ops prog to return referenced kptr 2025-02-17 18:47:27 -08:00
bpf_task_storage.c bpf: Remove migrate_{disable|enable} from bpf_task_storage_lock helpers 2025-01-08 18:06:36 -08:00
btf.c bpf_res_spin_lock 2025-03-30 13:06:27 -07:00
btf_iter.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
btf_relocate.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
cgroup.c bpf: Allow pre-ordering for bpf cgroup progs 2025-03-15 11:48:25 -07:00
cgroup_iter.c
core.c bpf: Make perf_event_read_output accessible in all program types. 2025-03-18 10:21:59 -07:00
cpumap.c bpf: cpumap: switch to napi_skb_cache_get_bulk() 2025-02-27 14:03:39 +01:00
cpumask.c bpf: fix missing kdoc string fields in cpumask.c 2025-03-15 11:48:57 -07:00
crypto.c bpf: crypto: make state and IV dynptr nullable 2024-06-13 16:33:04 -07:00
devmap.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2024-12-12 14:19:05 -08:00
disasm.c bpftool: Using the right format specifiers 2025-03-17 13:50:56 -07:00
disasm.h
dispatcher.c bpf: Add kernel symbol for struct_ops trampoline 2024-11-12 17:13:46 -08:00
hashtab.c bpf: Convert hashtab.c to rqspinlock 2025-03-19 08:03:05 -07:00
helpers.c bpf-next-6.15 2025-03-30 12:43:03 -07:00
inode.c Change inode_operations.mkdir to return struct dentry * 2025-02-27 20:00:17 +01:00
kmem_cache_iter.c bpf: Add open coded version of kmem_cache iterator 2024-11-01 11:08:32 -07:00
link_iter.c
local_storage.c bpf: Replace 8 seq_puts() calls by seq_putc() calls 2024-07-29 12:53:00 -07:00
log.c bpf: Introduce support for bpf_local_irq_{save,restore} 2024-12-04 08:38:29 -08:00
lpm_trie.c bpf: Convert lpm_trie.c to rqspinlock 2025-03-19 08:03:05 -07:00
map_in_map.c bpf: switch maps to CLASS(fd, ...) 2024-08-13 15:58:17 -07:00
map_in_map.h
map_iter.c
memalloc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2024-11-13 12:52:51 -08:00
mmap_unlock_work.h
mprog.c
net_namespace.c
offload.c net: move misc netdev_lock flavors to a separate header 2025-03-08 09:06:50 -08:00
percpu_freelist.c bpf: Convert percpu_freelist.c to rqspinlock 2025-03-19 08:03:05 -07:00
percpu_freelist.h bpf: Convert percpu_freelist.c to rqspinlock 2025-03-19 08:03:05 -07:00
prog_iter.c
queue_stack_maps.c bpf: Convert queue_stack map to rqspinlock 2025-04-10 12:51:10 -07:00
range_tree.c bpf: Disable migration before calling ops->map_free() 2025-01-08 18:06:36 -08:00
range_tree.h bpf: Introduce range_tree data structure and use it in bpf arena 2024-11-13 13:52:45 -08:00
relo_core.c bpf: Remove custom build rule 2024-08-30 08:55:26 -07:00
reuseport_array.c bpf: Use sockfd_put() helper 2024-08-30 08:57:47 -07:00
ringbuf.c bpf: Convert ringbuf map to rqspinlock 2025-04-11 10:28:26 -07:00
rqspinlock.c bpf: Use architecture provided res_smp_cond_load_acquire 2025-04-10 12:47:07 -07:00
rqspinlock.h rqspinlock: Protect waiters in queue from stalls 2025-03-19 08:03:05 -07:00
stackmap.c bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers 2024-09-11 09:58:31 -07:00
syscall.c bpf_try_alloc_pages 2025-03-30 13:45:28 -07:00
sysfs_btf.c btf: Switch vmlinux BTF attribute to sysfs_bin_attr_simple_read() 2025-01-09 10:44:06 +01:00
task_iter.c vfs-6.13.file 2024-11-18 10:30:29 -08:00
tcx.c
tnum.c
token.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
trampoline.c bpf: Add kernel symbol for struct_ops trampoline 2024-11-12 17:13:46 -08:00
verifier.c bpf_res_spin_lock 2025-03-30 13:06:27 -07:00