mirror of https://github.com/torvalds/linux.git
net: fix napi_consume_skb() with alien skbs
There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.
Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.
Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.
This removes contention on SLUB spinlocks and data structures.
After this patch, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).
80 Mpps -> 120 Mpps.
Profiling one of the 32 cpus servicing NIC interrupts :
Before:
mpstat -P 511 1 1
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: 511 0.00 0.00 0.00 0.00 0.00 98.00 0.00 0.00 0.00 2.00
31.01% ksoftirqd/511 [kernel.kallsyms] [k] queued_spin_lock_slowpath
12.45% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
5.60% ksoftirqd/511 [kernel.kallsyms] [k] __slab_free
3.31% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
3.27% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
2.95% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_start
2.52% ksoftirqd/511 [kernel.kallsyms] [k] fq_dequeue
2.32% ksoftirqd/511 [kernel.kallsyms] [k] read_tsc
2.25% ksoftirqd/511 [kernel.kallsyms] [k] build_detached_freelist
2.15% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free
2.11% swapper [kernel.kallsyms] [k] __slab_free
2.06% ksoftirqd/511 [kernel.kallsyms] [k] idpf_features_check
2.01% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
1.97% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_data
1.52% ksoftirqd/511 [kernel.kallsyms] [k] sock_wfree
1.34% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
1.23% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
1.15% ksoftirqd/511 [kernel.kallsyms] [k] dma_unmap_page_attrs
1.11% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
1.03% swapper [kernel.kallsyms] [k] fq_dequeue
0.94% swapper [kernel.kallsyms] [k] kmem_cache_free
0.93% swapper [kernel.kallsyms] [k] read_tsc
0.81% ksoftirqd/511 [kernel.kallsyms] [k] napi_consume_skb
0.79% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
0.77% ksoftirqd/511 [kernel.kallsyms] [k] skb_free_head
0.76% swapper [kernel.kallsyms] [k] idpf_features_check
0.72% swapper [kernel.kallsyms] [k] skb_release_data
0.69% swapper [kernel.kallsyms] [k] build_detached_freelist
0.58% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_head_state
0.56% ksoftirqd/511 [kernel.kallsyms] [k] __put_partials
0.55% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free_bulk
0.48% swapper [kernel.kallsyms] [k] sock_wfree
After:
mpstat -P 511 1 1
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: 511 0.00 0.00 0.00 0.00 0.00 51.49 0.00 0.00 0.00 48.51
19.10% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
13.86% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
10.80% swapper [kernel.kallsyms] [k] skb_attempt_defer_free
10.57% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
7.18% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
6.69% swapper [kernel.kallsyms] [k] sock_wfree
5.55% swapper [kernel.kallsyms] [k] dma_unmap_page_attrs
3.10% swapper [kernel.kallsyms] [k] fq_dequeue
3.00% swapper [kernel.kallsyms] [k] skb_release_head_state
2.73% swapper [kernel.kallsyms] [k] read_tsc
2.48% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
1.20% swapper [kernel.kallsyms] [k] idpf_features_check
1.13% swapper [kernel.kallsyms] [k] napi_consume_skb
0.93% swapper [kernel.kallsyms] [k] idpf_vport_splitq_napi_poll
0.64% swapper [kernel.kallsyms] [k] native_send_call_func_single_ipi
0.60% swapper [kernel.kallsyms] [k] acpi_processor_ffh_cstate_enter
0.53% swapper [kernel.kallsyms] [k] io_idle
0.43% swapper [kernel.kallsyms] [k] netif_skb_features
0.41% swapper [kernel.kallsyms] [k] __direct_call_cpuidle_state_enter2
0.40% swapper [kernel.kallsyms] [k] native_irq_return_iret
0.40% swapper [kernel.kallsyms] [k] idpf_tx_buf_hw_update
0.36% swapper [kernel.kallsyms] [k] sched_clock_noinstr
0.34% swapper [kernel.kallsyms] [k] handle_softirqs
0.32% swapper [kernel.kallsyms] [k] net_rx_action
0.32% swapper [kernel.kallsyms] [k] dql_completed
0.32% swapper [kernel.kallsyms] [k] validate_xmit_skb
0.31% swapper [kernel.kallsyms] [k] skb_network_protocol
0.29% swapper [kernel.kallsyms] [k] skb_csum_hwoffload_help
0.29% swapper [kernel.kallsyms] [k] x2apic_send_IPI
0.28% swapper [kernel.kallsyms] [k] ktime_get
0.24% swapper [kernel.kallsyms] [k] __qdisc_run
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251106202935.1776179-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
parent
1fcf572211
commit
e20dfbad8a
|
|
@ -1476,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
|
|||
|
||||
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
|
||||
|
||||
if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
|
||||
skb_release_head_state(skb);
|
||||
return skb_attempt_defer_free(skb);
|
||||
}
|
||||
|
||||
if (!skb_unref(skb))
|
||||
return;
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue