net: fix napi_consume_skb() with alien skbs

There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.

Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.

Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.

This removes contention on SLUB spinlocks and data structures.

After this patch, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

80 Mpps -> 120 Mpps.

Profiling one of the 32 cpus servicing NIC interrupts :

Before:

mpstat -P 511 1 1

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     511    0.00    0.00    0.00    0.00    0.00   98.00    0.00    0.00    0.00    2.00

    31.01%  ksoftirqd/511    [kernel.kallsyms]  [k] queued_spin_lock_slowpath
    12.45%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     5.60%  ksoftirqd/511    [kernel.kallsyms]  [k] __slab_free
     3.31%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
     3.27%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     2.95%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_start
     2.52%  ksoftirqd/511    [kernel.kallsyms]  [k] fq_dequeue
     2.32%  ksoftirqd/511    [kernel.kallsyms]  [k] read_tsc
     2.25%  ksoftirqd/511    [kernel.kallsyms]  [k] build_detached_freelist
     2.15%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free
     2.11%  swapper          [kernel.kallsyms]  [k] __slab_free
     2.06%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_features_check
     2.01%  ksoftirqd/511    [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
     1.97%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_data
     1.52%  ksoftirqd/511    [kernel.kallsyms]  [k] sock_wfree
     1.34%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
     1.23%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     1.15%  ksoftirqd/511    [kernel.kallsyms]  [k] dma_unmap_page_attrs
     1.11%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
     1.03%  swapper          [kernel.kallsyms]  [k] fq_dequeue
     0.94%  swapper          [kernel.kallsyms]  [k] kmem_cache_free
     0.93%  swapper          [kernel.kallsyms]  [k] read_tsc
     0.81%  ksoftirqd/511    [kernel.kallsyms]  [k] napi_consume_skb
     0.79%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
     0.77%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_free_head
     0.76%  swapper          [kernel.kallsyms]  [k] idpf_features_check
     0.72%  swapper          [kernel.kallsyms]  [k] skb_release_data
     0.69%  swapper          [kernel.kallsyms]  [k] build_detached_freelist
     0.58%  ksoftirqd/511    [kernel.kallsyms]  [k] skb_release_head_state
     0.56%  ksoftirqd/511    [kernel.kallsyms]  [k] __put_partials
     0.55%  ksoftirqd/511    [kernel.kallsyms]  [k] kmem_cache_free_bulk
     0.48%  swapper          [kernel.kallsyms]  [k] sock_wfree

After:

mpstat -P 511 1 1

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     511    0.00    0.00    0.00    0.00    0.00   51.49    0.00    0.00    0.00   48.51

    19.10%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_hdr
    13.86%  swapper          [kernel.kallsyms]  [k] idpf_tx_clean_buf_ring
    10.80%  swapper          [kernel.kallsyms]  [k] skb_attempt_defer_free
    10.57%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_clean_all
     7.18%  swapper          [kernel.kallsyms]  [k] queued_spin_lock_slowpath
     6.69%  swapper          [kernel.kallsyms]  [k] sock_wfree
     5.55%  swapper          [kernel.kallsyms]  [k] dma_unmap_page_attrs
     3.10%  swapper          [kernel.kallsyms]  [k] fq_dequeue
     3.00%  swapper          [kernel.kallsyms]  [k] skb_release_head_state
     2.73%  swapper          [kernel.kallsyms]  [k] read_tsc
     2.48%  swapper          [kernel.kallsyms]  [k] idpf_tx_splitq_start
     1.20%  swapper          [kernel.kallsyms]  [k] idpf_features_check
     1.13%  swapper          [kernel.kallsyms]  [k] napi_consume_skb
     0.93%  swapper          [kernel.kallsyms]  [k] idpf_vport_splitq_napi_poll
     0.64%  swapper          [kernel.kallsyms]  [k] native_send_call_func_single_ipi
     0.60%  swapper          [kernel.kallsyms]  [k] acpi_processor_ffh_cstate_enter
     0.53%  swapper          [kernel.kallsyms]  [k] io_idle
     0.43%  swapper          [kernel.kallsyms]  [k] netif_skb_features
     0.41%  swapper          [kernel.kallsyms]  [k] __direct_call_cpuidle_state_enter2
     0.40%  swapper          [kernel.kallsyms]  [k] native_irq_return_iret
     0.40%  swapper          [kernel.kallsyms]  [k] idpf_tx_buf_hw_update
     0.36%  swapper          [kernel.kallsyms]  [k] sched_clock_noinstr
     0.34%  swapper          [kernel.kallsyms]  [k] handle_softirqs
     0.32%  swapper          [kernel.kallsyms]  [k] net_rx_action
     0.32%  swapper          [kernel.kallsyms]  [k] dql_completed
     0.32%  swapper          [kernel.kallsyms]  [k] validate_xmit_skb
     0.31%  swapper          [kernel.kallsyms]  [k] skb_network_protocol
     0.29%  swapper          [kernel.kallsyms]  [k] skb_csum_hwoffload_help
     0.29%  swapper          [kernel.kallsyms]  [k] x2apic_send_IPI
     0.28%  swapper          [kernel.kallsyms]  [k] ktime_get
     0.24%  swapper          [kernel.kallsyms]  [k] __qdisc_run

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20251106202935.1776179-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Eric Dumazet 2025-11-06 20:29:34 +00:00 committed by Jakub Kicinski
parent 1fcf572211
commit e20dfbad8a
1 changed files with 5 additions and 0 deletions

View File

@ -1476,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
DEBUG_NET_WARN_ON_ONCE(!in_softirq()); DEBUG_NET_WARN_ON_ONCE(!in_softirq());
if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
skb_release_head_state(skb);
return skb_attempt_defer_free(skb);
}
if (!skb_unref(skb)) if (!skb_unref(skb))
return; return;