linux/net/core
Xin Long a12c76a033 net: sched: refine software bypass handling in tc_run
This patch addresses issues with filter counting in block (tcf_block),
particularly for software bypass scenarios, by introducing a more
accurate mechanism using useswcnt.

Previously, filtercnt and skipswcnt were introduced by:

  Commit 2081fd3445 ("net: sched: cls_api: add filter counter") and
  Commit f631ef39d8 ("net: sched: cls_api: add skip_sw counter")

  filtercnt tracked all tp (tcf_proto) objects added to a block, and
  skipswcnt counted tp objects with the skipsw attribute set.

The problem is: a single tp can contain multiple filters, some with skipsw
and others without. The current implementation fails in the case:

  When the first filter in a tp has skipsw, both skipswcnt and filtercnt
  are incremented, then adding a second filter without skipsw to the same
  tp does not modify these counters because tp->counted is already set.

  This results in bypass software behavior based solely on skipswcnt
  equaling filtercnt, even when the block includes filters without
  skipsw. Consequently, filters without skipsw are inadvertently bypassed.

To address this, the patch introduces useswcnt in block to explicitly count
tp objects containing at least one filter without skipsw. Key changes
include:

  Whenever a filter without skipsw is added, its tp is marked with usesw
  and counted in useswcnt. tc_run() now uses useswcnt to determine software
  bypass, eliminating reliance on filtercnt and skipswcnt.

  This refined approach prevents software bypass for blocks containing
  mixed filters, ensuring correct behavior in tc_run().

Additionally, as atomic operations on useswcnt ensure thread safety and
tp->lock guards access to tp->usesw and tp->counted, the broader lock
down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
and this resolves a performance regression caused by the filter counting
mechanism during parallel filter insertions.

  The improvement can be demonstrated using the following script:

  # cat insert_tc_rules.sh

    tc qdisc add dev ens1f0np0 ingress
    for i in $(seq 16); do
        taskset -c $i tc -b rules_$i.txt &
    done
    wait

  Each of rules_$i.txt files above includes 100000 tc filter rules to a
  mlx5 driver NIC ens1f0np0.

  Without this patch:

  # time sh insert_tc_rules.sh

    real    0m50.780s
    user    0m23.556s
    sys	    4m13.032s

  With this patch:

  # time sh insert_tc_rules.sh

    real    0m17.718s
    user    0m7.807s
    sys     3m45.050s

Fixes: 047f340b36 ("net: sched: make skip_sw actually skip software")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-20 09:21:27 +00:00
..
Makefile net: Implement fault injection forcing skb reallocation 2024-11-12 12:05:33 +01:00
bpf_sk_storage.c bpf: Add "bool swap_uptrs" arg to bpf_local_storage_update() and bpf_selem_alloc() 2024-10-24 10:25:59 -07:00
datagram.c net: add support for skbs with unreadable frags 2024-09-11 20:44:31 -07:00
dev.c net: sched: refine software bypass handling in tc_run 2025-01-20 09:21:27 +00:00
dev.h net: make netdev netlink ops hold netdev_lock() 2025-01-15 19:13:34 -08:00
dev_addr_lists.c net: ti: icssg-prueth: Add Multicast Filtering support for VLAN in MAC mode 2025-01-14 12:17:27 +01:00
dev_addr_lists_test.c net: dev_addr_lists: move locking out of init/exit in kunit 2024-04-15 10:26:35 +01:00
dev_ioctl.c dev: Hold rtnl_net_lock() for dev_ifsioc(). 2025-01-16 17:20:50 -08:00
devmem.c net: devmem: add ring parameter filtering 2025-01-15 14:42:11 -08:00
devmem.h tcp: RX path for devmem TCP 2024-09-11 20:44:32 -07:00
drop_monitor.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
dst.c net: do not delay dst_entries_add() in dst_release() 2024-10-10 11:28:17 +02:00
dst_cache.c net: dst_cache: add two DEBUG_NET warnings 2024-06-03 18:50:09 -07:00
failover.c
fib_notifier.c net: do not acquire rtnl in fib_seq_sum() 2024-10-11 15:35:05 -07:00
fib_rules.c net: fib_rules: Enable flow label selector usage 2024-12-19 16:02:22 +01:00
filter.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-16 10:34:59 -08:00
flow_dissector.c net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE 2024-07-18 10:52:17 +02:00
flow_offload.c
gen_estimator.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
gen_stats.c
gro.c net: Add netif_get_gro_max_size helper for GRO 2024-10-01 10:48:51 +02:00
gro_cells.c
gso.c
hotdata.c net: move sysctl_mem_pcpu_rsv to net_hotdata 2024-04-30 18:46:52 -07:00
hwbm.c
ieee8021q_helpers.c net: add IEEE 802.1q specific helpers 2024-05-08 10:35:09 +01:00
link_watch.c ipvlan: Fix use-after-free in ipvlan_get_iflink(). 2025-01-07 17:50:49 -08:00
lwt_bpf.c bpf: lwtunnel: Prepare bpf_lwt_xmit_reroute() to future .flowi4_tos conversion. 2024-11-14 19:07:49 -08:00
lwtunnel.c
mp_dmabuf_devmem.h memory-provider: dmabuf devmem memory provider 2024-09-11 20:44:31 -07:00
neighbour.c net/neighbor: clear error in case strict check is not set 2024-11-18 18:42:21 -08:00
net-procfs.c net: make softnet_data.dropped an atomic_t 2024-04-01 11:28:32 +01:00
net-sysfs.c net: protect NAPI config fields with netdev_lock() 2025-01-15 19:13:34 -08:00
net-sysfs.h
net-traces.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
net_namespace.c net: expedite synchronize_net() for cleanup_net() 2025-01-15 19:17:03 -08:00
net_test.c pfcp: always set pfcp metadata 2024-04-01 10:49:28 +01:00
netclassid_cgroup.c
netdev-genl-gen.c netdev: avoid CFI problems with sock priv helpers 2025-01-16 13:15:40 +01:00
netdev-genl-gen.h netdev-genl: Support setting per-NAPI config values 2024-10-14 17:54:29 -07:00
netdev-genl.c netdev-genl: remove rtnl_lock protection from NAPI ops 2025-01-15 19:13:35 -08:00
netdev_rx_queue.c netdev: define NETDEV_INTERNAL 2025-01-09 15:33:08 +01:00
netevent.c
netmem_priv.h page_pool: devmem support 2024-09-11 20:44:31 -07:00
netpoll.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-16 10:34:59 -08:00
netprio_cgroup.c
of_net.c
page_pool.c selftests: drv-net-hw: inject pp_alloc_fail errors in the right place 2025-01-16 17:18:53 -08:00
page_pool_priv.h memory-provider: dmabuf devmem memory provider 2024-09-11 20:44:31 -07:00
page_pool_user.c netdev: add dmabuf introspection 2024-09-11 20:44:32 -07:00
pktgen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-16 10:34:59 -08:00
ptp_classifier.c
request_sock.c
rtnetlink.c rtnetlink: Add rtnl_net_lock_killable(). 2025-01-07 13:45:53 +01:00
rtnl_net_debug.c dev: Hold rtnl_net_lock() for dev_ifsioc(). 2025-01-16 17:20:50 -08:00
scm.c af_unix: Add dead flag to struct scm_fp_list. 2024-05-10 18:52:45 -07:00
secure_seq.c
selftests.c
skb_fault_injection.c net: Implement fault injection forcing skb reallocation 2024-11-12 12:05:33 +01:00
skbuff.c bpf, xdp: constify some bpf_prog * function arguments 2024-12-05 18:41:06 -08:00
skmsg.c skmsg: Return copied bytes in sk_msg_memcopy_from_iter 2024-12-20 22:53:36 +01:00
sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-03 16:29:29 -08:00
sock_destructor.h
sock_diag.c net: use unrcu_pointer() helper 2024-06-06 11:52:52 +02:00
sock_map.c bpf, sockmap: Fix race between element replace and close() 2024-12-10 17:38:05 +01:00
sock_reuseport.c net: core: annotate socks of struct sock_reuseport with __counted_by 2024-08-02 17:16:59 -07:00
stream.c
sysctl_net_core.c net: sysctl: allow dump_cpumask to handle higher numbers of CPUs 2024-10-23 10:28:26 +02:00
timestamping.c net: Add the possibility to support a selected hwtstamp in netdevice 2024-12-16 12:51:40 +00:00
tso.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
utils.c net: Correct spelling in net/core 2024-08-26 09:37:23 -07:00
xdp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-01-16 10:34:59 -08:00