linux/net
Eric Dumazet 16c610162d net: call cond_resched() less often in __release_sock()
While stress testing TCP I had unexpected retransmits and sack packets
when a single cpu receives data from multiple high-throughput flows.

super_netperf 4 -H srv -T,10 -l 3000 &

Tcpdump extract:

 00:00:00.000007 IP6 clnt > srv: Flags [.], seq 26062848:26124288, ack 1, win 66, options [nop,nop,TS val 651460834 ecr 3100749131], length 61440
 00:00:00.000006 IP6 clnt > srv: Flags [.], seq 26124288:26185728, ack 1, win 66, options [nop,nop,TS val 651460834 ecr 3100749131], length 61440
 00:00:00.000005 IP6 clnt > srv: Flags [P.], seq 26185728:26243072, ack 1, win 66, options [nop,nop,TS val 651460834 ecr 3100749131], length 57344
 00:00:00.000006 IP6 clnt > srv: Flags [.], seq 26243072:26304512, ack 1, win 66, options [nop,nop,TS val 651460844 ecr 3100749141], length 61440
 00:00:00.000005 IP6 clnt > srv: Flags [.], seq 26304512:26365952, ack 1, win 66, options [nop,nop,TS val 651460844 ecr 3100749141], length 61440
 00:00:00.000007 IP6 clnt > srv: Flags [P.], seq 26365952:26423296, ack 1, win 66, options [nop,nop,TS val 651460844 ecr 3100749141], length 57344
 00:00:00.000006 IP6 clnt > srv: Flags [.], seq 26423296:26484736, ack 1, win 66, options [nop,nop,TS val 651460853 ecr 3100749150], length 61440
 00:00:00.000005 IP6 clnt > srv: Flags [.], seq 26484736:26546176, ack 1, win 66, options [nop,nop,TS val 651460853 ecr 3100749150], length 61440
 00:00:00.000005 IP6 clnt > srv: Flags [P.], seq 26546176:26603520, ack 1, win 66, options [nop,nop,TS val 651460853 ecr 3100749150], length 57344
 00:00:00.003932 IP6 clnt > srv: Flags [P.], seq 26603520:26619904, ack 1, win 66, options [nop,nop,TS val 651464844 ecr 3100753141], length 16384
 00:00:00.006602 IP6 clnt > srv: Flags [.], seq 24862720:24866816, ack 1, win 66, options [nop,nop,TS val 651471419 ecr 3100759716], length 4096
 00:00:00.013000 IP6 clnt > srv: Flags [.], seq 24862720:24866816, ack 1, win 66, options [nop,nop,TS val 651484421 ecr 3100772718], length 4096
 00:00:00.000416 IP6 srv > clnt: Flags [.], ack 26619904, win 1393, options [nop,nop,TS val 3100773185 ecr 651484421,nop,nop,sack 1 {24862720:24866816}], length 0

After analysis, it appears this is because of the cond_resched()
call from  __release_sock().

When current thread is yielding, while still holding the TCP socket lock,
it might regain the cpu after a very long time.

Other peer TLP/RTO is firing (multiple times) and packets are retransmit,
while the initial copy is waiting in the socket backlog or receive queue.

In this patch, I call cond_resched() only once every 16 packets.

Modern TCP stack now spends less time per packet in the backlog,
especially because ACK are no longer sent (commit 133c4c0d37
"tcp: defer regular ACK while processing socket backlog")

Before:

clnt:/# nstat -n;sleep 10;nstat|egrep "TcpOutSegs|TcpRetransSegs|TCPFastRetrans|TCPTimeouts|Probes|TCPSpuriousRTOs|DSACK"
TcpOutSegs                      19046186           0.0
TcpRetransSegs                  1471               0.0
TcpExtTCPTimeouts               1397               0.0
TcpExtTCPLossProbes             1356               0.0
TcpExtTCPDSACKRecv              1352               0.0
TcpExtTCPSpuriousRTOs           114                0.0
TcpExtTCPDSACKRecvSegs          1352               0.0

After:

clnt:/# nstat -n;sleep 10;nstat|egrep "TcpOutSegs|TcpRetransSegs|TCPFastRetrans|TCPTimeouts|Probes|TCPSpuriousRTOs|DSACK"
TcpOutSegs                      19218936           0.0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250903174811.1930820-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-04 19:16:51 -07:00
..
6lowpan net: replace ND_PRINTK with dynamic debug 2025-07-10 15:27:32 -07:00
9p
802
8021q net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
atm net: atm: fix memory leak in atm_register_sysfs when device_register fail 2025-09-04 09:53:44 +02:00
ax25 ax25: properly unshare skbs in ax25_kiss_rcv() 2025-09-03 17:06:30 -07:00
batman-adv batman-adv: fix OOB read/write in network-coding decode 2025-08-31 17:01:35 +02:00
bluetooth Bluetooth: Fix use-after-free in l2cap_sock_cleanup_listen() 2025-08-29 14:51:06 -04:00
bpf bpf: Add attach_type field to bpf_link 2025-07-11 10:51:55 -07:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
caif caif: Replace memset(0) + strscpy() with strscpy_pad() 2025-08-12 14:08:56 -07:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-12 10:09:10 -07:00
ceph libceph: Rename hmac_sha256() to ceph_hmac_sha256() 2025-07-04 10:18:52 -07:00
core net: call cond_resched() less often in __release_sock() 2025-09-04 19:16:51 -07:00
dcb
devlink devlink: Make health reporter burst period configurable 2025-08-26 17:24:16 -07:00
dns_resolver
dsa net: s/dev_close_many/netif_close_many/ 2025-07-18 17:27:47 -07:00
ethernet
ethtool net: ethtool: support including Flow Label in the flow hash for RSS 2025-08-14 11:40:13 +02:00
handshake net/handshake: Add new parameter 'HANDSHAKE_A_ACCEPT_KEYRING' 2025-07-08 15:31:44 +02:00
hsr net, hsr: reject HSR frame if skb can't hold tag 2025-08-20 19:31:25 -07:00
ieee802154
ife
ipv4 tcp: use tcp_eat_recv_skb in __tcp_close() 2025-09-04 19:13:41 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
iucv net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers 2025-08-28 13:14:50 +02:00
kcm net: kcm: Fix race condition in kcm_unattach() 2025-08-13 18:18:33 -07:00
key Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
l2tp l2tp: do not use sock_hold() in pppol2tp_session_get_sock() 2025-08-27 17:16:13 -07:00
l3mdev
lapb
llc net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
mac80211 wifi: mac80211: do not permit 40 MHz EHT operation on 5/6 GHz 2025-08-28 13:39:16 +02:00
mac802154
mctp mctp: return -ENOPROTOOPT for unknown getsockopt options 2025-09-03 17:01:52 -07:00
mpls net: s/dev_get_flags/netif_get_flags/ 2025-07-18 17:27:47 -07:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
ncsi net: ncsi: Fix buffer overflow in fetching version id 2025-06-12 18:21:59 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
netlabel
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
netrom
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-06-19 13:00:24 -07:00
nsh
openvswitch net: openvswitch: Use for_each_cpu() where appropriate 2025-08-20 19:47:22 -07:00
packet net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers 2025-08-28 13:14:50 +02:00
phonet net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers 2025-08-28 13:14:50 +02:00
psample
qrtr
rds rds: Fix endianness annotations for RDS extension headers 2025-08-22 16:44:39 -07:00
rfkill
rose net: rose: fix a typo in rose_clear_routes() 2025-08-27 17:27:52 -07:00
rxrpc rxrpc: Fix to use conn aborts for conn-wide failures 2025-07-17 07:50:48 -07:00
sched net_sched: act: remove tcfa_qstats 2025-09-02 15:52:24 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-08-29 11:48:01 -07:00
shaper
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-04 13:33:00 -07:00
strparser net: make sk->sk_rcvtimeo lockless 2025-06-23 17:05:12 -07:00
sunrpc nfsd-6.17 fixes: 2025-08-11 07:38:55 -07:00
switchdev
tipc net: add sk_drops_read(), sk_drops_inc() and sk_drops_reset() helpers 2025-08-28 13:14:50 +02:00
tls tls: fix handling of zero-length records on the rx_list 2025-08-21 07:52:30 -07:00
unix Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-08-29 11:48:01 -07:00
wireless wifi: cfg80211: sme: cap SSID length in __cfg80211_connect_result() 2025-09-03 09:37:55 +02:00
x25 net/x25: Remove unused x25_terminate_link() 2025-07-14 17:19:13 -07:00
xdp net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockopt 2025-07-10 14:48:29 +02:00
xfrm ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
Kconfig net: Kconfig: add endif/endmenu comments 2025-07-22 18:17:23 -07:00
Kconfig.debug
Makefile
compat.c
devres.c
socket.c net: annotate races around sk->sk_uid 2025-06-23 17:04:03 -07:00
sysctl_net.c