linux/net/ipv4
Jiayuan Chen 929e30f931 bpf, sockmap: Fix FIONREAD for sockmap
A socket using sockmap has its own independent receive queue: ingress_msg.
This queue may contain data from its own protocol stack or from other
sockets.

Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to
calculate FIONREAD is not enough.

This patch adds a new msg_tot_len field in the psock structure to record
the data length in ingress_msg. Additionally, we implement new ioctl
interfaces for TCP and UDP to intercept FIONREAD operations.

Note that we intentionally do not include sk_receive_queue data in the
FIONREAD result. Data in sk_receive_queue has not yet been processed by
the BPF verdict program, and may be redirected to other sockets or
dropped. Including it would create semantic ambiguity since this data
may never be readable by the user.

Unix and VSOCK sockets have similar issues, but fixing them is outside
the scope of this patch as it would require more intrusive changes.

Previous work by John Fastabend made some efforts towards FIONREAD support:
commit e5c6de5fa0 ("bpf, sockmap: Incorrectly handling copied_seq")
Although the current patch is based on the previous work by John Fastabend,
it is acceptable for our Fixes tag to point to the same commit.

                                                      FD1:read()
                                                      --  FD1->copied_seq++
                                                          |  [read data]
                                                          |
                                   [enqueue data]         v
                  [sockmap]     -> ingress to self ->  ingress_msg queue
FD1 native stack  ------>                                 ^
-- FD1->rcv_nxt++               -> redirect to other      | [enqueue data]
                                       |                  |
                                       |             ingress to FD1
                                       v                  ^
                                      ...                 |  [sockmap]
                                                     FD2 native stack

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20260124113314.113584-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-27 09:11:30 -08:00
..
netfilter netfilter: nf_reject: don't reply to icmp error messages 2025-09-11 15:40:55 +02:00
Kconfig tcp: Convert tcp-md5 to use MD5 library instead of crypto_ahash 2025-10-17 17:14:54 -07:00
Makefile
af_inet.c net: factor-out _sk_charge() helper 2025-11-24 19:49:40 -08:00
ah4.c
arp.c arp: do not assume dev_hard_header() does not change skb->head 2026-01-08 09:04:24 -08:00
bpf_tcp_ca.c tcp: Pass flags to __tcp_send_ack 2025-03-17 13:56:38 +00:00
cipso_ipv4.c ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() 2025-09-11 12:22:38 +02:00
datagram.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
devinet.c ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init() 2025-09-03 16:58:44 -07:00
esp4.c tcp: Don't pass hashinfo to socket lookup helpers. 2025-08-25 17:53:35 -07:00
esp4_offload.c xfrm: Determine inner GSO type from packet inner protocol 2025-10-30 11:52:31 +01:00
fib_frontend.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
fib_semantics.c net: fib: restore ECMP balance from loopback 2025-12-30 11:07:38 +01:00
fib_trie.c ipv4: Fix reference count leak when using error routes with nexthop objects 2025-12-30 10:39:22 +01:00
fou_bpf.c
fou_core.c net: gro: remove is_ipv6 from napi_gro_cb 2025-09-25 12:42:49 +02:00
fou_nl.c tools: ynl-gen: add regeneration comment 2025-11-25 19:20:42 -08:00
fou_nl.h tools: ynl-gen: add regeneration comment 2025-11-25 19:20:42 -08:00
gre_demux.c net: ip_gre: Fix spelling mistake "demultiplexor" -> "demultiplexer" 2025-04-24 18:20:40 -07:00
gre_offload.c
icmp.c ipv4: icmp: Add RFC 5837 support 2025-10-29 18:28:29 -07:00
igmp.c ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu] 2025-07-02 14:32:30 -07:00
igmp_internal.h netlink: support dumping IPv4 multicast addresses 2025-02-11 11:26:53 +01:00
inet_connection_sock.c tcp: remove icsk->icsk_retransmit_timer 2025-11-25 19:28:29 -08:00
inet_diag.c tcp: introduce icsk->icsk_keepalive_timer 2025-11-25 19:28:29 -08:00
inet_fragment.c inet: frags: drop fraglist conntrack references 2026-01-04 10:58:04 -08:00
inet_hashtables.c inet: Avoid ehash lookup race in inet_ehash_insert() 2025-10-17 16:08:43 -07:00
inet_timewait_sock.c inet: Avoid ehash lookup race in inet_twsk_hashdance_schedule() 2025-10-17 16:08:43 -07:00
inetpeer.c inetpeer: use EXPORT_IPV6_MOD[_GPL]() 2025-02-14 13:09:39 -08:00
ip_forward.c
ip_fragment.c inet: frags: flush pending skbs in fqdir_pre_exit() 2025-12-10 01:15:27 -08:00
ip_gre.c erspan: Initialize options_len before referencing options. 2025-12-23 09:21:00 +01:00
ip_input.c net: ipv4: Remove extern udp_v4_early_demux()/tcp_v4_early_demux() in .c files 2025-10-29 17:05:30 -07:00
ip_options.c net: Switch to skb_dstref_steal/skb_dstref_restore for ip_route_input callers 2025-08-19 17:54:35 -07:00
ip_output.c net: psp: don't assume reply skbs will have a socket 2025-10-03 10:23:50 -07:00
ip_sockglue.c
ip_tunnel.c net/ip6_tunnel: Prevent perpetual tunnel growth 2025-10-13 17:43:46 -07:00
ip_tunnel_core.c tunnels: reset the GSO metadata before reusing the skb 2025-09-09 13:03:33 +02:00
ip_vti.c ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu] 2025-07-02 14:32:30 -07:00
ipcomp.c xfrm: delete x->tunnel as we delete x 2025-07-08 13:28:27 +02:00
ipconfig.c net: ipconfig: Replace strncpy with strscpy in ic_proto_name 2025-11-28 20:19:16 -08:00
ipip.c netfilter: flowtable: Add IPIP rx sw acceleration 2025-11-28 00:00:38 +00:00
ipmr.c ipv4: start using dst_dev_rcu() 2025-08-29 19:36:32 -07:00
ipmr_base.c ipmr: do not call mr_mfc_uses_dev() for unres entries 2025-01-23 07:08:13 -08:00
metrics.c
netfilter.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
netlink.c
nexthop.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-25 11:00:59 -07:00
ping.c inet: ping: Fix icmp out counting 2026-01-04 09:48:53 -08:00
proc.c ipv4: snmp: do not use SNMP_MIB_SENTINEL anymore 2025-09-08 18:06:20 -07:00
protocol.c
raw.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
raw_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
route.c ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe 2025-11-12 06:46:36 -08:00
syncookies.c tcp: accecn: AccECN negotiation 2025-09-18 08:47:51 +02:00
sysctl_net_ipv4.c tcp: add net.ipv4.tcp_rcvbuf_low_rtt 2025-11-20 17:44:23 -08:00
tcp.c net: do not write to msg_get_inq in callee 2026-01-08 08:45:13 -08:00
tcp_ao.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-18 11:26:06 -07:00
tcp_bbr.c
tcp_bic.c
tcp_bpf.c bpf, sockmap: Fix FIONREAD for sockmap 2026-01-27 09:11:30 -08:00
tcp_cdg.c tcp: cdg: remove redundant __GFP_NOWARN 2025-08-12 14:05:43 -07:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c tcp: helpers for ECN mode handling 2025-03-17 13:54:11 +00:00
tcp_dctcp.h tcp: Pass flags to __tcp_send_ack 2025-03-17 13:56:38 +00:00
tcp_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
tcp_fastopen.c tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check() 2025-08-29 19:36:32 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: add net.ipv4.tcp_rcvbuf_low_rtt 2025-11-20 17:44:23 -08:00
tcp_ipv4.c tcp: introduce icsk->icsk_keepalive_timer 2025-11-25 19:28:29 -08:00
tcp_lp.c net: tcp_lp: fix kernel-doc warnings and update outdated reference links 2025-10-28 17:52:44 -07:00
tcp_metrics.c Networking changes for 6.18. 2025-10-02 15:17:01 -07:00
tcp_minisocks.c tcp: Don't reinitialise tw->tw_transparent in tcp_time_wait(). 2025-11-18 18:00:38 -08:00
tcp_nv.c
tcp_offload.c tcp: gro: inline tcp_gro_pull_header() 2025-11-14 18:00:08 -08:00
tcp_output.c net/smc: bpf: Introduce generic hook for handshake flow 2025-11-10 11:19:41 -08:00
tcp_plb.c
tcp_rate.c
tcp_recovery.c tcp: update the outdated ref draft-ietf-tcpm-rack 2025-07-08 09:01:52 -07:00
tcp_scalable.c
tcp_sigpool.c
tcp_timer.c tcp: remove icsk->icsk_retransmit_timer 2025-11-25 19:28:29 -08:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c udp: call skb_orphan() before skb_attempt_defer_free() 2026-01-06 17:05:17 -08:00
udp_bpf.c bpf, sockmap: Fix FIONREAD for sockmap 2026-01-27 09:11:30 -08:00
udp_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
udp_impl.h udp: move udp_memory_allocated into net_aligned_data 2025-07-02 14:22:02 -07:00
udp_offload.c net: gro: remove is_ipv6 from napi_gro_cb 2025-09-25 12:42:49 +02:00
udp_tunnel_core.c net: Convert proto_ops connect() callbacks to use sockaddr_unsized 2025-11-04 19:10:32 -08:00
udp_tunnel_nic.c udp_tunnel: use netdev_warn() instead of netdev_WARN() 2025-09-11 19:09:48 -07:00
udp_tunnel_stub.c
udplite.c udp: move udp_memory_allocated into net_aligned_data 2025-07-02 14:22:02 -07:00
xfrm4_input.c xfrm: Set transport header to fix UDP GRO handling 2025-07-02 09:19:56 +02:00
xfrm4_output.c ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu] 2025-07-02 14:32:30 -07:00
xfrm4_policy.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c