linux/net
Jiayuan Chen 8259eb0e06 bpf, sockmap: Avoid using sk_socket after free when sending
The sk->sk_socket is not locked or referenced in backlog thread, and
during the call to skb_send_sock(), there is a race condition with
the release of sk_socket. All types of sockets(tcp/udp/unix/vsock)
will be affected.

Race conditions:
'''
CPU0                               CPU1

backlog::skb_send_sock
  sendmsg_unlocked
    sock_sendmsg
      sock_sendmsg_nosec
                                   close(fd):
                                     ...
                                     ops->release() -> sock_map_close()
                                     sk_socket->ops = NULL
                                     free(socket)
      sock->ops->sendmsg
            ^
            panic here
'''

The ref of psock become 0 after sock_map_close() executed.
'''
void sock_map_close()
{
    ...
    if (likely(psock)) {
    ...
    // !! here we remove psock and the ref of psock become 0
    sock_map_remove_links(sk, psock)
    psock = sk_psock_get(sk);
    if (unlikely(!psock))
        goto no_psock; <=== Control jumps here via goto
        ...
        cancel_delayed_work_sync(&psock->work); <=== not executed
        sk_psock_put(sk, psock);
        ...
}
'''

Based on the fact that we already wait for the workqueue to finish in
sock_map_close() if psock is held, we simply increase the psock
reference count to avoid race conditions.

With this patch, if the backlog thread is running, sock_map_close() will
wait for the backlog thread to complete and cancel all pending work.

If no backlog running, any pending work that hasn't started by then will
fail when invoked by sk_psock_get(), as the psock reference count have
been zeroed, and sk_psock_drop() will cancel all jobs via
cancel_delayed_work_sync().

In summary, we require synchronization to coordinate the backlog thread
and close() thread.

The panic I catched:
'''
Workqueue: events sk_psock_backlog
RIP: 0010:sock_sendmsg+0x21d/0x440
RAX: 0000000000000000 RBX: ffffc9000521fad8 RCX: 0000000000000001
...
Call Trace:
 <TASK>
 ? die_addr+0x40/0xa0
 ? exc_general_protection+0x14c/0x230
 ? asm_exc_general_protection+0x26/0x30
 ? sock_sendmsg+0x21d/0x440
 ? sock_sendmsg+0x3e0/0x440
 ? __pfx_sock_sendmsg+0x10/0x10
 __skb_send_sock+0x543/0xb70
 sk_psock_backlog+0x247/0xb80
...
'''

Fixes: 4b4647add7 ("sock_map: avoid race between sock_map_close and sk_psock_put")
Reported-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250516141713.291150-1-jiayuan.chen@linux.dev
2025-05-22 16:16:37 -07:00
..
6lowpan
9p 9p: Use hashtable.h for hash_errmap 2025-03-23 06:20:48 +09:00
802
8021q net: vlan: don't propagate flags on open 2025-03-20 09:57:37 +01:00
appletalk treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
atm treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
ax25 treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
batman-adv batman-adv: Fix double-hold of meshif when getting enabled 2025-04-15 17:56:47 -07:00
bluetooth Bluetooth: l2cap: Process valid commands in too long frame 2025-04-16 16:50:25 -04:00
bpf selftests/bpf: Add test to access const void pointer argument in tracing program 2025-04-23 11:26:22 -07:00
bridge net: bridge: switchdev: do not notify new brentries as changed 2025-04-16 18:11:39 -07:00
caif rtnetlink: Pack newlink() params into struct 2025-02-21 15:28:02 -08:00
can can: fix missing decrement of j1939_proto.inuse_idx 2025-04-15 12:18:07 +02:00
ceph A small CephFS encryption-related fix and a dead code cleanup. 2025-04-25 15:51:28 -07:00
core bpf, sockmap: Avoid using sk_socket after free when sending 2025-05-22 16:16:37 -07:00
dcb
dccp tcp/dccp: remove icsk->icsk_ack.timeout 2025-03-25 10:34:33 -07:00
devlink devlink: fix xa_alloc_cyclic() error handling 2025-03-19 09:57:36 +00:00
dns_resolver
dsa net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() fails 2025-04-16 18:14:44 -07:00
ethernet
ethtool ethtool: cmis_cdb: use correct rpl size in ethtool_cmis_module_poll() 2025-04-11 18:41:19 -07:00
handshake
hsr net: hold instance lock during NETDEV_CHANGE 2025-04-07 11:13:39 -07:00
ieee802154 inet: frags: save a pair of atomic operations in reassembly 2025-03-18 13:18:36 +01:00
ife
ipv4 treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
ipv6 ipv6: add exception routes to GC list in rt6_insert_exception 2025-04-10 20:09:05 -07:00
iucv s390: Convert MACHINE_IS_[LPAR|VM|KVM], etc, machine_is_[lpar|vm|kvm]() 2025-03-04 17:18:07 +01:00
kcm
key
l2tp net: move misc netdev_lock flavors to a separate header 2025-03-08 09:06:50 -08:00
l3mdev net: fib_rules: Fix iif / oif matching on L3 master device 2025-04-15 17:54:56 -07:00
lapb treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
llc treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
mac80211 Just a handful of fixes, notably 2025-04-11 16:38:04 -07:00
mac802154 mac802154: Switch to use hrtimer_setup() 2025-02-18 10:35:44 +01:00
mctp net: mctp: Set SOCK_RCU_FREE 2025-04-11 18:42:34 -07:00
mpls percpu: use TYPEOF_UNQUAL() in variable declarations 2025-03-16 22:05:53 -07:00
mptcp mptcp: pm: Defer freeing of MPTCP userspace path manager entries 2025-04-23 16:27:58 -07:00
ncsi treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
netfilter netfilter: conntrack: fix erronous removal of offload bit 2025-04-17 11:14:22 +02:00
netlabel
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-02-27 10:20:58 -08:00
netrom treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
nfc treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
nsh
openvswitch net: openvswitch: fix nested key length validation in the set() action 2025-04-14 16:15:38 -07:00
packet treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
phonet
psample
qrtr
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-02-27 10:20:58 -08:00
rfkill
rose treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
rxrpc treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
sched net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too 2025-04-23 17:16:50 -07:00
sctp Including fixes from netfilter. 2025-04-10 08:52:18 -07:00
shaper
smc smc: Fix lockdep false-positive for IPPROTO_SMC. 2025-04-11 14:14:26 -07:00
strparser
sunrpc nfsd-6.15 fixes: 2025-04-26 10:43:03 -07:00
switchdev net: switchdev: Convert blocking notification chain to a raw one 2025-03-11 11:30:28 +01:00
tipc tipc: fix NULL pointer dereference in tipc_mon_reinit_self() 2025-04-22 18:43:57 -07:00
tls ktls, sockmap: Fix missing uncharge operation 2025-05-09 18:09:59 -07:00
unix unix: fix up for "apparmor: add fine grained af_unix mediation" 2025-03-26 09:31:18 -07:00
vmw_vsock vsock: avoid timeout during connect() if the socket is closing 2025-04-02 17:19:30 -07:00
wireless treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
x25 treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
xdp xsk: Fix __xsk_generic_xmit() error code when cq is full 2025-04-02 21:55:43 -07:00
xfrm treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
Kconfig
Kconfig.debug
Makefile
compat.c
devres.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-03-26 09:32:10 -07:00
sysctl_net.c