linux/drivers/net/ethernet
Duoming Zhou f8b4687151 octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.

A typical race condition is illustrated below:

CPU 0 (cleanup)           | CPU 1 (delayed work callback)
otx2_remove()             |
  otx2_ptp_destroy()      | otx2_sync_tstamp()
    cancel_delayed_work() |
    kfree(ptp)            |
                          |   ptp = container_of(...); //UAF
                          |   ptp-> //UAF

This is confirmed by a KASAN report:

BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0x55/0x70
 print_report+0xcf/0x610
 ? __run_timer_base.part.0+0x7d7/0x8c0
 kasan_report+0xb8/0xf0
 ? __run_timer_base.part.0+0x7d7/0x8c0
 __run_timer_base.part.0+0x7d7/0x8c0
 ? __pfx___run_timer_base.part.0+0x10/0x10
 ? __pfx_read_tsc+0x10/0x10
 ? ktime_get+0x60/0x140
 ? lapic_next_event+0x11/0x20
 ? clockevents_program_event+0x1d4/0x2a0
 run_timer_softirq+0xd1/0x190
 handle_softirqs+0x16a/0x550
 irq_exit_rcu+0xaf/0xe0
 sysvec_apic_timer_interrupt+0x70/0x80
 </IRQ>
...
Allocated by task 1:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x7f/0x90
 otx2_ptp_init+0xb1/0x860
 otx2_probe+0x4eb/0xc30
 local_pci_probe+0xdc/0x190
 pci_device_probe+0x2fe/0x470
 really_probe+0x1ca/0x5c0
 __driver_probe_device+0x248/0x310
 driver_probe_device+0x44/0x120
 __driver_attach+0xd2/0x310
 bus_for_each_dev+0xed/0x170
 bus_add_driver+0x208/0x500
 driver_register+0x132/0x460
 do_one_initcall+0x89/0x300
 kernel_init_freeable+0x40d/0x720
 kernel_init+0x1a/0x150
 ret_from_fork+0x10c/0x1a0
 ret_from_fork_asm+0x1a/0x30

Freed by task 136:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3a/0x60
 __kasan_slab_free+0x3f/0x50
 kfree+0x137/0x370
 otx2_ptp_destroy+0x38/0x80
 otx2_remove+0x10d/0x4c0
 pci_device_remove+0xa6/0x1d0
 device_release_driver_internal+0xf8/0x210
 pci_stop_bus_device+0x105/0x150
 pci_stop_and_remove_bus_device_locked+0x15/0x30
 remove_store+0xcc/0xe0
 kernfs_fop_write_iter+0x2c3/0x440
 vfs_write+0x871/0xd70
 ksys_write+0xee/0x1c0
 do_syscall_64+0xac/0x280
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.

This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.

Fixes: 2958d17a89 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18 07:47:18 -07:00
..
3com
8390
actions
adaptec
adi
aeroflex
agere et131x: Add missing check after DMA map 2025-07-17 19:02:55 -07:00
airoha net: airoha: ppe: Do not invalid PPE entries in case of SW hash collision 2025-08-21 11:25:11 +02:00
alacritech
allwinner
alteon
altera
amazon net: Fix typos 2025-07-25 10:29:07 -07:00
amd A treewide cleanup of struct cycle_counter const annotations: 2025-07-29 14:02:53 -07:00
apm
apple
aquantia
arc
asix
atheros net: Use netif_threaded_enable instead of netif_set_threaded in drivers 2025-07-24 18:34:55 -07:00
broadcom cnic: Fix use-after-free bugs in cnic_delete_task 2025-09-18 07:47:18 -07:00
brocade
cadence net: macb: Fix tx_ptr_lock locking 2025-09-01 13:11:10 -07:00
calxeda
cavium net: liquidio: fix overflow in octeon_init_instr_queue() 2025-09-18 07:47:17 -07:00
chelsio net: Fix typos 2025-07-25 10:29:07 -07:00
cirrus
cisco
cortina
davicom
dec net: Fix typos 2025-07-25 10:29:07 -07:00
dlink eth: sundance: fix endian issues 2025-09-02 15:49:41 -07:00
emulex benet: fix BUG when creating VFs 2025-08-04 17:17:31 -07:00
engleder
ezchip
faraday net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect 2025-08-05 16:00:53 -07:00
freescale dpaa2-switch: fix buffer pool seeding for control traffic 2025-09-11 18:51:25 -07:00
fujitsu
fungible
google gve: prevent ethtool ops after shutdown 2025-08-19 18:04:07 -07:00
hisilicon net: hibmcge: fix the np_link_fail error reporting issue 2025-08-08 11:48:49 -07:00
huawei net: Fix typos 2025-07-25 10:29:07 -07:00
i825xx
ibm ibmveth: Add multi buffers rx replenishment hcall support 2025-07-22 15:08:23 +02:00
intel igc: don't fail igc_probe() on LED setup error 2025-09-16 14:01:53 -07:00
litex
marvell octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp() 2025-09-18 07:47:18 -07:00
mediatek net: ethernet: mtk_eth_soc: fix tx vlan tag for llc packets 2025-09-02 16:27:30 -07:00
mellanox Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set" 2025-09-18 07:47:17 -07:00
meta fbnic: Move phylink resume out of service_task and into open/close 2025-08-27 18:57:08 -07:00
micrel net: Fix typos 2025-07-25 10:29:07 -07:00
microchip microchip: lan865x: Fix LAN8651 autoloading 2025-08-29 19:42:07 -07:00
microsoft Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-07-24 11:10:46 -07:00
moxa
mscc
myricom
natsemi net: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure 2025-09-15 19:06:25 -07:00
neterion net: Fix typos 2025-07-25 10:29:07 -07:00
netronome Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
ni
nvidia
nxp
oki-semi
packetengines
pasemi
pensando Networking changes for 6.17. 2025-07-30 08:58:55 -07:00
qlogic qed: Don't collect too many protection override GRC elements 2025-09-14 14:25:03 -07:00
qualcomm net: Fix typos 2025-07-25 10:29:07 -07:00
rdc
realtek rtase: Fix Rx descriptor CRC error bit definition 2025-08-14 17:53:12 -07:00
renesas net: Use netif_threaded_enable instead of netif_set_threaded in drivers 2025-07-24 18:34:55 -07:00
rocker
samsung
seeq
sfc sfc: unfix not-a-typo in comment 2025-08-01 14:16:33 -07:00
sgi
silan
sis
smsc net: Fix typos 2025-07-25 10:29:07 -07:00
socionext
stmicro net: stmmac: Set CIC bit only for TX queues with COE 2025-08-26 18:12:42 -07:00
sun net: Fix typos 2025-07-25 10:29:07 -07:00
sunplus
synopsys
tehuti net: Fix typos 2025-07-25 10:29:07 -07:00
ti hsr: hold rcu and dev lock for hsr_get_port_ndev 2025-09-11 11:49:19 +02:00
toshiba
tundra
vertexcom
via
wangxun net: libwx: fix to enable RSS 2025-09-04 13:34:01 -07:00
wiznet
xilinx net: xilinx: axienet: Add error handling for RX metadata pointer retrieval 2025-09-04 07:13:08 -07:00
xircom xirc2ps_cs: fix register access when enabling FullDuplex 2025-08-29 19:05:11 -07:00
xscale
Kconfig
Makefile
dnet.c
dnet.h
ec_bhf.c
ethoc.c
fealnx.c
jme.c
jme.h
korina.c
lantiq_etop.c
lantiq_xrx200.c
oa_tc6.c net: ethernet: oa_tc6: Handle failure of spi_setup 2025-08-29 19:42:07 -07:00