Commit Graph

82135 Commits

Author SHA1 Message Date
Linus Torvalds d303caf5ca bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjykZUACgkQ6rmadz2v
 bTppcA/+LKOzgJVJd9q3tbPeXb+xJOqiCGPDuv3tfmglHxD+rR5n2lsoTkryL4PZ
 k5yJhA95Z5RVeV3cevl8n4HlTJipm795k+fKz1YRbe9gB49w2SqqDf5s7LYuABOm
 YqdUSCWaLyoBNd+qi9SuyOVXSg0mSbJk0bEsXsgTp/5rUt9v6cz+36BE1Pkcp3Aa
 d6y2I2MBRGCADtEmsXh7+DI0aUvgMi2zlX8veQdfAJZjsQFwbr1ZxSxHYqtS6gux
 wueEiihzipakhONACJskQbRv80NwT5/VmrAI/ZRVzIhsywQhDGdXtQVNs6700/oq
 QmIZtgAL17Y0SAyhzQsQuGhJGKdWKhf3hzDKEDPslmyJ6OCpMAs/ttPHTJ6s/MmG
 6arSwZD/GAIoWhvWYP/zxTdmjqKX13uradvaNTv55hOhCLTTnZQKRSLk1NabHl7e
 V0f7NlZaVPnLerW/90+pn5pZFSrhk0Nno9to+yXaQ9TYJlK4e6dcB9y+rButQNrr
 7bTRyQ55fQlyS+NnaD85wL41IIX4WmJ3ATdrKZIMvGMJaZxjzXRvW4AjGHJ6hbbt
 GATdtISkYqZ4AdlSf2Vj9BysZkf7tS83SyRlM6WDm3IaAS3v5E/e1Ky2Kn6gWu70
 MNcSW/O0DSqXRkcDkY/tSOlYJBZJYo6ZuWhNdQAERA3OxldBSFM=
 =p8bI
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
   imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)

 - Make selftests/bpf arg_parsing.c more robust to errors (Andrii
   Nakryiko)

 - Fix redefinition of 'off' as different kind of symbol when I40E
   driver is builtin (Brahmajit Das)

 - Do not disable preemption in bpf_test_run (Sahil Chandna)

 - Fix memory leak in __lookup_instance error path (Shardul Bankar)

 - Ensure test data is flushed to disk before reading it (Xing Guo)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix redefinition of 'off' as different kind of symbol
  bpf: Do not disable preemption in bpf_test_run().
  bpf: Fix memory leak in __lookup_instance error path
  selftests: arg_parsing: Ensure data is flushed to disk before reading.
  bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
  selftests/bpf: make arg_parsing.c more robust to crashes
  bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
2025-10-18 08:00:43 -10:00
Sahil Chandna 7c33e97a6e bpf: Do not disable preemption in bpf_test_run().
The timer mode is initialized to NO_PREEMPT mode by default,
this disables preemption and force execution in atomic context
causing issue on PREEMPT_RT configurations when invoking
spin_lock_bh(), leading to the following warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6107, name: syz.0.17
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
Preemption disabled at:
[<ffffffff891fce58>] bpf_test_timer_enter+0xf8/0x140 net/bpf/test_run.c:42

Fix this, by removing NO_PREEMPT/NO_MIGRATE mode check.
Also, the test timer context no longer needs explicit calls to
migrate_disable()/migrate_enable() with rcu_read_lock()/rcu_read_unlock().
Use helpers rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate()
instead.

Reported-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1f1fbecb9413cdbfbef8
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Suggested-by: Menglong Dong <menglong.dong@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com
Co-developed-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Sahil Chandna <chandna.sahil@gmail.com>
Link: https://lore.kernel.org/r/20251014185635.10300-1-chandna.sahil@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-17 11:29:35 -07:00
Linus Torvalds 634ec1fc79 Including fixes from CAN
Current release - regressions:
 
   - udp: do not use skb_release_head_state() before skb_attempt_defer_free()
 
   - gro_cells: use nested-BH locking for gro_cell
 
   - dpll: zl3073x: increase maximum size of flash utility
 
 Previous releases - regressions:
 
   - core: fix lockdep splat on device unregister
 
   - tcp: fix tcp_tso_should_defer() vs large RTT
 
   - tls:
     - don't rely on tx_work during send()
     - wait for pending async decryptions if tls_strp_msg_hold fails
 
   - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
 
   - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom
 
 Previous releases - always broken:
 
   - ip6_tunnel: prevent perpetual tunnel growth
 
   - dpll: zl3073x: handle missing or corrupted flash configuration
 
   - can: m_can: fix pm_runtime and CAN state handling
 
   - eth:ixgbe: fix too early devlink_free() in ixgbe_remove()
 
   - eth: ixgbevf: fix mailbox API compatibility
 
   - eth: gve: Check valid ts bit on RX descriptor before hw timestamping
 
   - eth: idpf: cleanup remaining SKBs in PTP flows
 
   - eth: r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjxBUASHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkTfwP/0I5aKZHbZ9I3gHL6Dw9TtcYPsP8cZDK
 Iu7nOtgyD8AKSnX+3ynABg4K3WbE6HFi1BQzv63uO9G2UYLZSMdU16uytUKbstR6
 HWw+r2N0NMb+qacb++nbMugmYczSWUYKRyrpWHrNtqbKi3aQRJt5lQ81AGwgMITX
 mx8VILjMTXIt/K055LVnF9z2c4ltFmLTGiODKqHjQQirNbBiRJFNYn4qtw3BkbdS
 OwxNtTqOozQppRf+1l4MRHHXNbwGe7W9hOnALbiHpwtXxvG/+NVRu8IkNcD8vMD9
 I7aThpkOt7B9xcWqLb2pAAk976MuKQr6BUGKQrP26vZb9W0NIJbew1Y8d74Zld5d
 Z9bD5Uqr85Ge4Xp3/mWnaQZWk2OTsVdOOxkUoPWilFByULGfJORMvu6+y6/RWpZt
 dezE7WL9o+K1AJLBWnZ3RTWI8vmSEDup4WzPSP1v7paAFcvb9Ub4pBghX9+RGCfE
 ZmCecQPZfVzTe08PvAysc9VR7o3jGx5FpCLhinLxaHaSrV4YvALyV8iYagrmtQWY
 iWVkMYEIxh5B6KLmIVVu5cQJJjuRLm6k+nCTemBJaAItu2YLdy4ZeLeFWUhVgmxM
 SIsueBmTGorRqCbSed7L17yMApsMaBzE775+qOFoK7tLoKVTm22AMA3DzQtKGeIv
 OFlNreoOWzMX
 =jVlx
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from CAN

  Current release - regressions:

    - udp: do not use skb_release_head_state() before
      skb_attempt_defer_free()

    - gro_cells: use nested-BH locking for gro_cell

    - dpll: zl3073x: increase maximum size of flash utility

  Previous releases - regressions:

    - core: fix lockdep splat on device unregister

    - tcp: fix tcp_tso_should_defer() vs large RTT

    - tls:
        - don't rely on tx_work during send()
        - wait for pending async decryptions if tls_strp_msg_hold fails

    - can: j1939: add missing calls in NETDEV_UNREGISTER notification
      handler

    - eth: lan78xx: fix lost EEPROM write timeout in
      lan78xx_write_raw_eeprom

  Previous releases - always broken:

    - ip6_tunnel: prevent perpetual tunnel growth

    - dpll: zl3073x: handle missing or corrupted flash configuration

    - can: m_can: fix pm_runtime and CAN state handling

    - eth:
        - ixgbe: fix too early devlink_free() in ixgbe_remove()
        - ixgbevf: fix mailbox API compatibility
        - gve: Check valid ts bit on RX descriptor before hw timestamping
        - idpf: cleanup remaining SKBs in PTP flows
        - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"

* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  udp: do not use skb_release_head_state() before skb_attempt_defer_free()
  net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
  netdevsim: set the carrier when the device goes up
  selftests: tls: add test for short splice due to full skmsg
  selftests: net: tls: add tests for cmsg vs MSG_MORE
  tls: don't rely on tx_work during send()
  tls: wait for pending async decryptions if tls_strp_msg_hold fails
  tls: always set record_type in tls_process_cmsg
  tls: wait for async encrypt in case of error during latter iterations of sendmsg
  tls: trim encrypted message to match the plaintext on short splice
  tg3: prevent use of uninitialized remote_adv and local_adv variables
  MAINTAINERS: new entry for IPv6 IOAM
  gve: Check valid ts bit on RX descriptor before hw timestamping
  net: core: fix lockdep splat on device unregister
  MAINTAINERS: add myself as maintainer for b53
  selftests: net: check jq command is supported
  net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
  tcp: fix tcp_tso_should_defer() vs large RTT
  r8152: add error handling in rtl8152_driver_init
  usbnet: Fix using smp_processor_id() in preemptible code warnings
  ...
2025-10-16 09:41:21 -07:00
Eric Dumazet 6de1dec1c1 udp: do not use skb_release_head_state() before skb_attempt_defer_free()
Michal reported and bisected an issue after recent adoption
of skb_attempt_defer_free() in UDP.

The issue here is that skb_release_head_state() is called twice per skb,
one time from skb_consume_udp(), then a second time from skb_defer_free_flush()
and napi_consume_skb().

As Sabrina suggested, remove skb_release_head_state() call from
skb_consume_udp().

Add a DEBUG_NET_WARN_ON_ONCE(skb_nfct(skb)) in skb_attempt_defer_free()

Many thanks to Michal, Sabrina, Paolo and Florian for their help.

Fixes: 6471658dc6 ("udp: use skb_attempt_defer_free()")
Reported-and-bisected-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/gpjh4lrotyephiqpuldtxxizrsg6job7cvhiqrw72saz2ubs3h@g6fgbvexgl3r/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251015052715.4140493-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-16 16:03:07 +02:00
Jakub Kicinski 5e655aadda linux-can-fixes-for-6.18-20251014
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEn/sM2K9nqF/8FWzzDHRl3/mQkZwFAmjuAZATHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAMdGXf+ZCRnPeECACtwzFozRla2Y+WTR7+BZiBlRtcWZ8d
 aNHrvtjefaX4TkgEVgC9Qt+VI7Wzv2TDlVWIWZnm3lotufmQsbRdDdEyfeRHPR7m
 9yZcqRvLGQ17LDnC13W66YaZXhhz263rvCTwzcLyuB7tnO+zyYfakTZfR+xtfJ3x
 LBT6yNVlujy+/I4NCRNwlLzJc5fdGTKaSbt8ECf7qygcbQfZ/AcLUV9/AIjweVP1
 Tcp2qumrM+HIU0lPQzfyiEGJn/weLRfajVbzcsv7NNc9vtlP2Ayi0LnLoXl98Pfk
 00nJVXmV6rAJCm/LqDKoWvP98jEL4B41+06RRwsm5sJQ7caRS+knuVVn
 =Wml2
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2025-10-14

The first 2 paches are by Celeste Liu and target the gS_usb driver.
The first patch remove the limitation to 3 CAN interface per USB
device. The second patch adds the missing population of
net_device->dev_port.

The next 4 patches are by me and fix the m_can driver. They add a
missing pm_runtime_disable(), fix the CAN state transition back to
Error Active and fix the state after ifup and suspend/resume.

Another patch by me targets the m_can driver, too and replaces Dong
Aisheng's old email address.

The next 2 patches are by Vincent Mailhol and update the CAN
networking Documentation.

Tetsuo Handa contributes the last patch that add missing cleanup calls
in the NETDEV_UNREGISTER notification handler.

* tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
  can: add Transmitter Delay Compensation (TDC) documentation
  can: remove false statement about 1:1 mapping between DLC and length
  can: m_can: replace Dong Aisheng's old email address
  can: m_can: fix CAN state in system PM
  can: m_can: m_can_chip_config(): bring up interface in correct state
  can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active
  can: m_can: m_can_plat_remove(): add missing pm_runtime_disable()
  can: gs_usb: gs_make_candev(): populate net_device->dev_port
  can: gs_usb: increase max interface to U8_MAX
====================

Link: https://patch.msgid.link/20251014122140.990472-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:56:20 -07:00
Sabrina Dubroca 7f846c65ca tls: don't rely on tx_work during send()
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.

Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/8396631478f70454b44afb98352237d33f48d34d.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Sabrina Dubroca b8a6ff84ab tls: wait for pending async decryptions if tls_strp_msg_hold fails
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).

In this case, wait for all pending decryption requests.

Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Sabrina Dubroca b6fe4c29bb tls: always set record_type in tls_process_cmsg
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).

Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/0457252e578a10a94e40c72ba6288b3a64f31662.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Sabrina Dubroca b014a4e066 tls: wait for async encrypt in case of error during latter iterations of sendmsg
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.

This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/c793efe9673b87f808d84fdefc0f732217030c52.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Sabrina Dubroca ce5af41e32 tls: trim encrypted message to match the plaintext on short splice
During tls_sw_sendmsg_locked, we pre-allocate the encrypted message
for the size we're expecting to send during the current iteration, but
we may end up sending less, for example when splicing: if we're
getting the data from small fragments of memory, we may fill up all
the slots in the skmsg with less data than expected.

In this case, we need to trim the encrypted message to only the length
we actually need, to avoid pushing uninitialized bytes down the
underlying TCP socket.

Fixes: fe1e81d4f7 ("tls/sw: Support MSG_SPLICE_PAGES")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/66a0ae99c9efc15f88e9e56c1f58f902f442ce86.1760432043.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15 17:41:45 -07:00
Florian Westphal 7f0fddd817 net: core: fix lockdep splat on device unregister
Since blamed commit, unregister_netdevice_many_notify() takes the netdev
mutex if the device needs it.

If the device list is too long, this will lock more device mutexes than
lockdep can handle:

unshare -n \
 bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done'

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by kworker/u16:1/69:
 #0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work
 #1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work
 #2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net
 #3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch
 #4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify
[..]

Add a helper to close and then unlock a list of net_devices.
Devices that are not up have to be skipped - netif_close_many always
removes them from the list without any other actions taken, so they'd
remain in locked state.

Close devices whenever we've used up half of the tracking slots or we
processed entire list without hitting the limit.

Fixes: 7e4d784f58 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20251013185052.14021-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-14 19:27:20 -07:00
Shardul Bankar 7f9ee5fc97 bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
Fix a memory leak in bpf_prog_test_run_xdp() where the context buffer
allocated by bpf_ctx_init() is not freed when the function returns early
due to a data size check.

On the failing path:
  ctx = bpf_ctx_init(...);
  if (kattr->test.data_size_in - meta_sz < ETH_HLEN)
      return -EINVAL;

The early return bypasses the cleanup label that kfree()s ctx, leading to a
leak detectable by kmemleak under fuzzing. Change the return to jump to the
existing free_ctx label.

Fixes: fe9544ed1a ("bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN")
Reported-by: BPF Runtime Fuzzer (BRF)
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20251014120037.1981316-1-shardulsb08@gmail.com
2025-10-14 12:07:30 -07:00
Eric Dumazet 295ce1eb36 tcp: fix tcp_tso_should_defer() vs large RTT
Neal reported that using neper tcp_stream with TCP_TX_DELAY
set to 50ms would often lead to flows stuck in a small cwnd mode,
regardless of the congestion control.

While tcp_stream sets TCP_TX_DELAY too late after the connect(),
it highlighted two kernel bugs.

The following heuristic in tcp_tso_should_defer() seems wrong
for large RTT:

delta = tp->tcp_clock_cache - head->tstamp;
/* If next ACK is likely to come too late (half srtt), do not defer */
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
      goto send_now;

If next ACK is expected to come in more than 1 ms, we should
not defer because we prefer a smooth ACK clocking.

While blamed commit was a step in the good direction, it was not
generic enough.

Another patch fixing TCP_TX_DELAY for established flows
will be proposed when net-next reopens.

Fixes: 50c8339e92 ("tcp: tso: restore IW10 after TSO autosizing")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251011115742.1245771-1-edumazet@google.com
[pabeni@redhat.com: fixed whitespace issue]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-14 12:21:48 +02:00
Dmitry Safonov 21f4d45eba net/ip6_tunnel: Prevent perpetual tunnel growth
Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too.
While ipv4 tunnel headroom adjustment growth was limited in
commit 5ae1e9922b ("net: ip_tunnel: prevent perpetual headroom growth"),
ipv6 tunnel yet increases the headroom without any ceiling.

Reflect ipv4 tunnel headroom adjustment limit on ipv6 version.

Credits to Francesco Ruggeri, who was originally debugging this issue
and wrote local Arista-specific patch and a reproducer.

Fixes: 8eb30be035 ("ipv6: Create ip6_tnl_xmit")
Cc: Florian Westphal <fw@strlen.de>
Cc: Francesco Ruggeri <fruggeri05@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20251009-ip6_tunnel-headroom-v2-1-8e4dbd8f7e35@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-13 17:43:46 -07:00
Sebastian Andrzej Siewior 25718fdcbd net: gro_cells: Use nested-BH locking for gro_cell
The gro_cell data structure is per-CPU variable and relies on disabled
BH for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.

Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.

Reported-by: syzbot+8715dd783e9b0bef43b1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68c6c3b1.050a0220.2ff435.0382.GAE@google.com/
Fixes: 3253cb49cb ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251009094338.j1jyKfjR@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-13 17:33:32 -07:00
Tetsuo Handa 93a27b5891 can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
Currently NETDEV_UNREGISTER event handler is not calling
j1939_cancel_active_session() and j1939_sk_queue_drop_all().
This will result in these calls being skipped when j1939_sk_release() is
called. And I guess that the reason syzbot is still reporting

  unregister_netdevice: waiting for vcan0 to become free. Usage count = 2

is caused by lack of these calls.

Calling j1939_cancel_active_session(priv, sk) from j1939_sk_release() can
be covered by calling j1939_cancel_active_session(priv, NULL) from
j1939_netdev_notify().

Calling j1939_sk_queue_drop_all() from j1939_sk_release() can be covered
by calling j1939_sk_netdev_event_netdown() from j1939_netdev_notify().

Therefore, we can reuse j1939_cancel_active_session(priv, NULL) and
j1939_sk_netdev_event_netdown(priv) for NETDEV_UNREGISTER event handler.

Fixes: 7fcbe5b2c6 ("can: j1939: implement NETDEV_UNREGISTER notification handler")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/3ad3c7f8-5a74-4b07-a193-cb0725823558@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-10-13 21:26:31 +02:00
Linus Torvalds fbde105f13 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
 bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
 /hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
 DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
 +Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
 vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
 o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
 EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
 3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
 nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
 aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
 Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
 =77JH
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Finish constification of 1st parameter of bpf_d_path() (Rong Tao)

 - Harden userspace-supplied xdp_desc validation (Alexander Lobakin)

 - Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
   Borkmann)

 - Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)

 - Use correct context to unpin bpf hash map with special types (KaFai
   Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add test for unpinning htab with internal timer struct
  bpf: Avoid RCU context warning when unpinning htab with internal structs
  xsk: Harden userspace-supplied xdp_desc validation
  bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
  libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
  bpf: Finish constification of 1st parameter of bpf_d_path()
2025-10-11 10:31:38 -07:00
Linus Torvalds 8bd9238e51 Some messenger improvements from Eric and Max, a patch to address the
issue (also affected userspace) of incorrect permissions being granted
 to users who have access to multiple different CephFS instances within
 the same cluster from Kotresh and a bunch of assorted CephFS fixes from
 Slava.
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmjpShsTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzizoaB/C7qTw5Olh8NDpX8I+ljEO50XNduurf
 fp2eNn0ub5brhcvh8iACSPKE2oer/bDvv3b2SN9310GmBX3f7H2Ht5TeH2tGBRN0
 clg+C2DmY/2watHovo+ua7YAd+HiPH2XMbpeU38Pu1nEdmiU6cQ0YaOn8n2p+c1E
 bID0dMHWb4HTmFRURqWqKPDkM1fLHRxIVgyOMaov5vs0T7XdglwPja3S2W6epvqF
 hKSMSvO/j9qYlOsBM6G6IuHDMJomzBqOQKqsQqC4XZN6uXeaKPTLYRnzxKfJUEWj
 P5JTaum7NGGtfIs0L9wr6zpou/GY2zTFiyXhLZsLJMn894bBO5nArg==
 =wGIB
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:

 - some messenger improvements (Eric and Max)

 - address an issue (also affected userspace) of incorrect permissions
   being granted to users who have access to multiple different CephFS
   instances within the same cluster (Kotresh)

 - a bunch of assorted CephFS fixes (Slava)

* tag 'ceph-for-6.18-rc1' of https://github.com/ceph/ceph-client:
  ceph: add bug tracking system info to MAINTAINERS
  ceph: fix multifs mds auth caps issue
  ceph: cleanup in ceph_alloc_readdir_reply_buffer()
  ceph: fix potential NULL dereference issue in ceph_fill_trace()
  libceph: add empty check to ceph_con_get_out_msg()
  libceph: pass the message pointer instead of loading con->out_msg
  libceph: make ceph_con_get_out_msg() return the message pointer
  ceph: fix potential race condition on operations with CEPH_I_ODIRECT flag
  ceph: refactor wake_up_bit() pattern of calling
  ceph: fix potential race condition in ceph_ioctl_lazyio()
  ceph: fix overflowed constant issue in ceph_do_objects_copy()
  ceph: fix wrong sizeof argument issue in register_session()
  ceph: add checking of wait_for_completion_killable() return value
  ceph: make ceph_start_io_*() killable
  libceph: Use HMAC-SHA256 library instead of crypto_shash
2025-10-10 11:30:19 -07:00
Alexander Lobakin 07ca98f906 xsk: Harden userspace-supplied xdp_desc validation
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.

desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.

Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).

bloat-o-meter reports a little growth of the code size:

add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function                                     old     new   delta
xskq_cons_peek_desc                          299     330     +31
xsk_tx_peek_release_desc_batch               973    1002     +29
xsk_generic_xmit                            3148    3132     -16

but hopefully this doesn't hurt the performance much.

Fixes: 341ac980ea ("xsk: Support tx_metadata_len")
Cc: stable@vger.kernel.org # 6.8+
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:07:48 -07:00
Linus Torvalds 80b7065ec1 Bunch of unrelated fixes
- polling fix for trans fd that ought to have been fixed otherwise back
 in March, but apparently came back somewhere else...
 - USB transport buffer overflow fix
 - Some dentry lifetime rework to handle metadata update for currently
 opened files in uncached mode, or inode type change in cached mode
 - a double-put on invalid flush found by syzbot
 - and finally /sys/fs/9p/caches not advancing buffer and overwriting
 itself for large contents
 
 Thanks to everyone involved!
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmjns5cACgkQq06b7GqY
 5nDGLg/+Ltsiwj51CDwyJxFPX4ki1jM3j2iA8BNpdjjzCev6Jp1vuWBPiV+sTtgM
 zKmIw9joyvzaIXKzA3h7tMe7h7y8H4p7KZyD3YSDSNW5XIb5UYXeQwYVw39OxENk
 nVQkQ41H+bPepmz3R0t/F4la9myVLBtfdcPWJN+JfkOLgdeemCGf+TShJ6F+9Qaq
 PTd3UKfdv8JRlKQmeLH7pPlBoWYRoC2Tq3pZbyC3D1A50bWkTfxusW0k3MUzRyHO
 t/jNbhHt0ai+densQ6O5M+ALMI7KIIzR+obVaBHfzSUcwFeGhmW2x84Mo0qX6YMF
 0B/fY3mCXi8QO6my1hfwmbRtY5mEV967wM/uF7E/SuvyNVizOMBZn8FgWfVPVszr
 ix1iNBgQJdoOEMbHvDfCAtGNorFrMiybyLvoabL7YCwFo68LdYqj2br3/feajoYM
 5g46nUcidH4fVBLtiBGE8w0ko8aa3zLB4iu/fdATuEZ1r+gPt40SWo1mKahoCgY3
 BOtYxEZTnd4l4GPJDQIaYgVckgSiex0xcE9pRJ5LtBwdyw9g4jwfb39WHx4MT2sE
 u5xWDbadU/ZM6ppLJ/iBkotvU9D7ohKTviN1IfCwjqL7IMBNz5c+nBwLQZWK4YTR
 2ySKpv1VOMROglt1KvnHfdpmdJ+CjAJLLNE+XSz9AHXinK5v8aM=
 =bUN1
 -----END PGP SIGNATURE-----

Merge tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
 "A bunch of unrelated fixes:

   - polling fix for trans fd that ought to have been fixed otherwise
     back in March, but apparently came back somewhere else...

   - USB transport buffer overflow fix

   - Some dentry lifetime rework to handle metadata update for currently
     opened files in uncached mode, or inode type change in cached mode

   - a double-put on invalid flush found by syzbot

   - and finally /sys/fs/9p/caches not advancing buffer and overwriting
     itself for large contents

  Thanks to everyone involved!"

* tag '9p-for-6.18-rc1' of https://github.com/martinetd/linux:
  9p: sysfs_init: don't hardcode error to ENOMEM
  9p: fix /sys/fs/9p/caches overwriting itself
  9p: clean up comment typos
  9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
  net/9p: fix double req put in p9_fd_cancelled
  net/9p: Fix buffer overflow in USB transport layer
  fs/9p: Add p9_debug(VFS) in d_revalidate
  fs/9p: Invalidate dentry if inode type change detected in cached mode
  fs/9p: Refresh metadata in d_revalidate for uncached mode too
2025-10-09 11:56:59 -07:00
Linus Torvalds 18a7e218cf Including fixes from netfilter.
Current release - regressions:
 
   - mlx5: fix pre-2.40 binutils assembler error
 
 Current release - new code bugs:
 
   - net: psp: don't assume reply skbs will have a socket
 
   - eth: fbnic: fix missing programming of the default descriptor
 
 Previous releases - regressions:
 
   - page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
 
   - tcp:
     - take care of zero tp->window_clamp in tcp_set_rcvlowat()
     - don't call reqsk_fastopen_remove() in tcp_conn_request().
 
   - eth: ice: release xa entry on adapter allocation failure
 
   - eth: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
 
 Previous releases - always broken:
 
   - netfilter: validate objref and objrefmap expressions
 
   - sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
 
   - eth: mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
 
   - eth: mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
 
   - eth: ocelot: fix use-after-free caused by cyclic delayed work
 
 Misc:
 
   -  add support for MediaTek PCIe 5G HP DRMR-H01
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjnlD0SHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkQnYP/iQAwtLSyE2lqdP+OsfnwhdpeeMpy2B/
 rfN7hk4OS1Rs1WBEf+Y45Lm5yC2poRleEf3QrixcqAHrtr4SED1JgZynjzjtYJ6c
 mmCdzZPZMfBEVG7rcOusSMI+ebWCA1yvSzjqw6CctwzW6Sac4S3KRyYzUNuTJ+De
 3APf6NptD324veJuyP01opafaExBpu21DfaEXeu4kP030S74J5TMfjt7dNvVpYDt
 qwb7s6syJ9O+U3INmi6SVxJY9711Cj8tjUXzz94FWc/kC28D3UnIKUO1tL/qNthe
 do0OhNa1YlsmoGJuRthxQa9ubyScnP6hKcy8VlanvuDya0JVyzdod/RDO6A2L9bp
 bdJIM2wnLd4Sqe7QcGPBFj0KNahgqG1lzj0uHKhrWBetp+5wWySuk9Mohg8yZaGP
 w53X0+WhAeferpcdMPFbwBL9pcgmHo2FxDMaHKcIrIAFb+TIzSZA6CXWB2RFRSgH
 smgQ9EYUbfUtP6nZCs1lfOux7CdN94tuKA9jnUXFrlIda44akxVe/wVi9bRazrVx
 TcpF0BEeQTyM4wCoVsk/JzTNeK2VZtj8i1B7TrZvR68UkwGrXm8DB4xG42LxdRaw
 PEtu8mbUGiWtkojmUAVpYc/TfMdQOVyBUijd2uAe1pXV4yPDz5cWNkEPX8vvAiRu
 rIe8v1VAz+gg
 =LlXo
 -----END PGP SIGNATURE-----

Merge tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull  networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mlx5: fix pre-2.40 binutils assembler error

  Current release - new code bugs:

   - net: psp: don't assume reply skbs will have a socket

   - eth: fbnic: fix missing programming of the default descriptor

  Previous releases - regressions:

   - page_pool: fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches

   - tcp:
       - take care of zero tp->window_clamp in tcp_set_rcvlowat()
       - don't call reqsk_fastopen_remove() in tcp_conn_request()

   - eth:
       - ice: release xa entry on adapter allocation failure
       - usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock

  Previous releases - always broken:

   - netfilter: validate objref and objrefmap expressions

   - sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()

   - eth:
       - mlx4: prevent potential use after free in mlx4_en_do_uc_filter()
       - mlx5: prevent tunnel mode conflicts between FDB and NIC IPsec tables
       - ocelot: fix use-after-free caused by cyclic delayed work

  Misc:

   -  add support for MediaTek PCIe 5G HP DRMR-H01"

* tag 'net-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  net: airoha: Fix loopback mode configuration for GDM2 port
  selftests: drv-net: pp_alloc_fail: add necessary optoins to config
  selftests: drv-net: pp_alloc_fail: lower traffic expectations
  selftests: drv-net: fix linter warnings in pp_alloc_fail
  eth: fbnic: fix reporting of alloc_failed qstats
  selftests: drv-net: xdp: add test for interface level qstats
  selftests: drv-net: xdp: rename netnl to ethnl
  eth: fbnic: fix saving stats from XDP_TX rings on close
  eth: fbnic: fix accounting of XDP packets
  eth: fbnic: fix missing programming of the default descriptor
  selftests: netfilter: query conntrack state to check for port clash resolution
  selftests: netfilter: nft_fib.sh: fix spurious test failures
  bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
  netfilter: nft_objref: validate objref and objrefmap expressions
  net: pse-pd: tps23881: Fix current measurement scaling
  net/mlx5: fix pre-2.40 binutils assembler error
  net/mlx5e: Do not fail PSP init on missing caps
  net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed
  net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tables
  net: usb: asix: hold PM usage ref to avoid PM/MDIO + RTNL deadlock
  ...
2025-10-09 11:13:08 -07:00
Max Kellermann 6140f1d43b libceph: add empty check to ceph_con_get_out_msg()
This moves the list_empty() checks from the two callers (v1 and v2)
into the base messenger.c library.  Now the v1/v2 specializations do
not need to know about con->out_queue; that implementation detail is
now hidden behind the ceph_con_get_out_msg() function.

[ idryomov: instead of changing prepare_write_message() to return
  a bool, move ceph_con_get_out_msg() call out to arrive to the same
  pattern as in messenger_v2.c ]

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:47 +02:00
Max Kellermann 7399212dcf libceph: pass the message pointer instead of loading con->out_msg
This pointer is in a register anyway, so let's use that instead of
reloading from memory everywhere.

[ idryomov: formatting ]

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:46 +02:00
Max Kellermann 59699a5a71 libceph: make ceph_con_get_out_msg() return the message pointer
The caller in messenger_v1.c loads it anyway, so let's keep the
pointer in the register instead of reloading it from memory.  This
eliminates a tiny bit of unnecessary overhead.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:46 +02:00
Eric Biggers 27c0a7b05d libceph: Use HMAC-SHA256 library instead of crypto_shash
Use the HMAC-SHA256 library functions instead of crypto_shash.  This is
simpler and faster.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-10-08 23:30:45 +02:00
Eric Woudstra bbf0c98b3a bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
net/bridge/br_private.h:1627 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
7 locks held by socat/410:
 #0: ffff88800d7a9c90 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_stream_connect+0x43/0xa0
 #1: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x62/0x1830
 [..]
 #6: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: nf_hook.constprop.0+0x8a/0x440

Call Trace:
 lockdep_rcu_suspicious.cold+0x4f/0xb1
 br_vlan_fill_forward_path_pvid+0x32c/0x410 [bridge]
 br_fill_forward_path+0x7a/0x4d0 [bridge]

Use to correct helper, non _rcu variant requires RTNL mutex.

Fixes: bcf2766b13 ("net: bridge: resolve forwarding path for VLAN tag actions in bridge devices")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-10-08 13:17:31 +02:00
Fernando Fernandez Mancera f359b809d5 netfilter: nft_objref: validate objref and objrefmap expressions
Referencing a synproxy stateful object from OUTPUT hook causes kernel
crash due to infinite recursive calls:

BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
[...]
Call Trace:
 __find_rr_leaf+0x99/0x230
 fib6_table_lookup+0x13b/0x2d0
 ip6_pol_route+0xa4/0x400
 fib6_rule_lookup+0x156/0x240
 ip6_route_output_flags+0xc6/0x150
 __nf_ip6_route+0x23/0x50
 synproxy_send_tcp_ipv6+0x106/0x200
 synproxy_send_client_synack_ipv6+0x1aa/0x1f0
 nft_synproxy_do_eval+0x263/0x310
 nft_do_chain+0x5a8/0x5f0 [nf_tables
 nft_do_chain_inet+0x98/0x110
 nf_hook_slow+0x43/0xc0
 __ip6_local_out+0xf0/0x170
 ip6_local_out+0x17/0x70
 synproxy_send_tcp_ipv6+0x1a2/0x200
 synproxy_send_client_synack_ipv6+0x1aa/0x1f0
[...]

Implement objref and objrefmap expression validate functions.

Currently, only NFT_OBJECT_SYNPROXY object type requires validation.
This will also handle a jump to a chain using a synproxy object from the
OUTPUT hook.

Now when trying to reference a synproxy object in the OUTPUT hook, nft
will produce the following error:

synproxy_crash.nft: Error: Could not process rule: Operation not supported
  synproxy name mysynproxy
  ^^^^^^^^^^^^^^^^^^^^^^^^

Fixes: ee394f96ad ("netfilter: nft_synproxy: add synproxy stateful object support")
Reported-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com>
Closes: https://bugzilla.suse.com/1250237
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2025-10-08 13:17:25 +02:00
Linus Torvalds 2215336295 hyperv-next for v6.18
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp
 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61
 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW
 aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od
 UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W
 ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ
 =OR26
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Unify guest entry code for KVM and MSHV (Sean Christopherson)

 - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
   (Nam Cao)

 - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
   (Mukesh Rathor)

 - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)

 - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
   Kumar T S M)

 - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
   Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)

* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: Remove the spurious null directive line
  MAINTAINERS: Mark hyperv_fb driver Obsolete
  fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
  Drivers: hv: Make CONFIG_HYPERV bool
  Drivers: hv: Add CONFIG_HYPERV_VMBUS option
  Drivers: hv: vmbus: Fix typos in vmbus_drv.c
  Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
  Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
  x86/hyperv: Switch to msi_create_parent_irq_domain()
  mshv: Use common "entry virt" APIs to do work in root before running guest
  entry: Rename "kvm" entry code assets to "virt" to genericize APIs
  entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
  mshv: Handle NEED_RESCHED_LAZY before transferring to guest
  x86/hyperv: Add kexec/kdump support on Azure CVMs
  Drivers: hv: Simplify data structures for VMBus channel close message
  Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
  mshv: Add support for a new parent partition configuration
  clocksource: hyper-v: Skip unnecessary checks for the root partition
  hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07 08:40:15 -07:00
Daniel Borkmann 23f3770e1a bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
Cilium has a BPF egress gateway feature which forces outgoing K8s Pod
traffic to pass through dedicated egress gateways which then SNAT the
traffic in order to interact with stable IPs outside the cluster.

The traffic is directed to the gateway via vxlan tunnel in collect md
mode. A recent BPF change utilized the bpf_redirect_neigh() helper to
forward packets after the arrival and decap on vxlan, which turned out
over time that the kmalloc-256 slab usage in kernel was ever-increasing.

The issue was that vxlan allocates the metadata_dst object and attaches
it through a fake dst entry to the skb. The latter was never released
though given bpf_redirect_neigh() was merely setting the new dst entry
via skb_dst_set() without dropping an existing one first.

Fixes: b4ab314149 ("bpf: Add redirect_neigh helper as redirect drop-in")
Reported-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Reported-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jordan Rife <jrife@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jordan Rife <jrife@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20251003073418.291171-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-06 21:20:10 -07:00
Linus Torvalds 81538c8e42 NFSD 6.18 Release Notes
Mike Snitzer has prototyped a mechanism for disabling I/O caching in
 NFSD. This is introduced in v6.18 as an experimental feature. This
 enables scaling NFSD in /both/ directions:
 
 - NFS service can be supported on systems with small memory
   footprints, such as low-cost cloud instances
 - Large NFS workloads will be less likely to force the eviction of
   server-local activity, helping it avoid thrashing
 
 Jeff Layton contributed a number of fixes to the new attribute
 delegation implementation (based on a pending Internet RFC) that we
 hope will make attribute delegation reliable enough to enable by
 default, as it is on the Linux NFS client.
 
 The remaining patches in this pull request are clean-ups and minor
 optimizations. Many thanks to the contributors, reviewers, testers,
 and bug reporters who participated during the v6.18 NFSD development
 cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmjjxZMACgkQM2qzM29m
 f5dhcA//f6ha4lQ2qq3HfJCa4jLe55LEEYiPPgsBLyuAJtQcEQEVJtb1+0qmdeoE
 4ROW/ooE6TmgRX3hJxpUqMeJPj99lC/Y29gsDfWvl9+EM1Dii4JborEBXFwKXn5b
 KQhM/UK0H8nsZSDys5nr45eai0LQhjL6kbG2k2OoRxDjj1VpOdqVMYWKBVFDwYLc
 LDqv/zh4mB30faMATfLkLpGQb2J9sWd6+Ks45m0H6+KkEEjINlHB5WTX601Omyw5
 7YgIiR9/OPLiEcY0rK1iDjiY5g3sFza/W+hqzBADi9i5BDdadYUhUvQWgR+Y4SDD
 04UltS5eVRtLM74S3JErWqQ4yuS5/aiUcinNeHV6wyfunuipusG2Hc5ZehcJAH78
 Pt8D9KcBCsCMv2sBJl3vH7++rpBeiP9WHONrydK25Omk5as7eOGFjxE++KQ69BOB
 +AtHIKEtw1VqX8IpLaG1EL3Ux22z4/PRw2sVx1Oql3mgOCXIsBTr7KPVhjwfBoQO
 qlGpnj+xGV1b082BBMfTp1TLtRdD2PKOMaUQeF8XhyEXw92iGQfi0fCrAsbpFKfn
 D3+BK3mR+tOcaEhpHhqU/XwiYfSTU0NYrimmjEvGbFhsT0Tz+RSoNQUBth2tJaqZ
 hMEBmVujtu5LWnD0+dUNuylL10I4XDge+NVEmJa9pg2nFmjcojw=
 =/nLS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Mike Snitzer has prototyped a mechanism for disabling I/O caching in
  NFSD. This is introduced in v6.18 as an experimental feature. This
  enables scaling NFSD in /both/ directions:

   - NFS service can be supported on systems with small memory
     footprints, such as low-cost cloud instances

   - Large NFS workloads will be less likely to force the eviction of
     server-local activity, helping it avoid thrashing

  Jeff Layton contributed a number of fixes to the new attribute
  delegation implementation (based on a pending Internet RFC) that we
  hope will make attribute delegation reliable enough to enable by
  default, as it is on the Linux NFS client.

  The remaining patches in this pull request are clean-ups and minor
  optimizations. Many thanks to the contributors, reviewers, testers,
  and bug reporters who participated during the v6.18 NFSD development
  cycle"

* tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  nfsd: discard nfserr_dropit
  SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
  NFSD: Add io_cache_{read,write} controls to debugfs
  NFSD: Do the grace period check in ->proc_layoutget
  nfsd: delete unnecessary NULL check in __fh_verify()
  NFSD: Allow layoutcommit during grace period
  NFSD: Disallow layoutget during grace period
  sunrpc: fix "occurence"->"occurrence"
  nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in
  nfsd: nfserr_jukebox in nlm_fopen should lead to a retry
  NFSD: Reduce DRC bucket size
  NFSD: Delay adding new entries to LRU
  SUNRPC: Move the svc_rpcb_cleanup() call sites
  NFS: Remove rpcbind cleanup for NFSv4.0 callback
  nfsd: unregister with rpcbind when deleting a transport
  NFSD: Drop redundant conversion to bool
  sunrpc: eliminate return pointer in svc_tcp_sendmsg()
  sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length
  nfsd: decouple the xprtsec policy check from check_nfsd_access()
  NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()
  ...
2025-10-06 13:22:21 -07:00
Eric Dumazet 21b29e74ff tcp: take care of zero tp->window_clamp in tcp_set_rcvlowat()
Some applications (like selftests/net/tcp_mmap.c) call SO_RCVLOWAT
on their listener, before accept().

This has an unfortunate effect on wscale selection in
tcp_select_initial_window() during 3WHS.

For instance, tcp_mmap was negotiating wscale 4, regardless
of tcp_rmem[2] and sysctl_rmem_max.

Do not change tp->window_clamp if it is zero
or bigger than our computed value.

Zero value is special, it allows tcp_select_initial_window()
to enable autotuning.

Note that SO_RCVLOWAT use on listener is probably not wise,
because tp->scaling_ratio has a default value, possibly wrong.

Fixes: d1361840f8 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20251003184119.2526655-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 13:08:48 -07:00
Toke Høiland-Jørgensen 95920c2ed0 page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
boot on his 32-bit parisc machine. The cause of this is the mask is set
too wide, so the page_pool_page_is_pp() incurs false positives which
crashes the machine.

Just disabling the check in page_pool_is_pp() will lead to the page_pool
code itself malfunctioning; so instead of doing this, this patch changes
the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
pointers for page_pool-tagged pages.

The fix relies on the kernel pointers that alias with the pp_magic field
always being above PAGE_OFFSET. With this assumption, we can use the
lowest bit of the value of PAGE_OFFSET as the upper bound of the
PP_DMA_INDEX_MASK, which should avoid the false positives.

Because we cannot rely on PAGE_OFFSET always being a compile-time
constant, nor on it always being >0, we fall back to disabling the
dma_index storage when there are not enough bits available. This leaves
us in the situation we were in before the patch in the Fixes tag, but
only on a subset of architecture configurations. This seems to be the
best we can do until the transition to page types in complete for
page_pool pages.

v2:
- Make sure there's at least 8 bits available and that the PAGE_OFFSET
  bit calculation doesn't wrap

Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/
Fixes: ee62ce7a1d ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
Cc: stable@vger.kernel.org # 6.15+
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Tested-by: Helge Deller <deller@gmx.de>
Link: https://patch.msgid.link/20250930114331.675412-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 12:14:04 -07:00
Kuniyuki Iwashima 2e7cbbbe3d tcp: Don't call reqsk_fastopen_remove() in tcp_conn_request().
syzbot reported the splat below in tcp_conn_request(). [0]

If a listener is close()d while a TFO socket is being processed in
tcp_conn_request(), inet_csk_reqsk_queue_add() does not set reqsk->sk
and calls inet_child_forget(), which calls tcp_disconnect() for the
TFO socket.

After the cited commit, tcp_disconnect() calls reqsk_fastopen_remove(),
where reqsk_put() is called due to !reqsk->sk.

Then, reqsk_fastopen_remove() in tcp_conn_request() decrements the
last req->rsk_refcnt and frees reqsk, and __reqsk_free() at the
drop_and_free label causes the refcount underflow for the listener
and double-free of the reqsk.

Let's remove reqsk_fastopen_remove() in tcp_conn_request().

Note that other callers make sure tp->fastopen_rsk is not NULL.

[0]:
refcount_t: underflow; use-after-free.
WARNING: CPU: 12 PID: 5563 at lib/refcount.c:28 refcount_warn_saturate (lib/refcount.c:28)
Modules linked in:
CPU: 12 UID: 0 PID: 5563 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:refcount_warn_saturate (lib/refcount.c:28)
Code: ab e8 8e b4 98 ff 0f 0b c3 cc cc cc cc cc 80 3d a4 e4 d6 01 00 75 9c c6 05 9b e4 d6 01 01 48 c7 c7 e8 df fb ab e8 6a b4 98 ff <0f> 0b e9 03 5b 76 00 cc 80 3d 7d e4 d6 01 00 0f 85 74 ff ff ff c6
RSP: 0018:ffffa79fc0304a98 EFLAGS: 00010246
RAX: d83af4db1c6b3900 RBX: ffff9f65c7a69020 RCX: d83af4db1c6b3900
RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: ffffffffac78a280
RBP: 000000009d781b60 R08: 0000000000007fff R09: ffffffffac6ca280
R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9f65c7b4f100
R13: ffff9f65c7d23c00 R14: ffff9f65c7d26000 R15: ffff9f65c7a64ef8
FS:  00007f9f962176c0(0000) GS:ffff9f65fcf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000180 CR3: 000000000dbbe006 CR4: 0000000000372ef0
Call Trace:
 <IRQ>
 tcp_conn_request (./include/linux/refcount.h:400 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/sock.h:1965 ./include/net/request_sock.h:131 net/ipv4/tcp_input.c:7301)
 tcp_rcv_state_process (net/ipv4/tcp_input.c:6708)
 tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1670)
 tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1906)
 ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438)
 ip6_input (net/ipv6/ip6_input.c:500)
 ipv6_rcv (net/ipv6/ip6_input.c:311)
 __netif_receive_skb (net/core/dev.c:6104)
 process_backlog (net/core/dev.c:6456)
 __napi_poll (net/core/dev.c:7506)
 net_rx_action (net/core/dev.c:7569 net/core/dev.c:7696)
 handle_softirqs (kernel/softirq.c:579)
 do_softirq (kernel/softirq.c:480)
 </IRQ>

Fixes: 45c8a6cc2b ("tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251001233755.1340927-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 11:11:20 -07:00
Alexandr Sapozhnikov 2f3119686e net/sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce()
If new_asoc->peer.adaptation_ind=0 and sctp_ulpevent_make_authkey=0
and sctp_ulpevent_make_authkey() returns 0, then the variable
ai_ev remains zero and the zero will be dereferenced
in the sctp_ulpevent_free() function.

Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Fixes: 30f6ebf65b ("sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT")
Link: https://patch.msgid.link/20251002091448.11-1-alsp705@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06 11:07:20 -07:00
Linus Torvalds 070a542f08 NFS Client Updates for Linux 6.18
New Features:
  * Add a Kconfig option to redirect dfprintk() to the trace buffer
  * Enable use of the RWF_DONTCACHE flag on the NFS client
  * Add striped layout handling to pNFS flexfiles
  * Add proper localio handling for READ and WRITE O_DIRECT
 
 Bugfixes:
  * Handle NFS4ERR_GRACE errors during delegation recall
  * Fix NFSv4.1 backchannel max_resp_sz verification check
  * Fix mount hang after CREATE_SESSION failure
  * Fix d_parent->d_inode locking in nfs4_setup_readdir()
 
 Other Cleanups and Improvements:
  * Improvements to write handling tracepoints
  * Fix a few trivial spelling mistakes
  * Cleanups to the rpcbind cleanup call sites
  * Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page
  * Remove unused NFS_WBACK_BUSY() macro
  * Remove __GFP_NOWARN flags
  * Unexport rpc_malloc() and rpc_free()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmjgOEkACgkQ18tUv7Cl
 QOuX7RAA33AUq4NBxzDOgz4u4eNU/a8z2AazRgAtfmPVLTitrx/kqcfVEtHAdHFi
 cWkN2SO+TVzxIGOrudqNyjV2cjfUJV4ZJBkNY6lJvxPNAH27Dk9P2iMF12QYtHOq
 qyrwqoUQcyBkmtpgFUyHzydA4J17JDl5A7I/tOkro3ZfV4gmYAUwVdS+VtJoosLp
 7FnXv+W5FBWkfKrIT+vPyiBqxl0gZXmzUkJK2lG9m9NvE2Jk2MbPFyhdUEA5JybJ
 akNLdBnFwNWw2rLulSqs68ZbCGz6NY634q1Z+ZsRJ907ZdBqJ7zIBFv4yc/bMpZm
 Q9kh1M0OyvK0MlLRFe3efOLxRoN9nJPd+kuaw9eYw5V57Jrwj6QGV4nud2C8nzs8
 iB+LuJli+FRCeD84SY8NnMFKpXphHCeMXcBMRMsLTOSotJZFithO95+w1pKlK64A
 lxY1JXOQYelwJZxfhGPovwac4t1arpDjsumRlTmq12KaQnM3Z1gR2PUgeLxEPHQM
 f6gEiN9KDOhW/gZrFQxNs2hVAH68RDKpWxeR2XeVJlJYf37Hgh8bNGEiURi3G57n
 ED7tFbK9lzHVFR07hiP3Cvzop4z2mzadgHo+1vzdXmZZQA4gc4MSFfszFLCnQopw
 LEb7RqpVVXtQb+f7A+LuD+a2rLEnW+gTf6iLqCR8hAB5k1xmcYQ=
 =8wnU
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add a Kconfig option to redirect dfprintk() to the trace buffer
   - Enable use of the RWF_DONTCACHE flag on the NFS client
   - Add striped layout handling to pNFS flexfiles
   - Add proper localio handling for READ and WRITE O_DIRECT

  Bugfixes:
   - Handle NFS4ERR_GRACE errors during delegation recall
   - Fix NFSv4.1 backchannel max_resp_sz verification check
   - Fix mount hang after CREATE_SESSION failure
   - Fix d_parent->d_inode locking in nfs4_setup_readdir()

  Other Cleanups and Improvements:
   - Improvements to write handling tracepoints
   - Fix a few trivial spelling mistakes
   - Cleanups to the rpcbind cleanup call sites
   - Convert the SUNRPC xdr_buf to use a scratch folio instead of
     scratch page
   - Remove unused NFS_WBACK_BUSY() macro
   - Remove __GFP_NOWARN flags
   - Unexport rpc_malloc() and rpc_free()"

* tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits)
  NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
  nfs/localio: add tracepoints for misaligned DIO READ and WRITE support
  nfs/localio: add proper O_DIRECT support for READ and WRITE
  nfs/localio: refactor iocb initialization
  nfs/localio: refactor iocb and iov_iter_bvec initialization
  nfs/localio: avoid issuing misaligned IO using O_DIRECT
  nfs/localio: make trace_nfs_local_open_fh more useful
  NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
  sunrpc: unexport rpc_malloc() and rpc_free()
  NFSv4/flexfiles: Add support for striped layouts
  NFSv4/flexfiles: Update layout stats & error paths for striped layouts
  NFSv4/flexfiles: Write path updates for striped layouts
  NFSv4/flexfiles: Commit path updates for striped layouts
  NFSv4/flexfiles: Read path updates for striped layouts
  NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
  NFSv4/flexfiles: Add data structure support for striped layouts
  NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
  NFSv4/flexfiles: Remove cred local variable dependency
  nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing
  NFS: Enable use of the RWF_DONTCACHE flag on the NFS client
  ...
2025-10-03 14:20:40 -07:00
Bhanu Seshu Kumar Valluri 1b54b0756f net: doc: Fix typos in docs
Fix typos in doc comments.

Signed-off-by: Bhanu Seshu Kumar Valluri <bhanuseshukumar@gmail.com>
Link: https://patch.msgid.link/20251001105715.50462-1-bhanuseshukumar@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-03 10:24:14 -07:00
Jakub Kicinski 7a0f94361f net: psp: don't assume reply skbs will have a socket
Rx path may be passing around unreferenced sockets, which means
that skb_set_owner_edemux() may not set skb->sk and PSP will crash:

  KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
  RIP: 0010:psp_reply_set_decrypted (./include/net/psp/functions.h:132 net/psp/psp_sock.c:287)
    tcp_v6_send_response.constprop.0 (net/ipv6/tcp_ipv6.c:979)
    tcp_v6_send_reset (net/ipv6/tcp_ipv6.c:1140 (discriminator 1))
    tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1683)
    tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1912)

Fixes: 659a2899a5 ("tcp: add datapath logic for PSP with inline key exchange")
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251001022426.2592750-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-03 10:23:50 -07:00
Linus Torvalds 07fdad3a93 Networking changes for 6.18.
Core & protocols
 ----------------
 
  - Improve drop account scalability on NUMA hosts for RAW and UDP sockets
    and the backlog, almost doubling the Pps capacity under DoS.
 
  - Optimize the UDP RX performance under stress, reducing contention,
    revisiting the binary layout of the involved data structs and
    implementing NUMA-aware locking. This improves UDP RX performance by
    an additional 50%, even more under extreme conditions.
 
  - Add support for PSP encryption of TCP connections; this mechanism has
    some similarities with IPsec and TLS, but offers superior HW offloads
    capabilities.
 
  - Ongoing work to support Accurate ECN for TCP. AccECN allows more than
    one congestion notification signal per RTT and is a building block for
    Low Latency, Low Loss, and Scalable Throughput (L4S).
 
  - Reorganize the TCP socket binary layout for data locality, reducing
    the number of touched cachelines in the fastpath.
 
  - Refactor skb deferral free to better scale on large multi-NUMA hosts,
    this improves TCP and UDP RX performances significantly on such HW.
 
  - Increase the default socket memory buffer limits from 256K to 4M to
    better fit modern link speeds.
 
  - Improve handling of setups with a large number of nexthop, making dump
    operating scaling linearly and avoiding unneeded synchronize_rcu() on
    delete.
 
  - Improve bridge handling of VLAN FDB, storing a single entry per bridge
    instead of one entry per port; this makes the dump order of magnitude
    faster on large switches.
 
  - Restore IP ID correctly for encapsulated packets at GSO segmentation
    time, allowing GRO to merge packets in more scenarios.
 
  - Improve netfilter matching performance on large sets.
 
  - Improve MPTCP receive path performance by leveraging recently
    introduced core infrastructure (skb deferral free) and adopting recent
    TCP autotuning changes.
 
  - Allow bridges to redirect to a backup port when the bridge port is
    administratively down.
 
  - Introduce MPTCP 'laminar' endpoint that con be used only once per
    connection and simplify common MPTCP setups.
 
  - Add RCU safety to dst->dev, closing a lot of possible races.
 
  - A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing
    code duplication.
 
  - Supports pulling data from an skb frag into the linear area of an XDP
    buffer.
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Generate netlink documentation from YAML using an integrated
    YAML parser.
 
 Driver API
 ----------
 
  - Support using IPv6 Flow Label in Rx hash computation and RSS queue
    selection.
 
  - Introduce API for fetching the DMA device for a given queue, allowing
    TCP zerocopy RX on more H/W setups.
 
  - Make XDP helpers compatible with unreadable memory, allowing more
    easily building DevMem-enabled drivers with a unified XDP/skbs
    datapath.
 
  - Add a new dedicated ethtool callback enabling drivers to provide the
    number of RX rings directly, improving efficiency and clarity in RX
    ring queries and RSS configuration.
 
  - Introduce a burst period for the health reporter, allowing better
    handling of multiple errors due to the same root cause.
 
  - Support for DPLL phase offset exponential moving average, controlling
    the average smoothing factor.
 
 Device drivers
 --------------
 
  - Add a new Huawei driver for 3rd gen NIC (hinic3).
 
  - Add a new SpacemiT driver for K1 ethernet MAC.
 
  - Add a generic abstraction for shared memory communication devices
    (dibps)
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - Use multiple per-queue doorbell, to avoid MMIO contention issues
      - support adjacent functions, allowing them to delegate their
        SR-IOV VFs to sibling PFs
      - support RSS for IPSec offload
      - support exposing raw cycle counters in PTP and mlx5
      - support for disabling host PFs.
    - Intel (100G, ice, idpf):
      - ice: support for SRIOV VFs over an Active-Active link aggregate
      - ice: support for firmware logging via debugfs
      - ice: support for Earliest TxTime First (ETF) hardware offload
      - idpf: support basic XDP functionalities and XSk
    - Broadcom (bnxt):
      - support Hyper-V VF ID
      - dynamic SRIOV resource allocations for RoCE
    - Meta (fbnic):
      - support queue API, zero-copy Rx and Tx
      - support basic XDP functionalities
      - devlink health support for FW crashes and OTP mem corruptions
      - expand hardware stats coverage to FEC, PHY, and Pause
    - Wangxun:
      - support ethtool coalesce options
      - support for multiple RSS contexts
 
  - Ethernet virtual:
    - Macsec:
      - replace custom netlink attribute checks with policy-level checks
    - Bonding:
      - support aggregator selection based on port priority
    - Microsoft vNIC:
      - use page pool fragments for RX buffers instead of full pages to
        improve memory efficiency
 
  - Ethernet NICs consumer, and embedded:
    - Qualcomm: support Ethernet function for IPQ9574 SoC
    - Airoha: implement wlan offloading via NPU
    - Freescale
      - enetc: add NETC timer PTP driver and add PTP support
      - fec: enable the Jumbo frame support for i.MX8QM
    - Renesas (R-Car S4): support HW offloading for layer 2 switching
      - support for RZ/{T2H, N2H} SoCs
    - Cadence (macb): support TAPRIO traffic scheduling
    - TI:
      - support for Gigabit ICSS ethernet SoC (icssm-prueth)
    - Synopsys (stmmac): a lot of cleanups
 
  - Ethernet PHYs:
    - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
      driver
    - Support bcm63268 GPHY power control
    - Support for Micrel lan8842 PHY and PTP
    - Support for Aquantia AQR412 and AQR115
 
  - CAN:
    - a large CAN-XL preparation work
    - reorganize raw_sock and uniqframe struct to minimize memory usage
    - rcar_canfd: update the CAN-FD handling
 
  - WiFi:
    - extended Neighbor Awareness Networking (NAN) support
    - S1G channel representation cleanup
    - improve S1G support
 
  - WiFi drivers:
    - Intel (iwlwifi):
      - major refactor and cleanup
    - Broadcom (brcm80211):
      - support for AP isolation
    - RealTek (rtw88/89) rtw88/89:
      - preparation work for RTL8922DE support
    - MediaTek (mt76):
      - HW restart improvements
      - MLO support
    - Qualcomm/Atheros (ath10k_
      - GTK rekey fixes
 
  - Bluetooth drivers:
    - btusb: support for several new IDs for MT7925
    - btintel: support for BlazarIW core
    - btintel_pcie: support for _suspend() / _resume()
    - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB
 zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f
 q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU
 VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb
 ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR
 dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS
 0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD
 doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm
 pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X
 NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO
 Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx
 ST2QF6lZeWFh
 =w2Kz
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core & protocols:

   - Improve drop account scalability on NUMA hosts for RAW and UDP
     sockets and the backlog, almost doubling the Pps capacity under DoS

   - Optimize the UDP RX performance under stress, reducing contention,
     revisiting the binary layout of the involved data structs and
     implementing NUMA-aware locking. This improves UDP RX performance
     by an additional 50%, even more under extreme conditions

   - Add support for PSP encryption of TCP connections; this mechanism
     has some similarities with IPsec and TLS, but offers superior HW
     offloads capabilities

   - Ongoing work to support Accurate ECN for TCP. AccECN allows more
     than one congestion notification signal per RTT and is a building
     block for Low Latency, Low Loss, and Scalable Throughput (L4S)

   - Reorganize the TCP socket binary layout for data locality, reducing
     the number of touched cachelines in the fastpath

   - Refactor skb deferral free to better scale on large multi-NUMA
     hosts, this improves TCP and UDP RX performances significantly on
     such HW

   - Increase the default socket memory buffer limits from 256K to 4M to
     better fit modern link speeds

   - Improve handling of setups with a large number of nexthop, making
     dump operating scaling linearly and avoiding unneeded
     synchronize_rcu() on delete

   - Improve bridge handling of VLAN FDB, storing a single entry per
     bridge instead of one entry per port; this makes the dump order of
     magnitude faster on large switches

   - Restore IP ID correctly for encapsulated packets at GSO
     segmentation time, allowing GRO to merge packets in more scenarios

   - Improve netfilter matching performance on large sets

   - Improve MPTCP receive path performance by leveraging recently
     introduced core infrastructure (skb deferral free) and adopting
     recent TCP autotuning changes

   - Allow bridges to redirect to a backup port when the bridge port is
     administratively down

   - Introduce MPTCP 'laminar' endpoint that con be used only once per
     connection and simplify common MPTCP setups

   - Add RCU safety to dst->dev, closing a lot of possible races

   - A significant crypto library API for SCTP, MPTCP and IPv6 SR,
     reducing code duplication

   - Supports pulling data from an skb frag into the linear area of an
     XDP buffer

  Things we sprinkled into general kernel code:

   - Generate netlink documentation from YAML using an integrated YAML
     parser

  Driver API:

   - Support using IPv6 Flow Label in Rx hash computation and RSS queue
     selection

   - Introduce API for fetching the DMA device for a given queue,
     allowing TCP zerocopy RX on more H/W setups

   - Make XDP helpers compatible with unreadable memory, allowing more
     easily building DevMem-enabled drivers with a unified XDP/skbs
     datapath

   - Add a new dedicated ethtool callback enabling drivers to provide
     the number of RX rings directly, improving efficiency and clarity
     in RX ring queries and RSS configuration

   - Introduce a burst period for the health reporter, allowing better
     handling of multiple errors due to the same root cause

   - Support for DPLL phase offset exponential moving average,
     controlling the average smoothing factor

  Device drivers:

   - Add a new Huawei driver for 3rd gen NIC (hinic3)

   - Add a new SpacemiT driver for K1 ethernet MAC

   - Add a generic abstraction for shared memory communication
     devices (dibps)

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - Use multiple per-queue doorbell, to avoid MMIO contention
           issues
         - support adjacent functions, allowing them to delegate their
           SR-IOV VFs to sibling PFs
         - support RSS for IPSec offload
         - support exposing raw cycle counters in PTP and mlx5
         - support for disabling host PFs.
      - Intel (100G, ice, idpf):
         - ice: support for SRIOV VFs over an Active-Active link
           aggregate
         - ice: support for firmware logging via debugfs
         - ice: support for Earliest TxTime First (ETF) hardware offload
         - idpf: support basic XDP functionalities and XSk
      - Broadcom (bnxt):
         - support Hyper-V VF ID
         - dynamic SRIOV resource allocations for RoCE
      - Meta (fbnic):
         - support queue API, zero-copy Rx and Tx
         - support basic XDP functionalities
         - devlink health support for FW crashes and OTP mem corruptions
         - expand hardware stats coverage to FEC, PHY, and Pause
      - Wangxun:
         - support ethtool coalesce options
         - support for multiple RSS contexts

   - Ethernet virtual:
      - Macsec:
         - replace custom netlink attribute checks with policy-level
           checks
      - Bonding:
         - support aggregator selection based on port priority
      - Microsoft vNIC:
         - use page pool fragments for RX buffers instead of full pages
           to improve memory efficiency

   - Ethernet NICs consumer, and embedded:
      - Qualcomm: support Ethernet function for IPQ9574 SoC
      - Airoha: implement wlan offloading via NPU
      - Freescale
         - enetc: add NETC timer PTP driver and add PTP support
         - fec: enable the Jumbo frame support for i.MX8QM
      - Renesas (R-Car S4):
         - support HW offloading for layer 2 switching
         - support for RZ/{T2H, N2H} SoCs
      - Cadence (macb): support TAPRIO traffic scheduling
      - TI:
         - support for Gigabit ICSS ethernet SoC (icssm-prueth)
      - Synopsys (stmmac): a lot of cleanups

   - Ethernet PHYs:
      - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
        driver
      - Support bcm63268 GPHY power control
      - Support for Micrel lan8842 PHY and PTP
      - Support for Aquantia AQR412 and AQR115

   - CAN:
      - a large CAN-XL preparation work
      - reorganize raw_sock and uniqframe struct to minimize memory
        usage
      - rcar_canfd: update the CAN-FD handling

   - WiFi:
      - extended Neighbor Awareness Networking (NAN) support
      - S1G channel representation cleanup
      - improve S1G support

   - WiFi drivers:
      - Intel (iwlwifi):
         - major refactor and cleanup
      - Broadcom (brcm80211):
         - support for AP isolation
      - RealTek (rtw88/89) rtw88/89:
         - preparation work for RTL8922DE support
      - MediaTek (mt76):
         - HW restart improvements
         - MLO support
      - Qualcomm/Atheros (ath10k):
         - GTK rekey fixes

   - Bluetooth drivers:
      - btusb: support for several new IDs for MT7925
      - btintel: support for BlazarIW core
      - btintel_pcie: support for _suspend() / _resume()
      - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"

* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
  net: stmmac: Add support for Allwinner A523 GMAC200
  dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
  Revert "Documentation: net: add flow control guide and document ethtool API"
  octeontx2-pf: fix bitmap leak
  octeontx2-vf: fix bitmap leak
  net/mlx5e: Use extack in set rxfh callback
  net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
  net/mlx5e: Introduce mlx5e_rss_init_params
  net/mlx5e: Remove unused mdev param from RSS indir init
  net/mlx5: Improve QoS error messages with actual depth values
  net/mlx5e: Prevent entering switchdev mode with inconsistent netns
  net/mlx5: HWS, Generalize complex matchers
  net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
  selftests/net: add tcp_port_share to .gitignore
  Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
  net: add NUMA awareness to skb_attempt_defer_free()
  net: use llist for sd->defer_list
  net: make softnet_data.defer_count an atomic
  selftests: drv-net: psp: add tests for destroying devices
  selftests: drv-net: psp: add test for auto-adjusting TCP MSS
  ...
2025-10-02 15:17:01 -07:00
Eric Biggers d8e97cc476 SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it.  This
unblocks the eventual removal of the selection of CRYPTO from NFSD_V4,
which will no longer be needed by nfsd itself due to switching to the
crypto library functions.  But NFSD_V4 selects RPCSEC_GSS_KRB5, which
still needs CRYPTO.  It makes more sense for RPCSEC_GSS_KRB5 to select
CRYPTO itself, like most other kconfig options that need CRYPTO do.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01 15:54:01 -04:00
Paolo Abeni f1455695d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

tools/testing/selftests/drivers/net/bonding/Makefile
  87951b5664 selftests: bonding: add test for passive LACP mode
  c2377f1763 selftests: bonding: add test for LACP actor port priority

Adjacent changes:

drivers/net/ethernet/cadence/macb.h
  fca3dc859b net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf16 net: macb: Add TAPRIO traffic scheduling support

drivers/net/ethernet/cadence/macb_main.c
  fca3dc859b net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf16 net: macb: Add TAPRIO traffic scheduling support

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-01 10:14:49 +02:00
Paolo Abeni 1a98f5699b Revert "Documentation: net: add flow control guide and document ethtool API"
This reverts commit 7bd80ed89d.

I should not have merged it to begin with due to pending review and
changes to be addressed.

Link: https://patch.msgid.link/c6f3af12df9b7998920a02027fc8893ce82afc4c.1759239721.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-01 09:48:21 +02:00
Linus Torvalds ae28ed4578 bpf-next-6.18
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v
 bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/
 UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX
 +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9
 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT
 tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG
 TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH
 Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB
 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy
 n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p
 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd
 c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg=
 =LeQi
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:

 - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc
   (Amery Hung)

   Applied as a stable branch in bpf-next and net-next trees.

 - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)

   Also a stable branch in bpf-next and net-next trees.

 - Enforce expected_attach_type for tailcall compatibility (Daniel
   Borkmann)

 - Replace path-sensitive with path-insensitive live stack analysis in
   the verifier (Eduard Zingerman)

   This is a significant change in the verification logic. More details,
   motivation, long term plans are in the cover letter/merge commit.

 - Support signed BPF programs (KP Singh)

   This is another major feature that took years to materialize.

   Algorithm details are in the cover letter/marge commit

 - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)

 - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)

 - Fix USDT SIB argument handling in libbpf (Jiawei Zhao)

 - Allow uprobe-bpf program to change context registers (Jiri Olsa)

 - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and
   Puranjay Mohan)

 - Allow access to union arguments in tracing programs (Leon Hwang)

 - Optimize rcu_read_lock() + migrate_disable() combination where it's
   used in BPF subsystem (Menglong Dong)

 - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred
   execution of BPF callback in the context of a specific task using the
   kernel’s task_work infrastructure (Mykyta Yatsenko)

 - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya
   Dwivedi)

 - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)

 - Improve the precision of tnum multiplier verifier operation
   (Nandakumar Edamana)

 - Use tnums to improve is_branch_taken() logic (Paul Chaignon)

 - Add support for atomic operations in arena in riscv JIT (Pu Lehui)

 - Report arena faults to BPF error stream (Puranjay Mohan)

 - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin
   Monnet)

 - Add bpf_strcasecmp() kfunc (Rong Tao)

 - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao
   Chen)

* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits)
  libbpf: Replace AF_ALG with open coded SHA-256
  selftests/bpf: Add stress test for rqspinlock in NMI
  selftests/bpf: Add test case for different expected_attach_type
  bpf: Enforce expected_attach_type for tailcall compatibility
  bpftool: Remove duplicate string.h header
  bpf: Remove duplicate crypto/sha2.h header
  libbpf: Fix error when st-prefix_ops and ops from differ btf
  selftests/bpf: Test changing packet data from kfunc
  selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
  selftests/bpf: Refactor stacktrace_map case with skeleton
  bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
  selftests/bpf: Fix flaky bpf_cookie selftest
  selftests/bpf: Test changing packet data from global functions with a kfunc
  bpf: Emit struct bpf_xdp_sock type in vmlinux BTF
  selftests/bpf: Task_work selftest cleanup fixes
  MAINTAINERS: Delete inactive maintainers from AF_XDP
  bpf: Mark kfuncs as __noclone
  selftests/bpf: Add kprobe multi write ctx attach test
  selftests/bpf: Add kprobe write ctx attach test
  selftests/bpf: Add uprobe context ip register change test
  ...
2025-09-30 17:58:11 -07:00
Mukesh Rathor 94b04355e6 Drivers: hv: Add CONFIG_HYPERV_VMBUS option
At present VMBus driver is hinged off of CONFIG_HYPERV which entails
lot of builtin code and encompasses too much. It's not always clear
what depends on builtin hv code and what depends on VMBus. Setting
CONFIG_HYPERV as a module and fudging the Makefile to switch to builtin
adds even more confusion. VMBus is an independent module and should have
its own config option. Also, there are scenarios like baremetal dom0/root
where support is built in with CONFIG_HYPERV but without VMBus. Lastly,
there are more features coming down that use CONFIG_HYPERV and add more
dependencies on it.

So, create a fine grained HYPERV_VMBUS option and update Kconfigs for
dependency on VMBus.

Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>	# drivers/pci
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-10-01 00:00:42 +00:00
Jeff Layton ffe381923d sunrpc: unexport rpc_malloc() and rpc_free()
These are not used outside of sunrpc code.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30 16:04:03 -04:00
Linus Torvalds 56a0810d8c audit/stable-6.18 PR 20250926
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmjWq5oUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPTjRAAwYapnw+ZGdFtTGIDT63HtlKjCGHg
 DRR8J1RYWhxQL78dInjl7hlGPd4ULdpdF6zsh27X/8OsdFotw4NhDyPJwS1qWZv9
 uBJMy/s1Qi1V/rrtDygLGgkQ9ICfl/hgVh3L+g9AXU8H9IapMULp33z+2ueFU4rA
 PXgXppgNQTOhIQml0tagY7iPlLaaI1uPv/Dbvt792CSrKZReC+uiDSQKD6SUy5oJ
 NBRs0emdCqbllo8Eo7wTGdfzUttsPWYHe7X9BGCMK2bHp0BpMnFBDtuipUAgjNE8
 O16EkAtBMpEBW9VEFvDYW1jMFO7ccD8b09CbqPLdE7E0GeigTiODg+FdncKEpZn0
 Dl4xPbIoPBHVrDHKFK3HcuEdUs0FZH3NpTLFRg0/nWbg3CfSOFq1ZKhSbwLTZ48V
 2Iq22G0hIIl3yTEePSoR8xCSQkWf6hA1SVvzBqw5Xn1tnkdIUuM+KzeZUPKxCOiH
 r5b3ufrN5YMAcmc59q393sNuSMd7s97fohhK8/HouB93EcVNM2UjLEKVJnhMhYRE
 N21O17jwQG9F+OYTnmtMzuUF6yxwSAmkzQOg6F+lalJ8MECnNrZOEeyuA3d5ISi5
 4ZrXHWw90qaDy9lCV1o0UwWt9na+WxeMCJNpI07h5V1k3x7BULiI6WeP7J1qnY9r
 YlLv/6Hgx29dtqE=
 =iQal
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:

 - Proper audit support for multiple LSMs

   As the audit subsystem predated the work to enable multiple LSMs,
   some additional work was needed to support logging the different LSM
   labels for the subjects/tasks and objects on the system. Casey's
   patches add new auxillary records for subjects and objects that
   convey the additional labels.

 - Ensure fanotify audit events are always generated

   Generally speaking security relevant subsystems always generate audit
   events, unless explicitly ignored. However, up to this point fanotify
   events had been ignored by default, but starting with this pull
   request fanotify follows convention and generates audit events by
   default.

 - Replace an instance of strcpy() with strscpy()

 - Minor indentation, style, and comment fixes

* tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix skb leak when audit rate limit is exceeded
  audit: init ab->skb_list earlier in audit_buffer_alloc()
  audit: add record for multiple object contexts
  audit: add record for multiple task security contexts
  lsm: security_lsmblob_to_secctx module selection
  audit: create audit_stamp structure
  audit: add a missing tab
  audit: record fanotify event regardless of presence of rules
  audit: fix typo in auditfilter.c comment
  audit: Replace deprecated strcpy() with strscpy()
  audit: fix indentation in audit_log_exit()
2025-09-30 08:22:16 -07:00
Eric Dumazet 5628f3fe3b net: add NUMA awareness to skb_attempt_defer_free()
Instead of sharing sd->defer_list & sd->defer_count with
many cpus, add one pair for each NUMA node.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:45:53 +02:00
Eric Dumazet 844c9db7f7 net: use llist for sd->defer_list
Get rid of sd->defer_lock and adopt llist operations.

We optimize skb_attempt_defer_free() for the common case,
where the packet is queued. Otherwise sd->defer_count
is increasing, until skb_defer_free_flush() clears it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:45:53 +02:00
Eric Dumazet 9c94ae6bb0 net: make softnet_data.defer_count an atomic
This is preparation work to remove the softnet_data.defer_lock,
as it is contended on hosts with large number of cores.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:45:52 +02:00
Jakub Kicinski f857478d62 netdevsim: a basic test PSP implementation
Provide a PSP implementation for netdevsim.

Use psp_dev_encapsulate() and psp_dev_rcv() to do actual encapsulation
and decapsulation on skbs, but perform no encryption or decryption. In
order to make encryption with a bad key result in a drop on the peer's
rx side, we stash our psd's generation number in the first byte of each
key before handing to the peer.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-2-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 15:17:21 +02:00
Dragos Tatulea a1b501a8c6 page_pool: Clamp pool size to max 16K pages
page_pool_init() returns E2BIG when the page_pool size goes above 32K
pages. As some drivers are configuring the page_pool size according to
the MTU and ring size, there are cases where this limit is exceeded and
the queue creation fails.

The page_pool size doesn't have to cover a full queue, especially for
larger ring size. So clamp the size instead of returning an error. Do
this in the core to avoid having each driver do the clamping.

The current limit was deemed to high [1] so it was reduced to 16K to avoid
page waste.

[1] https://lore.kernel.org/all/1758532715-820422-3-git-send-email-tariqt@nvidia.com/

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250926131605.2276734-2-dtatulea@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30 12:16:23 +02:00