Commit Graph

1399042 Commits

Author SHA1 Message Date
Raju Rangoju 9c11b6b1ab amd-xgbe: add ethtool jumbo frame selftest
Adds support for jumbo frame selftest. Works only for
mtu size greater than 1500.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20251031111555.774425-5-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06 13:38:11 +01:00
Raju Rangoju d7735c6bb2 amd-xgbe: add ethtool split header selftest
Adds support for ethtool split header selftest. Performs
UDP and TCP check to ensure split header selft test works
for both packet types.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251031111555.774425-4-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06 13:38:11 +01:00
Raju Rangoju 42b06fcc87 amd-xgbe: add ethtool phy loopback selftest
Add support for PHY loopback testing via ethtool self-test.
The test uses phy_loopback() which enables PHY-level loopback
through the PHY driver's set_loopback callback if provided,
else uses the genphy_loopback().

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251031111555.774425-3-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06 13:38:11 +01:00
Raju Rangoju 862a64c83f amd-xgbe: introduce support ethtool selftest
Add support for ethtool selftest for MAC loopback. This includes the
sanity check and helps in finding the misconfiguration of HW. Uses the
existing selftest infrastructure to create test packets.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251031111555.774425-2-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06 13:38:11 +01:00
Raju Rangoju 6b47af35a6 net: selftests: export packet creation helpers for driver use
Export the network selftest packet creation infrastructure to allow
network drivers to reuse the existing selftest framework instead of
duplicating packet creation code.

Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251031111811.775434-1-Raju.Rangoju@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-06 13:38:11 +01:00
Peter Zijlstra 4cb5ac2626 futex: Optimize per-cpu reference counting
Shrikanth noted that the per-cpu reference counter was still some 10%
slower than the old immutable option (which removes the reference
counting entirely).

Further optimize the per-cpu reference counter by:

 - switching from RCU to preempt;
 - using __this_cpu_*() since we now have preempt disabled;
 - switching from smp_load_acquire() to READ_ONCE().

This is all safe because disabling preemption inhibits the RCU grace
period exactly like rcu_read_lock().

Having preemption disabled allows using __this_cpu_*() provided the
only access to the variable is in task context -- which is the case
here.

Furthermore, since we know changing fph->state to FR_ATOMIC demands a
full RCU grace period we can rely on the implied smp_mb() from that to
replace the acquire barrier().

This is very similar to the percpu_down_read_internal() fast-path.

The reason this is significant for PowerPC is that it uses the generic
this_cpu_*() implementation which relies on local_irq_disable() (the
x86 implementation relies on it being a single memop instruction to be
IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids
this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE
barrier, not having to use explicit barriers safes a bunch.

Combined this reduces the performance gap by half, down to some 5%.

Fixes: 760e6f7bef ("futex: Remove support for IMMUTABLE")
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251106092929.GR4067720@noisy.programming.kicks-ass.net
2025-11-06 12:30:54 +01:00
Aaron Lu 956dfda6a7 sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining
When a cfs_rq is to be throttled, its limbo list should be empty and
that's why there is a warn in tg_throttle_down() for non empty
cfs_rq->throttled_limbo_list.

When running a test with the following hierarchy:

          root
        /      \
        A*     ...
     /  |  \   ...
        B
       /  \
      C*

where both A and C have quota settings, that warn on non empty limbo list
is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
part of the cfs_rq for the sake of simpler representation).

Debug showed it happened like this:
Task group C is created and quota is set, so in tg_set_cfs_bandwidth(),
cfs_rq_c is initialized with runtime_enabled set, runtime_remaining
equals to 0 and *unthrottled*. Before any tasks are enqueued to cfs_rq_c,
*multiple* throttled tasks can migrate to cfs_rq_c (e.g., due to task
group changes). When enqueue_task_fair(cfs_rq_c, throttled_task) is
called and cfs_rq_c is in a throttled hierarchy (e.g., A is throttled),
these throttled tasks are directly placed into cfs_rq_c's limbo list by
enqueue_throttled_task().

Later, when A is unthrottled, tg_unthrottle_up(cfs_rq_c) enqueues these
tasks. The first enqueue triggers check_enqueue_throttle(), and with zero
runtime_remaining, cfs_rq_c can be throttled in throttle_cfs_rq() if it
can't get more runtime and enters tg_throttle_down(), where the warning
is hit due to remaining tasks in the limbo list.

I think it's a chaos to trigger throttle on unthrottle path, the status
of a being unthrottled cfs_rq can be in a mixed state in the end, so fix
this by granting 1ns to cfs_rq in tg_set_cfs_bandwidth(). This ensures
cfs_rq_c has a positive runtime_remaining when initialized as unthrottled
and cannot enter tg_unthrottle_up() with zero runtime_remaining.

Also, update outdated comments in tg_throttle_down() since
unthrottle_cfs_rq() is no longer called with zero runtime_remaining.
While at it, remove a redundant assignment to se in tg_throttle_down().

Fixes: e1fad12dcb ("sched/fair: Switch to task based throttle model")
Reviewed-By: Benjamin Segall <bsegall@google.com>
Suggested-by: Benjamin Segall <bsegall@google.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Hao Jia <jiahao1@lixiang.com>
Link: https://patch.msgid.link/20251030032755.560-1-ziqianlu@bytedance.com
2025-11-06 12:30:52 +01:00
Christoph Hellwig d8a823c6f0 xfs: free xfs_busy_extents structure when no RT extents are queued
kmemleak occasionally reports leaking xfs_busy_extents structure
from xfs_scrub calls after running xfs/528 (but attributed to following
tests), which seems to be caused by not freeing the xfs_busy_extents
structure when tr.queued is 0 and xfs_trim_rtgroup_extents breaks out
of the main loop.  Free the structure in this case.

Fixes: a3315d1130 ("xfs: use rtgroup busy extent list for FITRIM")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-06 08:59:19 +01:00
Vlastimil Babka c379b745e1 slab: prevent infinite loop in kmalloc_nolock() with debugging
In review of a followup work, Harry noticed a potential infinite loop.
Upon closed inspection, it already exists for kmalloc_nolock() on a
cache with debugging enabled, since commit af92793e52 ("slab:
Introduce kmalloc_nolock() and kfree_nolock().")

When alloc_single_from_new_slab() fails to trylock node list_lock, we
keep retrying to get partial slab or allocate a new slab. If we indeed
interrupted somebody holding the list_lock, the trylock fill fail
deterministically and we end up allocating and defer-freeing slabs
indefinitely with no progress.

To fix it, fail the allocation if spinning is not allowed. This is
acceptable in the restricted context of kmalloc_nolock(), especially
with debugging enabled.

Reported-by: Harry Yoo <harry.yoo@oracle.com>
Closes: https://lore.kernel.org/all/aQLqZjjq1SPD3Fml@hyeyoo/
Fixes: af92793e52 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251103-fix-nolock-loop-v1-1-6e2b3e82b9da@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-06 08:13:12 +01:00
Shangjuan Wei 0567c84d68 dt-bindings: ethernet: eswin: fix yaml schema issues
eswin,hsp-sp-csr attribute is one phandle with multiple arguments,
so the syntax should be in the form of:
 items:
   - items:
       - description: ...
       - description: ...
       - description: ...
       - description: ...

To align with the description of the 'eswin-sp-csr'
attribute in the mmc,usb modules, the description
of the 'eswin,hsp-sp-csr' attribute has been modified.

Fixes: 888bd0eca9 ("dt-bindings: ethernet: eswin: Document for EIC7700 SoC")
Reported-by: Rob Herring (Arm) <robh@kernel.org>
Closes: https://lore.kernel.org/all/176096011380.22917.1988679321096076522.robh@kernel.org/
Signed-off-by: Shangjuan Wei <weishangjuan@eswincomputing.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251104073305.299-1-weishangjuan@eswincomputing.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 20:00:29 -08:00
Jakub Kicinski 9158447f09 Merge branch 'net-stmmac-socfpga-add-agilex5-platform-support-and-enhancements'
Rohan G Thomas says:

====================
net: stmmac: socfpga: Add Agilex5 platform support and enhancements

This patch series adds support for the Agilex5 EMAC platform to the
dwmac-socfpga driver.

The series includes:
   - Platform configuration for Agilex5 EMAC
   - Enabling Time-Based Scheduling (TBS) for Tx queues 6 and 7
   - Enabling TCP Segmentation Offload(TSO)
   - Adding hardware-supported cross timestamping using the SMTG IP,
     allowing precise synchronization between MAC and system time via
     PTP_SYS_OFFSET_PRECISE.
====================

Link: https://patch.msgid.link/20251101-agilex5_ext-v2-0-a6b51b4dca4d@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:35:16 -08:00
Rohan G Thomas fd8c4f6454 net: stmmac: socfpga: Add hardware supported cross-timestamp
Cross timestamping is supported on Agilex5 platform with Synchronized
Multidrop Timestamp Gathering(SMTG) IP. The hardware cross-timestamp
result is made available the applications through the ioctl call
PTP_SYS_OFFSET_PRECISE, which inturn calls stmmac_getcrosststamp().

Device time is stored in the MAC Auxiliary register. The 64-bit System
time (ARM_ARCH_COUNTER) is stored in SMTG IP. SMTG IP is an MDIO device
with 0xC - 0xF MDIO register space holds 64-bit system time.

This commit is similar to following commit for Intel platforms:
Commit 341f67e424 ("net: stmmac: Add hardware supported cross-timestamp")

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Link: https://patch.msgid.link/20251101-agilex5_ext-v2-4-a6b51b4dca4d@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:35:14 -08:00
Rohan G Thomas e28988aef7 net: stmmac: socfpga: Enable TSO for Agilex5 platform
Agilex5 supports TCP Segmentation Offload(TSO). This commit enables
TSO for Agilex5 socfpga platforms.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Link: https://patch.msgid.link/20251101-agilex5_ext-v2-3-a6b51b4dca4d@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:35:14 -08:00
Rohan G Thomas 4c00476d44 net: stmmac: socfpga: Enable TBS support for Agilex5
Agilex5 supports Time-Based Scheduling(TBS) for Tx queue 6 and Tx
queue 7. This commit enables TBS support for these queues.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Link: https://patch.msgid.link/20251101-agilex5_ext-v2-2-a6b51b4dca4d@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:35:14 -08:00
Rohan G Thomas 93d46ea3e9 net: stmmac: socfpga: Agilex5 EMAC platform configuration
Agilex5 HPS EMAC uses the dwxgmac-3.10a IP, unlike previous socfpga
platforms which use dwmac1000 IP. Due to differences in platform
configuration, Agilex5 requires a distinct setup.

Introduce a setup_plat_dat() callback in socfpga_dwmac_ops to handle
platform-specific setup. This callback is invoked before
stmmac_dvr_probe() to ensure the platform data is correctly
configured. Also, implemented separate setup_plat_dat() callback for
current socfpga platforms and Agilex5.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20251101-agilex5_ext-v2-1-a6b51b4dca4d@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:35:14 -08:00
Jakub Kicinski 9b73cdad58 More changes from drivers are coming in, notably:
- ath10k: factory test support
  - ath11k: TX power insertion support
  - ath12k: BSS color change support
  - iwlwifi: new sniffer API support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmkLbdQACgkQ10qiO8sP
 aABJ2g/6A6nRAIyYZLGdSRCdZ/j6zZ+OhSvBvw1C4Rp9eWfPFwzX7qOfmFb9j00t
 5/l4Oby0Z6G8Ftv7/GRpss5IsfPYFDszaDyJmEtqWgmXFH3D3AUA5IQH6ImbfVHy
 Bnae+F+AHxp9vyUMwqToMJAfjJMufOsJFzmEEkHj6tlrs89ABe7hdzK557SZnLka
 6+p3bIT7knSBzfRKEsdWtKyNZW2r2s7sPpT4Yi6b4IS35v59fdugI9VDjDgwrStF
 ao227VlRLUEYGGyeVEcvq3NQFpVBaLX2dXeJD8kQ3j/If/W/XKI8xpVf453hxBk/
 Lxg/74cgew8wWf8otzSXQCaldLC6U6XyOZ+/j3phtKVzeL0TeUM8+0Mdg8BGQD9c
 2Ov36cXSze9UcB/izEanfExKLiFvh+QbVpFVt1lfwWgt2KFzdTSeG75n36ajWf3X
 JRKJHZincmCgT8KB4yIzz6/CH3pZaJGZ7MovT58xAM5k+sAFaZNNAnieeA6ilc7Q
 mAhHML7w1ABgDbrdUmhtPdbBlUoWB7eVOO5U71saEPirwAF3gZmG/ZZm5lPciZrS
 Q9zDgROvZ9w0J1+r/mS4dFfmqdsol7J8kDGsr8wFvWO/f/BUt0KhXP4J+AGaMqsm
 gon2wco7QRQz/7Qj7Pu2EGI+9y+2kwO2zfu46glH0K6GqPl8FJk=
 =1BIh
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
More changes from drivers are coming in, notably:

 - ath10k: factory test support
 - ath11k: TX power insertion support
 - ath12k: BSS color change support
 - iwlwifi: new sniffer API support

* tag 'wireless-next-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (63 commits)
  wifi: ath10k: use = {} to initialize bmi_target_info instead of memset
  wifi: ath10k: use = {} to initialize pm_qos_request instead of memset
  wifi: ath12k: unassign arvif on scan vdev create failure
  wifi: ath12k: enforce vdev limit in ath12k_mac_vdev_create()
  wifi: ath12k: Set EHT fixed rates for associated STAs
  wifi: ath12k: add EHT rates to ath12k_mac_op_set_bitrate_mask()
  wifi: ath12k: Add EHT fixed GI/LTF
  wifi: ath12k: Add EHT MCS/NSS rates to Peer Assoc
  wifi: ath12k: add EHT rate handling to existing set rate functions
  wifi: ath12k: generalize GI and LTF fixed rate functions
  wifi: ath12k: fix error handling in creating hardware group
  wifi: ath12k: fix reusing m3 memory
  wifi: ath12k: fix potential memory leak in ath12k_wow_arp_ns_offload()
  wifi: iwlwifi: mld: add null check for kzalloc() in iwl_mld_send_proto_offload()
  wifi: iwlwifi: mld: check for NULL pointer after kmalloc
  wifi: iwlwifi: cfg: fix a few device names
  wifi: iwlwifi: mld: Move EMLSR prints to IWL_DL_EHT
  wifi: iwlwifi: disable EHT if the device doesn't allow it
  wifi: iwlwifi: bump core version for BZ/SC/DR
  wifi: iwlwifi: mld: use FW_CHECK on bad ROC notification
  ...
====================

Link: https://patch.msgid.link/20251105153537.54096-38-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:09:23 -08:00
Jakub Kicinski 7d1988a943 Just two small fixes:
- ath12k: revert a change that caused performance regressions
  - hwsim: don't ignore netns on netlink socket matching
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmkLbLsACgkQ10qiO8sP
 aAAShQ//cCY3DzzictRKxhlgV+O4PQFzp39ItfwWqEKYW4egqUajDMOB0rneMA4h
 ZGsI3fy/5jrblcPSVSGljwkjPaYW9gbsxz0PnGKKAtlKIHSgAEoEb+EQcZFJy//u
 JCwZAN9EQuqiTDu82/NIqk+nFcSoNEtY56gkkzmcTlXjE4fswEFadayqOAhLBTWp
 gv0iC656r5IgZNbiXoR1Ja7qu6nubcZWkeOcbgqDJT29vZDFd315DDzx1kScmUM/
 KPRpv0rTMUULM5V9JxQGOCtwEQ8DZwaE75SL+/uiDvKCTJFxVoWXMuTeQqnwqOt2
 WDHCU9+oi43JKuaqT+aF1JdDAJZboizDbNqqOxtxuGBvUH84JiuhOCAF2ItBwczu
 IMsgJ9XGRRiSLNQ0qWjPW4tGXzGBieY6ec8RKhtldjLQsIZVRNamaJzIjZKFmmP+
 tYwICE26keQwgrM2AjDkXGUrtJPP7LzUUqnP6DXrb5M5nLd6zbI7AyU+2CSMXN20
 pYFZZA4j6hcFayDg5yQUoVDUab/2sTevclyuNhLsdFT2ymiyMTLjtIMczKVg+DIl
 m7oTRIMCS+7k25nG/B0+LqV3qv1Y06Ct3U/T/3S77HyvoPM1054dpDwLWHIyvpOG
 xw7crsLYRbUYcBr4ie5DOq7blzmvpdjgCMrFR5rDn2/4lhjIWI4=
 =zUoa
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Just two small fixes:

 - ath12k: revert a change that caused performance regressions
 - hwsim: don't ignore netns on netlink socket matching

* tag 'wireless-2025-11-05' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
  Revert "wifi: ath12k: Fix missing station power save configuration"
====================

Link: https://patch.msgid.link/20251105152827.53254-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:04:55 -08:00
Haotian Zhang 4d6ec3a793 net: wan: framer: pef2256: Switch to devm_mfd_add_devices()
The driver calls mfd_add_devices() but fails to call mfd_remove_devices()
in error paths after successful MFD device registration and in the remove
function. This leads to resource leaks where MFD child devices are not
properly unregistered.

Replace mfd_add_devices with devm_mfd_add_devices to automatically
manage the device resources.

Fixes: c96e976d9a ("net: wan: framer: Add support for the Lantiq PEF2256 framer")
Suggested-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://patch.msgid.link/20251105034716.662-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 18:02:34 -08:00
Dan Carpenter c79a022524 net: dsa: microchip: Fix a link check in ksz9477_pcs_read()
The BMSR_LSTATUS define is 0x4 but the "p->phydev.link" variable
is a 1 bit bitfield in a u32.  Since 4 doesn't fit in 0-1 range
it means that ".link" is always set to false.  Add a !! to fix
this.

[Jakub: According to Maxime the phydev struct isn't really
used and we should consider removing it completely. So not
treating this as a fix.]

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aQSz_euUg0Ja8ZaH@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:58:51 -08:00
Jiawen Wu a04ea57aae net: libwx: fix device bus LAN ID
The device bus LAN ID was obtained from PCI_FUNC(), but when a PF
port is passthrough to a virtual machine, the function number may not
match the actual port index on the device. This could cause the driver
to perform operations such as LAN reset on the wrong port.

Fix this by reading the LAN ID from port status register.

Fixes: a34b3e6ed8 ("net: txgbe: Store PCI info")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/B60A670C1F52CB8E+20251104062321.40059-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:52:13 -08:00
Jakub Kicinski b1d9154878 Merge branch 'net-mlx5e-shampo-fixes-for-64kb-page-size'
Tariq Toukan says:

====================
net/mlx5e: SHAMPO fixes for 64KB page size

This series by Dragos contains fixes for HW-GRO issues found on systems
with 64KB page size.
====================

Link: https://patch.msgid.link/1762238915-1027590-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:48:41 -08:00
Dragos Tatulea d8a7ed9586 net/mlx5e: SHAMPO, Fix header formulas for higher MTUs and 64K pages
The MLX5E_SHAMPO_WQ_HEADER_PER_PAGE and
MLX5E_SHAMPO_LOG_MAX_HEADER_ENTRY_SIZE macros are used directly in
several places under the assumption that there will always be more
headers per WQE than headers per page. However, this assumption doesn't
hold for 64K page sizes and higher MTUs (> 4K). This can be first
observed during header page allocation: ksm_entries will become 0 during
alignment to MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.

This patch introduces 2 additional members to the mlx5e_shampo_hd struct
which are meant to be used instead of the macrose mentioned above.
When the number of headers per WQE goes below
MLX5E_SHAMPO_WQ_HEADER_PER_PAGE, clamp the number of headers per
page and expand the header size accordingly so that the headers
for one WQE cover a full page.

All the formulas are adapted to use these two new members.

Fixes: 945ca432bf ("net/mlx5e: SHAMPO, Drop info array")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:48:37 -08:00
Dragos Tatulea bacd8d8018 net/mlx5e: SHAMPO, Fix skb size check for 64K pages
mlx5e_hw_gro_skb_has_enough_space() uses a formula to check if there is
enough space in the skb frags to store more data. This formula is
incorrect for 64K page sizes and it triggers early GRO session
termination because the first fragment will blow up beyond
GRO_LEGACY_MAX_SIZE.

This patch adds a special case for page sizes >= GRO_LEGACY_MAX_SIZE
(64K) which uses the skb->len instead. Within this context,
the check is safe from fragment overflow because the hardware
will continuously fill the data up to the reservation size of 64K
and the driver will coalesce all data from the same page to the same
fragment. This means that the data will span one fragment or at most
two for such a large page size.

It is expected that the if statement will be optimized out as the
check is done with constants.

Fixes: 92552d3abd ("net/mlx5e: HW_GRO cqe handler implementation")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:48:36 -08:00
Dragos Tatulea 665a7e13c2 net/mlx5e: SHAMPO, Fix header mapping for 64K pages
HW-GRO is broken on mlx5 for 64K page sizes. The patch in the fixes tag
didn't take into account larger page sizes when doing an align down
of max_ksm_entries. For 64K page size, max_ksm_entries is 0 which will skip
mapping header pages via WQE UMR. This breaks header-data split
and will result in the following syndrome:

mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x4c9, ci 0x0, qn 0x1133, opcode 0xe, syndrome 0x4, vendor syndrome 0x32
00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
00000030: 00 00 3b c7 93 01 32 04 00 00 00 00 00 00 bf e0
mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1133

Furthermore, the function that fills in WQE UMRs for the headers
(mlx5e_build_shampo_hd_umr()) only supports mapping page sizes that
fit in a single UMR WQE.

This patch goes back to the old non-aligned max_ksm_entries value and it
changes mlx5e_build_shampo_hd_umr() to support mapping a large page over
multiple UMR WQEs.

This means that mlx5e_build_shampo_hd_umr() can now leave a page only
partially mapped. The caller, mlx5e_alloc_rx_hd_mpwqe(), ensures that
there are enough UMR WQEs to cover complete pages by working on
ksm_entries that are multiples of MLX5E_SHAMPO_WQ_HEADER_PER_PAGE.

Fixes: 8a0ee54027 ("net/mlx5e: SHAMPO, Simplify UMR allocation for headers")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762238915-1027590-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:48:36 -08:00
Meghana Malladi ae4789affd net: ti: icssg-prueth: Fix fdb hash size configuration
The ICSSG driver does the initial FDB configuration which
includes setting the control registers. Other run time
management like learning is managed by the PRU's. The default
FDB hash size used by the firmware is 512 slots, which is
currently missing in the current driver. Update the driver
FDB config to include FDB hash size as well.

Please refer trm [1] 6.4.14.12.17 section on how the FDB config
register gets configured. From the table 6-1404, there is a reset
field for FDB_HAS_SIZE which is 4, meaning 1024 slots. Currently
the driver is not updating this reset value from 4(1024 slots) to
3(512 slots). This patch fixes this by updating the reset value
to 512 slots.

[1]: https://www.ti.com/lit/pdf/spruim2
Fixes: abd5576b9c ("net: ti: icssg-prueth: Add support for ICSSG switch firmware")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251104104415.3110537-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:43:08 -08:00
Gal Pressman d1c94bc5b9 net/mlx5e: Fix return value in case of module EEPROM read error
mlx5e_get_module_eeprom_by_page() has weird error handling.

First, it is treating -EINVAL as a special case, but it is unclear why.

Second, it tries to fail "gracefully" by returning the number of bytes
read even in case of an error. This results in wrongly returning
success (0 return value) if the error occurs before any bytes were
read.

Simplify the error handling by returning an error when such occurs. This
also aligns with the error handling we have in mlx5e_get_module_eeprom()
for the old API.

This fixes the following case where the query fails, but userspace
ethtool wrongly treats it as success and dumps an output:

  # ethtool -m eth2
  netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5
  netlink warning: mlx5_core: Query module eeprom by page failed, read 0 bytes, err -5
  Offset		Values
  ------		------
  0x0000:		00 00 00 00 05 00 04 00 00 00 00 00 05 00 05 00
  0x0010:		00 00 00 00 05 00 06 00 50 00 00 00 67 65 20 66
  0x0020:		61 69 6c 65 64 2c 20 72 65 61 64 20 30 20 62 79
  0x0030:		74 65 73 2c 20 65 72 72 20 2d 35 00 14 00 03 00
  0x0040:		08 00 01 00 03 00 00 00 08 00 02 00 1a 00 00 00
  0x0050:		14 00 04 00 08 00 01 00 04 00 00 00 08 00 02 00
  0x0060:		0e 00 00 00 14 00 05 00 08 00 01 00 05 00 00 00
  0x0070:		08 00 02 00 1a 00 00 00 14 00 06 00 08 00 01 00

Fixes: e109d2b204 ("net/mlx5: Implement get_module_eeprom_by_page()")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1762265736-1028868-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:42:37 -08:00
Sebastian Andrzej Siewior d917c217b6 net: gro_cells: Reduce lock scope in gro_cell_poll
One GRO-cell device's NAPI callback can nest into the GRO-cell of
another device if the underlying device is also using GRO-cell.
This is the case for IPsec over vxlan.
These two GRO-cells are separate devices. From lockdep's point of view
it is the same because each device is sharing the same lock class and so
it reports a possible deadlock assuming one device is nesting into
itself.

Hold the bh_lock only while accessing gro_cell::napi_skbs in
gro_cell_poll(). This reduces the locking scope and avoids acquiring the
same lock class multiple times.

Fixes: 25718fdcbd ("net: gro_cells: Use nested-BH locking for gro_cell")
Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/all/66664116-edb8-48dc-ad72-d5223696dd19@nvidia.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20251104153435.ty88xDQt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:41:29 -08:00
Tim Hostetler dfb073d32c ptp: Return -EINVAL on ptp_clock_register if required ops are NULL
ptp_clock should never be registered unless it stubs one of gettimex64()
or gettime64() and settime64(). WARN_ON_ONCE and error out if either set
of function pointers is null.

For consistency, n_alarm validation is also folded into the
WARN_ON_ONCE.

Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Tim Hostetler <thostet@google.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://patch.msgid.link/20251104225915.2040080-1-thostet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:39:17 -08:00
Michal Swiatkowski b1d16f7c00 libie: depend on DEBUG_FS when building LIBIE_FWLOG
LIBIE_FWLOG is unusable without DEBUG_FS. Mark it in Kconfig.

Fix build error on ixgbe when DEBUG_FS is not set. To not add another
layer of #if IS_ENABLED(LIBIE_FWLOG) in ixgbe fwlog code define debugfs
dentry even when DEBUG_FS isn't enabled. In this case the dummy
functions of LIBIE_FWLOG will be used, so not initialized dentry isn't a
problem.

Fixes: 641585bc97 ("ixgbe: fwlog support for e610")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/f594c621-f9e1-49f2-af31-23fbcb176058@roeck-us.net/
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251104172333.752445-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-05 17:38:03 -08:00
James Jones 664ce10246 drm/nouveau: Advertise correct modifiers on GB20x
8 and 16 bit formats use a different layout on
GB20x than they did on prior chips. Add the
corresponding DRM format modifiers to the list of
modifiers supported by the display engine on such
chips, and filter the supported modifiers for each
format based on its bytes per pixel in
nv50_plane_format_mod_supported().

Note this logic will need to be updated when GB10
support is added, since it is a GB20x chip that
uses the pre-GB20x sector layout for all formats.

Fixes: 6cc6e08d45 ("drm/nouveau/kms: add support for GB20x")
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
2025-11-06 11:02:08 +10:00
James Jones 1cf52a0d4b drm: define NVIDIA DRM format modifiers for GB20x
The layout of bits within the individual tiles
(referred to as sectors in the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)
changed for 8 and 16-bit surfaces starting in
Blackwell 2 GPUs (With the exception of GB10).
To denote the difference, extend the sector field
in the parametric format modifier definition used
to generate modifier values for NVIDIA hardware.

Without this change, it would be impossible to
differentiate the two layouts based on modifiers,
and as a result software could attempt to share
surfaces directly between pre-GB20x and GB20x
cards, resulting in corruption when the surface
was accessed on one of the GPUs after being
populated with content by the other.

Of note: This change causes the
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to
evaluate its "s" parameter twice, with the side
effects that entails. I surveyed all usage of the
modifier in the kernel and Mesa code, and that
does not appear to be problematic in any current
usage, but I thought it was worth calling out.

Fixes: 6cc6e08d45 ("drm/nouveau/kms: add support for GB20x")
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251030181153.1208-2-jajones@nvidia.com
2025-11-06 11:01:45 +10:00
Timur Tabi ebe7556050 drm/nouveau: set DMA mask before creating the flush page
Set the DMA mask before calling nvkm_device_ctor(), so that when the
flush page is created in nvkm_fb_ctor(), the allocation will not fail
if the page is outside of DMA address space, which can easily happen if
IOMMU is disable.  In such situations, you will get an error like this:

nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0).

Commit 38f5359354 ("rm/nouveau/pci: set streaming DMA mask early")
set the mask after calling nvkm_device_ctor(), but back then there was
no flush page being created, which might explain why the mask wasn't
set earlier.

Flush page allocation was added in commit 5728d06419 ("drm/nouveau/fb:
handle sysmem flush page from common code").  nvkm_fb_ctor() calls
alloc_page(), which can allocate a page anywhere in system memory, but
then calls dma_map_page() on that page.  But since the DMA mask is still
set to 32, the map can fail if the page is allocated above 4GB.  This is
easy to reproduce on systems with a lot of memory and IOMMU disabled.

An alternative approach would be to force the allocation of the flush
page to low memory, by specifying __GFP_DMA32.  However, this would
always allocate the page in low memory, even though the hardware can
access high memory.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20251014174512.3172102-1-ttabi@nvidia.com
2025-11-06 10:26:51 +10:00
Linus Torvalds dc77806cf3 Rust fixes for v6.18
Toolchain and infrastructure:
 
   - Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are
     enabled due to extra checking performed by the compiler and an
     upstream issue already fixed for Rust 1.93.0.
 
   - Fix future Rust 1.93.0 builds by supporting the stabilized name for
     the 'no-jump-tables' flag.
 
   - Fix a couple private/broken intra-doc links uncovered by the future
     move of pin-init to 'syn'.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmkLYvMACgkQGXyLc2ht
 IW2I9g/+IIP7rJEQug5EpyTuxO2GX1gKOf3lbV5YxpUi+BXIFL5ZY7Nlgi3EssSd
 Cj3LaoEiDzFeYewCnjhk3JAsuzmr/bEjN5xWiDT3Rk/yhBIX+oRBJjW+yze+gfoe
 27gqS20W08WSfIj2n91EDwbdrhiz0Lp87cMlcesdDVZKnY215Whf8zsYNGIGutSe
 ISdDPsHdoCe2RHoxLcP0Yo6OD/bRK7hTQxEp6rPh787+SK4znjdJWmOVoi4D/HdI
 fMHG3T8HfDIOTQJITDTlvne9CRvq6+JaUDQBPFDZBFwxA7S83SdiDDIqWb4N7zlh
 ETySFSblkrJdCdT2LelRu/JPv25h1vUvhQObNBY77enBeOxJdeJNWk7iY2p6GRCD
 4+qunebeRsLuAQqzAAhk5gLz5Jb7mPcGbhJzAebOlp/bixwS3YQb9710FvWwmjQY
 dPxbkxAQj1dAHgkVLDk91mzwU3R2m3OcWMrwC6rU2O2Evraq58B7N2xDOSSi1uzK
 e77oBk8CEuFLw/EIZq32dWhRf+9F8x1CrG4K6bTXTax1ZgHp/UZ4/yNrd5+7z2wZ
 5gVoCqswTqQg4YOPhwk+tACFDS0vDYMnsdT7BNNkRAUibBC7q52jbPpBsMBaJ3ji
 3VwS93idCkEGYYjNlPRKxtlL3nIFbbN4PbsvfjIKSauv2MyxV7Y=
 =0bvf
 -----END PGP SIGNATURE-----

Merge tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:

 - Fix/workaround a couple Rust 1.91.0 build issues when sanitizers are
   enabled due to extra checking performed by the compiler and an
   upstream issue already fixed for Rust 1.93.0

 - Fix future Rust 1.93.0 builds by supporting the stabilized name for
   the 'no-jump-tables' flag

 - Fix a couple private/broken intra-doc links uncovered by the future
   move of pin-init to 'syn'

* tag 'rust-fixes-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0
  rust: kbuild: workaround `rustdoc` doctests modifier bug
  rust: kbuild: treat `build_error` and `rustdoc` as kernel objects
  rust: condvar: fix broken intra-doc link
  rust: devres: fix private intra-doc link
2025-11-05 11:15:36 -08:00
Jason Gunthorpe afb47765f9 iommufd: Make vfio_compat's unmap succeed if the range is already empty
iommufd returns ENOENT when attempting to unmap a range that is already
empty, while vfio type1 returns success. Fix vfio_compat to match.

Fixes: d624d6652a ("iommufd: vfio container FD ioctl compatibility")
Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Reported-by: Alex Mastro <amastro@fb.com>
Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05 15:11:26 -04:00
Linus Torvalds 5624d4c378 platform-drivers-x86 for v6.18-3
Fixes and New Hotkey Support
 
 - input + dell-wmi-base: Electronic privacy screen on/off hotkey support
 
 - int3472: Fix unregister double free
 
 - wireless-hotkey: Fix Kconfig typo
 
 The following is an automated shortlog grouped by driver:
 
 dell-wmi-base:
  -  Handle electronic privacy screen on/off events
 
 Input:
  -  Add keycodes for electronic privacy screen on/off hotkeys
 
 int3472:
  -  Fix double free of GPIO device during unregister
 
 MAINTAINERS:
  -  Update int3472 maintainers
 
 x86: Kconfig:
  -  fix minor typo in help for WIRELESS_HOTKEY
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaQso/QAKCRBZrE9hU+XO
 Me2tAQCoij2NER2aThaFPzTjBfvIKF4DbpsSo9V0I2r+gR6xzAD/UWmliCDGQ0dV
 NS28/L982I716VK2Mv5SvdG9BKxAlwM=
 =Nnez
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and New Hotkey Support:

   - input + dell-wmi-base: Electronic privacy screen on/off hotkey
     support

   - int3472: Fix unregister double free

   - wireless-hotkey: Fix Kconfig typo"

* tag 'platform-drivers-x86-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform: x86: Kconfig: fix minor typo in help for WIRELESS_HOTKEY
  platform/x86: dell-wmi-base: Handle electronic privacy screen on/off events
  Input: Add keycodes for electronic privacy screen on/off hotkeys
  MAINTAINERS: Update int3472 maintainers
  platform/x86: int3472: Fix double free of GPIO device during unregister
2025-11-05 11:08:10 -08:00
Zilin Guan c367af440e btrfs: release root after error in data_reloc_print_warning_inode()
data_reloc_print_warning_inode() calls btrfs_get_fs_root() to obtain
local_root, but fails to release its reference when paths_from_inode()
returns an error. This causes a potential memory leak.

Add a missing btrfs_put_root() call in the error path to properly
decrease the reference count of local_root.

Fixes: b9a9a85059 ("btrfs: output affected files when relocation fails")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:12 +01:00
Zilin Guan 5fea61aa1c btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe()
scrub_raid56_parity_stripe() allocates a bio with bio_alloc(), but
fails to release it on some error paths, leading to a potential
memory leak.

Add the missing bio_put() calls to properly drop the bio reference
in those error cases.

Fixes: 1009254bf2 ("btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:12 +01:00
Filipe Manana bfe3d755ef btrfs: do not update last_log_commit when logging inode due to a new name
When logging that a new name exists, we skip updating the inode's
last_log_commit field to prevent a later explicit fsync against the inode
from doing nothing (as updating last_log_commit makes btrfs_inode_in_log()
return true). We are detecting, at btrfs_log_inode(), that logging a new
name is happening by checking the logging mode is not LOG_INODE_EXISTS,
but that is not enough because we may log parent directories when logging
a new name of a file in LOG_INODE_ALL mode - we need to check that the
logging_new_name field of the log context too.

An example scenario where this results in an explicit fsync against a
directory not persisting changes to the directory is the following:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  $ touch /mnt/foo

  $ sync

  $ mkdir /mnt/dir

  # Write some data to our file and fsync it.
  $ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/foo

  # Add a new link to our file. Since the file was logged before, we
  # update it in the log tree by calling btrfs_log_new_name().
  $ ln /mnt/foo /mnt/dir/bar

  # fsync the root directory - we expect it to persist the dentry for
  # the new directory "dir".
  $ xfs_io -c "fsync" /mnt

  <power fail>

After mounting the fs the entry for directory "dir" does not exists,
despite the explicit fsync on the root directory.

Here's why this happens:

1) When we fsync the file we log the inode, so that it's present in the
   log tree;

2) When adding the new link we enter btrfs_log_new_name(), and since the
   inode is in the log tree we proceed to updating the inode in the log
   tree;

3) We first set the inode's last_unlink_trans to the current transaction
   (early in btrfs_log_new_name());

4) We then eventually enter btrfs_log_inode_parent(), and after logging
   the file's inode, we call btrfs_log_all_parents() because the inode's
   last_unlink_trans matches the current transaction's ID (updated in the
   previous step);

5) So btrfs_log_all_parents() logs the root directory by calling
   btrfs_log_inode() for the root's inode with a log mode of LOG_INODE_ALL
   so that new dentries are logged;

6) At btrfs_log_inode(), because the log mode is LOG_INODE_ALL, we
   update root inode's last_log_commit to the last transaction that
   changed the inode (->last_sub_trans field of the inode), which
   corresponds to the current transaction's ID;

7) Then later when user space explicitly calls fsync against the root
   directory, we enter btrfs_sync_file(), which calls skip_inode_logging()
   and that returns true, since its call to btrfs_inode_in_log() returns
   true and there are no ordered extents (it's a directory, never has
   ordered extents). This results in btrfs_sync_file() returning without
   syncing the log or committing the current transaction, so all the
   updates we did when logging the new name, including logging the root
   directory,  are not persisted.

So fix this by but updating the inode's last_log_commit if we are sure
we are not logging a new name (if ctx->logging_new_name is false).

A test case for fstests will follow soon.

Reported-by: Vyacheslav Kovalevsky <slava.kovalevskiy.2014@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/03c5d7ec-5b3d-49d1-95bc-8970a7f82d87@gmail.com/
Fixes: 130341be7f ("btrfs: always update the logged transaction when logging new names")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:01:01 +01:00
Naohiro Aota 6a1ab50135 btrfs: zoned: fix stripe width calculation
The stripe offset calculation in the zoned code for raid0 and raid10
wrongly uses map->stripe_size to calculate it. In fact, map->stripe_size is
the size of the device extent composing the block group, which always is
the zone_size on the zoned setup.

Fix it by using BTRFS_STRIPE_LEN and BTRFS_STRIPE_LEN_SHIFT. Also, optimize
the calculation a bit by doing the common calculation only once.

Fixes: c0d90a79e8 ("btrfs: zoned: fix alloc_offset calculation for partly conventional block groups")
CC: stable@vger.kernel.org # 6.17+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:00:08 +01:00
Naohiro Aota 94f54924b9 btrfs: zoned: fix conventional zone capacity calculation
When a block group contains both conventional zone and sequential zone, the
capacity of the block group is wrongly set to the block group's full
length. The capacity should be calculated in btrfs_load_block_group_* using
the last allocation offset.

Fixes: 568220fa96 ("btrfs: zoned: support RAID0/1/10 on top of raid stripe tree")
CC: stable@vger.kernel.org # v6.12+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-05 20:00:06 +01:00
Pavel Begunkov 1fd5367391 io_uring: fix types for region size calulation
->nr_pages is int, it needs type extension before calculating the region
size.

Fixes: a90558b36c ("io_uring/memmap: helper for pinning region pages")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: style fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-05 11:45:07 -07:00
Christoph Hellwig 21ab5179aa xfs: fix zone selection in xfs_select_open_zone_mru
xfs_select_open_zone_mru needs to pass XFS_ZONE_ALLOC_OK to
xfs_try_use_zone because we only want to tightly pack into zones of the
same or a compatible temperature instead of any available zone.

This got broken in commit 0301dae732 ("xfs: refactor hint based zone
allocation"), which failed to update this particular caller when
switching to an enum.  xfs/638 sometimes, but not reliably fails due to
this change.

Fixes: 0301dae732 ("xfs: refactor hint based zone allocation")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:54:38 +01:00
Christoph Hellwig f5714a3c1a xfs: fix a rtgroup leak when xfs_init_zone fails
Drop the rtgrop reference when xfs_init_zone fails for a conventional
device.

Fixes: 4e4d520755 ("xfs: add the zoned space allocator")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:53:49 +01:00
Darrick J. Wong 8d7bba1e83 xfs: fix various problems in xfs_atomic_write_cow_iomap_begin
I think there are several things wrong with this function:

A) xfs_bmapi_write can return a much larger unwritten mapping than what
   the caller asked for.  We convert part of that range to written, but
   return the entire written mapping to iomap even though that's
   inaccurate.

B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
   unwritten mapping could be *smaller* than the write range (or even
   the hole range).  In this case, we convert too much file range to
   written state because we then return a smaller mapping to iomap.

C) It doesn't handle delalloc mappings.  This I covered in the patch
   that I already sent to the list.

D) Reassigning count_fsb to handle the hole means that if the second
   cmap lookup attempt succeeds (due to racing with someone else) we
   trim the mapping more than is strictly necessary.  The changing
   meaning of count_fsb makes this harder to notice.

E) The tracepoint is kinda wrong because @length is mutated.  That makes
   it harder to chase the data flows through this function because you
   can't just grep on the pos/bytecount strings.

F) We don't actually check that the br_state = XFS_EXT_NORM assignment
   is accurate, i.e that the cow fork actually contains a written
   mapping for the range we're interested in

G) Somewhat inadequate documentation of why we need to xfs_trim_extent
   so aggressively in this function.

H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
   the write range to s_maxbytes.

Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.

Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:52:49 +01:00
Darrick J. Wong 8d54eacd82 xfs: fix delalloc write failures in software-provided atomic writes
With the 20 Oct 2025 release of fstests, generic/521 fails for me on
regular (aka non-block-atomic-writes) storage:

QA output created by 521
dowrite: write: Input/output error
LOG DUMP (8553 total operations):
1(  1 mod 256): SKIPPED (no operation)
2(  2 mod 256): WRITE    0x7e000 thru 0x8dfff	(0x10000 bytes) HOLE
3(  3 mod 256): READ     0x69000 thru 0x79fff	(0x11000 bytes)
4(  4 mod 256): FALLOC   0x53c38 thru 0x5e853	(0xac1b bytes) INTERIOR
5(  5 mod 256): COPY 0x55000 thru 0x59fff	(0x5000 bytes) to 0x25000 thru 0x29fff
6(  6 mod 256): WRITE    0x74000 thru 0x88fff	(0x15000 bytes)
7(  7 mod 256): ZERO     0xedb1 thru 0x11693	(0x28e3 bytes)

with a warning in dmesg from iomap about XFS trying to give it a
delalloc mapping for a directio write.  Fix the software atomic write
iomap_begin code to convert the reservation into a written mapping.
This doesn't fix the data corruption problems reported by generic/760,
but it's a start.

Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-05 16:52:49 +01:00
Johannes Berg 2f6adeaf92 ath.git patches for v6.19
Highlights for some specific drivers include:
 
 ath10k:
 Add support for Factory Test TLV commands
 
 ath11k:
 Add support for Tx Power insertion
 
 ath12k:
 Add support for BSS color change
 
 And of course there is the usual set of cleanups and bug fixes across
 the entire family of "ath" drivers.
 
 We do expect to have one more pull request before the v6.19 merge
 window to pull in the refactored ath12k driver from the ath12k-ng
 branch.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ/mtSHzPUi16IfDEksFbugiYzLewUCaQjHOAAKCRAsFbugiYzL
 e8/7AQCUUwU91FS/AfOCOZKeIp8aRAumtlAp20qYLEAKhJh0PQD+M+JW8VDeZxMO
 uyj+YJZqcKgRz0GIJNgWPvi0cepWOgs=
 =bBd1
 -----END PGP SIGNATURE-----

Merge tag 'ath-next-20251103' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath into wireless-next

Jeff Johnson says:
==================
ath.git patches for v6.19

Highlights for some specific drivers include:

ath10k:
Add support for Factory Test TLV commands

ath11k:
Add support for Tx Power insertion

ath12k:
Add support for BSS color change

And of course there is the usual set of cleanups and bug fixes across
the entire family of "ath" drivers.

We do expect to have one more pull request before the v6.19 merge
window to pull in the refactored ath12k driver from the ath12k-ng
branch.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-05 16:29:11 +01:00
Johannes Berg 4c740c4d8b ath.git update for v6.18-rc5
Revert an ath12k change which resulted in a significance performance
 impact on WCN7850.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ/mtSHzPUi16IfDEksFbugiYzLewUCaQjAlgAKCRAsFbugiYzL
 e1g/AP0VjhhuC5zzElFTNf+5HTgNnXAvs6ghg2BUPkJug+X6xAD9GnKdlrqoo4qw
 iGex7lkBnIzPn+fTlF0xPHFDvF0cgww=
 =b8DL
 -----END PGP SIGNATURE-----

Merge tag 'ath-current-20251103' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git update for v6.18-rc5

Revert an ath12k change which resulted in a significance performance
impact on WCN7850.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-05 16:18:48 +01:00
Martin Willi c74619e760 wifi: mac80211_hwsim: Limit destroy_on_close radio removal to netgroup
hwsim radios marked destroy_on_close are removed when the Netlink socket
that created them is closed. As the portid is not unique across network
namespaces, closing a socket in one namespace may remove radios in another
if it has the destroy_on_close flag set.

Instead of matching the network namespace, match the netgroup of the radio
to limit radio removal to those that have been created by the closing
Netlink socket. The netgroup of a radio identifies the network namespace
it was created in, and matching on it removes a destroy_on_close radio
even if it has been moved to another namespace.

Fixes: 100cb9ff40 ("mac80211_hwsim: Allow managing radios from non-initial namespaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Link: https://patch.msgid.link/20251103082436.30483-1-martin@strongswan.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-05 16:18:16 +01:00
James Clark a50f7456f8 dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope
Clang doesn't like that (1ULL<<(64)) overflows when initializing a
global scope variable, even if that part of the ternary isn't used when
n = 64. The same initialization can be done without warnings in function
scopes, and GCC doesn't mind either way.

The build failure that highlighted this was already fixed in a different
way [1], which also has detailed links to the Clang issues. However it's
not going to be long before the same thing happens again, so it's better
to fix the root cause.

Fix it by using GENMASK_ULL() which does exactly the same thing, is much
more readable anyway, and doesn't have a shift that overflows.

[1]: https://lore.kernel.org/all/20250918-mmp-pdma-simplify-dma-addressing-v1-1-5c2be2b85696@riscstar.com/

Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20251030-james-fix-dma_bit_mask-v1-1-ad1ce7cfab6e@linaro.org
2025-11-05 13:43:41 +01:00
Pierre-Eric Pelloux-Prayer 487df8b698 drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb
The Mesa issue referenced below pointed out a possible deadlock:

[ 1231.611031]  Possible interrupt unsafe locking scenario:

[ 1231.611033]        CPU0                    CPU1
[ 1231.611034]        ----                    ----
[ 1231.611035]   lock(&xa->xa_lock#17);
[ 1231.611038]                                local_irq_disable();
[ 1231.611039]                                lock(&fence->lock);
[ 1231.611041]                                lock(&xa->xa_lock#17);
[ 1231.611044]   <Interrupt>
[ 1231.611045]     lock(&fence->lock);
[ 1231.611047]
                *** DEADLOCK ***

In this example, CPU0 would be any function accessing job->dependencies
through the xa_* functions that don't disable interrupts (eg:
drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()).

CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling
callback so in an interrupt context. It will deadlock when trying to
grab the xa_lock which is already held by CPU0.

Replacing all xa_* usage by their xa_*_irq counterparts would fix
this issue, but Christian pointed out another issue: dma_fence_signal
takes fence.lock and so does dma_fence_add_callback.

  dma_fence_signal() // locks f1.lock
  -> drm_sched_entity_kill_jobs_cb()
  -> foreach dependencies
     -> dma_fence_add_callback() // locks f2.lock

This will deadlock if f1 and f2 share the same spinlock.

To fix both issues, the code iterating on dependencies and re-arming them
is moved out to drm_sched_entity_kill_jobs_work().

Cc: stable@vger.kernel.org # v6.2+
Fixes: 2fdb8a8f07 ("drm/scheduler: rework entity flush, kill and fini")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
[phasta: commit message nits]
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
2025-11-05 12:29:52 +01:00