linux

Commit Graph

Author	SHA1	Message	Date
Matthieu Baerts (NGI0)	5c59df126b	selftests: mptcp: join: validate extra bind cases By design, an MPTCP connection will not accept extra subflows where no MPTCP listening sockets can accept such requests. In other words, it means that if the 'server' listens on a specific address / device, it cannot accept MP_JOIN sent to a different address / device. Except if there is another MPTCP listening socket accepting them. This is what the new tests are validating: - Forcing a bind on the main v4/v6 address, and checking that MP_JOIN to announced addresses are not accepted. - Also forcing a bind on the main v4/v6 address, but before, another listening socket is created to accept additional subflows. Note that 'mptcpize run nc -l' -- or something else only doing: socket(MPTCP), bind(<IP>), listen(0) -- would be enough, but here mptcp_connect is reused not to depend on another tool just for that. - Same as the previous one, but using v6 link-local addresses: this is a bit particular because it is required to specify the outgoing network interface when connecting to a link-local address announced by the other peer. When using the routing rules, this doesn't work (the outgoing interface is not known) ; but it does work with a 'laminar' endpoint having a specified interface. Note that extra small modifications are needed for these tests to work: - mptcp_connect's check_getpeername_connect() check should strip the specified interface when comparing addresses. - With IPv6 link-local addresses, it is required to wait for them to be ready (no longer in 'tentative' mode) before using them, otherwise the bind() will not be allowed. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/591 Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-4-b4166772d6bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:15:07 -08:00
Matthieu Baerts (NGI0)	4a6220a453	selftests: mptcp: join: do_transfer: reduce code dup The same extra long commands are present twice, with small differences: the variable for the stdin file is different. Use new dedicated variables in one command to avoid this code duplication. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-3-b4166772d6bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:15:06 -08:00
Matthieu Baerts (NGI0)	e461e8a799	mptcp: pm: in kernel: only use fullmesh endp if any Our documentation is saying that the in-kernel PM is only using fullmesh endpoints to establish subflows to announced addresses when at least one endpoint has a fullmesh flag. But this was not totally correct: only fullmesh endpoints were used if at least one endpoint from the same address family as the received ADD_ADDR has the fullmesh flag. This is confusing, and it seems clearer not to have differences depending on the address family. So, now, when at least one MPTCP endpoint has a fullmesh flag, the local addresses are picked from all fullmesh endpoints, which might be 0 if there are no endpoints for the correct address family. One selftest needs to be adapted for this behaviour change. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-2-b4166772d6bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:15:06 -08:00
Matthieu Baerts (NGI0)	f88191c7f3	mptcp: pm: in-kernel: record fullmesh endp nb Instead of iterating over all endpoints, under RCU read lock, just to check if one of them as the fullmesh flag, we can keep a counter of fullmesh endpoint, similar to what is done with the other flags. This counter is now checked, before iterating over all endpoints. Similar to the other counters, this new one is also exposed. A userspace app can then know when it is being used in a fullmesh mode, with potentially (too) many subflows. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251101-net-next-mptcp-fm-endp-nb-bind-v1-1-b4166772d6bb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:15:06 -08:00
Michael S. Tsirkin	c3838262b8	virtio_net: fix alignment for virtio_net_hdr_v1_hash Changing alignment of header would mean it's no longer safe to cast a 2 byte aligned pointer between formats. Use two 16 bit fields to make it 2 byte aligned as previously. This fixes the performance regression since commit ("virtio_net: enable gso over UDP tunnel support.") as it uses virtio_net_hdr_v1_hash_tunnel which embeds virtio_net_hdr_v1_hash. Pktgen in guest + XDP_DROP on TAP + vhost_net shows the TX PPS is recovered from 2.4Mpps to 4.45Mpps. Fixes: `56a06bd40f` ("virtio_net: enable gso over UDP tunnel support.") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251031060551.126-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:14:07 -08:00
Jakub Kicinski	b117befe8a	Merge branch 'net-mlx5e-reduce-interface-downtime-on-configuration-change' Tariq Toukan says: ==================== net/mlx5e: Reduce interface downtime on configuration change This series significantly reduces the interface downtime while swapping channels during a configuration change, on capable devices. Here we remove an old requirement on operations ordering that became obsolete on recent capable devices. This helps cutting the downtime by a factor of magnitude, ~80% in our example. Perf numbers: Measured the number of dropped packets in a simple ping flood test, during a configuration change operation, that switches the number of channels from 247 to 248. Before: 71 packets lost After: 15 packets lost, ~80% saving. ==================== Link: https://patch.msgid.link/1761831159-1013140-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:54 -08:00
Tariq Toukan	3b88a535a8	net/mlx5e: Defer channels closure to reduce interface down time Cap bit tis_tir_td_order=1 indicates that an old firmware requirement / limitation no longer exists. When unset, the latency of several firmware commands significantly increases with the presence of high number of co-existing channels (both old and new sets). Hence, we used to close unneeded old channels before invoking those firmware commands. Today, on capable devices, this is no longer the case. Minimize the interface down time by deferring the old channels closure, after the activation of the new ones. Perf numbers: Measured the number of dropped packets in a simple ping flood test, during a configuration change operation, that switches the number of channels from 247 to 248. Before: 71 packets lost After: 15 packets lost, ~80% saving. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-8-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:36 -08:00
Tariq Toukan	911e3a37b0	net/mlx5e: Pass old channels as argument to mlx5e_switch_priv_channels Let the caller function mlx5e_safe_switch_params() maintain a copy of the old channels, and pass it to mlx5e_switch_priv_channels(). This is in preparation for the next patch. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:36 -08:00
Tariq Toukan	477c352add	net/mlx5e: Do not re-apply TIR loopback configuration if not necessary On old firmware, (tis_tir_td_order=0), TIR of a transport domain should either be created after all SQs of the same domain, or TIR.self_lb_en should be reapplied using MODIFY_TIR, for self loopback filtering to function correctly. This is not necessary anymnore on new FW (tis_tir_td_order=1), thus there's no need for calling modify_tir operations after creating a new set of SQs to maintain the self loopback prevention functional. Skip these operations. This saves O(max_num_channels) MODIFY_TIR firmware commands in operations like interface up or channels configuration change. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:36 -08:00
Tariq Toukan	a4c81e72f1	net/mlx5: IPoIB, set self loopback prevention in TIR init In IPoIB, the self loopback prevention configuration apply in activation stage has two roles: fulfill a firmware requirement for old firmware (tis_tir_td_order=0), and update the proper configuration as it was not set in init. Here we set the proper configuration in init, to allow skipping the modify_tirs commands on new firmware in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:36 -08:00
Tariq Toukan	99b002018f	net/mlx5e: Allow setting self loopback prevention bits on TIR init Until now, IPoIB was creating TIRs without setting self loopback prevention, then modifying them in activation stage. This is a preparation patch, that will be used by IPoIB to init TIRs properly without the need for following calls of modify_tir. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:36 -08:00
Tariq Toukan	5c51a86122	net/mlx5e: Use TIR API in mlx5e_modify_tirs_lb() Extend the TIR API and use it in mlx5e_modify_tirs_lb() instead of the explicit modify_tir code. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:35 -08:00
Tariq Toukan	091400a5d4	net/mlx5e: Enhance function structures for self loopback prevention application The re-application of self loopback prevention attributes in TIRs is necessary in old firmwares (where tis_tir_td_order cap is cleared) after recreation of SQs. However, this is not needed in new firmware with tis_tir_td_order=1. As a preparation patch, enhance the function structures to differentiate between an explicit loopback prevention configuration apply, and the re-apply operation required by old firmware. Loopback selftests should now call mlx5e_modify_tirs_lb() directly, as their use case is not related to the firmware limitation. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1761831159-1013140-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:04:35 -08:00
Chu Guangqing	52665fcc22	xen/netfront: Comment Correction: Fix Spelling Error and Description of Queue Quantity Rules The original comments contained spelling errors and incomplete logical descriptions, which could easily lead to misunderstandings of the code logic. The specific modifications are as follows: Correct the spelling error by changing "inut max" to "but not exceed the maximum limit"; Add the note "If the user has not specified a value, the default maximum limit is 8" to clarify the default value logic; Improve the coherence of the statement to make the queue quantity rules clearer. After the modification, the comments can accurately reflect the code behavior of "taking the smaller value between the number of CPUs and the default maximum limit of 8 for the number of queues", enhancing code maintainability. Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://patch.msgid.link/20251103032212.2462-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:01:01 -08:00
Chu Guangqing	96c68954cd	net: sungem_phy: Fix a typo error in sungem_phy Fix a spelling mistakes for regularly Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251103054443.2878-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:55 -08:00
Chu Guangqing	9781642e58	veth: Fix a typo error in veth Fix a spellling error for resources Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251103055351.3150-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:50 -08:00
Chu Guangqing	2428803d5e	gtp: Fix a typo error for size Fix the spelling error of "size". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251103060504.3524-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:43 -08:00
Chu Guangqing	f4b2786fb1	virtio_net: Fix a typo error in virtio_net Fix the spelling error of "separate". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251103074305.4727-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:37 -08:00
Yongpeng Yang	1e39da974c	fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT When simulating an nvme device on qemu with both logical_block_size and physical_block_size set to 8 KiB, an error trace appears during partition table reading at boot time. The issue is caused by inode->i_blkbits being larger than PAGE_SHIFT, which leads to a left shift of -1 and triggering a UBSAN warning. [ 2.697306] ------------[ cut here ]------------ [ 2.697309] UBSAN: shift-out-of-bounds in fs/crypto/inline_crypt.c:336:37 [ 2.697311] shift exponent -1 is negative [ 2.697315] CPU: 3 UID: 0 PID: 274 Comm: (udev-worker) Not tainted 6.18.0-rc2+ #34 PREEMPT(voluntary) [ 2.697317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 2.697320] Call Trace: [ 2.697324] <TASK> [ 2.697325] dump_stack_lvl+0x76/0xa0 [ 2.697340] dump_stack+0x10/0x20 [ 2.697342] __ubsan_handle_shift_out_of_bounds+0x1e3/0x390 [ 2.697351] bh_get_inode_and_lblk_num.cold+0x12/0x94 [ 2.697359] fscrypt_set_bio_crypt_ctx_bh+0x44/0x90 [ 2.697365] submit_bh_wbc+0xb6/0x190 [ 2.697370] block_read_full_folio+0x194/0x270 [ 2.697371] ? __pfx_blkdev_get_block+0x10/0x10 [ 2.697375] ? __pfx_blkdev_read_folio+0x10/0x10 [ 2.697377] blkdev_read_folio+0x18/0x30 [ 2.697379] filemap_read_folio+0x40/0xe0 [ 2.697382] filemap_get_pages+0x5ef/0x7a0 [ 2.697385] ? mmap_region+0x63/0xd0 [ 2.697389] filemap_read+0x11d/0x520 [ 2.697392] blkdev_read_iter+0x7c/0x180 [ 2.697393] vfs_read+0x261/0x390 [ 2.697397] ksys_read+0x71/0xf0 [ 2.697398] __x64_sys_read+0x19/0x30 [ 2.697399] x64_sys_call+0x1e88/0x26a0 [ 2.697405] do_syscall_64+0x80/0x670 [ 2.697410] ? __x64_sys_newfstat+0x15/0x20 [ 2.697414] ? x64_sys_call+0x204a/0x26a0 [ 2.697415] ? do_syscall_64+0xb8/0x670 [ 2.697417] ? irqentry_exit_to_user_mode+0x2e/0x2a0 [ 2.697420] ? irqentry_exit+0x43/0x50 [ 2.697421] ? exc_page_fault+0x90/0x1b0 [ 2.697422] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 2.697425] RIP: 0033:0x75054cba4a06 [ 2.697426] Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08 [ 2.697427] RSP: 002b:00007fff973723a0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 [ 2.697430] RAX: ffffffffffffffda RBX: 00005ea9a2c02760 RCX: 000075054cba4a06 [ 2.697432] RDX: 0000000000002000 RSI: 000075054c190000 RDI: 000000000000001b [ 2.697433] RBP: 00007fff973723c0 R08: 0000000000000000 R09: 0000000000000000 [ 2.697434] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 2.697434] R13: 00005ea9a2c027c0 R14: 00005ea9a2be5608 R15: 00005ea9a2be55f0 [ 2.697436] </TASK> [ 2.697436] ---[ end trace ]--- This situation can happen for block devices because when CONFIG_TRANSPARENT_HUGEPAGE is enabled, the maximum logical_block_size is 64 KiB. set_init_blocksize() then sets the block device inode->i_blkbits to 13, which is within this limit. File I/O does not trigger this problem because for filesystems that do not support the FS_LBS feature, sb_set_blocksize() prevents sb->s_blocksize_bits from being larger than PAGE_SHIFT. During inode allocation, alloc_inode()->inode_init_always() assigns inode->i_blkbits from sb->s_blocksize_bits. Currently, only xfs_fs_type has the FS_LBS flag, and since xfs I/O paths do not reach submit_bh_wbc(), it does not hit the left-shift underflow issue. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Fixes: `47dd675323` ("block/bdev: lift block size restrictions to 64k") Cc: stable@vger.kernel.org [EB: use folio_pos() and consolidate the two shifts by i_blkbits] Link: https://lore.kernel.org/r/20251105003642.42796-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-11-04 16:37:38 -08:00
Jakub Kicinski	31113a452a	Merge branch 'net-stmmac-multi-interface-stmmac' Russell King says: ==================== net: stmmac: multi-interface stmmac This series adds a callback for platform glue to configure the stmmac core interface mode depending on the PHY interface mode that is being used. This is currently only called just before the dwmac core is reset since these signals are latched on reset. Included in this series are changes to s32 to move its PHY_INTF_SEL_x definitions out of the way of the dwmac core's signals which has more entitlement to use this name. We convert dwmac-imx as an example. Including other platform glue would make this series excessively large, but once this core code is merged, the individual platform glue updates can be posted one after another as they will be independent of each other. It is hoped that this callback can be used in future to reconfigure the dwmac core when the interface mode changes to support PHYs that change their interface mode, but we're nowhere near being able to do that yet. ==================== Link: https://patch.msgid.link/aQiWzyrXU_2hGJ4j@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:26 -08:00
Russell King (Oracle)	eaca1a4dc5	net: stmmac: imx: use ->set_phy_intf_sel() Rather than placing the phy_intf_sel() setup in the ->init() method, move it to the new ->set_phy_intf_sel() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt5C-0000000ChpR-2kAB@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:26 -08:00
Russell King (Oracle)	38cd4e84b3	net: stmmac: imx: cleanup arguments for set_intf_mode() method Pass the imx_priv_data instead of the plat_stmmacenet_data into the set_intf_mode() SoC specific methods. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt57-0000000ChpL-25kS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:26 -08:00
Russell King (Oracle)	35103babce	net: stmmac: imx: simplify set_intf_mode() implementations Simplify the set_intf_mode() implementations, testing the phy_intf_sel value rather than the PHY interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt52-0000000ChpG-1bsd@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	c012710c14	net: stmmac: imx: use stmmac_get_phy_intf_sel() i.MX implementations other than IMX8DXL involve setting the dwmac core phy_intf_sel input. Use stmmac_get_phy_intf_sel() to decode the PHY interface mode to the phy_intf_sel value, validating the result, and passing it into the implementation specific .set_intf_mode() method rather than each .set_intf_mode() method doing this. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4x-0000000ChpA-1Edr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	d73c1dccfb	net: stmmac: imx: use FIELD_PREP()/FIELD_GET() for PHY_INTF_SEL_x Use FIELD_PREP()/FIELD_GET() in the functions to construct the PHY interface selection bitfield or to extract its value. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4s-0000000Chp4-0kwf@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	8233cc4397	net: stmmac: imx: convert to PHY_INTF_SEL_xxx Convert dwmac-imx to use the PHY_INTF_SEL_xxx definitions rather than constants via: - ensuring that the prefix for the MASK and value definitions is the same. - using FIELD_PREP() to shift the PHY_INTF_SEL_xxx definition to the appropriate bitfield. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4n-0000000Choy-0IeG@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	1b6aa81c85	net: stmmac: add support for configuring the phy_intf_sel inputs When dwmac is synthesised with support for multiple PHY interfaces, the core provides phy_intf_sel inputs, sampled on reset, to configure the PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to determine the dwmac phy_intf_sel input value, and provide a new platform method called with this value just before we issue a soft reset to the dwmac core. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4h-0000000Chos-3wxX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	b459790d3f	net: stmmac: add stmmac_get_phy_intf_sel() Provide a function to translate the PHY interface mode to the phy_intf_sel pin configuration for dwmac1000 and dwmac4 cores that support multiple interfaces. We currently handle MII, GMII, RGMII, SGMII, RMII and REVMII, but not TBI, RTBI nor SMII as drivers do not appear to use these three and the driver doesn't currently support these. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4c-0000000Choe-3SII@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	4a4692e909	net: stmmac: add phy_intf_sel and ACTPHYIF definitions Add definitions for the active PHY interface found in DMA hardware feature register 0, and also used to configure the core in multi- interface designs via phy_intf_sel. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1vFt4X-0000000ChoY-30p9@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	553f23d195	net: stmmac: s32: move PHY_INTF_SEL_x definitions out of the way S32's PHY_INTF_SEL_x definitions conflict with those for the dwmac cores as they use a different bitmapping. Add a S32 prefix so that they are unique. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Jan Petrous (OSS) <jan.petrous@oss.nxp.com> Link: https://patch.msgid.link/E1vFt4S-0000000ChoS-2Ahi@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:25 -08:00
Russell King (Oracle)	dec568a36f	net: stmmac: imx: use phylink's interface mode for set_clk_tx_rate() imx_dwmac_set_clk_tx_rate() is passed the interface mode from phylink which will be the same as plat_dat->phy_interface. Use the passed-in interface mode rather than plat_dat->phy_interface. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4N-0000000ChoM-1llp@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:21:17 -08:00
Eric Dumazet	46173144e0	net: mark deliver_skb() as unlikely and not inlined deliver_skb() should not be inlined as is it not called in the fast path. Add unlikely() clauses giving hints to the compiler about this fact. Before this patch: size net/core/dev.o text data bss dec hex filename 121794 13330 176 135300 21084 net/core/dev.o __netif_receive_skb_core() size on x86_64 : 4080 bytes. After: size net/core/dev.o text data bss dec hex filenamee 120330 13338 176 133844 20ad4 net/core/dev.o __netif_receive_skb_core() size on x86_64 : 2781 bytes. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251103165256.1712169-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:08:25 -08:00
Adrian Moreno	105bae3218	rtnetlink: honor RTEXT_FILTER_SKIP_STATS in IFLA_STATS Gathering interface statistics can be a relatively expensive operation on certain systems as it requires iterating over all the cpus. RTEXT_FILTER_SKIP_STATS was first introduced [1] to skip AF_INET6 statistics from interface dumps and it was then extended [2] to also exclude IFLA_VF_INFO. The semantics of the flag does not seem to be limited to AF_INET or VF statistics and having a way to query the interface status (e.g: carrier, address) without retrieving its statistics seems reasonable. So this patch extends the use RTEXT_FILTER_SKIP_STATS to also affect IFLA_STATS. [1] https://lore.kernel.org/all/20150911204848.GC9687@oracle.com/ [2] https://lore.kernel.org/all/20230611105108.122586-1-gal@nvidia.com/ Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20251103154006.1189707-1-amorenoz@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 16:07:37 -08:00
Shuhao Fu	b540de9e3b	smb: client: fix refcount leak in smb2_set_path_attr Fix refcount leak in `smb2_set_path_attr` when path conversion fails. Function `cifs_get_writable_path` returns `cfile` with its reference counter `cfile->count` increased on success. Function `smb2_compound_op` would decrease the reference counter for `cfile`, as stated in its comment. By calling `smb2_rename_path`, the reference counter of `cfile` would leak if `cifs_convert_path_to_utf16` fails in `smb2_set_path_attr`. Fixes: `8de9e86c67` ("cifs: create a helper to find a writeable handle by path name") Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Signed-off-by: Steve French <stfrench@microsoft.com>	2025-11-04 16:03:56 -06:00
Linus Torvalds	17d85f33a8	RDMA v6.18 first rc pull request Following fixes: - Memory leak in bnxt GSI qp path - Failure in irdma registering large MRs - Failure to clean out the right CQ table entry in irdma - Invalid vf_id in some cases - Incorrect error unwind in EFA CQ create - hns doesn't use the optimal cq/qp relationships for it's HW banks - hns reports the wrong SGE size to userspace for its QPs - Corruption of the hns work queue entries in some cases -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCaQoEZAAKCRCFwuHvBreF YbdjAP9MfjyZvmOu5H7yqwdIgCNeduANVWbEzSXUEU6j5LRCywD+O5UnkPbHQ9ko k+jo07V6Ra/FuTVmr1Wf/Nfa9JmPqwc= =dnEg -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: - Memory leak in bnxt GSI qp path - Failure in irdma registering large MRs - Failure to clean out the right CQ table entry in irdma - Invalid vf_id in some cases - Incorrect error unwind in EFA CQ create - hns doesn't use the optimal cq/qp relationships for it's HW banks - hns reports the wrong SGE size to userspace for its QPs - Corruption of the hns work queue entries in some cases * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: MAINTAINERS: Update irdma maintainers RDMA/irdma: Fix vf_id size to u16 to avoid overflow RDMA/hns: Remove an extra blank line RDMA/hns: Fix wrong WQE data when QP wraps around RDMA/hns: Fix the modification of max_send_sge RDMA/hns: Fix recv CQ and QP cache affinity RDMA/uverbs: Fix umem release in UVERBS_METHOD_CQ_CREATE RDMA/irdma: Set irdma_cq cq_num field during CQ create RDMA/irdma: Fix SD index calculation RDMA/bnxt_re: Fix a potential memory leak in destroy_gsi_sqp	2025-11-05 04:08:55 +09:00
Rong Zhang	6dd97ceb64	drm/amd/display: Fix NULL deref in debugfs odm_combine_segments When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: `07926ba8a4` ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f19bbecd34`) Cc: stable@vger.kernel.org	2025-11-04 13:40:42 -05:00
Philip Yang	597eb70f7f	drm/amdkfd: Don't clear PT after process killed If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "ERROR Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `10c382ec6c`) Cc: stable@vger.kernel.org	2025-11-04 13:40:42 -05:00
Alex Deucher	7c5609b72b	drm/amdgpu/smu: Handle S0ix for vangogh Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fd39b5a583`) Cc: <stable@vger.kernel.org> # c81f5cebe849: drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() Cc: <stable@vger.kernel.org>	2025-11-04 13:39:27 -05:00
Alex Deucher	c81f5cebe8	drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `960e30a61e`)	2025-11-04 13:28:20 -05:00
Alex Hung	fdc93beead	drm/amd/display: Fix black screen with HDMI outputs [Why & How] This fixes the black screen issue on certain APUs with HDMI, accompanied by the following messages: amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info frame on connector DP-1: -22 amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm] Cannot find any crtc or sizes Fixes: `489f0f600c` ("drm/amd/display: Fix DVI-D/HDMI adapters") Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `678c901443`)	2025-11-04 13:24:40 -05:00
Mario Limonciello (AMD)	3362692fea	drm/amd/display: Don't stretch non-native images by default in eDP commit `978fa2f6d0` ("drm/amd/display: Use scaling for non-native resolutions on eDP") started using the GPU scaler hardware to scale when a non-native resolution was picked on eDP. This scaling was done to fill the screen instead of maintain aspect ratio. The idea was supposed to be that if a different scaling behavior is preferred then the compositor would request it. The not following aspect ratio behavior however isn't desirable, so adjust it to follow aspect ratio and still try to fill screen. Note: This will lead to black bars in some cases for non-native resolutions. Compositors can request the previous behavior if desired. Fixes: `978fa2f6d0` ("drm/amd/display: Use scaling for non-native resolutions on eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `825df7ff4b`)	2025-11-04 13:23:24 -05:00
Yang Wang	37e3567dee	drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init() Use the correct label to complete all cleanup work. Fixes: `4d154b1ca5` ("drm/amd/pm: Add support for DPM policies") Fixes: `25e82f2e2c` ("drm/amd/pm: Add temperature metrics sysfs entry") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4c4c138a1c`)	2025-11-04 13:18:05 -05:00
Alex Deucher	90b75e12a6	drm/amdgpu: set default gfx reset masks for gfx6-8 These were not set so soft recovery was inadvertantly disabled. Fixes: `6ac55eab4f` ("drm/amdgpu: move reset support type checks into the caller") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1972763505`)	2025-11-04 13:15:43 -05:00
Miguel Ojeda	789521b471	rust: kbuild: support `-Cjump-tables=n` for Rust 1.93.0 Rust 1.93.0 (expected 2026-01-22) is stabilizing `-Zno-jump-tables` [1][2] as `-Cjump-tables=n` [3]. Without this change, one would eventually see: RUSTC L rust/core.o error: unknown unstable option: `no-jump-tables` Thus support the upcoming version. Link: https://github.com/rust-lang/rust/issues/116592 [1] Link: https://github.com/rust-lang/rust/pull/105812 [2] Link: https://github.com/rust-lang/rust/pull/145974 [3] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Acked-by: Nicolas Schier <nsc@kernel.org> Link: https://patch.msgid.link/20251101094011.1024534-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>	2025-11-04 19:11:39 +01:00
Eric Biggers	44e8241c51	lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIAN On big endian arm kernels, the arm optimized Curve25519 code produces incorrect outputs and fails the Curve25519 test. This has been true ever since this code was added. It seems that hardly anyone (or even no one?) actually uses big endian arm kernels. But as long as they're ostensibly supported, we should disable this code on them so that it's not accidentally used. Note: for future-proofing, use !CPU_BIG_ENDIAN instead of CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could get removed in the future if big endian support gets dropped. Fixes: `d8f1308a02` ("crypto: arm/curve25519 - wire up NEON implementation") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-11-04 09:36:22 -08:00
Nathan Chancellor	2b81082ad3	lib/crypto: curve25519-hacl64: Fix older clang KASAN workaround for GCC Commit `2f13daee2a` ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older") inadvertently disabled KASAN in curve25519-hacl64.o for GCC unconditionally because clang-min-version will always evaluate to nothing for GCC. Add a check for CONFIG_CC_IS_CLANG to avoid applying the workaround for GCC, which is only needed for clang-17 and older. Cc: stable@vger.kernel.org Fixes: `2f13daee2a` ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251103-curve25519-hacl64-fix-kasan-workaround-v2-1-ab581cbd8035@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-11-04 09:35:58 -08:00
Sean Christopherson	ae431059e7	KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying When unbinding a memslot from a guest_memfd instance, remove the bindings even if the guest_memfd file is dying, i.e. even if its file refcount has gone to zero. If the memslot is freed before the file is fully released, nullifying the memslot side of the binding in kvm_gmem_release() will write to freed memory, as detected by syzbot+KASAN: ================================================================== BUG: KASAN: slab-use-after-free in kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 Write of size 8 at addr ffff88807befa508 by task syz.0.17/6022 CPU: 0 UID: 0 PID: 6022 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xca/0x240 mm/kasan/report.c:482 kasan_report+0x118/0x150 mm/kasan/report.c:595 kvm_gmem_release+0x176/0x440 virt/kvm/guest_memfd.c:353 __fput+0x44c/0xa70 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xe9/0x130 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fbeeff8efc9 </TASK> Allocated by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 poison_kmalloc_redzone mm/kasan/common.c:397 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:414 kasan_kmalloc include/linux/kasan.h:262 [inline] __kmalloc_cache_noprof+0x3e2/0x700 mm/slub.c:5758 kmalloc_noprof include/linux/slab.h:957 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] kvm_set_memory_region+0x747/0xb90 virt/kvm/kvm_main.c:2104 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 6023: kasan_save_stack mm/kasan/common.c:56 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:77 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584 poison_slab_object mm/kasan/common.c:252 [inline] __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284 kasan_slab_free include/linux/kasan.h:234 [inline] slab_free_hook mm/slub.c:2533 [inline] slab_free mm/slub.c:6622 [inline] kfree+0x19a/0x6d0 mm/slub.c:6829 kvm_set_memory_region+0x9c4/0xb90 virt/kvm/kvm_main.c:2130 kvm_vm_ioctl_set_memory_region+0x6f/0xd0 virt/kvm/kvm_main.c:2154 kvm_vm_ioctl+0x957/0xc60 virt/kvm/kvm_main.c:5201 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Deliberately don't acquire filemap invalid lock when the file is dying as the lifecycle of f_mapping is outside the purview of KVM. Dereferencing the mapping is probably fine, but there's no need to invalidate anything as memslot deletion is responsible for zapping SPTEs, and the only code that can access the dying file is kvm_gmem_release(), whose core code is mutually exclusive with unbinding. Note, the mutual exclusivity is also what makes it safe to access the bindings on a dying gmem instance. Unbinding either runs with slots_lock held, or after the last reference to the owning "struct kvm" is put, and kvm_gmem_release() nullifies the slot pointer under slots_lock, and puts its reference to the VM after that is done. Reported-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68fa7a22.a70a0220.3bf6c6.008b.GAE@google.com Tested-by: syzbot+2479e53d0db9b32ae2aa@syzkaller.appspotmail.com Fixes: `a7800aa80e` ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Cc: stable@vger.kernel.org Cc: Hillf Danton <hdanton@sina.com> Reviewed-By: Vishal Annapurve <vannapurve@google.com> Link: https://patch.msgid.link/20251104011205.3853541-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-04 09:16:53 -08:00
Maxim Levitsky	fd92bd3b44	KVM: SVM: switch to raw spinlock for svm->ir_list_lock Use a raw spinlock for vcpu_svm.ir_list_lock as the lock can be taken during schedule() via kvm_sched_out() => __avic_vcpu_put(), and "normal" spinlocks are sleepable locks when PREEMPT_RT=y. This fixes the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 Not tainted ----------------------------- qemu-kvm/38299 is trying to lock: ff11000239725600 (&svm->ir_list_lock){....}-{3:3}, at: __avic_vcpu_put+0xfd/0x300 [kvm_amd] other info that might help us debug this: context-{5:5} 2 locks held by qemu-kvm/38299: #0: ff11000239723ba8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x240/0xe00 [kvm] #1: ff11000b906056d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2e/0x130 stack backtrace: CPU: 1 UID: 0 PID: 38299 Comm: qemu-kvm Kdump: loaded Not tainted 6.12.0-146.1640_2124176644.el10.x86_64+debug #1 PREEMPT(voluntary) Hardware name: AMD Corporation QUARTZ/QUARTZ, BIOS RQZ100AB 09/14/2023 Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 __lock_acquire+0x921/0xb80 lock_acquire.part.0+0xbe/0x270 _raw_spin_lock_irqsave+0x46/0x90 __avic_vcpu_put+0xfd/0x300 [kvm_amd] svm_vcpu_put+0xfa/0x130 [kvm_amd] kvm_arch_vcpu_put+0x48c/0x790 [kvm] kvm_sched_out+0x161/0x1c0 [kvm] prepare_task_switch+0x36b/0xf60 __schedule+0x4f7/0x1890 schedule+0xd4/0x260 xfer_to_guest_mode_handle_work+0x54/0xc0 vcpu_run+0x69a/0xa70 [kvm] kvm_arch_vcpu_ioctl_run+0xdc0/0x17e0 [kvm] kvm_vcpu_ioctl+0x39f/0xe00 [kvm] Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://patch.msgid.link/20251030194130.307900-1-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-04 09:14:28 -08:00
Sean Christopherson	aaac099459	KVM: SVM: Make avic_ga_log_notifier() local to avic.c Make amd_iommu_register_ga_log_notifier() a local symbol now that it's defined and used purely within avic.c. No functional change intended. Fixes: `4bdec12aa8` ("KVM: SVM: Detect X2APIC virtualization (x2AVIC) support") Link: https://patch.msgid.link/20251016190643.80529-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-04 09:14:28 -08:00
Sean Christopherson	adc6ae9729	KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit Unregister the GALog notifier (used to get notified of wake events for blocking vCPUs) on kvm-amd.ko exit so that a KVM or IOMMU driver bug that results in a spurious GALog event "only" results in a spurious IRQ, and doesn't trigger a use-after-free due to executing unloaded module code. Fixes: `5881f73757` ("svm: Introduce AMD IOMMU avic_ga_log_notifier") Reported-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Closes: https://lore.kernel.org/all/20250918130320.GA119526@k08j02272.eu95sqa Link: https://patch.msgid.link/20251016190643.80529-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-11-04 09:14:27 -08:00

... 13 14 15 16 17 ...

1399042 Commits All Branches Search

1399042 Commits

All Branches