linux

Commit Graph

Author	SHA1	Message	Date
Oleg Nesterov	8661bb9c71	selftests/pidfd: fixes syscall number defines I had to spend some (a lot;) time to understand why pidfd_info_test (and more) fails with my patch under qemu on my machine ;) Until I applied the patch below. I think it is a bad idea to do the things like #ifndef __NR_clone3 #define __NR_clone3 -1 #endif because this can hide a problem. My working laptop runs Fedora-23 which doesn't have __NR_clone3/etc in /usr/include/. So "make" happily succeeds, but everything fails and it is not clear why. Link: https://lore.kernel.org/r/20250323174518.GB834@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-25 14:59:05 +01:00
Filipe Xavier	474eecc882	selftests: livepatch: test if ftrace can trace a livepatched function This new test makes sure that ftrace can trace a function that was introduced by a livepatch. Signed-off-by: Filipe Xavier <felipeaggger@gmail.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-2-d9d7cc386c75@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-03-25 14:52:32 +01:00
Filipe Xavier	2ca7cd8020	selftests: livepatch: add new ftrace helpers functions Add new ftrace helpers functions cleanup_tracing, trace_function and check_traced_functions. Signed-off-by: Filipe Xavier <felipeaggger@gmail.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250324-ftrace-sftest-livepatch-v3-1-d9d7cc386c75@gmail.com Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-03-25 14:46:02 +01:00
Yi Liu	d57a1fb342	iommufd/selftest: Add coverage for iommufd pasid attach/detach This tests iommufd pasid attach/replace/detach. Link: https://patch.msgid.link/r/20250321171940.7213-19-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-25 10:18:31 -03:00
Dmitry Safonov	edbac739e4	selftests/net: Drop timeout argument from test_client_verify() It's always TEST_TIMEOUT_SEC, with an unjustified exception in rst test, that is more paranoia-long timeout rather than based on requirements. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-7-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:30 -07:00
Dmitry Safonov	1e1738faa2	selftests/net: Delete timeout from test_connect_socket() Unused: it's always either the default timeout or asynchronous connect(). Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-6-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:30 -07:00
Dmitry Safonov	266ed1ace8	selftests/net: Print the testing side in unsigned-md5 As both client and server print the same test name on failure or pass, add "[server]" so that it's more obvious from a log which side printed "ok" or "not ok". Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-5-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:30 -07:00
Dmitry Safonov	3f36781e57	selftests/net: Add mixed select()+polling mode to TCP-AO tests Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one, TEST_RETRANSMIT_SEC is used for operations that are expected to succeed in order for a test to pass. It is usually not consumed and exists only to avoid indefinite test run if the operation didn't complete. The second one, TEST_RETRANSMIT_SEC exists for the tests that checking operations, that are expected to fail/timeout. It is shorter as it is fully consumed, with an expectation that if operation didn't succeed during that period, it will timeout. And the related test that expects the timeout is passing. The actual operation failure is then cross-verified by other means like counters checks. The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact initial TCP timeout. So, in case the initial segment gets lost (quite unlikely on local veth interface between two net namespaces, yet happens in slow VMs), the retransmission never happens and as a result, the test is not actually testing the functionality. Which in the end fails counters checks. As I want tcp_ao selftests to be fast and finishing in a reasonable amount of time on manual run, I didn't consider increasing TEST_RETRANSMIT_SEC. Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever to make the initial TCP timeout shorter. But as it's not a socket bpf attached thing, but sock_ops (attaches to cgroups), the selftests would have to use libbpf, which I wanted to avoid if not absolutely required. Instead, use a mixed select() and counters polling mode with the longer TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It actually not only allows losing segments and succeeding after the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes the tests expecting timeout/failure pass faster. The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny "wrong snd id", which checks for no key on SYN-ACK for which there is no counter in the kernel (see tcp_make_synack()). Yet it can be speed up by poking skpair from the trace event (see trace_tcp_ao_synack_no_key). Fixes: `ed9d09b309` ("selftests/net: Add a test for TCP-AO keys matching") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-4-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:30 -07:00
Dmitry Safonov	5a0a3193f6	selftests/net: Fetch and check TCP-MD5 counters There are related TCP-MD5 <=> TCP and TCP-MD5 <=> TCP-AO tests that can benefit from checking the related counters, not only from validating operations timeouts. It also prepares the code for introduction of mixed select()+poll mode, see the follow-up patches. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-3-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:30 -07:00
Dmitry Safonov	1fe4221093	selftests/net: Provide tcp-ao counters comparison helper Rename __test_tcp_ao_counters_cmp() into test_assert_counters_ao() and test_tcp_ao_key_counters_cmp() into test_assert_counters_key() as they are asserts, rather than just compare functions. Provide test_cmp_counters() helper, that's going to be used to compare ao_info and netns counters as a stop condition for polling the sockets. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-2-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:29 -07:00
Dmitry Safonov	65ffdf31be	selftests/net: Print TCP flags in more common format Before: ># 13145[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: !FS!R!P!., keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 After: ># 13487[lib/ftrace-tcp.c:427] trace event filter tcp_ao_key_not_found [2001:db8:1::1:-1 => 2001:db8:254::1:7010, L3index 0, flags: S, keyid: 100, rnext: 100, maclen: -1, sne: -1] = 1 For the history, I think the initial format was to emphasize the absence of flags as well as their presence (!R meant no RST flag). But looking again, it's just unreadable and hard to understand. Make it the standard/expected one. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250319-tcp-ao-selftests-polling-v2-1-da48040153d1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 06:10:29 -07:00
Song Liu	59481b8bd0	selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACE CONFIG_KPROBES_ON_FTRACE is required for test-kprobe. Skip test-kprobe when CONFIG_KPROBES_ON_FTRACE is not set. Since some kernel may not have /proc/config.gz, grep for kprobe_ftrace_ops from /proc/kallsyms to check whether CONFIG_KPROBES_ON_FTRACE is enabled. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/r/20250318181518.1055532-1-song@kernel.org [pmladek@suse.com: Call grep with -q option.] Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2025-03-25 13:36:40 +01:00
Linus Torvalds	a49a879f0a	Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfeo6ERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gl9w//ZRxGhf3eXYVyPW35aoqaRe7W3fBJhaPn eLP0JuuglZO3JSTJfmbTmSXrNN4kkzI3nnuTnchK0iYZ+2Tn4E0iaz4tm5sjL5T1 KW9ajv7QQxMOwXS69hrklY/eGxVimfs4mIyhxEhxLjPvzbGJOVS6WFgvaYH5klQQ 7sPYXzrNGZJhGCmRqCelHSPhl7b6YVS/yWtMe+BpPBn1AqIQN+O+S/Kbu7dQokmq 7MNy4eUe7GYgnSB7Qhq/jNSqjmGWsVwhMiuoDxx7GvShM73+kYatQZT0H+qzTxzp 90rIPcPTNCOlKJO3xSWqXStcMH00MbplOhKdEU6SzeM+xC2aD2t4GoIG0EFxIQdS X0wh40Psm5GehThhpmMBJdmM4Le4TTSkHBhedpQ+sp+6BIX4A9hoeCoybIlcngaJ W89YKHqC5ruPSRDOzyaTMSypaEGH5VHP6AMj0CmkuyHRbxADsMazifpOVOw9AvVk IR0USCzyenjLMiugRrJgRpy8R2Q1jp35bP0e7Y4QvmydEGC1Wl5mW5Y8B9Zdlk0C iuxoqwntem7PAkEA2JejW6rzuRICxXloKv+xjtQjKzXpK+pwhhSp830zT8gaWZ1Z hdQRz7gSlJg3JzWinX8+XNusdHw9TxCUqwgLw8486nzNp01GU62gby5DWgECEdae 2IyRRSd0FyA= =GsK4 -----END PGP SIGNATURE----- Merge tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Miscellaneous x86 cleanups by Arnd Bergmann, Charles Han, Mirsad Todorovac, Randy Dunlap, Thorsten Blum and Zhang Kunbo" * tag 'x86-cleanups-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/coco: Replace 'static const cc_mask' with the newly introduced cc_get_mask() function x86/delay: Fix inconsistent whitespace selftests/x86/syscall: Fix coccinelle WARNING recommending the use of ARRAY_SIZE() x86/platform: Fix missing declaration of 'x86_apple_machine' x86/irq: Fix missing declaration of 'io_apic_irqs' x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()	2025-03-24 22:39:53 -07:00
Linus Torvalds	71b639af06	x86/fpu updates for v6.15: - Improve crypto performance by making kernel-mode FPU reliably usable in softirqs ((Eric Biggers) - Fully optimize out WARN_ON_FPU() (Eric Biggers) - Initial steps to support Support Intel APX (Advanced Performance Extensions) (Chang S. Bae) - Fix KASAN for arch_dup_task_struct() (Benjamin Berg) - Refine and simplify the FPU magic number check during signal return (Chang S. Bae) - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav Spassov) - selftests/x86/xstate: Introduce common code for testing extended states (Chang S. Bae) - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros Bizjak) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfepZQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1grLw/+Lt71PQWu/uVt3/dU0VkpppTqRmUTujuR XVwFlBVxMFw9A8wi56bLD3WT28mKWr0SepynBt1Wr7/pZCnaW4SH/91ERChRqTzU ifawmqQqhrVhFlXziDOKp9lcDy9f7NjJVKBfEBa1VBQGC0Q8FzeasOhCwTJw8qXa Lj2z8zenhDq6NBkk22VlScc1CvsUBiXppvm8uk3md7HKaDqdzmCakm/a0FgaEppt K6P/u1iaVvGXme2CHSskCshkYoEFIF6LRNYnkVloXsVV4AeWeLaJ54xW+syPd/4H EX6oMLifdXCmxmsmi3LPRUjBgdSMsDaAkLsPNoX1w4uWjsihqyUl4wTEpobMIuu1 PWOrCxQKxtaGoECJ0nsE4uR7MZEQ0sYCGKv4JSiiXeUVf/EDRuUBfvBCoGlDWXYA GZcoMmH+BtcYuQdzbrc/vWkfo3bpTxL3x5zsgtFkQL3xyzhugRPNgeOn9yuSZ9x6 AD8QS0G6uRcc9a5ZeeTEcYxoIE/+vvfnh1wfMjisehkVn179ixfZdgcSGrZJUNyH a7LmUmwLvLYNO5DUUGs1upN8YHP7jiohNc/r2ZC1IX5LHbuK1gyXA4xo6hAZ8r/7 XZ2FfRp0dr3PvuWIwq2v6T5JHvALADsKCAjqvNdwHLkxF87ygixT+B87wTA3H6ov LSY10A/eRx4= =U7PR -----END PGP SIGNATURE----- Merge tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/fpu updates from Ingo Molnar: - Improve crypto performance by making kernel-mode FPU reliably usable in softirqs ((Eric Biggers) - Fully optimize out WARN_ON_FPU() (Eric Biggers) - Initial steps to support Support Intel APX (Advanced Performance Extensions) (Chang S. Bae) - Fix KASAN for arch_dup_task_struct() (Benjamin Berg) - Refine and simplify the FPU magic number check during signal return (Chang S. Bae) - Fix inconsistencies in guest FPU xfeatures (Chao Gao, Stanislav Spassov) - selftests/x86/xstate: Introduce common code for testing extended states (Chang S. Bae) - Misc fixes and cleanups (Borislav Petkov, Colin Ian King, Uros Bizjak) * tag 'x86-fpu-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu/xstate: Fix inconsistencies in guest FPU xfeatures x86/fpu: Clarify the "xa" symbolic name used in the XSTATE* macros x86/fpu: Use XSAVE{,OPT,C,S} and XRSTOR{,S} mnemonics in xstate.h x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs x86/fpu/xstate: Simplify print_xstate_features() x86/fpu: Refine and simplify the magic number check during signal return selftests/x86/xstate: Fix spelling mistake "hader" -> "header" x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct() vmlinux.lds.h: Remove entry to place init_task onto init_stack selftests/x86/avx: Add AVX tests selftests/x86/xstate: Clarify supported xstates selftests/x86/xstate: Consolidate test invocations into a single entry selftests/x86/xstate: Introduce signal ABI test selftests/x86/xstate: Refactor ptrace ABI test selftests/x86/xstate: Refactor context switching test selftests/x86/xstate: Enumerate and name xstate components selftests/x86/xstate: Refactor XSAVE helpers for general use selftests/x86: Consolidate redundant signal helper functions x86/fpu: Fix guest FPU state buffer allocation size x86/fpu: Fully optimize out WARN_ON_FPU()	2025-03-24 22:27:18 -07:00
Linus Torvalds	e34c38057a	[ Merge note: this pull request depends on you having merged two locking commits in the locking tree, part of the locking-core-2025-03-22 pull request. ] x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Bootup: Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/ DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M tJj1oc2TIZw= =Dcb3 -----END PGP SIGNATURE----- Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye" * tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits) zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault x86/asm: Make asm export of __ref_stack_chk_guard unconditional x86/mm: Only do broadcast flush from reclaim if pages were unmapped perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones perf/x86/intel, x86/cpu: Simplify Intel PMU initialization x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions x86/asm: Use asm_inline() instead of asm() in clwb() x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h> x86/hweight: Use asm_inline() instead of asm() x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm() x86/hweight: Use named operands in inline asm() x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP x86/head/64: Avoid Clang < 17 stack protector in startup code x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro x86/cpu/intel: Limit the non-architectural constant_tsc model checks x86/mm/pat: Replace Intel x86_model checks with VFM ones x86/cpu/intel: Fix fast string initialization for extended Families ...	2025-03-24 22:06:11 -07:00
Linus Torvalds	32b22538be	Scheduler updates for v6.15: [ Merge note, these two commits are identical: - `f3fa0e40df` ("sched/clock: Don't define sched_clock_irqtime as static key") - `b9f2b29b94` ("sched: Don't define sched_clock_irqtime as static key") The first one is a cherry-picked version of the second, and the first one is already upstream. ] Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfejsoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ivkhAAwBF2tYRBS1oIHcC/OKK3JJoHVDp2LFbU 9sm5S3ZlGD/Ns2fbpY+9A8UFgUFfjYiTSV7hvf2B9Vge0XSxTmMNFu/MdxLBbo9r w6GSeNcNDQKpjEGLkrmPFsa2fiYI4dmH0IzDbS9V2cNPk470QBKjAKXNPaSER691 n2wLnQq+m5o4gXnPjnSz6RrrzisRnm2GOWnDV/iqR47pZFNlX2wWlo3s5r7//Hw0 a+QfEfpgKehhy/VSDXmSAgpqnNffjc78yBV6LNoVUddwahnOWiQMS3XViOqgy5VO jUGBrzW+sKkdRMBppxwJ/0XWgHGC27amIgnU0ZE5u+eiUEu8H9qWl1cRCFxyeB0O 8+WNfwmkH+FPWUdsn84kdePhSsZy6HfM6h44Xe0hx1V7tQXEXfbPzK3TnQg8Ktt1 Ky6ctbZt4cGpqGQuIqvba21A/racrD/DgvB7mHeZksnqZoKTDwxhT/nlQGpuwPoy SJYd1ynFVJvfC69SwMdwnaglimvEZx1GfT0o5XtCMslY5NkWCou5u+e65WX7ccU5 94wBCwI1/+KiFMJZp6TlPw07Q/Hsj9dDryxLc3OunU3zMVt3++1GZnBS/eb5FN1A 5TlAEpxgH9c5Q4/XJFCvx21DrTVHSuIrR6naK91bgqHo0cEfaHGtO/ejPtJGSfxe YIFnnu1dhRw= =SuQE -----END PGP SIGNATURE----- Merge tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...	2025-03-24 21:28:12 -07:00
Linus Torvalds	3ba7dfb8da	RCU pull request for v6.15 This pull request contains the following branches: docs.2025.02.04a: - Add broken-timing possibility to stallwarn.rst. - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr(). - Document self-propagating callbacks. - Point call_srcu() to call_rcu() for detailed memory ordering. - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header. - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text. - Remove references to old grace-period-wait primitives. srcu.2025.02.05a: - Introduce srcu_read_{un,}lock_fast(), which is similar to srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the cost of calling synchronize_rcu() in synchronize_srcu(). Moreover, by returning the percpu offset of the counter at srcu_read_lock_fast() time, srcu_read_unlock_fast() can save extra pointer dereferencing, which makes it faster than srcu_read_{un,}lock_lite(). srcu_read_{un,}lock_fast() are intended to replace rcu_read_{un,}lock_trace() if possible. torture.2025.02.05a: - Add get_torture_init_jiffies() to return the start time of the test. - Add a test_boost_holdoff module parameter to allow delaying boosting tests when building rcutorture as built-in. - Add grace period sequence number logging at the beginning and end of failure/close-call results. - Switch to hexadecimal for the expedited grace period sequence number in the rcu_exp_grace_period trace point. - Make cur_ops->format_gp_seqs take buffer length. - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool. - Complain when invalid SRCU reader_flavor is specified. - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU uses atomics even when percpu ops are NMI safe, and use the Kconfig for SRCU lockdep testing. misc.2025.03.04a: - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing. - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes(). - Fix get_state_synchronize_rcu_full() GP-start detection. - Move RCU Tasks self-tests to core_initcall(). - Print segment lengths in show_rcu_nocb_gp_state(). - Make RCU watch ct_kernel_exit_state() warning. - Flush console log from kernel_power_off(). - rcutorture: Allow a negative value for nfakewriters. - rcu: Update TREE05.boot to test normal synchronize_rcu(). - rcu: Use _full() API to debug synchronize_rcu(). lazypreempt.2025.03.04a: Make RCU handle PREEMPT_LAZY better: - Fix header guard for rcu_all_qs(). - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY. - Update __cond_resched comment about RCU quiescent states. - Handle unstable rdp in rcu_read_unlock_strict(). - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y. - osnoise: Provide quiescent states. - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y combination. - Limit PREEMPT_RCU configurations. - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfeBLQACgkQSXnow7UH +rh11Qf/Rt6IZJ/YT/V9Sd+8hMx4O0BMh779pr9cD6mbAG+FDk2Yeva1m8vIdFOb qId6oc8K/ef2JfFjSn0oHMzQP2D3XUyiJWPNbBDHv/D8Os8GZgjzu8dkxVkSbdbY OxtvIflbcqFN1JDJfGKZnTEW0/YxGqfnS9b6R7iyyA7SOGQ/WubGOE5qNCqPufc9 zJiP+qTUFYQzCIiPlEJul39o9KboPogbt3QAAQjWmi3utd77ehJnm/15FvAjyau4 uhC2cnGfMY535rQaiaQeBQ/IHIowKripCq0JQFvcUNdyArZM3HOI2x79+2II6ft7 mjHskNODOIJHfW2o1RzQ0yRYAywFIg== =J+mH -----END PGP SIGNATURE----- Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Boqun Feng: "Documentation: - Add broken-timing possibility to stallwarn.rst - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr() - Document self-propagating callbacks - Point call_srcu() to call_rcu() for detailed memory ordering - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text - Remove references to old grace-period-wait primitives srcu: - Introduce srcu_read_{un,}lock_fast(), which is similar to srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the cost of calling synchronize_rcu() in synchronize_srcu() Moreover, by returning the percpu offset of the counter at srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid extra pointer dereferencing, which makes it faster than srcu_read_{un,}lock_lite() srcu_read_{un,}lock_fast() are intended to replace rcu_read_{un,}lock_trace() if possible RCU torture: - Add get_torture_init_jiffies() to return the start time of the test - Add a test_boost_holdoff module parameter to allow delaying boosting tests when building rcutorture as built-in - Add grace period sequence number logging at the beginning and end of failure/close-call results - Switch to hexadecimal for the expedited grace period sequence number in the rcu_exp_grace_period trace point - Make cur_ops->format_gp_seqs take buffer length - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool - Complain when invalid SRCU reader_flavor is specified - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU uses atomics even when percpu ops are NMI safe, and use the Kconfig for SRCU lockdep testing Misc: - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes() - Fix get_state_synchronize_rcu_full() GP-start detection - Move RCU Tasks self-tests to core_initcall() - Print segment lengths in show_rcu_nocb_gp_state() - Make RCU watch ct_kernel_exit_state() warning - Flush console log from kernel_power_off() - rcutorture: Allow a negative value for nfakewriters - rcu: Update TREE05.boot to test normal synchronize_rcu() - rcu: Use _full() API to debug synchronize_rcu() Make RCU handle PREEMPT_LAZY better: - Fix header guard for rcu_all_qs() - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY - Update __cond_resched comment about RCU quiescent states - Handle unstable rdp in rcu_read_unlock_strict() - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y - osnoise: Provide quiescent states - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y combination - Limit PREEMPT_RCU configurations - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y" * tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits) rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y rcu: limit PREEMPT_RCU configurations rcutorture: Update ->extendables check for lazy preemption rcutorture: Update rcutorture_one_extend_check() for lazy preemption osnoise: provide quiescent states rcu: Use _full() API to debug synchronize_rcu() rcu: Update TREE05.boot to test normal synchronize_rcu() rcutorture: Allow a negative value for nfakewriters Flush console log from kernel_power_off() context_tracking: Make RCU watch ct_kernel_exit_state() warning rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state() rcu-tasks: Move RCU Tasks self-tests to core_initcall() rcu: Fix get_state_synchronize_rcu_full() GP-start detection torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe() srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing rcutorture: Complain when invalid SRCU reader_flavor is specified rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool rcutorture: Make cur_ops->format_gp_seqs take buffer length rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output ...	2025-03-24 19:41:37 -07:00
Linus Torvalds	418becac37	nolibc changes for 6.15 Changes ------- * 32bit s390 support * opendir() and friends * openat() support * sscanf() support * various cleanups -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTg4lxklFHAidmUs57B+h1jyw5bOAUCZ8w24QAKCRDB+h1jyw5b OIRTAQD1KvUV0qYOKU3xqOiUxDe84DyFv4mgu1iZoqmd0po/CQEAs8HU4XWp4Ls0 tPkcLqlhiVPgwhtipjklW8uy6JcFzAI= =+Zma -----END PGP SIGNATURE----- Merge tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - 32bit s390 support - opendir() and friends - openat() support - sscanf() support - various cleanups [ Paul has just forwarded the pull request from Thomas Weißschuh, so the tag signature is from Thomas, not Paul - Linus ] * tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits) tools/nolibc: don't use asm/ UAPI headers selftests/nolibc: stop testing constructor order selftests/nolibc: use O_RDONLY flag instead of 0 tools/nolibc: drop outdated example from overview comment tools/nolibc: process open() vararg as mode_t tools/nolibc: always use openat(2) instead of open(2) tools/nolibc: add support for openat(2) selftests/nolibc: add armthumb configuration selftests/nolibc: explicitly enable ARM mode Revert "selftests: kselftest: Fix build failure with NOLIBC" tools/nolibc: add support for [v]sscanf() tools/nolibc: add support for 32-bit s390 selftests/nolibc: rename s390 to s390x selftests/nolibc: only run constructor tests on nolibc selftests/nolibc: split up architecture list in run-tests.sh tools/nolibc: add support for directory access tools/nolibc: add support for sys_llseek() selftests/nolibc: always keep test kernel configuration up to date selftests/nolibc: execute defconfig before other targets selftests/nolibc: drop call to mrproper target ...	2025-03-24 17:59:29 -07:00
Linus Torvalds	bcb044256d	sched_ext: Changes for v6.15 - Add mechanism to count and report internal events. This significantly improves visibility on subtle corner conditions. - The default idle CPU selection logic is revamped and improved in multiple ways including being made topology aware. - sched_ext was disabling ttwu_queue for simplicity, which can be costly when hardware topology is more complex. Implement SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively enable ttwu_queue. - tools/sched_ext updates to improve compatibility among others. - Other misc updates and fixes. - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite fixes and resolve conflicts. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ999Sg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGf/KAQCoMTVOBpQT9gCaCKDOmrVJTwi6boEoV5WnGZzw PDr0vwEAq36iz4no6Y5THcN/DCx+52IiS0zuhPy3rBZVo11TMgU= =iQ+A -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add mechanism to count and report internal events. This significantly improves visibility on subtle corner conditions. - The default idle CPU selection logic is revamped and improved in multiple ways including being made topology aware. - sched_ext was disabling ttwu_queue for simplicity, which can be costly when hardware topology is more complex. Implement SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively enable ttwu_queue. - tools/sched_ext updates to improve compatibility among others. - Other misc updates and fixes. - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite fixes and resolve conflicts. * tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits) sched_ext: idle: Refactor scx_select_cpu_dfl() sched_ext: idle: Honor idle flags in the built-in idle selection policy sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local() sched_ext: Add trace point to track sched_ext core events sched_ext: Change the event type from u64 to s64 sched_ext: Documentation: add task lifecycle summary tools/sched_ext: Provide a compatible helper for scx_bpf_events() selftests/sched_ext: Add NUMA-aware scheduler test tools/sched_ext: Provide consistent access to scx flags sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior sched_ext: idle: Introduce scx_bpf_nr_node_ids() sched_ext: idle: Introduce node-aware idle cpu kfunc helpers sched_ext: idle: Per-node idle cpumasks sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE sched_ext: idle: Make idle static keys private sched/topology: Introduce for_each_node_numadist() iterator mm/numa: Introduce nearest_node_nodemask() nodemask: numa: reorganize inclusion path nodemask: add nodes_copy() tools/sched_ext: Sync with scx repo ...	2025-03-24 17:23:48 -07:00
Linus Torvalds	11c2b2e332	seccomp updates for v6.15-rc1 - avoid the lock trip seccomp_filter_release in common case (Mateusz Guzik) - remove unused 'sd' argument through-out (Oleg Nesterov) - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hJKgAKCRA2KwveOeQk u3UDAQCUPSJ+iNGEsh9ErHJZWmg+LQJ999fOVrWWz+onjXFoQgD/cEHyLtSnz1Oa RbdGEqkBkRTXgOdkcIW5pJ4lBtfIpQQ= =z/8H -----END PGP SIGNATURE----- Merge tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - avoid the lock trip seccomp_filter_release in common case (Mateusz Guzik) - remove unused 'sd' argument through-out (Oleg Nesterov) - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64 * tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: avoid the lock trip seccomp_filter_release in common case seccomp: remove the 'sd' argument from __seccomp_filter() seccomp: remove the 'sd' argument from __secure_computing() seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER seccomp/mips: change syscall_trace_enter() to use secure_computing() selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64	2025-03-24 15:34:38 -07:00
Linus Torvalds	06961fbbbd	move-lib-kunit for v6.15-rc1 - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir Duberstein) - lib/math: Add int_log test suite (Bruno Sobreira França) - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin) - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego Vieira) - unicode: refactor selftests into KUnit (Gabriela Bittencourt) - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein) - printf: convert self-test to KUnit (Tamir Duberstein) - scanf: convert self-test to KUnit (Tamir Duberstein) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hCvgAKCRA2KwveOeQk u1nzAP9F/vQZUPkU9ADuqdcbyyEXlTzNk8R5rC2e1w+uKzJx+QD9EAbeCv9ZLdC0 KQF0pWVYCJtiSMEhkiMS/bMmpRCgwQ8= =VYZG -----END PGP SIGNATURE----- Merge tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull lib kunit selftest move from Kees Cook: "This is a one-off tree to coordinate the move of selftests out of lib/ and into lib/tests/. A separate tree was used for this to keep the paths sane with all the work in the same place. - move lib/ selftests into lib/tests/ (Kees Cook, Gabriela Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir Duberstein) - lib/math: Add int_log test suite (Bruno Sobreira França) - lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin) - lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego Vieira) - unicode: refactor selftests into KUnit (Gabriela Bittencourt) - lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein) - printf: convert self-test to KUnit (Tamir Duberstein) - scanf: convert self-test to KUnit (Tamir Duberstein)" * tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits) scanf: break kunit into test cases scanf: convert self-test to KUnit scanf: remove redundant debug logs scanf: implicate test line in failure messages printf: implicate test line in failure messages printf: break kunit into test cases printf: convert self-test to KUnit kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR() kunit/fortify: Expand testing of __compiletime_strlen() kunit/stackinit: Use fill byte different from Clang i386 pattern kunit/overflow: Fix DEFINE_FLEX tests for counted_by selftests: remove reference to prime_numbers.sh MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING lib/prime_numbers: convert self-test to KUnit lib/math: Add Kunit test suite for gcd() unicode: kunit: change tests filename and path unicode: kunit: refactor selftest to kunit tests lib/tests/kfifo_kunit.c: add tests for the kfifo structure lib: Move KUnit tests into tests/ subdirectory lib/math: Add int_log test suite ...	2025-03-24 15:15:11 -07:00
Amit Cohen	36ed81bcad	selftests: vxlan_bridge: Test flood with unresolved FDB entry Extend flood test to configure FDB entry with unresolved destination IP, check that packets are not sent twice. Without the previous patch which handles such scenario in mlxsw, the tests fail: $ TESTS='test_flood' ./vxlan_bridge_1d.sh Running tests with UDP port 4789 TEST: VXLAN: flood [ OK ] TEST: VXLAN: flood, unresolved FDB entry [FAIL] vx2 ns2: Expected to capture 10 packets, got 20. $ TESTS='test_flood' ./vxlan_bridge_1q.sh INFO: Running tests with UDP port 4789 TEST: VXLAN: flood vlan 10 [ OK ] TEST: VXLAN: flood vlan 20 [ OK ] TEST: VXLAN: flood vlan 10, unresolved FDB entry [FAIL] vx10 ns2: Expected to capture 10 packets, got 20. TEST: VXLAN: flood vlan 20, unresolved FDB entry [FAIL] vx20 ns2: Expected to capture 10 packets, got 20. With the previous patch, the tests pass. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-24 15:09:31 -07:00
Gal Pressman	c61209eeb0	selftests: drv-net: rss_ctx: Don't assume indirection table is present The test_rss_context_dump() test assumes the indirection table is always supported, which is not true for all drivers, e.g., virtio_net when VIRTIO_NET_F_RSS is disabled. Skip the check if 'indir' is not present. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-24 12:22:37 -07:00
Peter Seiderer	3099f9e156	selftest: net: update proc_net_pktgen (add more imix_weights test cases) Add more imix_weights test cases (for incomplete input). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-24 12:01:38 -07:00
Linus Torvalds	130e696aa6	vfs-6.15-rc1.mount.namespace -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90r2wAKCRCRxhvAZXjc ouC6AQCk3MoqskN0WeNcaZT23dB7dHbEhf/7YXOFC9MFRMKXqQD9Fbn95+GuIe3U nBVPbVyQfDtfXE08ml6gbDJrCsbkkQI= =Xm1C -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount namespace updates from Christian Brauner: "This expands the ability of anonymous mount namespaces: - Creating detached mounts from detached mounts Currently, detached mounts can only be created from attached mounts. This limitaton prevents various use-cases. For example, the ability to mount a subdirectory without ever having to make the whole filesystem visible first. The current permission modelis: (1) Check that the caller is privileged over the owning user namespace of it's current mount namespace. (2) Check that the caller is located in the mount namespace of the mount it wants to create a detached copy of. While it is not strictly necessary to do it this way it is consistently applied in the new mount api. This model will also be used when allowing the creation of detached mount from another detached mount. The (1) requirement can simply be met by performing the same check as for the non-detached case, i.e., verify that the caller is privileged over its current mount namespace. To meet the (2) requirement it must be possible to infer the origin mount namespace that the anonymous mount namespace of the detached mount was created from. The origin mount namespace of an anonymous mount is the mount namespace that the mounts that were copied into the anonymous mount namespace originate from. In order to check the origin mount namespace of an anonymous mount namespace the sequence number of the original mount namespace is recorded in the anonymous mount namespace. With this in place it is possible to perform an equivalent check (2') to (2). The origin mount namespace of the anonymous mount namespace must be the same as the caller's mount namespace. To establish this the sequence number of the caller's mount namespace and the origin sequence number of the anonymous mount namespace are compared. The caller is always located in a non-anonymous mount namespace since anonymous mount namespaces cannot be setns()ed into. The caller's mount namespace will thus always have a valid sequence number. The owning namespace of any mount namespace, anonymous or non-anonymous, can never change. A mount attached to a non-anonymous mount namespace can never change mount namespace. If the sequence number of the non-anonymous mount namespace and the origin sequence number of the anonymous mount namespace match, the owning namespaces must match as well. Hence, the capability check on the owning namespace of the caller's mount namespace ensures that the caller has the ability to copy the mount tree. - Allow mount detached mounts on detached mounts Currently, detached mounts can only be mounted onto attached mounts. This limitation makes it impossible to assemble a new private rootfs and move it into place. Instead, a detached tree must be created, attached, then mounted open and then either moved or detached again. Lift this restriction. In order to allow mounting detached mounts onto other detached mounts the same permission model used for creating detached mounts from detached mounts can be used (cf. above). Allowing to mount detached mounts onto detached mounts leaves three cases to consider: (1) The source mount is an attached mount and the target mount is a detached mount. This would be equivalent to moving a mount between different mount namespaces. A caller could move an attached mount to a detached mount. The detached mount can now be freely attached to any mount namespace. This changes the current delegatioh model significantly for no good reason. So this will fail. (2) Anonymous mount namespaces are always attached fully, i.e., it is not possible to only attach a subtree of an anoymous mount namespace. This simplifies the implementation and reasoning. Consequently, if the anonymous mount namespace of the source detached mount and the target detached mount are the identical the mount request will fail. (3) The source mount's anonymous mount namespace is different from the target mount's anonymous mount namespace. In this case the source anonymous mount namespace of the source mount tree must be freed after its mounts have been moved to the target anonymous mount namespace. The source anonymous mount namespace must be empty afterwards. By allowing to mount detached mounts onto detached mounts a caller may do the following: fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE) fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE) fd_tree1 and fd_tree2 refer to two different detached mount trees that belong to two different anonymous mount namespace. It is important to note that fd_tree1 and fd_tree2 both refer to the root of their respective anonymous mount namespaces. By allowing to mount detached mounts onto detached mounts the caller may now do: move_mount(fd_tree1, "", fd_tree2, "", MOVE_MOUNT_F_EMPTY_PATH \| MOVE_MOUNT_T_EMPTY_PATH) This will cause the detached mount referred to by fd_tree1 to be mounted on top of the detached mount referred to by fd_tree2. Thus, the detached mount fd_tree1 is moved from its separate anonymous mount namespace into fd_tree2's anonymous mount namespace. It also means that while fd_tree2 continues to refer to the root of its respective anonymous mount namespace fd_tree1 doesn't anymore. This has the consequence that only fd_tree2 can be moved to another anonymous or non-anonymous mount namespace. Moving fd_tree1 will now fail as fd_tree1 doesn't refer to the root of an anoymous mount namespace anymore. Now fd_tree1 and fd_tree2 refer to separate detached mount trees referring to the same anonymous mount namespace. This is conceptually fine. The new mount api does allow for this to happen already via: mount -t tmpfs tmpfs /mnt mkdir -p /mnt/A mount -t tmpfs tmpfs /mnt/A fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE \| AT_RECURSIVE) fd_tree4 = open_tree(-EBADF, "/mnt/A", 0) Both fd_tree3 and fd_tree4 refer to two different detached mount trees but both detached mount trees refer to the same anonymous mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3 may be moved another mount namespace as fd_tree3 refers to the root of the anonymous mount namespace just while fd_tree4 doesn't. However, there's an important difference between the fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example. Closing fd_tree4 and releasing the respective struct file will have no further effect on fd_tree3's detached mount tree. However, closing fd_tree3 will cause the mount tree and the respective anonymous mount namespace to be destroyed causing the detached mount tree of fd_tree4 to be invalid for further mounting. By allowing to mount detached mounts on detached mounts as in the fd_tree1/fd_tree2 example both struct files will affect each other. Both fd_tree1 and fd_tree2 refer to struct files that have FMODE_NEED_UNMOUNT set. To handle this we use the fact that @fd_tree1 will have a parent mount once it has been attached to @fd_tree2. When dissolve_on_fput() is called the mount that has been passed in will refer to the root of the anonymous mount namespace. If it doesn't it would mean that mounts are leaked. So before allowing to mount detached mounts onto detached mounts this would be a bug. Now that detached mounts can be mounted onto detached mounts it just means that the mount has been attached to another anonymous mount namespace and thus dissolve_on_fput() must not unmount the mount tree or free the anonymous mount namespace as the file referring to the root of the namespace hasn't been closed yet. If it had been closed yet it would be obvious because the mount namespace would be NULL, i.e., the @fd_tree1 would have already been unmounted. If @fd_tree1 hasn't been unmounted yet and has a parent mount it is safe to skip any cleanup as closing @fd_tree2 will take care of all cleanup operations. - Allow mount propagation for detached mount trees In commit `ee2e3f5062` ("mount: fix mounting of detached mounts onto targets that reside on shared mounts") I fixed a bug where propagating the source mount tree of an anonymous mount namespace into a target mount tree of a non-anonymous mount namespace could be used to trigger an integer overflow in the non-anonymous mount namespace causing any new mounts to fail. The cause of this was that the propagation algorithm was unable to recognize mounts from the source mount tree that were already propagated into the target mount tree and then reappeared as propagation targets when walking the destination propagation mount tree. When fixing this I disabled mount propagation into anonymous mount namespaces. Make it possible for anonymous mount namespace to receive mount propagation events correctly. This is now also a correctness issue now that we allow mounting detached mount trees onto detached mount trees. Mark the source anonymous mount namespace with MNTNS_PROPAGATING indicating that all mounts belonging to this mount namespace are currently in the process of being propagated and make the propagation algorithm discard those if they appear as propagation targets" * tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) selftests: test subdirectory mounting selftests: add test for detached mount tree propagation fs: namespace: fix uninitialized variable use mount: handle mount propagation for detached mount trees fs: allow creating detached mounts from fsmount() file descriptors selftests: seventh test for mounting detached mounts onto detached mounts selftests: sixth test for mounting detached mounts onto detached mounts selftests: fifth test for mounting detached mounts onto detached mounts selftests: fourth test for mounting detached mounts onto detached mounts selftests: third test for mounting detached mounts onto detached mounts selftests: second test for mounting detached mounts onto detached mounts selftests: first test for mounting detached mounts onto detached mounts fs: mount detached mounts onto detached mounts fs: support getname_maybe_null() in move_mount() selftests: create detached mounts from detached mounts fs: create detached mounts from detached mounts fs: add may_copy_tree() fs: add fastpath for dissolve_on_fput() fs: add assert for move_mount() fs: add mnt_ns_empty() helper ...	2025-03-24 11:41:41 -07:00
Linus Torvalds	74adf9e353	vfs-6.15-rc1.nsfs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rXwAKCRCRxhvAZXjc ogrYAP4kWLzxD2IbBGSs5kBkKdc9qNGMtjrOn5InHm263vTpPwD/VYcOmyc3gScO e8hTBES3mYlzBpselh99HnGx5geMtAE= =+I5+ -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs nsfs updates from Christian Brauner: "This contains non-urgent fixes for nsfs to validate ioctls before performing any relevant operations. We alredy did this for a few other filesystems last cycle" * tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/nsfs: add ioctl validation tests nsfs: validate ioctls	2025-03-24 11:38:12 -07:00
Linus Torvalds	804382d59b	vfs-6.15-rc1.overlayfs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rUAAKCRCRxhvAZXjc opI3AP9ws4S/JXOjxNKoTYmNM2nZ8+r1v8tUxbLIiqdvzx9PygD/V1ZjXtn6lwZr OK8d5Y8UnlPZTlBF8D61op3AjnXYzws= =KV4p -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs overlayfs updates from Christian Brauner: "Currently overlayfs uses the mounter's credentials for its override_creds() calls. That provides a consistent permission model. This patches allows a caller to instruct overlayfs to use its credentials instead. The caller must be located in the same user namespace hierarchy as the user namespace the overlayfs instance will be mounted in. This provides a consistent and simple security model. With this it is possible to e.g., mount an overlayfs instance where the mounter must have CAP_SYS_ADMIN but the credentials used for override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of custom fs{g,u}id different from the callers and other tweaks" * tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/ovl: add third selftest for "override_creds" selftests/ovl: add second selftest for "override_creds" selftests/filesystems: add utils.{c,h} selftests/ovl: add first selftest for "override_creds" ovl: allow to specify override credentials	2025-03-24 10:37:40 -07:00
Linus Torvalds	df00ded23a	vfs-6.15-rc1.pidfs -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90pqgAKCRCRxhvAZXjc oqVsAP9Aq/fMCI14HeXehPCezKQZPu1HTrPPo2clLHXoSnafawEAsA3YfWTT4Heb iexzqvAEUOMYOVN66QEc+6AAwtMLrwc= =0eYo -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs pidfs updates from Christian Brauner: - Allow retrieving exit information after a process has been reaped through pidfds via the new PIDFD_INTO_EXIT extension for the PIDFD_GET_INFO ioctl. Various tools need access to information about a process/task even after it has already been reaped. Pidfd polling allows waiting on either task exit or for a task to have been reaped. The contract for PIDFD_INFO_EXIT is simply that EPOLLHUP must be observed before exit information can be retrieved, i.e., exit information is only provided once the task has been reaped and then can be retrieved as long as the pidfd is open. - Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to forgo allocating a file descriptor for their own process. This is useful in scenarios where users want to act on their own process through pidfds and is akin to AT_FDCWD. - Improve premature thread-group leader and subthread exec behavior when polling on pidfds: (1) During a multi-threaded exec by a subthread, i.e., non-thread-group leader thread, all other threads in the thread-group including the thread-group leader are killed and the struct pid of the thread-group leader will be taken over by the subthread that called exec. IOW, two tasks change their TIDs. (2) A premature thread-group leader exit means that the thread-group leader exited before all of the other subthreads in the thread-group have exited. Both cases lead to inconsistencies for pidfd polling with PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the current thread-group leader may or may not see an exit notification on the file descriptor depending on when poll is performed. If the poll is performed before the exec of the subthread has concluded an exit notification is generated for the old thread-group leader. If the poll is performed after the exec of the subthread has concluded no exit notification is generated for the old thread-group leader. The correct behavior is to simply not generate an exit notification on the struct pid of a subhthread exec because the struct pid is taken over by the subthread and thus remains alive. But this is difficult to handle because a thread-group may exit premature as mentioned in (2). In that case an exit notification is reliably generated but the subthreads may continue to run for an indeterminate amount of time and thus also may exec at some point. After this pull no exit notifications will be generated for a PIDFD_THREAD pidfd for a thread-group leader until all subthreads have been reaped. If a subthread should exec before no exit notification will be generated until that task exits or it creates subthreads and repeates the cycle. This means an exit notification indicates the ability for the father to reap the child. * tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) selftests/pidfd: third test for multi-threaded exec polling selftests/pidfd: second test for multi-threaded exec polling selftests/pidfd: first test for multi-threaded exec polling pidfs: improve multi-threaded exec and premature thread-group leader exit polling pidfs: ensure that PIDFS_INFO_EXIT is available selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest selftests/pidfd: add third PIDFD_INFO_EXIT selftest selftests/pidfd: add second PIDFD_INFO_EXIT selftest selftests/pidfd: add first PIDFD_INFO_EXIT selftest selftests/pidfd: expand common pidfd header pidfs/selftests: ensure correct headers for ioctl handling selftests/pidfd: fix header inclusion pidfs: allow to retrieve exit information pidfs: record exit code and cgroupid at exit pidfs: use private inode slab cache pidfs: move setting flags into pidfs_alloc_file() pidfd: rely on automatic cleanup in __pidfd_prepare() ...	2025-03-24 10:16:37 -07:00
Linus Torvalds	fd101da676	vfs-6.15-rc1.mount -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90qAwAKCRCRxhvAZXjc on7lAP0akpIsJMWREg9tLwTNTySI1b82uKec0EAgM6T7n/PYhAD/T4zoY8UYU0Pr qCxwTXHUVT6bkNhjREBkfqq9OkPP8w8= =GxeN -----END PGP SIGNATURE----- Merge tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Mount notifications The day has come where we finally provide a new api to listen for mount topology changes outside of /proc/<pid>/mountinfo. A mount namespace file descriptor can be supplied and registered with fanotify to listen for mount topology changes. Currently notifications for mount, umount and moving mounts are generated. The generated notification record contains the unique mount id of the mount. The listmount() and statmount() api can be used to query detailed information about the mount using the received unique mount id. This allows userspace to figure out exactly how the mount topology changed without having to generating diffs of /proc/<pid>/mountinfo in userspace. - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount api - Support detached mounts in overlayfs Since last cycle we support specifying overlayfs layers via file descriptors. However, we don't allow detached mounts which means userspace cannot user file descriptors received via open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to attach them to a mount namespace via move_mount() first. This is cumbersome and means they have to undo mounts via umount(). Allow them to directly use detached mounts. - Allow to retrieve idmappings with statmount Currently it isn't possible to figure out what idmapping has been attached to an idmapped mount. Add an extension to statmount() which allows to read the idmapping from the mount. - Allow creating idmapped mounts from mounts that are already idmapped So far it isn't possible to allow the creation of idmapped mounts from already idmapped mounts as this has significant lifetime implications. Make the creation of idmapped mounts atomic by allow to pass struct mount_attr together with the open_tree_attr() system call allowing to solve these issues without complicating VFS lookup in any way. The system call has in general the benefit that creating a detached mount and applying mount attributes to it becomes an atomic operation for userspace. - Add a way to query statmount() for supported options Allow userspace to query which mount information can be retrieved through statmount(). - Allow superblock owners to force unmount * tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) umount: Allow superblock owners to force umount selftests: add tests for mount notification selinux: add FILE__WATCH_MOUNTNS samples/vfs: fix printf format string for size_t fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper statmount: add a new supported_mask field samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP selftests: add tests for using detached mount with overlayfs samples/vfs: check whether flag was raised statmount: allow to retrieve idmappings uidgid: add map_id_range_up() fs: allow detached mounts in clone_private_mount() selftests/overlayfs: test specifying layers as O_PATH file descriptors fs: support O_PATH fds with FSCONFIG_SET_FD vfs: add notifications for mount attach and detach fanotify: notify on mount attach and detach ...	2025-03-24 09:34:10 -07:00
Ming Lei	0f3ebf2d4b	selftests: ublk: add stripe target Add ublk stripe target which can take 1~4 underlying backing files or block device, with stripe size 4k ~ 512K. Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe. This target is helpful to cover multiple IOs aiming at same fixed/registered IO kernel buffer. It is also capable of verifying vectored registered (kernel)buffers in future for zero copy, so far it isn't supported yet. Todo: support vectored registered kernel buffer for ublk/zc. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	263846eb43	selftests: ublk: simplify loop io completion Use the added target io handling helpers for simplifying loop io completion. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	8cb9b971e2	selftests: ublk: enable zero copy for null target Enable zero copy for null target so that we can evaluate performance from zero copy or not. Also this should be the simplest ublk zero copy implementation, which can be served as zc example. Add test for covering 'add -t null -z'. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	8842b72a82	selftests: ublk: prepare for supporting stripe target - pass 'truct dev_ctx *ctx' to target init function - add 'private_data' to 'struct ublk_dev' for storing target specific data - add 'private_data' to 'struct ublk_io' for storing per-IO data - add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios for handling the current io command - add helper ublk_get_io() for supporting stripe target - add two helpers for simplifying target io handling Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	10d962dae2	selftests: ublk: move common code into common.c Move two functions for initializing & de-initializing backing file into common.c. Also move one common helper into kublk.h. Prepare for supporting ublk-stripe. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	9413c0ca8e	selftests: ublk: increase max buffer size to 1MB Increase max buffer size to 1MB, and 64KB is too small to evaluate performance with builtin ublk server implementation. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	f2639ed11e	selftests: ublk: add single sqe allocator helper Unify the sqe allocator helper, and we will use it for supporting more cases, such as ublk stripe, in which variable sqe allocation is required. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Ming Lei	723977cab4	selftests: ublk: add generic_01 for verifying sequential IO order block layer, ublk and io_uring might re-order IO in the past - plug - queue ublk io command via task work Add one test for verifying if sequential WRITE IO is dispatched in order. - null target is taken, so we can just observe io order from `tracepoint:block:block_rq_complete` which represents the dispatch order - WRITE IO is taken because READ may come from system-wide utility Cc: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-22 08:35:08 -06:00
Kohei Enju	5f3077d7fc	selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid syzbot reported out-of-bounds read in check_atomic_load/store() when the register number is invalid in this context: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c To avoid the issue from now on, let's add tests where the register number is invalid for load-acquire/store-release. After discussion with Eduard, I decided to use R15 as invalid register because the actual slab-out-of-bounds read issue occurs when the register number is R12 or larger. Signed-off-by: Kohei Enju <enjuk@amazon.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250322045340.18010-6-enjuk@amazon.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-22 06:19:09 -07:00
Kohei Enju	c03bb2fa32	bpf: Fix out-of-bounds read in check_atomic_load/store() syzbot reported the following splat [0]. In check_atomic_load/store(), register validity is not checked before atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg() called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG or greater. Call check_load_mem()/check_store_reg() before atomic_ptr_type_ok() to avoid the OOB read. However, some tests introduced by commit `ff3afe5da9` ("selftests/bpf: Add selftests for load-acquire and store-release instructions") assume calling atomic_ptr_type_ok() before checking register validity. Therefore the swapping of order unintentionally changes verifier messages of these tests. For example in the test load_acquire_from_pkt_pointer(), expected message is 'BPF_ATOMIC loads from R2 pkt is not allowed' although actual messages are different. validate_msgs:FAIL:754 expect_msg VERIFIER LOG: ============= Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported. 0: R1=ctx() R10=fp0 ; asm volatile ( @ verifier_load_acquire.c:140 0: (61) r2 = (u32 )(r1 +0) ; R1=ctx() R2_w=pkt(r=0) 1: (d3) r0 = load_acquire((u8 *)(r2 +0)) invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0) R2 offset is outside of the packet processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 ============= EXPECTED SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed' #505/19 verifier_load_acquire/load-acquire from pkt pointer:FAIL This is because instructions in the test don't pass check_load_mem() and therefore don't enter the atomic_ptr_type_ok() path. In this case, we have to modify instructions so that they pass the check_load_mem() and trigger atomic_ptr_type_ok(). Similarly for store-release tests, we need to modify instructions so that they pass check_store_reg(). Like load_acquire_from_pkt_pointer(), modify instructions in: load_acquire_from_sock_pointer() store_release_to_ctx_pointer() store_release_to_pkt_pointer() Also in store_release_to_sock_pointer(), check_store_reg() returns error early and atomic_ptr_type_ok() is not triggered, since write to sock pointer is not possible in general. We might be able to remove the test, but for now let's leave it and just change the expected message. [0] BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline] BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842 CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0x16e/0x5b0 mm/kasan/report.c:521 kasan_report+0x143/0x180 mm/kasan/report.c:634 is_ctx_reg kernel/bpf/verifier.c:6185 [inline] atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223 check_atomic_store kernel/bpf/verifier.c:7804 [inline] check_atomic kernel/bpf/verifier.c:7841 [inline] do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334 do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600 do_check_main kernel/bpf/verifier.c:22691 [inline] bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821 Reported-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c Tested-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions") Fixes: `ff3afe5da9` ("selftests/bpf: Add selftests for load-acquire and store-release instructions") Signed-off-by: Kohei Enju <enjuk@amazon.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250322045340.18010-5-enjuk@amazon.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-22 06:15:57 -07:00
Ryan Roberts	a2c6f9c3ca	selftests/mm: speed up split_huge_page_test create_pagecache_thp_and_fd() was previously writing a file sized at twice the PMD size by making a per-byte write syscall. This was quite slow when the PMD size is 4M, but completely intolerable for 32M (PMD size for arm64's 16K page size), and 512M (PMD size for arm64's 64K page size). The byte pattern has a 256 byte period, so let's create a 1K buffer and fill it with exactly 4 periods. Then we can write the buffer as many times as is required to fill the file. This makes things much more tolerable. The test now passes for 16K page size. It still fails for 64K page size because MAX_PAGECACHE_ORDER is too small for 512M folio size (I think). Link: https://lkml.kernel.org/r/20250318174343.243631-3-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Rafael Aquini <raquini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-21 22:03:16 -07:00
Ryan Roberts	735b3f7e77	selftests/mm: uffd-unit-tests support for hugepages > 2M uffd-unit-tests uses a memory area with a fixed 32M size. Then it calculates the number of pages by dividing by page_size, which itself is either the base page size or the PMD huge page size depending on the test config. For the latter, we end up with nr_pages=1 for arm64 16K base pages, and nr_pages=0 for 64K base pages. This doesn't end well. So let's make the 32M size a floor and also ensure that we have at least 2 pages given the PMD size. With this change, the tests pass on arm64 64K base page size configuration. Link: https://lkml.kernel.org/r/20250318174343.243631-2-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Rafael Aquini <raquini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-21 22:03:15 -07:00
Brendan Jackman	d8a866c766	selftests/mm: add commentary about 9pfs bugs As discussed here: https://lore.kernel.org/lkml/Z9RRkL1hom48z3Tt@google.com/ This code could benefit from some more commentary. To avoid needing to comment the same thing in multiple places (I guess more of these SKIPs will need to be added over time, for now I am only like 20% of the way through Project Run run_vmtests.sh Successfully), add a dummy "skip tests for this specific reason" function that basically just serves as a hook to hang comments on. Link: https://lkml.kernel.org/r/20250317-9pfs-comments-v1-1-9ac96043e146@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-21 22:03:14 -07:00
Ming Lei	ffde32a49a	selftests: ublk: fix starting ublk device Firstly ublk char device node may not be created by udev yet, so wait a while until it can be opened or timeout. Secondly delete created ublk device in case of start failure, otherwise the device becomes zombie. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250321135324.259677-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-21 14:09:24 -06:00
John Stultz	e40d3709c0	selftests/timers: Improve skew_consistency by testing with other clockids Lei Chen reported a bug with CLOCK_MONOTONIC_COARSE having inconsistencies when NTP is adjusting the clock frequency. This has gone seemingly undetected for ~15 years, illustrating a clear gap in our testing. The skew_consistency test is intended to catch this sort of problem, but was focused on only evaluating CLOCK_MONOTONIC, and thus missed the problem on CLOCK_MONOTONIC_COARSE. So adjust the test to run with all clockids for 60 seconds each instead of 10 minutes with just CLOCK_MONOTONIC. Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250320200306.1712599-2-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/	2025-03-21 19:16:18 +01:00
Breno Leitao	4b73dc83ed	selftests: netconsole: Add tests for 'release' feature in sysdata Expands the self-tests to include the 'release' feature in sysdata. Verifies that enabling the 'release' feature appends the correct data and ensures that disabling it functions as expected. When enabled, the message should have an item similar to in the userdata: `release=$(uname -r)` Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-21 18:59:25 +01:00
Mickaël Salaün	15383a0d63	landlock: Add the errata interface Some fixes may require user space to check if they are applied on the running kernel before using a specific feature. For instance, this applies when a restriction was previously too restrictive and is now getting relaxed (e.g. for compatibility reasons). However, non-visible changes for legitimate use (e.g. security fixes) do not require an erratum. Because fixes are backported down to a specific Landlock ABI, we need a way to avoid cherry-pick conflicts. The solution is to only update a file related to the lower ABI impacted by this issue. All the ABI files are then used to create a bitmask of fixes. The new errata interface is similar to the one used to get the supported Landlock ABI version, but it returns a bitmask instead because the order of fixes may not match the order of versions, and not all fixes may apply to all versions. The actual errata will come with dedicated commits. The description is not actually used in the code but serves as documentation. Create the landlock_abi_version symbol and use its value to check errata consistency. Update test_base's create_ruleset_checks_ordering tests and add errata tests. This commit is backportable down to the first version of Landlock. Fixes: `3532b0b435` ("landlock: Enable user space to infer supported features") Cc: Günther Noack <gnoack@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>	2025-03-21 12:12:19 +01:00
Eric Biggers	ca17aa6640	crypto: lib/chacha - remove unused arch-specific init support All implementations of chacha_init_arch() just call chacha_init_generic(), so it is pointless. Just delete it, and replace chacha_init() with what was previously chacha_init_generic(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2025-03-21 17:39:06 +08:00
Ming Lei	96af5af47b	selftests: ublk: fix write cache implementation For loop target, write cache isn't enabled, and each write isn't be marked as DSYNC too. Fix it by enabling write cache, meantime fix FLUSH implementation by not taking LBA range into account, and there isn't such info for FLUSH command. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250321004758.152572-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-20 20:01:03 -06:00
Ming Lei	beb31982ad	selftests: ublk: add variable for user to not show test result Some user decides test result by exit code only, and wouldn't like to be bothered by the test result. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250320013743.4167489-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-20 17:18:55 -06:00
Ming Lei	fe2230d921	selftests: ublk: don't show `modprobe` failure ublk_drv may be built-in, so don't show modprobe failure, and we do check `/dev/ublk-control` for skipping test if ublk_drv isn't enabled. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250320013743.4167489-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-20 17:18:55 -06:00
Ming Lei	8764c1a72b	selftests: ublk: add one dependency header Add one dependency helper which can include new uapi definition which isn't synced from kernel. This way also helps a lot for downstream test deployment. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250320013743.4167489-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-20 17:18:55 -06:00
Paolo Abeni	6f13bec53a	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ9NWrwAKCRAraS+Z+3y5 E2VIAP4iIdMobFDEm04JuXyrOK6HasnkOFkuEgLNNhGRbCCTTwD/X/2A8baSXnjg eqYN2DJfrs7OZ1ZUku/ic1VGtJg7Sgk= =jTyW -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-03-13 The following pull-request contains BPF updates for your net-next tree. We've added 4 non-merge commits during the last 3 day(s) which contain a total of 2 files changed, 35 insertions(+), 12 deletions(-). The main changes are: 1) bpf_getsockopt support for TCP_BPF_RTO_MIN and TCP_BPF_DELACK_MAX, from Jason Xing bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN tcp: bpf: Support bpf_getsockopt for TCP_BPF_DELACK_MAX tcp: bpf: Support bpf_getsockopt for TCP_BPF_RTO_MIN tcp: bpf: Introduce bpf_sol_tcp_getsockopt to support TCP_BPF flags ==================== Link: https://patch.msgid.link/20250313221620.2512684-1-martin.lau@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-20 21:48:14 +01:00
Paolo Abeni	f491593394	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.14-rc8). Conflict: tools/testing/selftests/net/Makefile `03544faad7` ("selftest: net: add proc_net_pktgen") `3ed61b8938` ("selftests: net: test for lwtunnel dst ref loops") tools/testing/selftests/net/config: `85cb3711ac` ("selftests: net: Add test cases for link and peer netns") `3ed61b8938` ("selftests: net: test for lwtunnel dst ref loops") Adjacent commits: tools/testing/selftests/net/Makefile `c935af429e` ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY") `355d940f4d` ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-20 21:38:01 +01:00
Björn Töpel	e16e64f9e0	selftests/bpf: Sanitize pointer prior fclose() There are scenarios where env.{sub,}test_state->stdout_saved, can be NULL, e.g. sometimes when the watchdog timeout kicks in, or if the open_memstream syscall is not available. Avoid crashing test_progs by adding an explicit NULL check prior the fclose() call. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250318081648.122523-1-bjorn@kernel.org	2025-03-20 10:35:07 -07:00
Paolo Bonzini	0afd104fb3	KVM/arm64 updates for 6.15 - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCZ9s9gBccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFp6LAQCOQ1Fidp8RT1NdhLLAhW5D4gLe MNT619R4qfqu64ZpeQEAidHMAYaGRk5KDNBq6Jn+awcJnwCcMnh2ok0vTOjz3gY= =RC6A -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.15 - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events	2025-03-20 12:54:12 -04:00
Paolo Bonzini	c0f99fb4e5	KVM/riscv changes for 6.15 - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmfbsiQACgkQrUjsVaLH LAfGAxAAlODgZMDt+f/YlOOmQPsLohGrWsZjvheRfrf3X5PFNuXd1frpZSbF1PCs lk4WZrekJo2VT2tWMEBjFrH6A9slHUzjZs+xIxig1N/Gmxy1clKp/svJVG20WwQI a5/5jPH3TNDL3KUphwBF0QIi/Tlp6+SZ5RAag6QsBXKnFy5+o9lioR2T6tFzMoxy ry8VD8e395bQXn5tYkk4vsr9vBRzpgq7ZBOMg6zyNTL86UBOVq3QXPNzSE5kenXW 4zqDmuxbk+GgpUKLXA9aXwLvasGiOr+59uToZd2lHf5ynSdSprQyW11I5ZOFs2DX 800mLeYtyHc4nGTi6OO3VN9toMhmgijtFLNpBDg4LRvwMfB+sp62Ixsmjs3CBCd+ GqcbJaMtdgM1SpZ3tlzEQoWpIBlNYs7BuG/6Jen9xKHHvOg7Ty9UimJeMikV+FUm srq0+GlgjaniZ/d8arIIm7LVqZLAc6OSzn+A8IcE0KQ9fKBgB0CZrANW+N+b+7ub Lkq9hA8tM5Ti0bNszqPyJu3bmhWAOEOwF0CjYL79vEhIYsegCKKtruKOppdM+sl2 dzk+DVV9Aar+bDu/v/bYV5TfIq39U86Tqxe+OU7EPzch5NdyDvKcpZ9/JIZnf/j5 tviM3awZfFmgTeBUTAYvaR34KExBOLQLkGb9BqAydYjg+pl9jVE= =HP3N -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.15-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.15 - Disable the kernel perf counter during configure - KVM selftests improvements for PMU - Fix warning at the time of KVM module removal	2025-03-20 12:53:34 -04:00
Linus Torvalds	5fc3193608	Including fixes from can, bluetooth and ipsec. This contains a last minute revert of a recent GRE patch, mostly to allow me stating there are no known regressions outstanding. Current release - regressions: - revert "gre: Fix IPv6 link-local address generation." - eth: ti: am65-cpsw: fix NAPI registration sequence Previous releases - regressions: - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). - mptcp: fix data stream corruption in the address announcement - bluetooth: fix connection regression between LE and non-LE adapters - can: - flexcan: only change CAN state when link up in system PM - ucan: fix out of bound read in strscpy() source Previous releases - always broken: - lwtunnel: fix reentry loops - ipv6: fix TCP GSO segmentation with NAT - xfrm: force software GSO only in tunnel mode - eth: ti: icssg-prueth: add lock to stats Misc: - add Andrea Mayer as a maintainer of SRv6 Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmfcOhESHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkJUQP/2jpOcA+ogkCfRhcJHHaeX9gCX+buSY4 RZnMUkSUhW5dcL9mTybx3tzH/0oNlGfLtA016SR+Grq0mIBIYsRWhTW6C8mOgvsZ PrHvE9VyATBSxUM9o9bL5WMg/M9TvX7foR63+zGPN2lEk6mmLK/hIBFNjkv9R8Vk /VR0ZRKHMsJARsyRve1Sf8DZXndAfROnWhCCmWWxKpnCd4biBL/6n6p00vfxpAls /Xnm1PC0NMuRz0hlIr8UN9DkkF1v+LEhp1EFbg5e7i8cJkAJyXcvZBb4rJ8f2Ty7 4qXK53i+kp+NryNAZ6cNu6OtkD+DtdDUNJ28ElkdnKOc8H787YGFvCTBDPiqe5yu CVA6tT1hsuKjEnXdX3545+tTW48XS+6J60ZeVnslfC3fakG03ckOZboofT+LQV1y wcv4+73dDrIbSo3X7DBdksFzQi/ICb1VG/GOdGlg8vlokKnH0di/veoBtbyAR7dJ 2Fv3rpE6e/JQQmXYffto0qrNXOYrEx7Zqmo2QbDS6dTkQ1FiDsxRYegRmVxHLHX9 0fd48FNFIqEJSM5RwLOQW9X83aObT5p2OiHiA+WrnmkiSiMwK4EdusBm4WtFZ7qc bqaUHbnAmg1g7UiPn7cGE3navoasH8xmkS77ZsbyW618ej/hRFGDwWMQSuEMyeLz PKrT/OMXIHu9 =aYLc -----END PGP SIGNATURE----- Merge tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, bluetooth and ipsec. This contains a last minute revert of a recent GRE patch, mostly to allow me stating there are no known regressions outstanding. Current release - regressions: - revert "gre: Fix IPv6 link-local address generation." - eth: ti: am65-cpsw: fix NAPI registration sequence Previous releases - regressions: - ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). - mptcp: fix data stream corruption in the address announcement - bluetooth: fix connection regression between LE and non-LE adapters - can: - flexcan: only change CAN state when link up in system PM - ucan: fix out of bound read in strscpy() source Previous releases - always broken: - lwtunnel: fix reentry loops - ipv6: fix TCP GSO segmentation with NAT - xfrm: force software GSO only in tunnel mode - eth: ti: icssg-prueth: add lock to stats Misc: - add Andrea Mayer as a maintainer of SRv6" * tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6 Revert "gre: Fix IPv6 link-local address generation." Revert "selftests: Add IPv6 link-local address generation tests for GRE devices." net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES tools headers: Sync uapi/asm-generic/socket.h with the kernel sources mptcp: Fix data stream corruption in the address announcement selftests: net: test for lwtunnel dst ref loops net: ipv6: ioam6: fix lwtunnel_output() loop net: lwtunnel: fix recursion loops net: ti: icssg-prueth: Add lock to stats net: atm: fix use after free in lec_send() xsk: fix an integer overflow in xp_create_and_assign_umem() net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data selftests: drv-net: use defer in the ping test phy: fix xa_alloc_cyclic() error handling dpll: fix xa_alloc_cyclic() error handling devlink: fix xa_alloc_cyclic() error handling ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create(). ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw(). net: ipv6: fix TCP GSO segmentation with NAT ...	2025-03-20 09:39:15 -07:00
Guillaume Nault	355d940f4d	Revert "selftests: Add IPv6 link-local address generation tests for GRE devices." This reverts commit `6f50175cca`. Commit `183185a18f` ("gre: Fix IPv6 link-local address generation.") is going to be reverted. So let's revert the corresponding kselftest first. Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-20 15:46:16 +01:00
Christian Brauner	4d5483a42c	selftests/pidfd: third test for multi-threaded exec polling Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-20 15:32:43 +01:00
Christian Brauner	9b6f723db5	selftests/pidfd: second test for multi-threaded exec polling Ensure that during a multi-threaded exec and premature thread-group leader exit no exit notification is generated. Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-20 15:32:43 +01:00
Christian Brauner	db7ce91e22	selftests/pidfd: first test for multi-threaded exec polling Add first test for premature thread-group leader exit. Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-20 15:32:43 +01:00
Justin Iurman	3ed61b8938	selftests: net: test for lwtunnel dst ref loops As recently specified by commit `0ea09cbf83` ("docs: netdev: add a note on selftest posting") in net-next, the selftest is therefore shipped in this series. However, this selftest does not really test this series. It needs this series to avoid crashing the kernel. What it really tests, thanks to kmemleak, is what was fixed by the following commits: - commit `c71a192976` ("net: ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels") - commit `92191dd107` ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") - commit `c64a0727f9` ("net: ipv6: fix dst ref loop on input in seg6 lwt") - commit `13e55fbaec` ("net: ipv6: fix dst ref loop on input in rpl lwt") - commit `0e7633d7b9` ("net: ipv6: fix dst ref loop in ila lwtunnel") - commit `5da15a9c11` ("net: ipv6: fix missing dst ref drop in ila lwtunnel") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-20 11:25:52 +01:00
Geliang Tang	9cf0128e64	selftests: mptcp: add pm sysctl mapping tests This patch checks if the newly added net.mptcp.path_manager is mapped successfully from or to the old net.mptcp.pm_type in userspace_pm.sh. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-12-f4e4a88efc50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-20 10:14:49 +01:00
Bastien Curutchet (eBPF Foundation)	f8df95e84c	selftests/bpf: Migrate test_xdp_vlan.sh into test_progs test_xdp_vlan.sh isn't used by the BPF CI. Migrate test_xdp_vlan.sh in prog_tests/xdp_vlan.c. It uses the same BPF programs located in progs/test_xdp_vlan.c and the same network topology. Remove test_xdp_vlan*.sh and their Makefile entries. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250221-xdp_vlan-v1-2-7d29847169af@bootlin.com/	2025-03-19 16:01:33 -07:00
Bastien Curutchet (eBPF Foundation)	0f9ff4cb68	selftests/bpf: test_xdp_vlan: Rename BPF sections The __load() helper expects BPF sections to be names 'xdp' or 'tc' Rename BPF sections so they can be loaded with the __load() helper in upcoming patch. Rename the BPF functions with their previous section's name. Update the 'ip link' commands in the script to use the program name instead of the section name to load the BPF program. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250221-xdp_vlan-v1-1-7d29847169af@bootlin.com	2025-03-19 15:59:27 -07:00
Oliver Upton	4f2774c57a	Merge branch 'kvm-arm64/writable-midr' into kvmarm/next * kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:54:32 -07:00
Oliver Upton	d300b0168e	Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/next * kvm-arm64/pv-cpuid: : Paravirtualized implementation ID, courtesy of Shameer Kolothum : : Big-little has historically been a pain in the ass to virtualize. The : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim : of vCPU scheduling. This can be particularly annoying when the guest : needs to know the underlying implementation to mitigate errata. : : "Hyperscalers" face a similar scheduling problem, where VMs may freely : migrate between hosts in a pool of heterogenous hardware. And yes, our : server-class friends are equally riddled with errata too. : : In absence of an architected solution to this wart on the ecosystem, : introduce support for paravirtualizing the implementation exposed : to a VM, allowing the VMM to describe the pool of implementations that a : VM may be exposed to due to scheduling/migration. : : Userspace is expected to intercept and handle these hypercalls using the : SMCCC filter UAPI, should it choose to do so. smccc: kvm_guest: Fix kernel builds for 32 bit arm KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2 smccc/kvm_guest: Enable errata based on implementation CPUs arm64: Make _midr_in_range_list() an exported function KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 KVM: arm64: Specify hypercall ABI for retrieving target implementations arm64: Modify _midr_range() functions to read MIDR/REVIDR internally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:53:16 -07:00
Oliver Upton	13f64f6d21	Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next * kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:52:26 -07:00
Ingo Molnar	14d281db78	sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files We leave most of the defconfigs alone (there's over 70 of them), but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test Kconfig files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com	2025-03-19 22:23:24 +01:00
Michael Jeanson	d047e32b8d	rseq/selftests: Fix namespace collision with rseq UAPI header When the rseq UAPI header is included, 'union rseq' clashes with 'struct rseq'. It's not the case in the rseq selftests but it does break the KVM selftests that also include this file. Rename 'union rseq' to 'union rseq_tls' to fix this. Fixes: `e6644c967d` ("rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes") Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20250319202144.1141542-1-mjeanson@efficios.com	2025-03-19 21:26:24 +01:00
Jonathan Lennox	756f88ff9c	tc-tests: Update tc police action tests for tc buffer size rounding fixes. Before tc's recent change to fix rounding errors, several tests which specified a burst size of "1m" would translate back to being 1048574 bytes (2b less than 1Mb). sprint_size prints this as "1024Kb". With the tc fix, the burst size is instead correctly reported as 1048576 bytes (precisely 1Mb), which sprint_size prints as "1Mb". This updates the expected output in the tests' matchPattern values to accept either the old or the new output. Signed-off-by: Jonathan Lennox <jonathan.lennox@8x8.com> Link: https://patch.msgid.link/20250312174804.313107-1-jonathan.lennox@8x8.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-19 18:40:24 +01:00
Jakub Kicinski	acf10a8c0b	selftests: drv-net: use defer in the ping test Make sure the test cleans up after itself. The XDP off statements at the end of the test may not be reached. Fixes: `75cc19c8ff` ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312131040.660386-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-19 18:21:13 +01:00
Kumar Kartikeya Dwivedi	60ba5b3ed7	selftests/bpf: Add tests for rqspinlock Introduce selftests that trigger AA, ABBA deadlocks, and test the edge case where the held locks table runs out of entries, since we then fallback to the timeout as the final line of defense. Also exercise verifier's AA detection where applicable. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250316040541.108729-26-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-19 08:03:06 -07:00
Paolo Bonzini	783e9cd05c	KVM selftests changes for 6.15, part 2 - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration. - Fix a few minor bugs related to handling of stats FDs. - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation). - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked). - Misc cleanups and fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZqZYACgkQOlYIJqCj N/2G0Q/5AZAnOPBU/xmIZrntaWSkPX0wYCBzTCFAbyuXTYdWmCMaVJUjj0W4h15U rAPxq76moGSHT06EQhw8clrNHbRSr4OZvsJIQMgGK5GBXlxfD+S1bNb/NKijpKm1 yjI4VIKA1ugpPLbb4KM522UQdAwkBI8KuPbhzWCsz2kzFf3Iy/B6ZgfevTjldZ9f KbocBQhCaEOmk0i1UZWMnuJU+rh06y/BNR/7a/vhaZGy4pPm/WZpSezKbti3j7F5 +/VXJ+fh2T1oPzlycdQ4i9phYAOMJONcNxlzhs0Hp71zkxNykVKyv1zxvCI3ioEn nnHFYQyremDfDJhxabOJRYB4ETlhLpX9nOlz2euIy7YeurDs73Qtjg2rixgAyq80 /FITPTFVlcHxjPr9YC+wtWzZB2qnGNMfA7QujnoG3XlGQJJsvnFTygXfBMj+W0G/ K8V93lvGTcwtT7x+9MCDhD0wZXfPBXorze8QrDfCWF96G1Y+5gZmYGs38Cu6IWrc 6q3tHcDmUaOGI1Mag97J1smSWw+ZC1+UuHJSW7DMZldOm/2fXkqrRpbHRXEzRQIa vZjBZKoYHpqGMKayQ+GIRql9R6o+ZGQ6OQXgtLl0fxPPA01/hKX1ITZayBnHDl3C wm3svDZy6hPWs17kwtLj+cRSMVYItbjOgpOj58gBo16w1NPNYTI= =t77Q -----END PGP SIGNATURE----- Merge tag 'kvm-x86-selftests-6.15' of https://github.com/kvm-x86/linux into HEAD KVM selftests changes for 6.15, part 2 - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration. - Fix a few minor bugs related to handling of stats FDs. - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation). - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked). - Misc cleanups and fixes.	2025-03-19 09:05:34 -04:00
Paolo Bonzini	9b47f288eb	KVM selftests changes for 6.15, part 1 - Misc cleanups and prep work. - Annotate _no_printf() with "printf" so that pr_debug() statements are checked by the compiler for default builds (and pr_info() when QUIET). - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters. - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZpvcACgkQOlYIJqCj N/3a+BAAmv0iagaYSiigAWZ9EM/2+CZDGp0+g7sB95graJFKnrKCXSkpuUr1UIYr aDPLHAPBZ8szWNqrJARdSui0QP8cEd2erJkMNRDEQqx/OYKQJ+0W+gmlbleQ6GL6 7C/zUn0JdC0AhpdfePw6nIPpwIYBxDwHMQSuPUOxw5ZJmVCzQxn43MmPWTfL11Eo ARfh6VWmXSvp+5emyjnlGmcT1/a3UlfAGDRekK0gaPYmEtaf6LfKCXc+nX84q0fr T2N8nVcmEDg9VUO3JZfexoLcKpFMsTdwg4GbQ4beytYwPx3YELvCP9eFl5yo4lxk 9YL6uPdXyk0/Cgpv/QyB4YtnaQ4tktPmFLYjEtf8NIMLubouTS6dtYErKq9MOgsP 7UaOBVY6O7+cG+iOU2yGn1Mct0XxhKKLu3n5ZlGLJ/+8v1PUIegL3wUwpQaycp7O xgRd/narmKEmMYFr7D0y7epcUVBlkP17YsFH4w6z9moDjJNVz4Ziqft6ju0N3P1s a/8CA5gd2lplp/Yae62f2mgJNjw5kbEYtGO9ybINbhacznsl+3JsU85k1h+7dZrW 8lmfNJ8+B95UrZ9Wo0mWsv+UKYYFHRFoHjKlevuUTPQ/QxoghvpQ5iU3P+rT8E5n 858OhTi2vlAhg8a338XdfUOy799F73ecDaIKyPMdO74v7wq1swo= =RjiY -----END PGP SIGNATURE----- Merge tag 'kvm-x86-selftests_6.15-1' of https://github.com/kvm-x86/linux into HEAD KVM selftests changes for 6.15, part 1 - Misc cleanups and prep work. - Annotate _no_printf() with "printf" so that pr_debug() statements are checked by the compiler for default builds (and pr_info() when QUIET). - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters. - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware.	2025-03-19 09:05:22 -04:00
Paolo Bonzini	4d9a677596	KVM x86 misc changes for 6.15: - Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT. - Add a helper to consolidate handling of mp_state transitions, and use it to clear pv_unhalted whenever a vCPU is made RUNNABLE. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition. - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2). - Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS, as KVM can't get the current CPL. - Misc cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfZhoYACgkQOlYIJqCj N/1oSBAAlhsZzv4m6rHtWACjGsSBlAE5WM7HrpnwjOyMW+Desc0WLL4L6qAlxgT7 4ZkBvQ4zsiyUIdLv1XARvkBYHTUkcgG8fQSSJQ6grZit5OcFcMgWafvQrJYI6428 TyzF/x6t0gj8CjDluA6jM/eL5HhWZZDOIg0Wma+pyYpW+7y2tkphXyYyOQPBGwv3 geRUetbDHHXjf042k/8f1j1vjzrNNvAg3YyNyx1YbdU9XKsn5D+SeUW2eVfYk8G7 5QsCOGvUYcbbjrR8kbCZKexvoH6Np9J6YKDe4R9R2yDzgs/96qz6xkYTGCVkHA1y uursKqRHgbXBxzxa+ban073laT7Qt3S01Gd9bJW3IO7hzG89gl4qfX7fap8T9Yc2 yeBTYIgInpyx+NCdZ2Z/++BzPagBGfa77gFX/eIkmsVA9LWYi9CI3FSjtr/czvWm a4tfMPvTVBjsBQQ7t/lNksrq0O51lbb3iqqv3ToQpDOOqCWuMEU5xcihhPRr5NSZ dX4o/jIDhCV8EyXdtASyqMlYBXcuC45ojEZn1elh0QogzYAdSGQ2bIDyxuBtA//k kSbi+E4GB64jVfBWUyK2QeLOBnBkH7mh6Cg5UYr1Ln9Sm6l8vrcxhcbnchiWxXMI WCK7BJwI2HojBVpEZ04jMkjHvg36uSfjOzmMLT5yPXfFNebsGmA= =8SGF -----END PGP SIGNATURE----- Merge tag 'kvm-x86-misc-6.15' of https://github.com/kvm-x86/linux into HEAD KVM x86 misc changes for 6.15: - Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT. - Add a helper to consolidate handling of mp_state transitions, and use it to clear pv_unhalted whenever a vCPU is made RUNNABLE. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition. - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2). - Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS, as KVM can't get the current CPL. - Misc cleanups	2025-03-19 09:04:48 -04:00
Clément Léger	7a9827e777	KVM: riscv: selftests: Add Zaamo/Zalrsc extensions to get-reg-list test The KVM RISC-V allows Zaamo/Zalrsc extensions for Guest/VM so add these extensions to get-reg-list test. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619153913.867263-6-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>	2025-03-19 12:03:46 +00:00
Ingo Molnar	89771319e0	Linux 6.14-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfXVtUeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGN/sH/i5423Gt/z51gDjA s4v5Z7GaBJ9zOGBahn2RWFe72zytTqKrEJmMnGfguirs0atD1DtQj4WAP7iFKP+e WyO663X6HF7i5y37ja0Yd4PZc31hwtqzKH8LjBf8f8tTy8UsEVqumdi5A4sS9KTM qm4kTyyVEY9D/s7oRY8ywjDlRJtO6nT0aKMp4kAqNEbrNUYbilT/a0hgXcgSmPyB uIjmjL2fZfutxGI5LgfbaSHCa1ElmhvTvivOMpaAmZSGCRVHCKGgT0CTNnHyn/7C dB145JkRO4ZOUqirCdO4PE/23id3ajq9fcixJGBzAv7c45y+B3JZ1r2kAfKalE8/ qrOKLys= =8r7a -----END PGP SIGNATURE----- Merge tag 'v6.14-rc7' into x86/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-03-19 11:03:06 +01:00
Yafang Shao	be16ddeaae	selftests/bpf: Add selftest for attaching fexit to __noreturn functions The reuslt: $ tools/testing/selftests/bpf/test_progs --name=fexit_noreturns #99/1 fexit_noreturns/noreturns:OK #99 fexit_noreturns:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20250318114447.75484-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-18 19:07:18 -07:00
Nicolin Chen	97717a1f28	iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage Trigger vEVENTs by feeding an idev ID and validating the returned output virt_ids whether they equal to the value that was set to the vDEVICE. Link: https://patch.msgid.link/r/e829532ec0a3927d61161b7674b20e731ecd495b.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Nicolin Chen	941d0719aa	iommufd/selftest: Require vdev_id when attaching to a nested domain When attaching a device to a vIOMMU-based nested domain, vdev_id must be present. Add a piece of code hard-requesting it, preparing for a vEVENTQ support in the following patch. Then, update the TEST_F. A HWPT-based nested domain will return a NULL new_viommu, thus no such a vDEVICE requirement. Link: https://patch.msgid.link/r/4051ca8a819e51cb30de6b4fe9e4d94d956afe3d.1741719725.git.nicolinc@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-18 14:17:48 -03:00
Yunhui Cui	36dec9e448	RISC-V: selftests: Add TEST_ZICBOM into CBO tests Add test for Zicbom and its block size into CBO tests, when Zicbom is present, test that cbo.clean/flush may be issued and works. As the software can't verify the clean/flush functions, we just judged that cbo.clean/flush isn't executed illegally. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20250226063206.71216-4-cuiyunhui@bytedance.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>	2025-03-18 12:44:17 +00:00
Linus Torvalds	76b6905c11	15 hotfixes. 7 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 13 are for MM and the other two are for squashfs and procfs. All are singletons. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ9jkDgAKCRDdBJ7gKXxA jtgjAP0ZIVQ8wEL/txkrNyMT5JrWCK1XfjsfPNJ0afkcl4Oo/QEA8rnJHT/hX8aY Wr+aY2CoCsFEOsW4oDRUPrUYpkUBqA8= =AwLJ -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-03-17-20-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes. 7 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 13 are for MM and the other two are for squashfs and procfs. All are singletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-17-20-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/page_alloc: fix memory accept before watermarks gets initialized mm: decline to manipulate the refcount on a slab page memcg: drain obj stock on cpu hotplug teardown mm/huge_memory: drop beyond-EOF folios with the right number of refs selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT mm: memcontrol: fix swap counter leak from offline cgroup mm/vma: do not register private-anon mappings with khugepaged during mmap squashfs: fix invalid pointer dereference in squashfs_cache_delete mm/migrate: fix shmem xarray update during migration mm/hugetlb: fix surplus pages in dissolve_free_huge_page() mm/damon/core: initialize damos->walk_completed in damon_new_scheme() mm/damon: respect core layer filters' allowance decision on ops layer filemap: move prefaulting out of hot write path proc: fix UAF in proc_get_inode()	2025-03-17 22:27:27 -07:00
Cyan Yang	f841ad9ca5	selftests/mm/cow: fix the incorrect error handling Error handling doesn't check the correct return value. This patch will fix it. Link: https://lkml.kernel.org/r/20250312043840.71799-1-cyan.yang@sifive.com Fixes: `f4b5fd6946` ("selftests/vm: anon_cow: THP tests") Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: David Hildenbrand <david@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:07:06 -07:00
Zi Yan	80a5c494c8	selftests/mm: add tests for folio_split(), buddy allocator like split It splits page cache folios to orders from 0 to 8 at different in-folio offset. Link: https://lkml.kernel.org/r/20250307174001.242794-9-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Kairui Song <kasong@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:07:00 -07:00
Zi Yan	3fec86f8aa	xarray: add xas_try_split() to split a multi-index entry Patch series "Buddy allocator like (or non-uniform) folio split", v10. This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces 1. the total number of after-split folios from 2^(n-m) to n-m+1; 2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6; 3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios. For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios. Instead of duplicating existing split_huge_page() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. I also semi-replicated Hugh's test[12] and ran it without any issue for almost 24 hours. This patch (of 8): A preparation patch for non-uniform folio split, which always split a folio into half iteratively, and minimal xarray entry split. Currently, xas_split_alloc() and xas_split() always split all slots from a multi-index entry. They cost the same number of xa_node as the to-be-split slots. For example, to split an order-9 entry, which takes 2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8 xa_node are needed. Instead xas_try_split() is intended to be used iteratively to split the order-9 entry into 2 order-8 entries, then split one order-8 entry, based on the given index, to 2 order-7 entries, ..., and split one order-1 entry to 2 order-0 entries. When splitting the order-6 entry and a new xa_node is needed, xas_try_split() will try to allocate one if possible. As a result, xas_try_split() would only need 1 xa_node instead of 8. When a new xa_node is needed during the split, xas_try_split() can try to allocate one but no more. -ENOMEM will be return if a node cannot be allocated. -EINVAL will be return if a sibling node is split or cascade split happens, where two or more new nodes are needed, and these are not supported by xas_try_split(). xas_split_alloc() and xas_split() split an order-9 to order-0: --------------------------------- \| \| \| \| \| \| \| \| \| \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| \| \| \| \| \| \| \| \| \| --------------------------------- \| \| \| \| ------- --- --- ------- \| \| ... \| \| V V V V ----------- ----------- ----------- ----------- \| xa_node \| \| xa_node \| ... \| xa_node \| \| xa_node \| ----------- ----------- ----------- ----------- xas_try_split() splits an order-9 to order-0: --------------------------------- \| \| \| \| \| \| \| \| \| \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| \| \| \| \| \| \| \| \| \| --------------------------------- \| \| V ----------- \| xa_node \| ----------- Link: https://lkml.kernel.org/r/20250307174001.242794-1-ziy@nvidia.com Link: https://lkml.kernel.org/r/20250307174001.242794-2-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Kairui Song <kasong@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 22:06:59 -07:00
Mykyta Yatsenko	a024843d92	selftests/bpf: Test freplace from user namespace Add selftests to verify that it is possible to load freplace program from user namespace if BPF token is initialized by bpf_object__prepare before calling bpf_program__set_attach_target. Negative test is added as well. Modified type of the priv_prog to xdp, as kprobe did not work on aarch64 and s390x. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250317174039.161275-5-mykyta.yatsenko5@gmail.com	2025-03-17 13:45:12 -07:00
Wei Yang	ccaf3efcee	lib/interval_tree: add test case for span iteration Verify interval_tree_span_iter_xxx() helpers works as expected. Link: https://lkml.kernel.org/r/20250310074938.26756-6-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 12:17:01 -07:00
Wei Yang	16b1936ae6	lib/rbtree: add random seed Current test use pseudo rand function with fixed seed, which means the test data is the same pattern each time. Add random seed parameter to randomize the test. Link: https://lkml.kernel.org/r/20250310074938.26756-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 12:17:00 -07:00
Wei Yang	4164e1525d	lib/rbtree: enable userland test suite for rbtree related data structure Patch series "lib/interval_tree: add some test cases and cleanup", v2. Since rbtree/augmented tree/interval tree share similar data structure, besides new cases for interval tree, this patch set also does cleanup for others. This patch (of 7): Currently we have some tests for rbtree related data structure, e.g. rbtree, augmented rbtree, interval tree, in lib/ as kernel module. To facilitate the test and debug for those fundamental data structure, this patch enable those tests in userland. Link: https://lkml.kernel.org/r/20250310074938.26756-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20250310074938.26756-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-17 12:17:00 -07:00
Dave Jiang	1729808c54	cxl/test: Add Set Feature support to cxl_test Add emulation to support Set Feature mailbox command to cxl_test. The only feature supported is the device patrol scrub feature. The set feature allows activation of patrol scrub for the cxl_test emulated device. The command does not support partial data transfer even though the spec allows it. This restriction is to reduce complexity of the emulation given the patrol scrub feature is very minimal. Link: https://patch.msgid.link/r/20250307205648.1021626-8-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:41:37 -03:00
Dave Jiang	e77e9c1079	cxl/test: Add Get Feature support to cxl_test Add emulation of Get Feature command to cxl_test. The feature for device patrol scrub is returned by the emulation code. This is the only feature currently supported by cxl_test. It returns the information for the device patrol scrub feature. Link: https://patch.msgid.link/r/20250307205648.1021626-7-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:41:37 -03:00
Dave Jiang	858ce2f56b	cxl: Add FWCTL support to CXL Add fwctl support code to allow sending of CXL feature commands from userspace through as ioctls via FWCTL. Provide initial setup bits. The CXL PCI probe function will call devm_cxl_setup_fwctl() after the cxl_memdev has been enumerated in order to setup FWCTL char device under the cxl_memdev like the existing memdev char device for issuing CXL raw mailbox commands from userspace via ioctls. Link: https://patch.msgid.link/r/20250307205648.1021626-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-03-17 14:41:36 -03:00
Dave Jiang	3b5d43245f	Merge branch 'for-6.15/features' into cxl-for-next Add CXL Features support. Setup code for enabling in kernel usage of CXL Features. Expecting EDAC/RAS to utilize CXL Features in kernel for things such as memory sparing. Also prepartion for enabling of CXL FWCTL support to issue allowed Features from user space.	2025-03-17 09:22:59 -07:00
Lorenzo Stoakes	f3b92176f4	tools/selftests: add guard region test for /proc/$pid/pagemap Add a test to the guard region self tests to assert that the /proc/$pid/pagemap information now made availabile to the user correctly identifies and reports guard regions. As a part of this change, update vm_util.h to add the new bit (note there is no header file in the kernel where this is exposed, the user is expected to provide their own mask) and utilise the helper functions there for pagemap functionality. [lorenzo.stoakes@oracle.com: fixup define name] Link: https://lkml.kernel.org/r/32e83941-e6f5-42ee-9292-a44c16463cf1@lucifer.local Link: https://lkml.kernel.org/r/164feb0a43ae72650e6b20c3910213f469566311.1740139449.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:41 -07:00
Brendan Jackman	1ddae9d67e	selftests/mm/mlock: print error on failure It's not really possible to start diagnosing this without knowing the actual error. Also update the mlock2 helper to behave like libc would by setting errno and returning -1. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-12-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:40 -07:00
Brendan Jackman	5d2146a335	selftests/mm: skip mlock tests if nobody user can't read it If running from a directory that can't be read by unprivileged users, executing on-fault-test via the nobody user will fail. The kselftest build does give the file the correct permissions, but after being installed it might be in a directory without global execute permissions. Since the script can't safely fix that, just skip if it happens. Note that the stderr of the `ls` command is unfiltered meaning the user sees a "permission denied" error that can help inform them why the test was skipped. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-11-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:40 -07:00
Brendan Jackman	f896c6de83	selftests/mm: ensure uffd-wp-mremap gets pages of each size This test allocates a page of every available size and doesn't have any SKIP logic if the allocation fails. So, ensure it's available and skip the test if we can't do so. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-10-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:39 -07:00
Brendan Jackman	e9269b2cc4	selftests/mm: drop unnecessary sudo usage This script must be run as root anyway (see all the writing to privileged files in /proc etc). Remove the unnecessary use of sudo to avoid breaking on single-user systems that don't have sudo. This also avoids confusing readers. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-9-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:39 -07:00
Brendan Jackman	32b42970e8	selftests/mm: skip gup_longterm tests on weird filesystems Some filesystems don't support ftruncate()ing unlinked files. They return ENOENT. In that case, skip the test. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-8-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:39 -07:00
Brendan Jackman	571a4b62ed	selftests/mm: skip map_populate on weird filesystems It seems that 9pfs does not allow truncating unlinked files, Mark Brown has noted that NFS may also behave this way. It doesn't seem quite right to call this a "bug" but it's probably a special enough case that it makes sense for the test to just SKIP if it happens. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-7-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:39 -07:00
Brendan Jackman	bf6d575e24	selftests/mm: don't fail uffd-stress if too many CPUs This calculation divides a fixed parameter by an environment-dependent parameter i.e. the number of CPUs. The simple way to avoid machine-specific failures here is to just put a cap on the max value of the latter. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-6-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:38 -07:00
Brendan Jackman	db0f1c138f	selftests/mm: print some details when uffd-stress gets bad params So this can be debugged more easily. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-5-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:38 -07:00
Brendan Jackman	f3b5535abc	selftests/mm/uffd: rename nr_cpus -> nr_parallel A later commit will bound this variable so it no longer necessarily matches the number of CPUs. Rename it appropriately. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-4-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:38 -07:00
Brendan Jackman	f4b3e6c7f1	selftests/mm: skip uffd-wp-mremap if userfaultfd not available It's obvious that this should fail in that case, but still, save the reader the effort of figuring out that they've run into this by just SKIPping Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-3-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:38 -07:00
Brendan Jackman	0046dbed80	selftests/mm: skip uffd-stress if userfaultfd not available It's pretty obvious that the test wouldn't work if you don't have the feature enabled. But, it's still useful to SKIP instead of failing so the reader can immediately tell that this is the reason why. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-2-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:37 -07:00
Brendan Jackman	800ddf3cd7	selftests/mm: report errno when things fail in gup_longterm Patch series "selftests/mm: Some cleanups from trying to run them", v4. I never had much luck running mm selftests so I spent a few hours digging into why. Looks like most of the reason is missing SKIP checks, so this series is just adding a bunch of those that I found. I did not do anything like all of them, just the ones I spotted in gup_longterm, gup_test, mmap, userfaultfd and memfd_secret. It's a bit unfortunate to have to skip those tests when ftruncate() fails, but I don't have time to dig deep enough into it to actually make them pass. I have observed the issue on 9pfs and heard rumours that NFS has a similar problem. I'm now able to run these test groups successfully: - mmap - gup_test - compaction - migration - page_frag - userfaultfd - mlock I've never gone past "Waiting for hugetlb memory to get depleted", in the hugetlb tests. I don't know if they are stuck or if they would eventually work if I was patient enough (testing on a 1G machine). I have not investigated further. I had some issues with mlock tests failing due to -ENOSRCH from mlock2(), I can no longer reproduce that though, things work OK now. Of the remaining tests there may be others that work fine, but there's no convenient way to survey the whole output of run_vmtests.sh so I'm just going test by test here. In my spare moments I am slowly chipping away at a setup to run these tests continuously in a reasonably hermetic QEMU environment via virtme-ng: `5fad4b9c59/README.md` Hopefully that will eventually offer a way to provide a "canned" environment where the tests are known to work, which can be fairly easily reproduced by any developer. This patch (of 12): Just reporting failure doesn't tell you what went wrong. This can fail in different ways so report errno to help the reader get started debugging. Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-0-dec210a658f5@google.com Link: https://lkml.kernel.org/r/20250311-mm-selftests-v4-1-dec210a658f5@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:37 -07:00
Ujwal Kundur	af3b45aac5	selftests/mm: fix spelling Fix misspelling flagged by codespell. Link: https://lkml.kernel.org/r/20250215081803.1793-1-ujwal.kundur@gmail.com Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:23 -07:00
Suren Baghdasaryan	3104138517	mm: make vma cache SLAB_TYPESAFE_BY_RCU To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu(). Current checks are sufficient as long as vma is detached before it is freed. The only place this is not currently happening is in exit_mmap(). Add the missing vma_mark_detached() in exit_mmap(). Another issue which might trick lock_vma_under_rcu() during vma reuse is vm_area_dup(), which copies the entire content of the vma into a new one, overriding new vma's vm_refcnt and temporarily making it appear as attached. This might trick a racing lock_vma_under_rcu() to operate on a reused vma if it found the vma before it got reused. To prevent this situation, we should ensure that vm_refcnt stays at detached state (0) when it is copied and advances to attached state only after it is added into the vma tree. Introduce vm_area_init_from() which preserves new vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in detached state with no current readers when they are freed, lock_vma_under_rcu() will not be able to take vm_refcnt after vma got detached even if vma is reused. vma_mark_attached() in modified to include a release fence to ensure all stores to the vma happen before vm_refcnt gets initialized. Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and will minimize the number of call_rcu() calls. [surenb@google.com: remove atomic_set_release() usage in tools/] Link: https://lkml.kernel.org/r/20250217054351.2973666-1-surenb@google.com Link: https://lkml.kernel.org/r/20250213224655.1680278-18-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:21 -07:00
Suren Baghdasaryan	6bef4c2f97	mm: move lesser used vma_area_struct members into the last cacheline Move several vma_area_struct members which are rarely or never used during page fault handling into the last cacheline to better pack vm_area_struct. As a result vm_area_struct will fit into 3 as opposed to 4 cachelines. New typical vm_area_struct layout: struct vm_area_struct { union { struct { long unsigned int vm_start; /* 0 8 / long unsigned int vm_end; / 8 8 / }; / 0 16 / freeptr_t vm_freeptr; / 0 8 / }; / 0 16 / struct mm_struct vm_mm; /* 16 8 / pgprot_t vm_page_prot; / 24 8 / union { const vm_flags_t vm_flags; / 32 8 / vm_flags_t __vm_flags; / 32 8 / }; / 32 8 / unsigned int vm_lock_seq; / 40 4 / / XXX 4 bytes hole, try to pack / struct list_head anon_vma_chain; / 48 16 / / --- cacheline 1 boundary (64 bytes) --- / struct anon_vma anon_vma; /* 64 8 / const struct vm_operations_struct vm_ops; /* 72 8 / long unsigned int vm_pgoff; / 80 8 / struct file vm_file; /* 88 8 / void vm_private_data; /* 96 8 / atomic_long_t swap_readahead_info; / 104 8 / struct mempolicy vm_policy; /* 112 8 / struct vma_numab_state numab_state; /* 120 8 / / --- cacheline 2 boundary (128 bytes) --- / refcount_t vm_refcnt (__aligned__(64)); / 128 4 / / XXX 4 bytes hole, try to pack / struct { struct rb_node rb (__aligned__(8)); / 136 24 / long unsigned int rb_subtree_last; / 160 8 / } __attribute__((__aligned__(8))) shared; / 136 32 / struct anon_vma_name anon_name; /* 168 8 / struct vm_userfaultfd_ctx vm_userfaultfd_ctx; / 176 8 / / size: 192, cachelines: 3, members: 18 / / sum members: 176, holes: 2, sum holes: 8 / / padding: 8 / / forced alignments: 2, forced holes: 1, sum forced holes: 4 */ } __attribute__((__aligned__(64))); Memory consumption per 1000 VMAs becomes 48 pages: slabinfo after vm_area_struct changes: <name> ... <objsize> <objperslab> <pagesperslab> : ... vm_area_struct ... 192 42 2 : ... Link: https://lkml.kernel.org/r/20250213224655.1680278-14-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:20 -07:00
Suren Baghdasaryan	f35ab95ca0	mm: replace vm_lock and detached flag with a reference count rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore and the vma->detached flag with a refcount (vm_refcnt). When vma is in detached state, vm_refcnt is 0 and only a call to vma_mark_attached() can take it out of this state. Note that unlike before, now we enforce both vma_mark_attached() and vma_mark_detached() to be done only after vma has been write-locked. vma_mark_attached() changes vm_refcnt to 1 to indicate that it has been attached to the vma tree. When a reader takes read lock, it increments vm_refcnt, unless the top usable bit of vm_refcnt (0x40000000) is set, indicating presence of a writer. When writer takes write lock, it sets the top usable bit to indicate its presence. If there are readers, writer will wait using newly introduced mm->vma_writer_wait. Since all writers take mmap_lock in write mode first, there can be only one writer at a time. The last reader to release the lock will signal the writer to wake up. refcount might overflow if there are many competing readers, in which case read-locking will fail. Readers are expected to handle such failures. In summary: 1. all readers increment the vm_refcnt; 2. writer sets top usable (writer) bit of vm_refcnt; 3. readers cannot increment the vm_refcnt if the writer bit is set; 4. in the presence of readers, writer must wait for the vm_refcnt to drop to 1 (plus the VMA_LOCK_OFFSET writer bit), indicating an attached vma with no readers; 5. vm_refcnt overflow is handled by the readers. While this vm_lock replacement does not yet result in a smaller vm_area_struct (it stays at 256 bytes due to cacheline alignment), it allows for further size optimization by structure member regrouping to bring the size of vm_area_struct below 192 bytes. [surenb@google.com: fix a crash due to vma_end_read() that should have been removed] Link: https://lkml.kernel.org/r/20250220200208.323769-1-surenb@google.com Link: https://lkml.kernel.org/r/20250213224655.1680278-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Matthew Wilcox <willy@infradead.org> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:20 -07:00
Suren Baghdasaryan	55e50223bf	mm: introduce vma_iter_store_attached() to use with attached vmas vma_iter_store() functions can be used both when adding a new vma and when updating an existing one. However for existing ones we do not need to mark them attached as they are already marked that way. With vma->detached being a separate flag, double-marking a vmas as attached or detached is not an issue because the flag will simply be overwritten with the same value. However once we fold this flag into the refcount later in this series, re-attaching or re-detaching a vma becomes an issue since these operations will be incrementing/decrementing a refcount. Introduce vma_iter_store_new() and vma_iter_store_overwrite() to replace vma_iter_store() and avoid re-attaching a vma during vma update. Add assertions in vma_mark_attached()/vma_mark_detached() to catch invalid usage. Update vma tests to check for vma detached state correctness. Link: https://lkml.kernel.org/r/20250213224655.1680278-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:18 -07:00
Suren Baghdasaryan	8ef95d8f15	mm: mark vma as detached until it's added into vma tree Current implementation does not set detached flag when a VMA is first allocated. This does not represent the real state of the VMA, which is detached until it is added into mm's VMA tree. Fix this by marking new VMAs as detached and resetting detached flag only after VMA is added into a tree. Introduce vma_mark_attached() to make the API more readable and to simplify possible future cleanup when vma->vm_mm might be used to indicate detached vma and vma_mark_attached() will need an additional mm parameter. Link: https://lkml.kernel.org/r/20250213224655.1680278-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:17 -07:00
Suren Baghdasaryan	7b6218ae12	mm: move per-vma lock into vm_area_struct Back when per-vma locks were introduces, vm_lock was moved out of vm_area_struct in [1] because of the performance regression caused by false cacheline sharing. Recent investigation [2] revealed that the regressions is limited to a rather old Broadwell microarchitecture and even there it can be mitigated by disabling adjacent cacheline prefetching, see [3]. Splitting single logical structure into multiple ones leads to more complicated management, extra pointer dereferences and overall less maintainable code. When that split-away part is a lock, it complicates things even further. With no performance benefits, there are no reasons for this split. Merging the vm_lock back into vm_area_struct also allows vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset. Move vm_lock back into vm_area_struct, aligning it at the cacheline boundary and changing the cache to be cacheline-aligned as well. With kernel compiled using defconfig, this causes VMA memory consumption to grow from 160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes: slabinfo before: <name> ... <objsize> <objperslab> <pagesperslab> : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 160 51 2 : ... slabinfo after moving vm_lock: <name> ... <objsize> <objperslab> <pagesperslab> : ... vm_area_struct ... 256 32 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages, which is 5.5MB per 100000 VMAs. Note that the size of this structure is dependent on the kernel configuration and typically the original size is higher than 160 bytes. Therefore these calculations are close to the worst case scenario. A more realistic vm_area_struct usage before this change is: <name> ... <objsize> <objperslab> <pagesperslab> : ... vma_lock ... 40 102 1 : ... vm_area_struct ... 176 46 2 : ... Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages, which is 3.9MB per 100000 VMAs. This memory consumption growth can be addressed later by optimizing the vm_lock. [1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/ [2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/ [3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20250213224655.1680278-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:17 -07:00
Lorenzo Stoakes	0b6d4853d1	tools/selftests: add file/shmem-backed mapping guard region tests Extend the guard region self tests to explicitly assert that guard regions work correctly for functionality specific to file-backed and shmem mappings. In addition to testing all of the existing guard region functionality that is currently tested against anonymous mappings against file-backed and shmem mappings (except those which are exclusive to anonymous mapping), we now also: * Test that MADV_SEQUENTIAL does not cause unexpected readahead behaviour. * Test that MAP_PRIVATE behaves as expected with guard regions installed in both a shared and private mapping of an fd. * Test that a read-only file can correctly establish guard regions. * Test a probable fault-around case does not interfere with guard regions (or vice-versa). * Test that truncation does not eliminate guard regions. * Test that hole punching functions as expected in the presence of guard regions. * Test that a read-only mapping of a memfd write sealed mapping can have guard regions established within it and function correctly without violation of the seal. * Test that guard regions installed into a mapping of the anonymous zero page function correctly. Link: https://lkml.kernel.org/r/90c16bec5fcaafcd1700dfa3e9988c3e1aa9ac1d.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:15 -07:00
Lorenzo Stoakes	272f37d3e9	tools/selftests: expand all guard region tests to file-backed Extend the guard region tests to allow for test fixture variants for anon, shmem, and local file files. This allows us to assert that each of the expected behaviours of anonymous memory also applies correctly to file-backed (both shmem and an a file created locally in the current working directory) and thus asserts the same correctness guarantees as all the remaining tests do. The fixture teardown is now performed in the parent process rather than child forked ones, meaning cleanup is always performed, including unlinking any generated temporary files. Additionally the variant fixture data type now contains an enum value indicating the type of backing store and the mmap() invocation is abstracted to allow for the mapping of whichever backing store the variant is testing. We adjust tests as necessary to account for the fact they may now reference files rather than anonymous memory. Link: https://lkml.kernel.org/r/ab42228d2bd9b8aa18e9faebcd5c88732a7e5820.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:15 -07:00
Lorenzo Stoakes	ce1c0824fc	selftests/mm: rename guard-pages to guard-regions The feature formerly referred to as guard pages is more correctly referred to as 'guard regions', as in fact no pages are ever allocated in the process of installing the regions. To avoid confusion, rename the tests accordingly. [lorenzo.stoakes@oracle.com: fix guard regions invocation] Link: https://lkml.kernel.org/r/13426c71-d069-4407-9340-b227ff8b8736@lucifer.local Link: https://lkml.kernel.org/r/1c3cd04a3f69b5756b94bda701ac88325a9be18b.1739469950.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:15 -07:00
Mark Brown	85968b6a20	selftests/mm: allow tests to run with no huge pages support Currently the mm selftests refuse to run if huge pages are not available in the current system but this is an optional feature and not all the tests actually require them. Change the test during startup to be non-fatal and skip or omit tests which actually rely on having huge pages, allowing the other tests to be run. The gup_test does support using madvise() to configure huge pages but it ignores the error code so we just let it run. Link: https://lkml.kernel.org/r/20250212-kselftest-mm-no-hugepages-v1-2-44702f538522@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Mariano Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:14 -07:00
Eric Salem	3fae696393	selftests: mm: fix typo Fix misspelling. Link: https://lkml.kernel.org/r/77e0e915-36c3-4c95-84b8-0b73aaa17951@gmail.com Signed-off-by: Eric Salem <ericsalem@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:11 -07:00
Mark Brown	023fff71d8	selftests/mm: fix thuge-gen test name uniqueness The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a variety of sizes but always report the result of their test with the same name, meaning that automated sysetms running the tests are unable to distinguish between the various tests. Add the supplied sizes to the logged test names to distinguish between runs. My test automation was getting pretty confused about what was going on - the test names are a pretty important external interface. Link: https://lkml.kernel.org/r/20250204-kselftest-mm-fix-dups-v1-1-6afe417ef4bb@kernel.org Fixes: `b38bd9b2c4` ("selftests/mm: thuge-gen: conform to TAP format output") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:03 -07:00
Lorenzo Stoakes	c372473a54	mm: completely abstract unnecessary adj_start calculation The adj_start calculation has been a constant source of confusion in the VMA merge code. There are two cases to consider, one where we adjust the start of the vmg->middle VMA (i.e. the vmg->__adjust_middle_start merge flag is set), in which case adj_start is calculated as: (1) adj_start = vmg->end - vmg->middle->vm_start And the case where we adjust the start of the vmg->next VMA (i.e. the vmg->__adjust_next_start merge flag is set), in which case adj_start is calculated as: (2) adj_start = -(vmg->middle->vm_end - vmg->end) We apply (1) thusly: vmg->middle->vm_start = vmg->middle->vm_start + vmg->end - vmg->middle->vm_start Which simplifies to: vmg->middle->vm_start = vmg->end Similarly, we apply (2) as: vmg->next->vm_start = vmg->next->vm_start + -(vmg->middle->vm_end - vmg->end) Noting that for these VMAs to be mergeable vmg->middle->vm_end == vmg->next->vm_start and so this simplifies to: vmg->next->vm_start = vmg->next->vm_start + -(vmg->next->vm_start - vmg->end) Which simplifies to: vmg->next->vm_start = vmg->end Therefore in each case, we simply need to adjust the start of the VMA to vmg->end (!) and can do away with this adj_start calculation. The only caveat is that we must ensure we update the vm_pgoff field correctly. We therefore abstract this entire calculation to a new function vmg_adjust_set_range() which performs this calculation and sets the adjusted VMA's new range using the general vma_set_range() function. We also must update vma_adjust_trans_huge() which expects the now-abstracted adj_start parameter. It turns out this is wholly unnecessary. In vma_adjust_trans_huge() the relevant code is: if (adjust_next > 0) { struct vm_area_struct *next = find_vma(vma->vm_mm, vma->vm_end); unsigned long nstart = next->vm_start; nstart += adjust_next; split_huge_pmd_if_needed(next, nstart); } The only case where this is relevant is when vmg->__adjust_middle_start is specified (in which case adj_next would have been positive), i.e. the one in which the vma specified is vmg->prev and this the sought 'next' VMA would be vmg->middle. We can therefore eliminate the find_vma() invocation altogether and simply provide the vmg->middle VMA in this instance, or NULL otherwise. Again we have an adj_next offset calculation: next->vm_start + vmg->end - vmg->middle->vm_start Where next == vmg->middle this simplifies to vmg->end as previously demonstrated. Therefore nstart is equal to vmg->end, which is already passed to vma_adjust_trans_huge() via the 'end' parameter and so this code (rather delightfully) simplifies to: if (next) split_huge_pmd_if_needed(next, end); With these changes in place, it becomes silly for commit_merge() to return vmg->target, as it is always the same and threaded through vmg, so we finally change commit_merge() to return an error value once again. This patch has no change in functional behaviour. Link: https://lkml.kernel.org/r/7bce2cd4b5afb56211822835d145471280c3dccc.1738326519.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:02 -07:00
Lorenzo Stoakes	fe3e9cf0d7	mm: eliminate adj_start parameter from commit_merge() Introduce internal vmg->__adjust_middle_start and vmg->__adjust_next_start merge flags, enabling us to indicate to commit_merge() that we are performing a merge which either spans only part of vmg->middle, or part of vmg->next respectively. In the former instance, we change the start of vmg->middle to match the attributes of vmg->prev, without spanning all of vmg->middle. This implies that vmg->prev->vm_end and vmg->middle->vm_start are both increased to form the new merged VMA (vmg->prev) and the new subsequent VMA (vmg->middle). In the latter case, we change the end of vmg->middle to match the attributes of vmg->next, without spanning all of vmg->next. This implies that vmg->middle->vm_end and vmg->next->vm_start are both decreased to form the new merged VMA (vmg->next) and the new prior VMA (vmg->middle). Since we now have a stable set of prev, middle, next VMAs threaded through vmg and with these flags set know what is happening, we can perform the calculation in commit_merge() instead. This allows us to drop the confusing adj_start parameter and instead pass semantic information to commit_merge(). In the latter case the -(middle->vm_end - start) calculation becomes -(middle->vm-end - vmg->end), however this is correct as vmg->end is set to the start parameter. This is because in this case (rather confusingly), we manipulate vmg->middle, but ultimately return vmg->next, whose range will be correctly specified. At this point vmg->start, end is the new range for the prior VMA rather than the merged one. This patch has no change in functional behaviour. Link: https://lkml.kernel.org/r/bcec0cd980b373a5eb02236cb033034ce1effe42.1738326519.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:02 -07:00
Lorenzo Stoakes	6ab2d9c7c6	mm: further refactor commit_merge() The current VMA merge mechanism contains a number of confusing mechanisms around removal of VMAs on merge and the shrinking of the VMA adjacent to vma->target in the case of merges which result in a partial merge with that adjacent VMA. Since we now have a STABLE set of VMAs - prev, middle, next - we are now able to have the caller of commit_merge() explicitly tell us which VMAs need deleting, using newly introduced internal VMA merge flags. Doing so allows us to embed this state within the VMG and remove the confusing remove, remove2 parameters from commit_merge(). We additionally are able to eliminate the highly confusing and misleading 'expanded' parameter - a parameter that in reality refers to whether or not the return VMA is the target one or the one immediately adjacent. We can infer which is the case from whether or not the adj_start parameter is negative. This also allows us to simplify further logic around iterator configuration and VMA iterator stores. Doing so means we can also eliminate the adjust parameter, as we are able to infer which VMA ought to be adjusted from adj_start - a positive value implies we adjust the start of 'middle', a negative one implies we adjust the start of 'next'. We are then able to have commit_merge() explicitly return the target VMA, or NULL on inability to pre-allocate memory. Errors were previously filtered so behaviour does not change. We additionally move from the slightly odd use of a bitwise-flag enum vmg->merge_flags field to vmg bitfields. This patch has no change in functional behaviour. Link: https://lkml.kernel.org/r/7bf2ed24af68aac18672b7acebbd9102f48c5b03.1738326519.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:02 -07:00
Lorenzo Stoakes	3a75ccba04	mm: simplify vma merge structure and expand comments Patch series "mm: further simplify VMA merge operation", v3. While significant efforts have been made to improve the VMA merge operation, there remains remnants of the bad (or rather confusing) old days, which make the code difficult to understand, more bug prone and thus harder to modify. This series attempts to significantly improve matters in a number of respects - with a focus on simplifying the commit_merge() function which actually actions the merge operation - and importantly, adjusting the two most confusing merge cases - those in which we 'adjust' the VMA immediately adjacent to the one being merged. One source of confusion are the VMAs being threaded through the operation themselves - vmg->prev, vmg->vma and vmg->next. At the start of the operation, vmg->vma is either NULL if a new VMA is propose to be added, or if not then a pointer to an existing VMA being modified, and prev/next are (perhaps not present) VMAs sat immediately before and after the range specified in vmg->start, end, respectively. However, during the VMA merge operation, we change vmg->start, end and pgoff to span the newly merged range and vmg->vma to either be: a. The ultimately returned VMA (in most cases) or b. A VMA which we will manipulate, but ultimately instead return vmg->next. Case b. especially here is confusing for somebody reading this code, but the fact we update this state, along with vmg->start, end, pgoff only makes matters worse. We simplify things by replacing vmg->vma with vmg->middle and never changing it - this is always either NULL (for a new VMA) or the VMA being modified between vmg->prev and vmg->next. We further simplify by placing the merged VMA in a new vmg->target field - whether case b. above is the case or not. The reader of the code can now simply rely on vmg->middle being the middle VMA and vmg->target being the ultimately merged VMA. We additionally tackle the confusing cases where we 'adjust' VMAs other than the one we ultimately return as the merged VMA (this includes case b. above). These are: (1) merge <-----------> \|------\|\|--------\| \|------------\|---\| \| prev \|\| middle \| -> \| target \| m \| \|------\|\|--------\| \|------------\|---\| In which case middle must be adjusted so middle->vm_start is increased as well as performing the merge. (2) (equivalent to case b. above) <-------------> \|---------\|\|------\| \|---\|-------------\| \| middle \|\| next \| -> \| m \| target \| \|---------\|\|------\| \|---\|-------------\| In which case next must be adjusted so next->vm_start is decreased as well as performing the merge. This cases have previously been performed by calculating and passing around a dubious and confusing 'adj_start' parameter along side a pointer to an 'adjust' VMA indicating which VMA requires additional adjustment (middle in case 1 and next in case 2). With the VMG structure in place we are able to avoid this by simply setting a merge flag to describe each case: (1) Sets the vmg->__adjust_middle_start flag (2) Sets the vmg->__adjust_next_start flag By doing so it turns out we can vastly simplify the logic and calculate what is required to perform the operation. Taken together the refactorings make it far easier to understand what is being done even in these more confusing cases, make the code far more maintainable, debuggable, and testable, providing more internal state indicating what is happening in the merge operation. The changes have no functional net impact on the merge operation and everything should still behave as it did before. This patch (of 5): The merge code, while much improved, still has a number of points of confusion. As part of a broader series cleaning this up to make this more maintainable, we start by addressing some confusion around vma_merge_struct fields. So far, the caller either provides no vmg->vma (a new VMA) or supplies the existing VMA which is being altered, setting vmg->start,end,pgoff to the proposed VMA dimensions. vmg->vma is then updated, as are vmg->start,end,pgoff as the merge process proceeds and the appropriate merge strategy is determined. This is rather confusing, as vmg->vma starts off as the 'middle' VMA between vmg->prev,next, but becomes the 'target' VMA, except in one specific edge case (merge next, shrink middle). Int his patch we introduce vmg->middle to describe the VMA that is between vmg->prev and vmg->next, and does NOT change during the merge operation. We replace vmg->vma with vmg->target, and use this only during the merge operation itself. Aside from the merge right, shrink middle case, this becomes the VMA that forms the basis of the VMA that is returned. This edge case can be addressed in a future commit. We also add a number of comments to explain what is going on. Finally, we adjust the ASCII diagrams showing each merge case in vma_merge_existing_range() to be clearer - the arrow range previously showed the vmg->start, end spanned area, but it is clearer to change this to show the final merged VMA. This patch has no change in functional behaviour. Link: https://lkml.kernel.org/r/cover.1738326519.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/4dfe60f1419d55e5d0516f56349695d73a57184c.1738326519.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:06:01 -07:00
Zi Yan	ad4c9bb541	selftests/mm: test splitting file-backed THP to any lower order Now split_huge_page*() supports shmem THP split to any lower order. Test it. The test now reads file content out after split to check if the split corrupts the file data. Link: https://lkml.kernel.org/r/20250122161928.1240637-3-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:05:56 -07:00
Zi Yan	035a112e5f	selftests/mm: make file-backed THP split work by writing PMD size data Commit `acd7ccb284` ("mm: shmem: add large folio support for tmpfs") changes huge=always to allocate THP/mTHP based on write size and split_huge_page_test does not write PMD size data, so file-back THP is not created during the test. Fix it by writing PMD size data. Link: https://lkml.kernel.org/r/20250122161928.1240637-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 22:05:56 -07:00
Rafael Aquini	67a2f86846	selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation We noticed that uffd-stress test was always failing to run when invoked for the hugetlb profiles on x86_64 systems with a processor count of 64 or bigger: ... # ------------------------------------ # running ./uffd-stress hugetlb 128 32 # ------------------------------------ # ERROR: invalid MiB (errno=9, @uffd-stress.c:459) ... # [FAIL] not ok 3 uffd-stress hugetlb 128 32 # exit=1 ... The problem boils down to how run_vmtests.sh (mis)calculates the size of the region it feeds to uffd-stress. The latter expects to see an amount of MiB while the former is just giving out the number of free hugepages halved down. This measurement discrepancy ends up violating uffd-stress' assertion on number of hugetlb pages allocated per CPU, causing it to bail out with the error above. This commit fixes that issue by adjusting run_vmtests.sh's half_ufd_size_MB calculation so it properly renders the region size in MiB, as expected, while maintaining all of its original constraints in place. Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com Fixes: `2e47a445d7` ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation") Signed-off-by: Rafael Aquini <raquini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-16 17:40:25 -07:00
Rae Moar	2e0cf2b32f	kunit: tool: add test to check parsing late test plan Add test to check for the infinite loop caused by the inability to parse a late test plan. The test parses the following output: TAP version 13 ok 4 test4 1..4 Link: https://lore.kernel.org/r/20250313192714.1380005-1-rmoar@google.com Signed-off-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2025-03-15 18:13:43 -06:00
Rae Moar	1d4c06d519	kunit: tool: Fix bug in parsing test plan A bug was identified where the KTAP below caused an infinite loop: TAP version 13 ok 4 test_case 1..4 The infinite loop was caused by the parser not parsing a test plan if following a test result line. Fix this bug by parsing test plan line to avoid the infinite loop. Link: https://lore.kernel.org/r/20250313192714.1380005-1-rmoar@google.com Signed-off-by: Rae Moar <rmoar@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <shuah@kernel.org>	2025-03-15 18:13:38 -06:00
Saket Kumar Bhaskar	8c10109a97	selftests/bpf: Fix sockopt selftest failure on powerpc The SO_RCVLOWAT option is defined as 18 in the selftest header, which matches the generic definition. However, on powerpc, SO_RCVLOWAT is defined as 16. This discrepancy causes sol_socket_sockopt() to fail with the default switch case on powerpc. This commit fixes by defining SO_RCVLOWAT as 16 for powerpc. Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/bpf/20250311084647.3686544-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:49:24 -07:00
Viktor Malik	de07b18289	selftests/bpf: Fix string read in strncmp benchmark The strncmp benchmark uses the bpf_strncmp helper and a hand-written loop to compare two strings. The values of the strings are filled from userspace. One of the strings is non-const (in .bss) while the other is const (in .rodata) since that is the requirement of bpf_strncmp. The problem is that in the hand-written loop, Clang optimizes the reads from the const string to always return 0 which breaks the benchmark. Use barrier_var to prevent the optimization. The effect can be seen on the strncmp-no-helper variant. Before this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 (112.309us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 1 (-23.238us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 2 ( 58.994us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 3 (-30.466us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 4 ( 29.996us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 5 ( 16.949us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Iter 6 (-60.035us): hits 0.000M/s ( 0.000M/prod), drops 0.000M/s, total operations 0.000M/s Summary: hits 0.000 ± 0.000M/s ( 0.000M/prod), drops 0.000 ± 0.000M/s, total operations 0.000 ± 0.000M/s After this change: # ./bench strncmp-no-helper Setting up benchmark 'strncmp-no-helper'... Benchmark 'strncmp-no-helper' started. Iter 0 ( 77.711us): hits 5.534M/s ( 5.534M/prod), drops 0.000M/s, total operations 5.534M/s Iter 1 ( 11.215us): hits 6.006M/s ( 6.006M/prod), drops 0.000M/s, total operations 6.006M/s Iter 2 (-14.253us): hits 5.931M/s ( 5.931M/prod), drops 0.000M/s, total operations 5.931M/s Iter 3 ( 59.087us): hits 6.005M/s ( 6.005M/prod), drops 0.000M/s, total operations 6.005M/s Iter 4 (-21.379us): hits 6.010M/s ( 6.010M/prod), drops 0.000M/s, total operations 6.010M/s Iter 5 (-20.310us): hits 5.861M/s ( 5.861M/prod), drops 0.000M/s, total operations 5.861M/s Iter 6 ( 53.937us): hits 6.004M/s ( 6.004M/prod), drops 0.000M/s, total operations 6.004M/s Summary: hits 5.969 ± 0.061M/s ( 5.969M/prod), drops 0.000 ± 0.000M/s, total operations 5.969 ± 0.061M/s Fixes: `9c42652f8b` ("selftests/bpf: Add benchmark for bpf_strncmp() helper") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:49:24 -07:00
Kumar Kartikeya Dwivedi	1f375aef6c	selftests/bpf: Fix arena_spin_lock compilation on PowerPC Venkat reported a compilation error for BPF selftests on PowerPC [0]. The crux of the error is the following message: In file included from progs/arena_spin_lock.c:7: /root/bpf-next/tools/testing/selftests/bpf/bpf_arena_spin_lock.h:122:8: error: member reference base type '__attribute__((address_space(1))) u32' (aka '__attribute__((address_space(1))) unsigned int') is not a structure or union 122 \| old = atomic_read(&lock->val); This is because PowerPC overrides the qspinlock type changing the lock->val member's type from atomic_t to u32. To remedy this, import the asm-generic version in the arena spin lock header, name it __qspinlock (since it's aliased to arena_spinlock_t, the actual name hardly matters), and adjust the selftest to not depend on the type in vmlinux.h. [0]: https://lore.kernel.org/bpf/7bc80a3b-d708-4735-aa3b-6a8c21720f9d@linux.ibm.com Fixes: `88d706ba7c` ("selftests/bpf: Introduce arena spin lock") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/bpf/20250311154244.3775505-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:59 -07:00
Blaise Boscaccy	7987f1627e	selftests/bpf: Add a kernel flag test for LSM bpf hook This test exercises the kernel flag added to security_bpf by effectively blocking light-skeletons from loading while allowing normal skeletons to function as-is. Since this should work with any arbitrary BPF program, an existing program from LSKELS_EXTRA was used as a test payload. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250310221737.821889-3-bboscaccy@linux.microsoft.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:58 -07:00
Blaise Boscaccy	082f1db02c	security: Propagate caller information in bpf hooks Certain bpf syscall subcommands are available for usage from both userspace and the kernel. LSM modules or eBPF gatekeeper programs may need to take a different course of action depending on whether or not a BPF syscall originated from the kernel or userspace. Additionally, some of the bpf_attr struct fields contain pointers to arbitrary memory. Currently the functionality to determine whether or not a pointer refers to kernel memory or userspace memory is exposed to the bpf verifier, but that information is missing from various LSM hooks. Here we augment the LSM hooks to provide this data, by simply passing a boolean flag indicating whether or not the call originated in the kernel, in any hook that contains a bpf_attr struct that corresponds to a subcommand that may be called from the kernel. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:58 -07:00
Chen Ni	a03d375330	selftests/bpf: Convert comma to semicolon Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/bpf/20250310032045.651068-1-nichen@iscas.ac.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:58 -07:00
Anton Protopopov	caa4237a79	selftests/bpf: Fix selection of static vs. dynamic LLVM The Makefile uses the exit code of the `llvm-config --link-static --libs` command to choose between statically-linked and dynamically-linked LLVMs. The stdout and stderr of that command are redirected to /dev/null. To redirect the output the "&>" construction is used, which might not be supported by /bin/sh, which is executed by make for $(shell ...) commands. On such systems the test will fail even if static LLVM is actually supported. Replace "&>" by ">/dev/null 2>&1" to fix this. Fixes: `2a9d30fac8` ("selftests/bpf: Support dynamically linking LLVM if static is not available") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/20250310145112.1261241-1-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:58 -07:00
Emil Tsalapatis	c06707ff07	selftests: bpf: fix duplicate selftests in cpumask_success. The BPF cpumask selftests are currently run twice in test_progs/cpumask.c, once by traversing cpumask_success_testcases, and once by invoking RUN_TESTS(cpumask_success). Remove the invocation of RUN_TESTS to properly run the selftests only once. Now that the tests are run only through cpumask_success_testscases, add to it the missing test_refcount_null_tracking testcase. Also remove the __success annotation from it, since it is now loaded and invoked by the runner. Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250309230427.26603-5-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:58 -07:00
Emil Tsalapatis	918ba2636d	selftests: bpf: add bpf_cpumask_populate selftests Add selftests for the bpf_cpumask_populate helper that sets a bpf_cpumask to a bit pattern provided by a BPF program. Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250309230427.26603-3-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation)	1041b8bc9f	selftests/bpf: lwt_seg6local: Move test to test_progs test_lwt_seg6local.sh isn't used by the BPF CI. Add a new file in the test_progs framework to migrate the tests done by test_lwt_seg6local.sh. It uses the same network topology and the same BPF programs located in progs/test_lwt_seg6local.c. Use the network helpers instead of `nc` to exchange the final packet. Remove test_lwt_seg6local.sh and its Makefile entry. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250307-seg6local-v1-2-990fff8f180d@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation)	b1d85ff517	selftests/bpf: lwt_seg6local: Remove unused routes Some routes in fb00:: are initialized during setup, even though they aren't needed by the test as the UDP packets will travel through the lightweight tunnels. Remove these unnecessary routes. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20250307-seg6local-v1-1-990fff8f180d@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Feng Yang	339c1f8ea1	selftests/bpf: Fix cap_enable_effective() return code The caller of cap_enable_effective() expects negative error code. Fix it. Before: failed to restore CAP_SYS_ADMIN: -1, Unknown error -1 After: failed to restore CAP_SYS_ADMIN: -3, No such process failed to restore CAP_SYS_ADMIN: -22, Invalid argument Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250305022234.44932-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Amery Hung	8bda5b787d	selftests/bpf: Fix dangling stdout seen by traffic monitor thread Traffic monitor thread may see dangling stdout as the main thread closes and reassigns stdout without protection. This happens when the main thread finishes one subtest and moves to another one in the same netns_new() scope. The issue can be reproduced by running test_progs repeatedly with traffic monitor enabled: for ((i=1;i<=100;i++)); do ./test_progs -a flow_dissector_skb* -m '*' done For restoring stdout in crash_handler(), since it does not really care about closing stdout, simlpy flush stdout and restore it to the original one. Then, Fix the issue by consolidating stdio_restore_cleanup() and stdio_restore(), and protecting the use/close/assignment of stdout with a lock. The locking in the main thread is always performed regradless of whether traffic monitor is running or not for simplicity. It won't have any side-effect. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/20250305182057.2802606-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Amery Hung	34a25aabcd	selftests/bpf: Allow assigning traffic monitor print function Allow users to change traffic monitor's print function. If not provided, traffic monitor will print to stdout by default. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250305182057.2802606-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Amery Hung	aeb5bbb025	selftests/bpf: Clean up call sites of stdio_restore() reset_affinity() and save_ns() are only called in run_one_test(). There is no need to call stdio_restore() in reset_affinity() and save_ns() if stdio_restore() is moved right after a test finishes in run_one_test(). Also remove an unnecessary check of env.stdout_saved in crash_handler() by moving env.stdout_saved assignment to the beginning of main(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/20250305182057.2802606-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Bastien Curutchet (eBPF Foundation)	f5e288943e	selftests/bpf: Move test_lwt_ip_encap to test_progs test_lwt_ip_encap.sh isn't used by the BPF CI. Add a new file in the test_progs framework to migrate the tests done by test_lwt_ip_encap.sh. It uses the same network topology and the same BPF programs located in progs/test_lwt_ip_encap.c. Rework the GSO part to avoid using nc and dd. Remove test_lwt_ip_encap.sh and its Makefile entry. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250304-lwt_ip-v1-1-8fdeb9e79a56@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:57 -07:00
Kumar Kartikeya Dwivedi	2dfc8186d6	selftests/bpf: Add tests for arena spin lock Add some basic selftests for qspinlock built over BPF arena using cond_break_label macro. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250306035431.2186189-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:56 -07:00
Kumar Kartikeya Dwivedi	88d706ba7c	selftests/bpf: Introduce arena spin lock Implement queued spin lock algorithm as BPF program for lock words living in BPF arena. The algorithm is copied from kernel/locking/qspinlock.c and adapted for BPF use. We first implement abstract helpers for portable atomics and acquire/release load instructions, by relying on X86_64 presence to elide expensive barriers and rely on implementation details of the JIT, and fall back to slow but correct implementations elsewhere. When support for acquire/release load/stores lands, we can improve this state. Then, the qspinlock algorithm is adapted to remove dependence on multi-word atomics due to lack of support in BPF ISA. For instance, xchg_tail cannot use 16-bit xchg, and needs to be a implemented as a 32-bit try_cmpxchg loop. Loops which are seemingly infinite from verifier PoV are annotated with cond_break_label macro to return an error. Only 1024 NR_CPUs are supported. Note that the slow path is a global function, hence the verifier doesn't know the return value's precision. The recommended way of usage is to always test against zero for success, and not ret < 0 for error, as the verifier would assume ret > 0 has not been accounted for. Add comments in the function documentation about this quirk. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250306035431.2186189-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:56 -07:00
Kumar Kartikeya Dwivedi	4b7ede0be3	selftests/bpf: Introduce cond_break_label Add a new cond_break_label macro that jumps to the specified label when the cond_break termination check fires, and allows us to better handle the uncontrolled termination of the loop. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250306035431.2186189-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:56 -07:00
Eduard Zingerman	871ef8d50e	bpf: correct use/def for may_goto instruction may_goto instruction does not use any registers, but in compute_insn_live_regs() it was treated as a regular conditional jump of kind BPF_K with r0 as source register. Thus unnecessarily marking r0 as used. Fixes: `14c8552db6` ("bpf: simple DFA-based live registers analysis") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250305085436.2731464-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:30 -07:00
Eduard Zingerman	2ea8f6a1cd	selftests/bpf: test cases for compute_live_registers() Cover instructions from each kind: - assignment - arithmetic - store/load - endian conversion - atomics - branches, conditional branches, may_goto, calls - LD_ABS/LD_IND - address_space_cast Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250304195024.2478889-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:30 -07:00
Eduard Zingerman	14c8552db6	bpf: simple DFA-based live registers analysis Compute may-live registers before each instruction in the program. The register is live before the instruction I if it is read by I or some instruction S following I during program execution and is not overwritten between I and S. This information would be used in the next patch as a hint in func_states_equal(). Use a simple algorithm described in [1] to compute this information: - define the following: - I.use : a set of all registers read by instruction I; - I.def : a set of all registers written by instruction I; - I.in : a set of all registers that may be alive before I execution; - I.out : a set of all registers that may be alive after I execution; - I.successors : a set of instructions S that might immediately follow I for some program execution; - associate separate empty sets 'I.in' and 'I.out' with each instruction; - visit each instruction in a postorder and update corresponding 'I.in' and 'I.out' sets as follows: I.out = U [S.in for S in I.successors] I.in = (I.out / I.def) U I.use (where U stands for set union, / stands for set difference) - repeat the computation while I.{in,out} changes for any instruction. On implementation side keep things as simple, as possible: - check_cfg() already marks instructions EXPLORED in post-order, modify it to save the index of each EXPLORED instruction in a vector; - represent I.{in,out,use,def} as bitmasks; - don't split the program into basic blocks and don't maintain the work queue, instead: - do fixed-point computation by visiting each instruction; - maintain a simple 'changed' flag if I.{in,out} for any instruction change; Measurements show that even such simplistic implementation does not add measurable verification time overhead (for selftests, at-least). Note on check_cfg() ex_insn_beg/ex_done change: To avoid out of bounds access to env->cfg.insn_postorder array, it should be guaranteed that instruction transitions to EXPLORED state only once. Previously this was not the fact for incorrect programs with direct calls to exception callbacks. The 'align' selftest needs adjustment to skip computed insn/live registers printout. Otherwise it matches lines from the live registers printout. [1] https://en.wikipedia.org/wiki/Live-variable_analysis Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250304195024.2478889-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:29 -07:00
Peilin Ye	ff3afe5da9	selftests/bpf: Add selftests for load-acquire and store-release instructions Add several ./test_progs tests: - arena_atomics/load_acquire - arena_atomics/store_release - verifier_load_acquire/* - verifier_store_release/* - verifier_precision/bpf_load_acquire - verifier_precision/bpf_store_release The last two tests are added to check if backtrack_insn() handles the new instructions correctly. Additionally, the last test also makes sure that the verifier "remembers" the value (in src_reg) we store-release into e.g. a stack slot. For example, if we take a look at the test program: #0: r1 = 8; /* store_release((u64 )(r10 - 8), r1); / #1: .8byte %[store_release]; #2: r1 = (u64 )(r10 - 8); #3: r2 = r10; #4: r2 += r1; #5: r0 = 0; #6: exit; At #1, if the verifier doesn't remember that we wrote 8 to the stack, then later at #4 we would be adding an unbounded scalar value to the stack pointer, which would cause the program to be rejected: VERIFIER LOG: ============= ... math between fp pointer and register with unbounded min value is not allowed For easier CI integration, instead of using built-ins like __atomic_{load,store}_n() which depend on the new __BPF_FEATURE_LOAD_ACQ_STORE_REL pre-defined macro, manually craft load-acquire/store-release instructions using __imm_insn(), as suggested by Eduard. All new tests depend on: (1) Clang major version >= 18, and (2) ENABLE_ATOMICS_TESTS is defined (currently implies -mcpu=v3 or v4), and (3) JIT supports load-acquire/store-release (currently arm64 and x86-64) In .../progs/arena_atomics.c: /* 8-byte-aligned / __u8 __arena_global load_acquire8_value = 0x12; / 1-byte hole */ __u16 __arena_global load_acquire16_value = 0x1234; That 1-byte hole in the .addr_space.1 ELF section caused clang-17 to crash: fatal error: error in backend: unable to write nop sequence of 1 bytes To work around such llvm-17 CI job failures, conditionally define __arena_global variables as 64-bit if __clang_major__ < 18, to make sure .addr_space.1 has no holes. Ideally we should avoid compiling this file using clang-17 at all (arena tests depend on __BPF_FEATURE_ADDR_SPACE_CAST, and are skipped for llvm-17 anyway), but that is a separate topic. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/1b46c6feaf0f1b6984d9ec80e500cc7383e9da1a.1741049567.git.yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:29 -07:00
Kumar Kartikeya Dwivedi	2fb761823e	bpf, x86: Add x86 JIT support for timed may_goto Implement the arch_bpf_timed_may_goto function using inline assembly to have control over which registers are spilled, and use our special protocol of using BPF_REG_AX as an argument into the function, and as the return value when going back. Emit call depth accounting for the call made from this stub, and ensure we don't have naked returns (when rethunk mitigations are enabled) by falling back to the RET macro (instead of retq). After popping all saved registers, the return address into the BPF program should be on top of the stack. Since the JIT support is now enabled, ensure selftests which are checking the produced may_goto sequences do not break by adjusting them. Make sure we still test the old may_goto sequence on other architectures, while testing the new sequence on x86_64. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250304003239.2390751-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:28 -07:00
Mykyta Yatsenko	6419d08b6c	selftests/bpf: Add tests for bpf_object__prepare Add selftests, checking that running bpf_object__prepare successfully creates maps before load step. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250303135752.158343-5-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:28 -07:00
Bastien Curutchet (eBPF Foundation)	a54e700696	selftests/bpf: test_tunnel: Remove test_tunnel.sh All tests from test_tunnel.sh have been migrated into test test_progs. The last test remaining in the script is the test_ipip() that is already covered in the test_prog framework by the NONE case of test_ipip_tunnel(). Remove the test_tunnel.sh script and its Makefile entry Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-10-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	05cd60ab57	selftests/bpf: test_tunnel: Move ip6tnl tunnel tests to test_progs ip6tnl tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6tnl tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ipip6() and test_ip6ip6() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-9-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	260f2da62d	selftests/bpf: test_tunnel: Move ip6geneve tunnel test to test_progs ip6geneve tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6geneve tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ip6geneve() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-8-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	bd477738e6	selftests/bpf: test_tunnel: Move geneve tunnel test to test_progs geneve tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test geneve tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_geneve() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-7-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	ea60b6a524	selftests/bpf: test_tunnel: Move ip6erspan tunnel test to test_progs ip6erspan tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6erspan tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_ip6erspan() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-6-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	cadb08a4d3	selftests/bpf: test_tunnel: Move erspan tunnel tests to test_progs erspan tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test erspan tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_erspan() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-5-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	856818b28f	selftests/bpf: test_tunnel: Move ip6gre tunnel test to test_progs ip6gre tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test ip6gre tunnels. It uses the same network topology and the same BPF programs than the script. Disable the IPv6 DAD feature because it can take lot of time and cause some tests to fail depending on the environment they're run on. Remove test_ip6gre() and test_ip6gretap() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-4-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	257dfd1c6b	selftests/bpf: test_tunnel: Move gre tunnel test to test_progs gre tunnels are tested in the test_tunnel.sh but not in the test_progs framework. Add a new test in test_progs to test gre tunnels. It uses the same network topology and the same BPF programs than the script. Remove test_gre() and test_gre_no_tunnel_key() from the script. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-3-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:27 -07:00
Bastien Curutchet (eBPF Foundation)	fcb39996a2	selftests/bpf: test_tunnel: Add ping helpers All tests use more or less the same ping commands as final validation. Also test_ping()'s return value is checked with ASSERT_OK() while this check is already done by the SYS() macro inside test_ping(). Create helpers around test_ping() and use them in the tests to avoid code duplication. Remove the unnecessary ASSERT_OK() from the tests. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-2-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:26 -07:00
Bastien Curutchet (eBPF Foundation)	5d6aa606c1	selftests/bpf: test_tunnel: Add generic_attach* helpers A fair amount of code duplication is present among tests to attach BPF programs. Create generic_attach* helpers that attach BPF programs to a given interface. Use ASSERT_OK_FD() instead of ASSERT_GE() to check fd's validity. Use these helpers in all the available tests. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250303-tunnels-v2-1-8329f38f0678@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:26 -07:00
Eduard Zingerman	346e4ca462	veristat: Report program type guess results to sdterr In order not to pollute CSV output, e.g.: $ ./veristat -o csv exceptions_ext.bpf.o > test.csv Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/extension... Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/throwing_extension... Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:26 -07:00
Eduard Zingerman	2d95b3f582	veristat: Strerror expects positive number (errno) Before: ./veristat -G @foobar iters.bpf.o Failed to open presets in 'foobar': Unknown error -2 ... After: ./veristat -G @foobar iters.bpf.o Failed to open presets in 'foobar': No such file or directory ... Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:26 -07:00
Eduard Zingerman	c0d078da7a	veristat: @files-list.txt notation for object files list Allow reading object file list from file. E.g. the following command: ./veristat @list.txt Is equivalent to the following invocation: ./veristat line-1 line-2 ... line-N Where line-i corresponds to lines from list.txt. Lines starting with '#' are ignored. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Link: https://lore.kernel.org/bpf/20250301000147.1583999-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:26 -07:00
Kumar Kartikeya Dwivedi	72ed076abf	selftests/bpf: Add tests for extending sleepable global subprogs Add tests for freplace behavior with the combination of sleepable and non-sleepable global subprogs. The changes_pkt_data selftest did all the hardwork, so simply rename it and include new support for more summarization tests for might_sleep bit. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250301151846.1552362-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:25 -07:00
Kumar Kartikeya Dwivedi	b2bb703434	selftests/bpf: Test sleepable global subprogs in atomic contexts Add tests for rejecting sleepable and accepting non-sleepable global function calls in atomic contexts. For spin locks, we still reject all global function calls. Once resilient spin locks land, we will carefully lift in cases where we deem it safe. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250301151846.1552362-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:25 -07:00
Alexis Lothoré (eBPF Foundation)	93cf4e537e	bpf/selftests: test_select_reuseport_kern: Remove unused header test_select_reuseport_kern.c is currently including <stdlib.h>, but it does not use any definition from there. Remove stdlib.h inclusion from test_select_reuseport_kern.c Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250227-remove_wrong_header-v1-1-bc94eb4e2f73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:25 -07:00
Yonghong Song	2222aa1c89	selftests/bpf: Add selftests allowing cgroup prog pre-ordering Add a few selftests with cgroup prog pre-ordering. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250224230121.283601-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:25 -07:00
Jiayuan Chen	7f260af1f2	selftests/bpf: Fixes for test_maps test BPF CI has failed 3 times in the last 24 hours. Add retry for ENOMEM. It's similar to the optimization plan: commit `2f553b032c` ("selftsets/bpf: Retry map update for non-preallocated per-cpu map") Failed CI: https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343 https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030 https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296 selftests/bpf: Fixes for test_maps test Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' Fork 100 tasks to 'test_update_delete' ...... test_task_storage_map_stress_lookup:PASS test_maps: OK, 0 SKIPPED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-4-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:25 -07:00
Jiayuan Chen	93a279b65a	selftests/bpf: Allow auto port binding for bpf nf Allow auto port binding for bpf nf test to avoid binding conflict. ./test_progs -a bpf_nf 24/1 bpf_nf/xdp-ct:OK 24/2 bpf_nf/tc-bpf-ct:OK 24/3 bpf_nf/alloc_release:OK 24/4 bpf_nf/insert_insert:OK 24/5 bpf_nf/lookup_insert:OK 24/6 bpf_nf/set_timeout_after_insert:OK 24/7 bpf_nf/set_status_after_insert:OK 24/8 bpf_nf/change_timeout_after_alloc:OK 24/9 bpf_nf/change_status_after_alloc:OK 24/10 bpf_nf/write_not_allowlisted_field:OK 24 bpf_nf:OK Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-3-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:24 -07:00
Jiayuan Chen	acf0d6f681	selftests/bpf: Allow auto port binding for cgroup connect Allow auto port binding for cgroup connect test to avoid binding conflict. Result: ./test_progs -a cgroup_v1v2 59 cgroup_v1v2:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250227142646.59711-2-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:24 -07:00
Mykyta Yatsenko	064e9aacfd	selftests/bpf: Add tests for bpf_dynptr_copy Add XDP setup type for dynptr tests, enabling testing for non-contiguous buffer. Add 2 tests: - test_dynptr_copy - verify correctness for the fast (contiguous buffer) code path. - test_dynptr_copy_xdp - verifies code paths that handle non-contiguous buffer. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250226183201.332713-4-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-03-15 11:48:20 -07:00
Alison Schofield	114a89b433	cxl/test: Define a CFMWS capable of a 3 way HB interleave The CXL unit test cxl-xor-region.sh is skipping a 1+1+1 region interleave test case because the window is not defined. Additionally, upcoming expansion of 3 way HB interleave test cases (like 2+2+2) require the same window. Replace an unused CFMWS with a 3-way capable CFMWS in the set of CFMWS's loaded when interleave_arithmetic=1. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20250226221931.2352061-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-03-14 16:27:40 -07:00
Dave Jiang	763e15d047	Merge branch 'for-6.15/extended-linear-cache' into cxl-for-next2 Add support for Extended Linear Cache for CXL. Add enumeration support of the cache. Add MCE notification of the aliased memory address.	2025-03-14 16:22:34 -07:00
Dave Jiang	d781a45270	Merge branch 'for-6.15/dirty-shutdown' into cxl-for-next2 Add support for Global Persistent Flush (GPF) and dirty shutdown accounting.	2025-03-14 16:11:42 -07:00
Davidlohr Bueso	6eb52f63ea	tools/testing/cxl: Set Shutdown State support Add support to emulate the CXL Set Shutdown State operation. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250220220235.276831-5-dave@stgolabs.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-03-14 15:55:27 -07:00
Yuquan Wang	2da9ad027e	cxl/pmem: debug invalid serial number data In a nvdimm interleave-set each device with an invalid or zero serial number may cause pmem region initialization to fail, but in cxl case such device could still set cookies of nd_interleave_set and create a nvdimm pmem region. This adds the validation of serial number in cxl pmem region creation. The event of no serial number would cause to fail to set the cookie and pmem region. For cxl-test to work properly, always +1 on mock device's serial number. Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250219040029.515451-2-wangyuquan1236@phytium.com.cn Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-03-14 15:01:04 -07:00
Dave Jiang	9387c6aec0	Merge branch 'for-6.15/fw-first-error-logging' into cxl-for-next2 Add logging support for CXL CPER endpoint and port protocol errors. Including the 2 patches that was completed later. Link: https://lore.kernel.org/linux-cxl/20250123084421.127697-1-Smita.KoralahalliChannabasappa@amd.com/ Link: https://lore.kernel.org/linux-cxl/20250310223839.31342-1-Smita.KoralahalliChannabasappa@amd.com/	2025-03-14 14:27:17 -07:00
Smita Koralahalli	36f257e3b0	acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors. The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors. Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing. Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations. Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors. [DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2025-03-14 14:21:45 -07:00
Tamir Duberstein	97c1f302f2	scanf: convert self-test to KUnit Convert the scanf() self-test to a KUnit test. In the interest of keeping the patch reasonably-sized this doesn't refactor the tests into proper parameterized tests - it's all one big test case. Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://lore.kernel.org/r/20250307-scanf-kunit-convert-v9-3-b98820fa39ff@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>	2025-03-14 13:55:37 -07:00
Niklas Cassel	24a42582b0	selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header In order to improve readability, use the IRQ_TYPE_* defines from the UAPI header rather than using raw values. No functional change. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-12-cassel@kernel.org	2025-03-14 16:12:04 +00:00
Matthieu Baerts (NGI0)	a1e36ec363	selftests: drv-net: fix merge conflicts resolution After the recent merge between net-next and net, I got some conflicts on my side because the merge resolution was different from Stephen's one [1] I applied on my side in the MPTCP tree. It looks like the code that is now in net-next is using the old way to retrieve the local and remote addresses. This patch is now using the new way, like what was in Stephen's email [1]. Also, in get_interface_info(), there were no conflicts in this area, because that was new code from 'net', but a small adaptation was needed there as well to get the remote address. Fixes: `941defcea7` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Link: https://lore.kernel.org/20250311115758.17a1d414@canb.auug.org.au [1] Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250314-net-next-drv-net-ping-fix-merge-v1-1-0d5c19daf707@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-14 10:24:14 +01:00
Paolo Abeni	941defcea7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py `75cc19c8ff` ("selftests: drv-net: add xdp cases for ping.py") `de94e86974` ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/ net/core/devmem.c `a70f891e0f` ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") `1d22d3060b` ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/ Adjacent changes: tools/testing/selftests/net/Makefile `6f50175cca` ("selftests: Add IPv6 link-local address generation tests for GRE devices.") `2e5584e0f9` ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c `661958552e` ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") `fe96d717d3` ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-13 23:08:11 +01:00
Jason Xing	a1e0783e10	selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN Add selftests for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN BPF socket cases. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250312153523.9860-5-kerneljasonxing@gmail.com	2025-03-13 14:42:53 -07:00
Linus Torvalds	4003c9e787	Including fixes from netfilter, bluetooth and wireless. No known regressions outstanding. Current release - regressions: - wifi: nl80211: fix assoc link handling - eth: lan78xx: sanitize return values of register read/write functions Current release - new code bugs: - ethtool: tsinfo: fix dump command - bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL - eth: mlx5: DR, use the right action structs for STEv3 Previous releases - regressions: - netfilter: nf_tables: make destruction work queue pernet - gre: fix IPv6 link-local address generation. - wifi: iwlwifi: fix TSO preparation - bluetooth: revert "bluetooth: hci_core: fix sleeping function called from invalid context" - ovs: revert "openvswitch: switch to per-action label counting in conntrack" - eth: ice: fix switchdev slow-path in LAG - eth: bonding: fix incorrect MAC address setting to receive NS messages Previous releases - always broken: - core: prevent TX of unreadable skbs - sched: prevent creation of classes with TC_H_ROOT - netfilter: nft_exthdr: fix offset with ipv4_find_option() - wifi: cfg80211: cancel wiphy_work before freeing wiphy - mctp: copy headers if cloned - phy: nxp-c45-tja11xx: add errata for TJA112XA/B - eth: bnxt: fix kernel panic in the bnxt_get_queue_stats{rx \| tx} - eth: mlx5: bridge, fix the crash caused by LAG state check Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmfS+yMSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOknd8P/3XdIaDXrwVl9BSi37qToRs40i+6qPep 5MZN+wakJkuHQqooADF/UarqHWEtQwFOTJalgsPbrUpsespA0lhmzSRHsk+2Xy// 8o8RuZ2VngN1/UVBaoXb0QOY28CgtDs0SdW6/jST/dEp3bHvW+ZTDR2aaivKAWtJ gdWIWR+Ctw1lpetKJer3loPXjPd2doDedSlf64CiHw7dRFTwiE2AcrTRkYdd18Ml HJjiBkqKMnUTxqWxOU3GJxdl7Qru7CvAfn3XQ0PjCO9hADPJb/uWZDRENauk3/jE 6lJc+EP5AoG3Ce8MLzHr/aayZR3YtyhJ2TaWmyPbe6BJq7tPH84ydKMZnF7UnOd5 Jp4T6rykkJOYnUdiYGiOPHcTxQxUugKUrL3rNvGJep+x3JAi/DJJkHNWEx+EBB55 3YbDTOUiY1bCGOoGJIZRwNzw52O0+7qINvCtAN21sMu4pKr/nYMzO8L9ldFExllj EuOcxr813V09NyoJs+wWcy9FdlVmcPwCsT4Pylrdb+mA1XohPzMwibH3yHk4XCZ6 Aijz5Bb78i4vg+kZzNZmVY11J5yL0/eC8DefMgUe3+JKwyM2TiSn5hHK6tYbTLtM yAjbTkeu/FD6alSRAnIAL+YBHg4ri3fXXjU4vLZA5n3wWIjfqpL7JAFgnK57Bd8Z 1X5k+52VjQzj =lIrc -----END PGP SIGNATURE----- Merge tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, bluetooth and wireless. No known regressions outstanding. Current release - regressions: - wifi: nl80211: fix assoc link handling - eth: lan78xx: sanitize return values of register read/write functions Current release - new code bugs: - ethtool: tsinfo: fix dump command - bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL - eth: mlx5: DR, use the right action structs for STEv3 Previous releases - regressions: - netfilter: nf_tables: make destruction work queue pernet - gre: fix IPv6 link-local address generation. - wifi: iwlwifi: fix TSO preparation - bluetooth: revert "bluetooth: hci_core: fix sleeping function called from invalid context" - ovs: revert "openvswitch: switch to per-action label counting in conntrack" - eth: - ice: fix switchdev slow-path in LAG - bonding: fix incorrect MAC address setting to receive NS messages Previous releases - always broken: - core: prevent TX of unreadable skbs - sched: prevent creation of classes with TC_H_ROOT - netfilter: nft_exthdr: fix offset with ipv4_find_option() - wifi: cfg80211: cancel wiphy_work before freeing wiphy - mctp: copy headers if cloned - phy: nxp-c45-tja11xx: add errata for TJA112XA/B - eth: - bnxt: fix kernel panic in the bnxt_get_queue_stats{rx \| tx} - mlx5: bridge, fix the crash caused by LAG state check" * tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) net: mana: cleanup mana struct after debugfs_remove() net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices net/mlx5: Bridge, fix the crash caused by LAG state check net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs net/mlx5: HWS, Rightsize bwc matcher priority net/mlx5: DR, use the right action structs for STEv3 Revert "openvswitch: switch to per-action label counting in conntrack" net: openvswitch: remove misbehaving actions length check selftests: Add IPv6 link-local address generation tests for GRE devices. gre: Fix IPv6 link-local address generation. netfilter: nft_exthdr: fix offset with ipv4_find_option() selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT net_sched: Prevent creation of classes with TC_H_ROOT ipvs: prevent integer overflow in do_ip_vs_get_ctl() selftests: netfilter: skip br_netfilter queue tests if kernel is tainted netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() wifi: mac80211: fix MPDU length parsing for EHT 5/6 GHz qlcnic: fix memory leak issues in qlcnic_sriov_common.c rtase: Fix improper release of ring list entries in rtase_sw_reset ...	2025-03-13 07:58:48 -10:00
Tamir Duberstein	7a79e7daa8	printf: convert self-test to KUnit Convert the printf() self-test to a KUnit test. In the interest of keeping the patch reasonably-sized this doesn't refactor the tests into proper parameterized tests - it's all one big test case. Signed-off-by: Tamir Duberstein <tamird@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250307-printf-kunit-convert-v6-1-4d85c361c241@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>	2025-03-13 10:26:33 -07:00
Paolo Abeni	2409fa66e2	netfilter pull request 25-03-13 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmfSqyUACgkQ1w0aZmrP KyGocBAAu0+rdAcW1wwSYk53tRN663WpbbYg/AchJ9ilLH2IXdCzzbeuekRzvSmS /gS2WNANmckL314300RV5vTd9+PCbANjh1248O4S8ZQnhmomoiB3xD4QM9fu8HKZ RE+koHmnHESl+TiiUzgw31OtZEdSpjn7zUzm+qAznBeDXfyBE1YWQ9q8LALi0+3m RW/waYykwRktzHCANBYNF0oqt6siBrHnI9KHgXIfQCEK0mc3A/s8MfO1IJqC177Y It/+4a6qsaGZZsSwGCrQ/40CO3rItzF7jefpeAjv/di224EQ+p2sWZgEiLPCMtvk 6WSfYhKNiDnKGAe1u9hCG2RdaSQ8Og7BsBWFeB56gZbK7gPoFY2VCM/hATsBTYG8 bn8xQKFoA9y7dYp8AP6a33OlzV+/qOfJa2kXQ90yjS4yWnMY+q59j1D564z/WhZ9 gVZjrcUDSsv3ETtp7vk7s+0SoI3EvSLb/umcjHZR6oeLkNFWpQ4r9NqY0GX6+zDC 89YR2w/RbO9WMRb52OAqY3HdKmOflwEIN5mDWCAB4sIPUgCpoDzvIjC2xLdr6lUW ydoisH2B87luG0p7zKP7iGE1KIp8wm+zO+XciOrcjyie5YNMHyCFRbzGAsd47CLB UjNrrlAS0tU65Nb+JzsrZOYye2cghE+KC8fKHvpwjk54lsaPHl0= =OTtx -----END PGP SIGNATURE----- Merge tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Missing initialization of cpu and jiffies32 fields in conncount, from Kohei Enju. 2) Skip several tests in case kernel is tainted, otherwise tests bogusly report failure too as they also check for tainted kernel, from Florian Westphal. 3) Fix a hyphothetical integer overflow in do_ip_vs_get_ctl() leading to bogus error logs, from Dan Carpenter. 4) Fix incorrect offset in ipv4 option match in nft_exthdr, from Alexey Kashavkin. netfilter pull request 25-03-13 * tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_exthdr: fix offset with ipv4_find_option() ipvs: prevent integer overflow in do_ip_vs_get_ctl() selftests: netfilter: skip br_netfilter queue tests if kernel is tainted netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() ==================== Link: https://patch.msgid.link/20250313095636.2186-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-13 15:07:39 +01:00
Thomas Gleixner	8e63360d86	selftests/timers/posix-timers: Add a test for exact allocation mode The exact timer ID allocation mode is used by CRIU to restore timers with a given ID. Add a test case for it. It's skipped on older kernels when the prctl() fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/8734fl2tkx.ffs@tglx	2025-03-13 12:07:18 +01:00
Guillaume Nault	6f50175cca	selftests: Add IPv6 link-local address generation tests for GRE devices. GRE devices have their special code for IPv6 link-local address generation that has been the source of several regressions in the past. Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an IPv6 link-link local address in accordance with the net.ipv6.conf.<dev>.addr_gen_mode sysctl. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/2d6772af8e1da9016b2180ec3f8d9ee99f470c77.1741375285.git.gnault@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-13 10:17:42 +01:00
Sebastian Ott	edfd826b8b	KVM: arm64: selftests: Test that TGRAN*_2 fields are writable Userspace can write to these fields for non-NV guests; add test that do just that. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-12 13:40:53 -07:00
Jakub Kicinski	188107b2c4	selftests: net: bump GRO timeout for gro/setup_veth Commit `51bef03e1a` ("selftests/net: deflake GRO tests") recently switched to NAPI suspension, and lowered the timeout from 1ms to 100us. This started causing flakes in netdev-run CI. Let's bump it to 200us. In a quick test of a debug kernel I see failures with 100us, with 200us in 5 runs I see 2 completely clean runs and 3 with a single retry (GRO test will retry up to 5 times). Reviewed-by: Kevin Krakauer <krakauer@google.com> Link: https://patch.msgid.link/20250310110821.385621-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 13:20:04 -07:00
Cong Wang	bb7737de5f	selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT Integrate the reproduer from Mingi to TDC. All test results: 1..4 ok 1 0385 - Create DRR with default setting ok 2 2375 - Delete DRR with handle ok 3 3092 - Show DRR class ok 4 4009 - Reject creation of DRR class with classid TC_H_ROOT Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306232355.93864-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-12 12:52:14 -07:00
Florian Westphal	c21b02fd9c	selftests: netfilter: skip br_netfilter queue tests if kernel is tainted These scripts fail if the kernel is tainted which leads to wrong test failure reports in CI environments when an unrelated test triggers some splat. Check taint state at start of script and SKIP if its already dodgy. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2025-03-12 15:28:34 +01:00
Michael Jeanson	e6644c967d	rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes Adding the aligned(1024) attribute to the definition of __rseq_abi did not increase its size to 1024, for this attribute to impact the size of __rseq_abi it would need to be added to the declaration of 'struct rseq_abi'. We only want to increase the size of the TLS allocation to ensure registration will succeed with future extended ABI. Use a union with a dummy member to ensure we allocate 1024 bytes. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20250311192222.323453-1-mjeanson@efficios.com	2025-03-12 13:19:47 +01:00
Madhavan Srinivasan	73276cee1a	selftest/powerpc/mm/pkey: fix build-break introduced by commit `00894c3fc9` Build break was reported in the powerpc mailing list for next-20250218 with below errors make[1]: Nothing to be done for 'all'. BUILD_TARGET=/root/venkat/linux-next/tools/testing/selftests/powerpc/mm; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C mm all CC pkey_exec_prot In file included from pkey_exec_prot.c:18: /root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h: In function ‘pkeys_unsupported’: /root/venkat/linux-next/tools/testing/selftests/powerpc/include/pkeys.h:96:34: error: ‘PKEY_UNRESTRICTED’ undeclared (first use in this function) 96 \| pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED); \| ^~~~~~~~~~~~~~~~~ https://lore.kernel.org/all/20250113170619.484698-2-yury.khrustalev@arm.com/ patchset has been queued to arm64/for-next/pkey_unrestricted which is causing a build break in the selftest/powerpc builds. Commit `6d61527d93` ("mm/pkey: Add PKEY_UNRESTRICTED macro") added a macro PKEY_UNRESTRICTED to handle implicit literal value of 0x0 (which is "unrestricted"). Add the same to selftest/powerpc/pkeys.h to fix the reported build break. Fixes: `00894c3fc9` ("selftests/powerpc: Use PKEY_UNRESTRICTED macro") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/lkml/3267ea6e-5a1a-4752-96ef-8351c912d386@linux.ibm.com/T/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://lore.kernel.org/r/20250311084129.39308-1-maddy@linux.ibm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-11 15:16:10 +00:00
Hangbin Liu	9318dc2357	selftests: bonding: fix incorrect mac address The correct mac address for NS target 2001:db8::254 is 33:33:ff:00:02:54, not 33:33:00:00:02:54. The same with client maddress. Fixes: `86fb6173d1` ("selftests: bonding: add ns multicast group testing") Acked-by: Jay Vosburgh <jv@jvosburgh.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250306023923.38777-3-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-03-11 13:19:27 +01:00
Miklos Szeredi	e1c24b52ad	selftests: add tests for mount notification Provide coverage for all mnt_notify_add() instances. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250307204046.322691-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-11 12:04:02 +01:00
Ming Lei	390174c91d	selftests: ublk: improve test usability Add UBLK_TEST_QUIET, so we can print test result(PASS/SKIP/FAIL) only. Also always run from test script's current directory, then the same test script can be started from other work directory. This way helps a lot to reuse this test source code and scripts for other projects(liburing, blktests, ...) Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-12-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	af83ccc7db	selftests: ublk: add stress test for covering IO vs. killing ublk server Add stress_test_01 for running IO vs. killing ublk server, so io_uring exit & cancel code path can be covered, same with ublk's cancel code path. Especially IO buffer lifetime is one big thing for ublk zero copy, the added test can verify if this area works as expected. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-11-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	c60ac48eab	selftests: ublk: add one stress test for covering IO vs. removing device Add stress_test_01 for running IO vs. removing device for verifying that ublk device removal can work as expected when heavy IO workloads are in progress. null, loop and loop/zc are covered in this tests. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-10-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	87a9265213	selftests: ublk: load/unload ublk_drv when preparing & cleaning up tests Load ublk_drv module in _prep_test(), and unload it in _cleanup_test(), so that test can always be done in consistent state. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	c2cb669a86	selftests: ublk: move zero copy feature check into _add_ublk_dev() Move zero copy feature check into _add_ublk_dev() since we will have more tests which requires to cover zero copy. Then one check function of _check_add_dev() has to be added for dealing with cleanup since '_add_ublk_dev()' is run in sub-shell, and we can't exit from it to terminal shell. Meantime always return error code from _add_ublk_dev(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	c83b089a70	selftests: ublk: don't pass ${dev_id} to _cleanup_test() More devices can be created in single tests, so simply remove all ublk devices in _cleanup_test(), meantime remove the ${dev_id} argument of _cleanup_test(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	632051ffbd	selftests: ublk: support shellcheck and fix all warning Add shellcheck, meantime fixes all warnings. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250303124324.3563605-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:42 -06:00
Ming Lei	5b2db7a8c7	selftests: ublk: fix parsing '-a' argument The argument of '-a' doesn't follow any value, so fix it by putting it with '-z' together. Fixes: `bedc9cbc5f` ("selftests: ublk: add ublk zero copy test") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250303124324.3563605-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 16:24:18 -06:00
Taehee Yoo	75cc19c8ff	selftests: drv-net: add xdp cases for ping.py ping.py has 3 cases, test_v4, test_v6 and test_tcp. But these cases are not executed on the XDP environment. So, it adds XDP environment, existing tests(test_v4, test_v6, and test_tcp) are executed too on the below XDP environment. So, it adds XDP cases. 1. xdp-generic + single-buffer 2. xdp-generic + multi-buffer 3. xdp-native + single-buffer 4. xdp-native + multi-buffer 5. xdp-offload It also makes test_{v4 \| v6 \| tcp} sending large size packets. this may help to check whether multi-buffer is working or not. Note that the physical interface may be down and then up when xdp is attached or detached. This takes some period to activate traffic. So sleep(10) is added if the test interface is the physical interface. netdevsim and veth type interfaces skip sleep. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250309134219.91670-9-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:31:12 -07:00
Willem de Bruijn	0922cb68ed	selftests/net: expand cmsg_ip with MSG_MORE UDP send with MSG_MORE takes a slightly different path than the lockless fast path. For completeness, add coverage to this case too. Pass MSG_MORE on the initial sendmsg, then follow up with a zero byte write to unplug the cork. Unrelated: also add two missing endlines in usage(). Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250307033620.411611-4-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:13:04 -07:00
Greg Kroah-Hartman	993a47bd7b	Linux 6.14-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+ t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc eWLFUeg= =Wypt -----END PGP SIGNATURE----- Merge 6.14-rc6 into driver-core-next We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-10 17:37:25 +01:00
Ming Lei	2ecdcdfee5	selftests: ublk: add --foreground command line Add --foreground command for helping to debug. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250303124324.3563605-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 09:17:13 -06:00
Ming Lei	9d80f48c5e	selftests: ublk: fix build failure Fixes the following build failure: ublk//file_backed.c: In function ‘backing_file_tgt_init’: ublk//file_backed.c:28:42: error: ‘O_DIRECT’ undeclared (first use in this function); did you mean ‘O_DIRECTORY’? 28 \| fd = open(file, O_RDWR \| O_DIRECT); \| ^~~~~~~~ \| O_DIRECTORY when trying to reuse this same utility for liburing test. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250303124324.3563605-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 09:17:13 -06:00
Ming Lei	9894e0eaae	selftests: ublk: make ublk_stop_io_daemon() more reliable Improve ublk_stop_io_daemon() in the following ways: - don't wait if ->ublksrv_pid becomes -1, which means that the disk has been stopped - don't wait if ublk char device doesn't exist any more, so we can avoid to rely on inoitfy for wait until the char device is closed And this way may reduce time of delete command a lot. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250303124324.3563605-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-03-10 09:17:13 -06:00
Linus Torvalds	a382b06d29	KVM/arm64 fixes for 6.14, take #4 * Fix a couple of bugs affecting pKVM's PSCI relay implementation when running in the hVHE mode, resulting in the host being entered with the MMU in an unknown state, and EL2 being in the wrong mode. x86: * Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow. * Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs. * Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent. * Always save DR masks for SNP vCPUs if DebugSwap is supported, as the guest can enable DebugSwap without KVM's knowledge. * Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault. * Fix a printf() goof in the SEV smoke test that causes build failures with -Werror. * Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmfNSeUUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNKngf/cLgQAT9AF4nFqcwh5b5uucKHVJ8W uTiGlWqLAf2UN53L63eZ/7vKQWGQYkOTFvormR14Jam6IYtytsZw1xLBH4fGtUyB qVjk0EPzaKGqn3LrgyneQNCXdyxJv7EBVBgoOKH0pvOksoW2E5ZizhhtRFtL7nCE Yk8FQKpP0mIBk04RMsvzJVEFKIb4OZgJadWo0gryg1oF2aAv7mxQjyqUWsBDsb3q 99c0ElSBfV39FeT8xeok4k7S5jbBWii2KiaH72ZsNiBu0rYmEuLwIoygCNNWL9Wu FPdQ+r//YrzfCJSXwGPfdUaRaF4p2642S6oiXQuusNNUmhK6/MRo3mZo8A== =XQHm -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "arm64: - Fix a couple of bugs affecting pKVM's PSCI relay implementation when running in the hVHE mode, resulting in the host being entered with the MMU in an unknown state, and EL2 being in the wrong mode x86: - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent - Always save DR masks for SNP vCPUs if DebugSwap is supported, as the guest can enable DebugSwap without KVM's knowledge - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault - Fix a printf() goof in the SEV smoke test that causes build failures with -Werror - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM KVM: selftests: Fix printf() format goof in SEV smoke test KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3 KVM: SVM: Save host DR masks on CPUs with DebugSwap KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu() KVM: arm64: Initialize HCR_EL2.E2H early KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled KVM: x86: Snapshot the host's DEBUGCTL in common x86 KVM: SVM: Suppress DEBUGCTL.BTF on AMD KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value KVM: selftests: Assert that STI blocking isn't set after event injection KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow	2025-03-09 09:04:08 -10:00
Paolo Bonzini	ea9bd29a9c	KVM x86 fixes for 6.14-rcN #2 - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow. - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs. - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent. - Always save DR masks for SNP vCPUs if DebugSwap is supported, as the guest can enable DebugSwap without KVM's knowledge. - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault. - Fix a printf() goof in the SEV smoke test that causes build failures with -Werror. - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmfLlhUACgkQOlYIJqCj N/0x7w/+MhqJdHbshL7Gzw+rcXwCROiCkqsxFP+YoTXte8uaHS5CEfcMYjE8SuGp KBpgLo4Lj1dVTXiCjemlY5sn6CDiuSs74X8A88ksuu5hVsFByJUgyWU9iw8J/crZ B2vj8huhqa8OCEPe5JujWfnfyAkKE5tUA4GFi73vhHMcftTNj+ftxT33/Pfg7y7M xOvWFWS6ZshKrouRKzI7ZFEYLwp0lr4U3dzO5rCRAd5J4MSBWRx6Dx2um5dyEYKJ xgwl4ylM4S/+78u1+0nQnToM0UWHJ3e7x8nze6UXYTZIrBr/lSeKlbhOPnEWJcJB Eemnur9ORI2BRPUReqBKluCZsSK+E5B/HPCVt5cxtuRIuUOD+kW17LPgnPyE4Sso eVt+XAvQc7EjrpWDSHr3ZQZZM89l9zHhuSAQ0npO6y71s0FzEVZQoDamNmOLAPjH Qg+qhBV2l6pyfqhqiLzADasYLOl57cJsfiMjM331ALLqAn57jzd+B8c4hdB2Xg4s KPuy8w8uBaY9zpd9YDBLLr7JJVs35KexNZMjT2vqBYXcScyLgmAuSQXy3hub6Mzn gI5ZXIKG8eO9v2jejfClI6/OEdtEwgSGEVwuBKB16pMrIxqpguMTMTWLVRn5G+oo qA8anmKaac62GaB66JE/Wjy069OPIGYnHSU2nal0Tej6kG0xv6E= =as6u -----END PGP SIGNATURE----- Merge tag 'kvm-x86-fixes-6.14-rcN.2' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.14-rcN #2 - Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow. - Ensure DEBUGCTL is context switched on AMD to avoid running the guest with the host's value, which can lead to unexpected bus lock #DBs. - Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly emulate BTF. KVM's lack of context switching has meant BTF has always been broken to some extent. - Always save DR masks for SNP vCPUs if DebugSwap is supported, as the guest can enable DebugSwap without KVM's knowledge. - Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO memory" phase without actually generating a write-protection fault. - Fix a printf() goof in the SEV smoke test that causes build failures with -Werror. - Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2 isn't supported by KVM.	2025-03-09 03:44:06 -04:00
Linus Torvalds	1110ce6a1e	33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 26 are for MM and 7 are for non-MM. - "mm: memory_failure: unmap poisoned folio during migrate properly" from Ma Wupeng fixes a couple of two year old bugs involving the migration of hwpoisoned folios. - "selftests/damon: three fixes for false results" from SeongJae Park fixes three one year old bugs in the SAMON selftest code. The remainder are singletons and doubletons. Please see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ8zgnAAKCRDdBJ7gKXxA jmTzAP9LsTIpkPkRXDpwxPR/Si5KPwOkE6sGj4ETEqbX3vUvcAEA/Lp7oafc7Vqr XxlC1VFush1ZK29Tecxzvnapl2/VSAs= =zBvY -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 26 are for MM and 7 are for non-MM. - "mm: memory_failure: unmap poisoned folio during migrate properly" from Ma Wupeng fixes a couple of two year old bugs involving the migration of hwpoisoned folios. - "selftests/damon: three fixes for false results" from SeongJae Park fixes three one year old bugs in the SAMON selftest code. The remainder are singletons and doubletons. Please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits) mm/page_alloc: fix uninitialized variable rapidio: add check for rio_add_net() in rio_scan_alloc_net() rapidio: fix an API misues when rio_add_net() fails MAINTAINERS: .mailmap: update Sumit Garg's email address Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" mm: fix finish_fault() handling for large folios mm: don't skip arch_sync_kernel_mappings() in error paths mm: shmem: remove unnecessary warning in shmem_writepage() userfaultfd: fix PTE unmapping stack-allocated PTE copies userfaultfd: do not block on locking a large folio with raised refcount mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages mm: shmem: fix potential data corruption during shmem swapin mm: fix kernel BUG when userfaultfd_move encounters swapcache selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms selftests/damon/damos_quota: make real expectation of quota exceeds include/linux/log2.h: mark is_power_of_2() with __always_inline NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback mm, swap: avoid BUG_ON in relocate_cluster() mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() ...	2025-03-08 14:34:06 -10:00
Kunihiko Hayashi	a28d2f2398	selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test Add GET_IRQTYPE API checks to each interrupt test. While at it, change pci_ep_ioctl() to get the appropriate return value from ioctl(). Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Link: https://lore.kernel.org/r/20250225110252.28866-2-hayashi.kunihiko@socionext.com [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>	2025-03-08 14:36:00 +00:00
Niklas Cassel	af1451b673	selftests: pci_endpoint: Skip disabled BARs Currently BARs that have been disabled by the endpoint controller driver will result in a test FAIL. Returning FAIL for a BAR that is disabled seems overly pessimistic. There are EPC that disables one or more BARs intentionally. One reason for this is that there are certain EPCs that are hardwired to expose internal PCIe controller registers over a certain BAR, so the EPC driver disables such a BAR, such that the host will not overwrite random registers during testing. Such a BAR will be disabled by the EPC driver's init function, and the BAR will be marked as BAR_RESERVED, such that it will be unavailable to endpoint function drivers. Let's return FAIL only for BARs that are actually enabled and failed the test, and let's return skip for BARs that are not even enabled. Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250123120147.3603409-4-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>	2025-03-08 14:35:57 +00:00
Jakub Kicinski	93b1e05517	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCZ8qHFwAKCRAraS+Z+3y5 E6QRAQCI/irYxpT6nBgT3ejyO6CIHjbHjdNP5KRkE8JT61zmJgD8Diet4dwwOxLK c1Pq5Jnl9KLpEPG99TcjtH3c++N3fww= =ltml -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-03-06 We've added 6 non-merge commits during the last 13 day(s) which contain a total of 6 files changed, 230 insertions(+), 56 deletions(-). The main changes are: 1) Add XDP metadata support for tun driver, from Marcus Wichelmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Fix file descriptor assertion in open_tuntap helper selftests/bpf: Add test for XDP metadata support in tun driver selftests/bpf: Refactor xdp_context_functional test and bpf program selftests/bpf: Move open_tuntap to network helpers net: tun: Enable transfer of XDP metadata to skb net: tun: Enable XDP metadata support ==================== Link: https://patch.msgid.link/20250307055335.441298-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:08:49 -08:00
Willem de Bruijn	7ae495a537	selftests/net: add proc_net_pktgen to .gitignore Ensure git doesn't pick up this new target. Fixes: `03544faad7` ("selftest: net: add proc_net_pktgen") Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250307031356.368350-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-07 19:08:14 -08:00
Linus Torvalds	2a520073e7	s390 updates for 6.14-rc6 - Fix return address recovery of traced function in ftrace to ensure reliable stack unwinding - Fix compiler warnings and runtime crashes of vDSO selftests on s390 by introducing a dedicated GNU hash bucket pointer with correct 32-bit entry size - Fix test_monitor_call() inline asm, which misses CC clobber, by switching to an instruction that doesn't modify CC -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfLedEACgkQjYWKoQLX FBhXMQf/bIaJEOV/Ww5PWY+AqtHPpQjhP2cvOfSRoh2py38Gp1sy/6ddpn8lcl3R skyeW4XHLpARYGg6NNOAOgFr43c7wQYrr+7uSW0zska+hOlSOi5TtycEOgu7o+VE Nig5XRuBPc01CBokFc+gbvMTjp2fTVlzTQ2/F/B7py62j2c21Y2/n7yZ4hhfQolM oiWlRDXzCRHM/sfa3FlML6sSgfIcHhhMSZLpdmRDhir9vBm1VpeGHa+FM5V8Pzsk KsiuoH7+ylbw+swVn2V/p3fqfS7JoopYdng8h40eLRkeHaTi052+NmMwv8gYAnCk BxfC1re5KlZ6kgWh0CDLUoYQHdwHxw== =KCjH -----END PGP SIGNATURE----- Merge tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix return address recovery of traced function in ftrace to ensure reliable stack unwinding - Fix compiler warnings and runtime crashes of vDSO selftests on s390 by introducing a dedicated GNU hash bucket pointer with correct 32-bit entry size - Fix test_monitor_call() inline asm, which misses CC clobber, by switching to an instruction that doesn't modify CC * tag 's390-6.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/ftrace: Fix return address recovery of traced function selftests/vDSO: Fix GNU hash table entry size for s390x s390/traps: Fix test_monitor_call() inline assembly	2025-03-07 16:21:02 -10:00
Thomas Weißschuh	a782d45c86	selftests/nolibc: stop testing constructor order The execution order of constructors in undefined and depends on the toolchain. While recent toolchains seems to have a stable order, it doesn't work for older ones and may also change at any time. Stop validating the order and instead only validate that all constructors are executed. Reported-by: Willy Tarreau <w@1wt.eu> Closes: https://lore.kernel.org/lkml/20250301110735.GA18621@1wt.eu/ Link: https://lore.kernel.org/r/20250306-nolibc-constructor-order-v1-1-68fd161cc5ec@weissschuh.net Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>	2025-03-07 07:34:12 +01:00
Jakub Kicinski	1cc3462159	selftests: openvswitch: don't hardcode the drop reason subsys WiFi removed one of their subsys entries from drop reasons, in commit `286e696770` ("wifi: mac80211: Drop cooked monitor support") SKB_DROP_REASON_SUBSYS_OPENVSWITCH is now 2 not 3. The drop reasons are not uAPI, read the correct value from debug info. We need to enable vmlinux BTF, otherwise pahole needs a few GB of memory to decode the enum name. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250304180615.945945-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 16:45:43 -08:00
Jakub Kicinski	56a586961b	selftests: net: bpf_offload: add 'libbpf_global' to ignored maps After installing pahole on the CI image we have a new map created by libbpf. Ignore it otherwise we see: Exception: Time out waiting for map counts to stabilize want 2, have 3 Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250304233204.1139251-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 15:41:37 -08:00
Jakub Kicinski	f9d2f5ddd4	selftests: net: fix error message in bpf_offload We hit a following exception on timeout, nmaps is never set: Test bpftool bound info reporting (own ns)... Traceback (most recent call last): File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module> check_dev_info(False, "") File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info maps = bpftool_map_list_wait(expected=2, ns=ns) File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps)) NameError: name 'nmaps' is not defined Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250304233204.1139251-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 15:41:37 -08:00
Louis Taylor	6e406202a4	selftests/nolibc: use O_RDONLY flag instead of 0 This doesn't matter much, but is what the standard says. Signed-off-by: Louis Taylor <louis@kragniz.eu> Link: https://lore.kernel.org/r/20250306184147.208723-5-louis@kragniz.eu Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>	2025-03-06 22:30:21 +01:00
Louis Taylor	b2edaad7f5	tools/nolibc: add support for openat(2) openat is useful to avoid needing to construct relative paths, so expose a wrapper for using it directly. Signed-off-by: Louis Taylor <louis@kragniz.eu> Link: https://lore.kernel.org/r/20250306184147.208723-1-louis@kragniz.eu Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>	2025-03-06 22:30:20 +01:00
Ingo Molnar	82354fce16	Merge branch 'sched/urgent' into sched/core, to pick up dependent commits Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-03-06 22:26:36 +01:00
Jakub Kicinski	2525e16a2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: net/ethtool/cabletest.c `2bcf4772e4` ("net: ethtool: try to protect all callback with netdev instance lock") `637399bf7e` ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device") No Adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-06 13:03:35 -08:00
Marcus Wichelmann	49306d5bfc	selftests/bpf: Fix file descriptor assertion in open_tuntap helper The open_tuntap helper function uses open() to get a file descriptor for /dev/net/tun. The open(2) manpage writes this about its return value: On success, open(), openat(), and creat() return the new file descriptor (a nonnegative integer). On error, -1 is returned and errno is set to indicate the error. This means that the fd > 0 assertion in the open_tuntap helper is incorrect and should rather check for fd >= 0. When running the BPF selftests locally, this incorrect assertion was not an issue, but the BPF kernel-patches CI failed because of this: open_tuntap:FAIL:open(/dev/net/tun) unexpected open(/dev/net/tun): actual 0 <= expected 0 Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250305213438.3863922-7-marcus.wichelmann@hetzner-cloud.de	2025-03-06 12:31:08 -08:00
Marcus Wichelmann	73eeecc3cd	selftests/bpf: Add test for XDP metadata support in tun driver Add a selftest that creates a tap device, attaches XDP and TC programs, writes a packet with a test payload into the tap device and checks the test result. This test ensures that the XDP metadata support in the tun driver is enabled and that the metadata size is correctly passed to the skb. See the previous commit ("selftests/bpf: refactor xdp_context_functional test and bpf program") for details about the test design. The test runs in its own network namespace. This provides some extra safety against conflicting interface names. Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250305213438.3863922-6-marcus.wichelmann@hetzner-cloud.de	2025-03-06 12:31:08 -08:00
Marcus Wichelmann	b46aa22b66	selftests/bpf: Refactor xdp_context_functional test and bpf program The existing XDP metadata test works by creating a veth pair and attaching XDP & TC programs that drop the packet when the condition of the test isn't fulfilled. The test then pings through the veth pair and succeeds when the ping comes through. While this test works great for a veth pair, it is hard to replicate for tap devices to test the XDP metadata support of them. A similar test for the tun driver would either involve logic to reply to the ping request, or would have to capture the packet to check if it was dropped or not. To make the testing of other drivers easier while still maximizing code reuse, this commit refactors the existing xdp_context_functional test to use a test_result map. Instead of conditionally passing or dropping the packet, the TC program is changed to copy the received metadata into the value of that single-entry array map. Tests can then verify that the map value matches the expectation. This testing logic is easy to adapt to other network drivers as the only remaining requirement is that there is some way to send a custom Ethernet packet through it that triggers the XDP & TC programs. The Ethernet header of that custom packet is all-zero, because it is not required to be valid for the test to work. The zero ethertype also helps to filter out packets that are not related to the test and would otherwise interfere with it. The payload of the Ethernet packet is used as the test data that is expected to be passed as metadata from the XDP to the TC program and written to the map. It has a fixed size of 32 bytes which is a reasonable size that should be supported by both drivers. Additional packet headers are not necessary for the test and were therefore skipped to keep the testing code short. This new testing methodology no longer requires the veth interfaces to have IP addresses assigned, therefore these were removed. Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250305213438.3863922-5-marcus.wichelmann@hetzner-cloud.de	2025-03-06 12:31:08 -08:00
Marcus Wichelmann	d5ca409c86	selftests/bpf: Move open_tuntap to network helpers To test the XDP metadata functionality of the tun driver, it's necessary to create a new tap device first. A helper function for this already exists in lwt_helpers.h. Move it to the common network helpers header, so it can be reused in other tests. Signed-off-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250305213438.3863922-4-marcus.wichelmann@hetzner-cloud.de	2025-03-06 12:31:08 -08:00
SeongJae Park	582ccf78f6	selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries damon_nr_regions.py starts DAMON, periodically collect number of regions in snapshots, and see if it is in the requested range. The check code assumes the numbers are sorted on the collection list, but there is no such guarantee. Hence this can result in false positive test success. Sort the list before doing the check. Link: https://lkml.kernel.org/r/20250225222333.505646-4-sj@kernel.org Fixes: `781497347d` ("selftests/damon: implement test for min/max_nr_regions") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-05 21:36:16 -08:00
SeongJae Park	695469c07a	selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms damon_nr_regions.py updates max_nr_regions to a number smaller than expected number of real regions and confirms DAMON respect the harsh limit. To give time for DAMON to make changes for the regions, 3 aggregation intervals (300 milliseconds) are given. The internal mechanism works with not only the max_nr_regions, but also sz_limit, though. It avoids merging region if that casn make region of size larger than sz_limit. In the test, sz_limit is set too small to achive the new max_nr_regions, unless it is updated for the new min_nr_regions. But the update is done only once per operations set update interval, which is one second by default. Hence, the test randomly incurs false positive failures. Fix it by setting the ops interval same to aggregation interval, to make sure sz_limit is updated by the time of the check. Link: https://lkml.kernel.org/r/20250225222333.505646-3-sj@kernel.org Fixes: `8bf890c816` ("selftests/damon/damon_nr_regions: test online-tuned max_nr_regions") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-05 21:36:16 -08:00
SeongJae Park	1c684d77df	selftests/damon/damos_quota: make real expectation of quota exceeds Patch series "selftests/damon: three fixes for false results". Fix three DAMON selftest bugs that cause two and one false positive failures and successes. This patch (of 3): damos_quota.py assumes the quota will always exceeded. But whether quota will be exceeded or not depend on the monitoring results. Actually the monitored workload has chaning access pattern and hence sometimes the quota may not really be exceeded. As a result, false positive test failures happen. Expect how much time the quota will be exceeded by checking the monitoring results, and use it instead of the naive assumption. Link: https://lkml.kernel.org/r/20250225222333.505646-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250225222333.505646-2-sj@kernel.org Fixes: `51f58c9da1` ("selftests/damon: add a test for DAMOS quota") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-05 21:36:16 -08:00
SeongJae Park	349db086a6	selftests/damon/damos_quota_goal: handle minimum quota that cannot be further reduced damos_quota_goal.py selftest see if DAMOS quota goals tuning feature increases or reduces the effective size quota for given score as expected. The tuning feature sets the minimum quota size as one byte, so if the effective size quota is already one, we cannot expect it further be reduced. However the test is not aware of the edge case, and fails since it shown no expected change of the effective quota. Handle the case by updating the failure logic for no change to see if it was the case, and simply skips to next test input. Link: https://lkml.kernel.org/r/20250217182304.45215-1-sj@kernel.org Fixes: `f1c07c0a16` ("selftests/damon: add a test for DAMOS quota goal") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202502171423.b28a918d-lkp@intel.com Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> [6.10.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-05 21:36:12 -08:00
John Hubbard	0a7565ee6e	Revert "selftests/mm: remove local __NR_* definitions" This reverts commit `a5c6bc5900`. The general approach described in commit `e076eaca59` ("selftests: break the dependency upon local header files") was taken one step too far here: it should not have been extended to include the syscall numbers. This is because doing so would require per-arch support in tools/include/uapi, and no such support exists. This revert fixes two separate reports of test failures, from Dave Hansen[1], and Li Wang[2]. An excerpt of Dave's report: Before this commit (`a5c6bc5900`) things are fine. But after, I get: running PKEY tests for unsupported CPU/OS An excerpt of Li's report: I just found that mlock2_() return a wrong value in mlock2-test [1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com [2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA@mail.gmail.com Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com Fixes: `a5c6bc5900` ("selftests/mm: remove local __NR_* definitions") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Li Wang <liwang@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Andrei Vagin <avagin@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-03-05 21:36:12 -08:00
Atish Patra	ee4e778c58	KVM: riscv: selftests: Allow number of interrupts to be configurable It is helpful to vary the number of the LCOFI interrupts generated by the overflow test. Allow additional argument for overflow test to accommodate that. It can be easily cross-validated with /proc/interrupts output in the host. Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-4-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>	2025-03-06 09:57:07 +05:30
Atish Patra	4b506adfea	KVM: riscv: selftests: Change command line option The PMU test commandline option takes an argument to disable a certain test. The initial assumption behind this was a common use case is just to run all the test most of the time. However, running a single test seems more useful instead. Especially, the overflow test has been helpful to validate PMU virtualizaiton interrupt changes. Switching the command line option to run a single test instead of disabling a single test also allows to provide additional test specific arguments to the test. The default without any options remains unchanged which continues to run all the tests. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-3-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>	2025-03-06 09:57:07 +05:30
Atish Patra	1f6bbe1255	KVM: riscv: selftests: Do not start the counter in the overflow handler There is no need to start the counter in the overflow handler as we intend to trigger precise number of LCOFI interrupts through these tests. The overflow irq handler has already stopped the counter. As a result, the stop call from the test function may return already stopped error which is fine as well. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-2-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>	2025-03-06 09:57:07 +05:30
Catalin Marinas	306219d59b	kselftest/arm64: mte: Skip the hugetlb tests if MTE not supported on such mappings While the kselftest was added at the same time with the kernel support for MTE on hugetlb mappings, the tests may be run on older kernels. Skip the tests if PROT_MTE is not supported on MAP_HUGETLB mappings. Fixes: `27879e8cb6` ("selftests: arm64: add hugetlb mte tests") Cc: Yang Shi <yang@os.amperecomputing.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250221093331.2184245-3-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-05 18:54:02 +00:00
Catalin Marinas	7ae95109c6	kselftest/arm64: mte: Use the correct naming for tag check modes in check_hugetlb_options.c The architecture doesn't define precise/imprecise MTE tag check modes, only synchronous and asynchronous. Use the correct naming and also ensure they match the MTE_{ASYNC,SYNC}_ERR type. Fixes: `27879e8cb6` ("selftests: arm64: add hugetlb mte tests") Cc: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250221093331.2184245-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-03-05 18:54:02 +00:00
Wojtek Wasko	76868642e4	testptp: Add option to open PHC in readonly mode PTP Hardware Clocks no longer require WRITE permission to perform readonly operations, such as listing device capabilities or listening to EXTTS events once they have been enabled by a process with WRITE permissions. Add '-r' option to testptp to open the PHC in readonly mode instead of the default read-write mode. Skip enabling EXTTS if readonly mode is requested. Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Wojtek Wasko <wwasko@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-03-05 12:43:54 +00:00
Christian Brauner	56f235da15	selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-16-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 13:26:21 +01:00
Christian Brauner	1c4f2dbe42	selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-15-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 13:26:21 +01:00
Christian Brauner	2adf6ca638	selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-14-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 13:26:21 +01:00
Christian Brauner	2e94e4c649	selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-13-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 13:26:21 +01:00
Christian Brauner	a79975f05e	selftests/pidfd: add third PIDFD_INFO_EXIT selftest Add a selftest for PIDFD_INFO_EXIT behavior. Link: https://lore.kernel.org/r/20250305-work-pidfs-kill_on_last_close-v3-12-c8c3d8361705@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-05 13:26:21 +01:00

... 3 4 5 6 7 ...

18741 Commits