linux

Commit Graph

Author	SHA1	Message	Date
Paolo Bonzini	4361f5aa8b	KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit `3ccbf6f470` ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjyszoACgkQOlYIJqCj N/03+g/9FPoIPxGL9+tJyGahdH2Mygiip8Q3tUTlGVkskp+dplf+6T51ogdqBOkS tvlGjccxAOVW73ijn8ox7UtGSY9B1IJ+rj/uhsBOMFQptyUJEv4iEujFKB/t5RIF gOTVVR6Z/mcrYJY7F21qBPCHPbz++rEXrfgyAsosz6tpS/nL6vrNdp1LZlcsLM/k 5DuhkQHLEwJoUXO5VUsBWRa3gRdOuZ+SJGd4C1mfDwXagxe8uFYRO/vraOV9dKJx l9RxKPIhf42cfu9tIeJYDJyXIDgzXUEND4smVw/ito1SeakBlC9rQ15ya8ETLAEX tHzmj38RPOfjsycGKIRhlLzQx77b+t7FhNdse19QC3p3u9Jn8FuMyFcGZuiaP5kK e9Xrp/zcaCNfHB6gGKEud7h7fpV9yB8SNlCsP81if73buq3qHx0+jeVg3jCS4mkb zGY7CG+oKJOdTSN8JAh8nJMM4bUv5m8myr6yYGU+SsGBQsyPyqQdRWuKdQyOQIVC RZSWXSjKfmT5FwSs0KRI0yUjMeUYbgwrsysFuS3qX62mZpr0vLaPoiYFvuffe9gB W3Tt98QNPtBmJdINNncgKvbn3sp/CzUHirygaI0APZwh6QQAkL1Dp2beA1i95uVI 8uin4zUjRbRjUpMUaEtAQIaVMTVQqgfiPAvtcDCRBBeOGaNr81M= =lBtj -----END PGP SIGNATURE----- Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit `3ccbf6f470` ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory.	2025-10-18 10:25:43 +02:00
Marc Zyngier	5c7cf1e44e	KVM: arm64: selftests: Fix misleading comment about virtual timer encoding The userspace-visible encoding for CNTV_CVAL_EL0 and CNTVCNT_EL0 have been swapped for as long as usersapce has had access to the registers. This is documented in arch/arm64/include/uapi/asm/kvm.h. Despite that, the get_reg_list test has unhelpful comments indicating the wrong register for the encoding. Replace this with definitions exposed in the include file, and a comment explaining again the brokenness. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:43:12 +01:00
Marc Zyngier	4da5a9af78	KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list Add yet another configuration, this time dealing E2H=0. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	6418330c84	KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit The hyp virtual timer registers only exist when VHE is present, Similarly, VNCR_EL2 only exists when NV2 is present. Make these dependencies explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Oliver Upton	d5e6310a0d	KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress vgic_lpi_stress rather hilariously leaves IRQs disabled for the duration of the test. While the ITS translation of MSIs happens regardless of this, for completeness the guest should actually handle the LPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:28:27 +01:00
Zenghui Yu	2192d348c0	KVM: arm64: selftests: Allocate vcpus with correct size vcpus array contains pointers to struct kvm_vcpu {}. It is way overkill to allocate the array with (nr_cpus * sizeof(struct kvm_vcpu)). Fix the allocation by using the correct size. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:27:55 +01:00
Zenghui Yu	9a7f87eb58	KVM: arm64: selftests: Sync ID_AA64PFR1, MPIDR, CLIDR in guest We forgot to sync several registers (ID_AA64PFR1, MPIDR, CLIDR) in guest to make sure that the guest had seen the written value. Add them to the list. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Reviewed-By: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	a133052666	KVM: selftests: Fix irqfd_test for non-x86 architectures The KVM_IRQFD ioctl fails if no irqchip is present in-kernel, which isn't too surprising as there's not much KVM can do for an IRQ if it cannot resolve a destination. As written the irqfd_test assumes that a 'default' VM created in selftests has an in-kernel irqchip created implicitly. That may be the case on x86 but it isn't necessarily true on other architectures. Add an arch predicate indicating if 'default' VMs get an irqchip and make the irqfd_test depend on it. Work around arm64 VGIC initialization requirements by using vm_create_with_one_vcpu(), ignoring the created vCPU as it isn't used for the test. Reported-by: Sebastian Ott <sebott@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Sean Christopherson <seanjc@google.com> Fixes: `7e9b231c40` ("KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Sean Christopherson	cb49b7b862	KVM: arm64: selftests: Track width of timer counter as "int", not "uint64_t" Store the width of arm64's timer counter as an "int", not a "uint64_t". ilog2() returns an "int", and more importantly using what is an "unsigned long" under the hood makes clang unhappy due to a type mismatch when clamping the width to a sane value. arm64/arch_timer_edge_cases.c:1032:10: error: comparison of distinct pointer types ('typeof (width) ' (aka 'unsigned long ') and 'typeof (56) ' (aka 'int ')) [-Werror,-Wcompare-distinct-pointer-types] 1032 \| width = clamp(width, 56, 64); \| ^~~~~~~~~~~~~~~~~~~~ tools/include/linux/kernel.h:47:45: note: expanded from macro 'clamp' 47 \| #define clamp(val, lo, hi) min((typeof(val))max(val, lo), hi) \| ^~~~~~~~~~~~ tools/include/linux/kernel.h:33:17: note: expanded from macro 'max' 33 \| (void) (&_max1 == &_max2); \ \| ~~~~~~ ^ ~~~~~~ tools/include/linux/kernel.h:39:9: note: expanded from macro 'min' 39 \| typeof(x) _min1 = (x); \ \| ^ Fixes: `fad4cf9448` ("KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases") Cc: Sebastian Ott <sebott@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	890c608b4d	KVM: arm64: selftests: Test effective value of HCR_EL2.AMO A defect against the architecture now allows an implementation to treat AMO as 1 when HCR_EL2.{E2H, TGE} = {1, 0}. KVM now takes advantage of this interpretation to address a quality of emulation issue w.r.t. SError injection. Add a corresponding test case and expect a pending SError to be taken. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Sean Christopherson	505f5224b1	KVM: selftests: Verify that reads to inaccessible guest_memfd VMAs SIGBUS Expand the guest_memfd negative testcases for overflow and MAP_PRIVATE to verify that reads to inaccessible memory also get a SIGBUS. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:30 -07:00
Sean Christopherson	19942d4fd9	KVM: selftests: Verify that faulting in private guest_memfd memory fails Add a guest_memfd testcase to verify that faulting in private memory gets a SIGBUS. For now, test only the case where memory is private by default since KVM doesn't yet support in-place conversion. Deliberately run the CoW test with and without INIT_SHARED set as KVM should disallow MAP_PRIVATE regardless of whether the memory itself is private from a CoCo perspective. Cc: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:30 -07:00
Sean Christopherson	f91187c0ec	KVM: selftests: Add wrapper macro to handle and assert on expected SIGBUS Extract the guest_memfd test's SIGBUS handling functionality into a common TEST_EXPECT_SIGBUS() macro in anticipation of adding more SIGBUS testcases. Eating a SIGBUS isn't terrible difficult, but it requires a non-trivial amount of boilerplate code, and using a macro allows selftests to print out the exact action that failed to generate a SIGBUS without the developer needing to remember to add a useful error message. Explicitly mark the SIGBUS handler as "used", as gcc-14 at least likes to discard the function before linking. Opportunistically use TEST_FAIL(...) instead of TEST_ASSERT(false, ...), and fix the write path of the guest_memfd test to use the local "val" instead of hardcoding the literal value a second time. Suggested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:29 -07:00
Sean Christopherson	505c953009	KVM: selftests: Isolate the guest_memfd Copy-on-Write negative testcase Move the guest_memfd Copy-on-Write (CoW) testcase to its own function to better separate positive testcases from negative testcases. No functional change intended. Suggested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:28 -07:00
Sean Christopherson	61cee97f40	KVM: selftests: Add wrappers for mmap() and munmap() to assert success Add and use wrappers for mmap() and munmap() that assert success to reduce a significant amount of boilerplate code, to ensure all tests assert on failure, and to provide consistent error messages on failure. No functional change intended. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:28 -07:00
Ackerley Tng	df0d9923f7	KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAP If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will run all test cases with GUEST_MEMFD_FLAG_MMAP set. This leaves the code path for creating a non-mmap()-able guest_memfd on a VM that supports mappable guest memfds untested. Refactor the test to run the main test suite with a given set of flags. Then, for VM types that support the mappable capability, invoke the test suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP set. This ensures both creation paths are properly exercised on capable VMs. Run test_guest_memfd_flags() only once per VM type since it depends only on the set of valid/supported flags, i.e. iterating over an arbitrary set of flags is both unnecessary and wrong. Signed-off-by: Ackerley Tng <ackerleytng@google.com> [sean: use double-underscores for the inner helper] Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:27 -07:00
Sean Christopherson	21d602ed61	KVM: selftests: Create a new guest_memfd for each testcase Refactor the guest_memfd selftest to improve test isolation by creating a a new guest_memfd for each testcase. Currently, the test reuses a single guest_memfd instance for all testcases, and thus creates dependencies between tests, e.g. not truncating folios from the guest_memfd instance at the end of a test could lead to unexpected results (see the PUNCH_HOLE purging that needs to done by in-flight the NUMA testcases[1]). Invoke each test via a macro wrapper to create and close a guest_memfd to cut down on the boilerplate copy+paste needed to create a test. Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com Reported-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:26 -07:00
Sean Christopherson	3a6c08538c	KVM: selftests: Stash the host page size in a global in the guest_memfd test Use a global variable to track the host page size in the guest_memfd test so that the information doesn't need to be constantly passed around. The state is purely a reflection of the underlying system, i.e. can't be set by the test and is constant for a given invocation of the test, and thus explicitly passing the host page size to individual testcases adds no value, e.g. doesn't allow testing different combinations. Making page_size a global will simplify an upcoming change to create a new guest_memfd instance per testcase. No functional change intended. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:26 -07:00
Sean Christopherson	fe2bf6234e	KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set Add a guest_memfd flag to allow userspace to state that the underlying memory should be configured to be initialized as shared, and reject user page faults if the guest_memfd instance's memory isn't shared. Because KVM doesn't yet support in-place private<=>shared conversions, all guest_memfd memory effectively follows the initial state. Alternatively, KVM could deduce the initial state based on MMAP, which for all intents and purposes is what KVM currently does. However, implicitly deriving the default state based on MMAP will result in a messy ABI when support for in-place conversions is added. For x86 CoCo VMs, which don't yet support MMAP, memory is currently private by default (otherwise the memory would be unusable). If MMAP implies memory is shared by default, then the default state for CoCo VMs will vary based on MMAP, and from userspace's perspective, will change when in-place conversion support is added. I.e. to maintain guest<=>host ABI, userspace would need to immediately convert all memory from shared=>private, which is both ugly and inefficient. The inefficiency could be avoided by adding a flag to state that memory is _private_ by default, irrespective of MMAP, but that would lead to an equally messy and hard to document ABI. Bite the bullet and immediately add a flag to control the default state so that the effective behavior is explicit and straightforward. Fixes: `3d3a04fad2` ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:23 -07:00
Sean Christopherson	d2042d8f96	KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGS Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't require a new capability, and so that developers aren't tempted to bundle multiple flags into a single capability. Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit value, but that limitation can be easily circumvented by adding e.g. KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more than 32 flags. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-10 14:25:22 -07:00
Yan Zhao	1bcc3f8791	KVM: selftests: Test prefault memory during concurrent memslot removal Expand the prefault memory selftest to add a regression test for a KVM bug where KVM's retry logic would result in (breakable) deadlock due to the memslot deletion waiting on prefaulting to release SRCU, and prefaulting waiting on the memslot to fully disappear (KVM uses a two-step process to delete memslots, and KVM x86 retries page faults if a to-be-deleted, a.k.a. INVALID, memslot is encountered). To exercise concurrent memslot remove, spawn a second thread to initiate memslot removal at roughly the same time as prefaulting. Test memslot removal for all testcases, i.e. don't limit concurrent removal to only the success case. There are essentially three prefault scenarios (so far) that are of interest: 1. Success 2. ENOENT due to no memslot 3. EAGAIN due to INVALID memslot For all intents and purposes, #1 and #2 are mutually exclusive, or rather, easier to test via separate testcases since writing to non-existent memory is trivial. But for #3, making it mutually exclusive with #1 _or_ #2 is actually more complex than testing memslot removal for all scenarios. The only requirement to let memslot removal coexist with other scenarios is a way to guarantee a stable result, e.g. that the "no memslot" test observes ENOENT, not EAGAIN, for the final checks. So, rather than make memslot removal mutually exclusive with the ENOENT scenario, simply restore the memslot and retry prefaulting. For the "no memslot" case, KVM_PRE_FAULT_MEMORY should be idempotent, i.e. should always fail with ENOENT regardless of how many times userspace attempts prefaulting. Pass in both the base GPA and the offset (instead of the "full" GPA) so that the worker can recreate the memslot. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250924174255.2141847-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-10-07 09:18:22 -07:00
Paolo Bonzini	12abeb81c8	KVM x86 CET virtualization support for 6.18 Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect Branch Tracking (IBT), that can be utilized by software to help provide Control-flow integrity (CFI). SHSTK defends against backward-edge attacks (a.k.a. Return-oriented programming (ROP)), while IBT defends against forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)). Attackers commonly use ROP and COP/JOP methodologies to redirect the control- flow to unauthorized targets in order to execute small snippets of code, a.k.a. gadgets, of the attackers choice. By chaining together several gadgets, an attacker can perform arbitrary operations and circumvent the system's defenses. SHSTK defends against backward-edge attacks, which execute gadgets by modifying the stack to branch to the attacker's target via RET, by providing a second stack that is used exclusively to track control transfer operations. The shadow stack is separate from the data/normal stack, and can be enabled independently in user and kernel mode. When SHSTK is is enabled, CALL instructions push the return address on both the data and shadow stack. RET then pops the return address from both stacks and compares the addresses. If the return addresses from the two stacks do not match, the CPU generates a Control Protection (#CP) exception. IBT defends against backward-edge attacks, which branch to gadgets by executing indirect CALL and JMP instructions with attacker controlled register or memory state, by requiring the target of indirect branches to start with a special marker instruction, ENDBRANCH. If an indirect branch is executed and the next instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH behaves as a NOP if IBT is disabled or unsupported. From a virtualization perspective, CET presents several problems. While SHSTK and IBT have two layers of enabling, a global control in the form of a CR4 bit, and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS. Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable option due to complexity, and outright disallowing use of XSTATE to context switch SHSTK/IBT state would render the features unusable to most guests. To limit the overall complexity without sacrificing performance or usability, simply ignore the potential virtualization hole, but ensure that all paths in KVM treat SHSTK/IBT as usable by the guest if the feature is supported in hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow userspace to advertise one of SHSTK or IBT if both are supported in hardware, even though doing so would allow a misbehaving guest to use the unadvertised feature. Fully emulating SHSTK and IBT would also require significant complexity, e.g. to track and update branch state for IBT, and shadow stack state for SHSTK. Given that emulating large swaths of the guest code stream isn't necessary on modern CPUs, punt on emulating instructions that meaningful impact or consume SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation of such instructions so that KVM's emulator can't be abused to circumvent CET. Disable support for SHSTK and IBT if KVM is configured such that emulation of arbitrary guest instructions may be required, specifically if Unrestricted Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR. Lastly disable SHSTK support if shadow paging is enabled, as the protections for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so that they can't be directly modified by software), i.e. would require non-trivial support in the Shadow MMU. Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT in SVM requires KVM modifications. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXbisACgkQOlYIJqCj N/373w//ckB4c9MjS6eDRp+LtTXQfXyAs8eMcs9YTs7yD3uMvqcbaNuDsf1U2cI6 i2qcuOdxlnKSJphn6oH2JKDWPjRAfHhCqmYghUPaJwgeYqsTfork9s8rzU2tC82q 38mQ6BhAuOwa/plodvDp/+POEIoXUyexSoWX+cngGVTmFWdbfA4NNGjWMZOl1XG2 qLBck6t+IxxUTs1Ij+OsexlAKdY7FcZZ85Ok6I/VE4/lITEhuTJkwkYdh8td3KK/ IVVk1jb1Z7t8lGQ5fi3+N/D8iHJ/0ladmOux6Yxzw88uyj6XLIFOOFsdK09GyhUS QzV06syFkV2vU68VDYiOcMZIdeGmYR5jDpmy9N+o0s86YLU6rKKEaXRP7vW5yHj/ 99AU+DfRHvhqKwWyQ51B+rhr80F3EQrkZXI0QBr8KO7sseFZvZNNVozwKjSyZtNH VBhxjIlVQm5Z1rjucKjc573sONK95z9XUSZjYnCUwB1NH7VsvdULQmJBucCmzW/p 9j49CpmShwggceV6LcYg4Miuvjl/bL1B8Go5Fg+1Fdg7L6Nepi16yywxHmyPqreJ Wx/6N0gqZ3LKDdl5CFYxAxvJoldJR6lbw/AGjvFkre8A+TGGRdz3uS9XXqGHvtbu W5wKhnvGov69lm4xYbxbI+rvxYmmQLm9SgQXel23icbKJ5kmE48= =zsBl -----END PGP SIGNATURE----- Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEAD KVM x86 CET virtualization support for 6.18 Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect Branch Tracking (IBT), that can be utilized by software to help provide Control-flow integrity (CFI). SHSTK defends against backward-edge attacks (a.k.a. Return-oriented programming (ROP)), while IBT defends against forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)). Attackers commonly use ROP and COP/JOP methodologies to redirect the control- flow to unauthorized targets in order to execute small snippets of code, a.k.a. gadgets, of the attackers choice. By chaining together several gadgets, an attacker can perform arbitrary operations and circumvent the system's defenses. SHSTK defends against backward-edge attacks, which execute gadgets by modifying the stack to branch to the attacker's target via RET, by providing a second stack that is used exclusively to track control transfer operations. The shadow stack is separate from the data/normal stack, and can be enabled independently in user and kernel mode. When SHSTK is is enabled, CALL instructions push the return address on both the data and shadow stack. RET then pops the return address from both stacks and compares the addresses. If the return addresses from the two stacks do not match, the CPU generates a Control Protection (#CP) exception. IBT defends against backward-edge attacks, which branch to gadgets by executing indirect CALL and JMP instructions with attacker controlled register or memory state, by requiring the target of indirect branches to start with a special marker instruction, ENDBRANCH. If an indirect branch is executed and the next instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH behaves as a NOP if IBT is disabled or unsupported. From a virtualization perspective, CET presents several problems. While SHSTK and IBT have two layers of enabling, a global control in the form of a CR4 bit, and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS. Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable option due to complexity, and outright disallowing use of XSTATE to context switch SHSTK/IBT state would render the features unusable to most guests. To limit the overall complexity without sacrificing performance or usability, simply ignore the potential virtualization hole, but ensure that all paths in KVM treat SHSTK/IBT as usable by the guest if the feature is supported in hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow userspace to advertise one of SHSTK or IBT if both are supported in hardware, even though doing so would allow a misbehaving guest to use the unadvertised feature. Fully emulating SHSTK and IBT would also require significant complexity, e.g. to track and update branch state for IBT, and shadow stack state for SHSTK. Given that emulating large swaths of the guest code stream isn't necessary on modern CPUs, punt on emulating instructions that meaningful impact or consume SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation of such instructions so that KVM's emulator can't be abused to circumvent CET. Disable support for SHSTK and IBT if KVM is configured such that emulation of arbitrary guest instructions may be required, specifically if Unrestricted Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR. Lastly disable SHSTK support if shadow paging is enabled, as the protections for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so that they can't be directly modified by software), i.e. would require non-trivial support in the Shadow MMU. Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT in SVM requires KVM modifications.	2025-09-30 13:37:14 -04:00
Paolo Bonzini	d05ca6b793	KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXIr0ACgkQOlYIJqCj N/1bbhAAxHzxN7IcizgAYf1BZWMjRU4zJgwlkoGuBeH/IgUOODPjs93L9kyrzvVL tcFgIe9o5fZRGmUfyZbCKnJaQi/4u/2QPRSGhsYt7vyDjCoXzO5CJPMYIqDz5Z2r qg+GNMlLtWI8EbcDd4qT22SWC8GufoXFEQnX6PUNhasOHeKit5ye8wmttcG+zvYV KeIkPluddQkQ2JKyG53IFNmm1lkY05oAibv61hkxqUSwCIJKsQFuDjl4GVouAd/H eu0+pzNmzPUTQ/qJzr2cNL5Nqz08DGp2OCFFRO6bgXaWkvHnFG3EAEHlhTAUh92t LPJxmhb6R8SUc+z8rYTgyF/zVpgeJcJO7F44FrXa7r2iV58ds3TfuO53hVaEfyNp 1GUMH0m8N2vfjtFyUVP1KwZHuFxiGKLd1wZ1h0yKpj1Eg1FjR2cEontqwH44tHn2 ENq8MIbWIBhvCsz5fIbM4y591JSevJUrDlYu60Lz7VyXHAw8Cq92t/dN9O7oH5mJ pIyoracU1g0Q6bbATZYsOGhkCTYLtdelZaBb5AYIgQ+U4C1TA4GpgEBUSVH8HXDy kXzVqSFlL0v5rrFkBPjiNFb5WD3iLjJIM3DLGoNegOM8+79r/USGHUY+XU3z/kCH rV8JBlTnLBCrNOHEiHJUI2kwBQ9C9/l88X/VwvRUNv7SthuExSo= =9IB0 -----END PGP SIGNATURE----- Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes.	2025-09-30 13:36:41 -04:00
Paolo Bonzini	473badf5c4	KVM selftests changes for 6.18 - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmjXfNMACgkQOlYIJqCj N/1Y2xAAgCTIpiarbgqbeyeH/ugTMmy861i00bdbzlMG5ViUSrMJ/XNBZcXFDoA5 WUpnKVno0n7FItqYtt1UTpsw6RU2FjqoZhL8GAfgTzwkvSy1xF/tEm8VtyBcdzQ5 V/V/H9wCINkGuAU/t7NdiAKOT8IqOuwIjQlv69UOHCzvCRA4OBr1h/d8o6DTGhpf zmdLNBoHYV4jTW/mWsKdgsOPgAXj3Id7OgjvNF5kq4KQ1jZ+DtMm9N+PW1M4PjA7 7Bx6k0HwmeC9UXzsfpnfMWkmRtNEk5DEf2T01qFO5Ij/+JlXPkMw8wmkDx4udpRF 0tVd6FxnGdy62NAxHETOs8ZOacRvs0SR4MHJRepIeO4Ey+4vTN3NEIu5/U24VwG+ 4oKKJ9XlYKHzY/RwgVtnqIPXcHixHt+1uGjX7dBnp8Wku5Oh9i1v3Hwbi0iDXKwu CdKCb4hry/PtKc4FkFhNFPIQTSKxrqj84GRGC/BHFdCSVLib0crFfFF+HHGsygnB BbkeOjPt/+DvflkI55EyxionDcojhjosuB5hHJoQr5RXwR0wfbDnYcIQ8ZeutMSR TiK2CpnEteHCKLDOF13Lydw9E1Byo9no62qzmmOPYpcuxS/VfXcKDK1BNzJ/WHti rrxy9mEWJ/tDsSDkq/MLLbWAhRldpmUumpfMLjseSay1jK2z6Gc= =A/nr -----END PGP SIGNATURE----- Merge tag 'kvm-x86-selftests-6.18' of https://github.com/kvm-x86/linux into HEAD KVM selftests changes for 6.18 - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements	2025-09-30 13:23:54 -04:00
Paolo Bonzini	924ccf1d09	KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmjU1XEACgkQrUjsVaLH LAfahQ//aPSwG0jbPwRSmWtoqdm38VvKYuWOOxeCjKyas5w0cZPkMQCh3zV/Ks92 YKNIe6wlDeLIlT+qIzAi5VLwHHiC3ggoJsuNhf7/qmJJvks+1quvl1clUDkVGbRu G+FKFkRpBa1HF2O3NASnvBnRLYzXa/wKOuArwsY5DEG4+irnJU3AegbqRuhyPBFr VX+LwpfUUHipFGg/0ivflsVBjcz42Y3i1VQ4oXzuHqsRn3Ig3eSy9dGTHIOqJ0A3 leNbDUSdZxnMlj3Sfab7hH4Vxlr8kiCEgMogHyYpnlyPvG8zJobMWDE9Ziy86D7f 130zMcHuF0ZkeuaKX+3o2L7yGNTnN/JNtns7VtClRShGqSA8Dtn2xLhnvyray7RH CIYjv//34z1BjWyCBfuN5kFIfX5K7cNQHlDP7RVfgw91k7rbCGCJF743nkxz1WBX 98G05/Rnmn+KCtU8t2pRoG9Qkq+bE3iz8Ka8thmiUNciqO78nn/p7a6OEMcjn7jH C+VUNST519UBGCjLehBrCZuUOtrPRozHz7Adx3VTJT5wBX2hd/QLoNoMssEi9VmS 1j/abh7HJcbe7aTUDWUhOK9I+bPHJRsbeNyL5r9jpoC2Xg5Njh/V3aQZ/uuLzO5+ pYjciRrrmdZ30xkInulWI6wORHkmki34RRmsM9gqYIJBMV+Rcxg= =BCyU -----END PGP SIGNATURE----- Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver	2025-09-30 13:23:36 -04:00
Paolo Bonzini	924ebaefce	KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmjVhSsACgkQI9DQutE9 ekOSIg//YdbA/zo17GrLnJnlGdUnS0e3/357n3e5Lypx1UFRDTmNacpVPw4VG/jt eVpQn7AgYwyvKfCq46eD+hBBqNv1XTn4DeXttv7CmVqhCRythsEvDkTBWSt7oYUZ xfYXCMKNhqUElH4AbYYx3y7nb2E9/KVGr+NBn6Vf5c14OZ3MGVc/fyp4jM1ih5dR kcV2onAYohlIGvFEyZMBtJ+jkYkqfIbfxfqCL0RAET0aEBFcmM1aXybWZj47hlLM f2j+E6cFQ0ZzUt+3pFhT75wo43lHGtIFDjVd60uishyU+NXTVvqRmXDTRU4k546W 18HHX1yijbzuXIatVhVRo2hIq3jKU37T9wtj46BejbDHRdAPENEyN/Qopm7rNS+X mCwOT7He6KR+H4rU6nFaTcsS7bNRCvIbZP9i9zb6NElbvXu5QnM8BUQsYFCDUa/n xtbtQlckbo/7zeoUsBDrGj2XmCf0d45FTHb7fdWOYEmMSmJhXYpUKdM4JcLyhKoQ DD0ox2S+pt2lwNw3XOSABdES0KJxCvDASAMIgn2h2sGpY8FxsBcVW/BufXopdafG UeInxWaILp2iCDM4tH2GLjKqlvMAOwcA+mAEZToXypxlJAnYA6J1pXCF8WEaM6+D BGrLli8Zd8JRs87byq6K7tp8oLNzZdliJp73j5jfOHTJA4MnvFI= =iAL3 -----END PGP SIGNATURE----- Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.	2025-09-30 13:23:28 -04:00
Paolo Bonzini	8cbb0df294	KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a raw spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 -----BEGIN PGP SIGNATURE----- iI0EABYIADUWIQSNXHjWXuzMZutrKNKivnWIJHzdFgUCaMHCvxccb2xpdmVyLnVw dG9uQGxpbnV4LmRldgAKCRCivnWIJHzdFqbCAP9ygpb3tkpAPkbaB68IwyJgGD/C 59f2NTUGkzak20SjHAD+IcvQV8GemDsGNvSvjpb08KrNWMUxeuOBiAp/IvqrqwU= =VrcD -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a raw spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 [Pull into kvm/master to fix conflicts. - Paolo]	2025-09-30 13:23:06 -04:00
Marc Zyngier	10fd028530	Merge branch kvm-arm64/selftests-6.18 into kvmarm-master/next * kvm-arm64/selftests-6.18: : . : KVM/arm64 selftest updates for 6.18: : : - Large update to run EL1 selftests at EL2 when possible : (20250917212044.294760-1-oliver.upton@linux.dev) : : - Work around lack of ID_AA64MMFR4_EL1 trapping on CPUs : without FEAT_FGT : (20250923173006.467455-1-oliver.upton@linux.dev) : : - Additional fixes and cleanups : (20250920-kvm-arm64-id-aa64isar3-el1-v1-0-1764c1c1c96d@kernel.org) : . KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:35:50 +01:00
Mark Brown	b02a2c060b	KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs We have a couple of writable bitfields in ID_AA64ISAR3_EL1 but the set_id_regs selftest does not cover this register at all, add coverage. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:24:57 +01:00
Mark Brown	5a070fc376	KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs Currently we list the main set of registers with bits we test three times, once in the test_regs array which is used at runtime, once in the guest code and once in a list of ARRAY_SIZE() operations we use to tell kselftest how many tests we plan to execute. This is needlessly fiddly, when adding new registers as the test_cnt calculation is formatted with two registers per line. Instead count the number of bitfields in the register arrays at runtime. The existing code subtracts ARRAY_SIZE(test_regs) from the number of tests to account for the terminating FTR_REG_END entries in the per register arrays, the new code accounts for this when enumerating. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:24:57 +01:00
Oliver Upton	75b2fdc1a8	KVM: arm64: selftests: Cope with arch silliness in EL2 selftest Implementations without FEAT_FGT aren't required to trap the entire ID register space when HCR_EL2.TID3 is set. This is a terrible idea, as the hypervisor may need to advertise the absence of a feature to the VM using a negative value in a signed field, FEAT_E2H0 being a great example of this. Cope with uncooperative implementations in the EL2 selftest by accepting a zero value when FEAT_FGT is absent and otherwise only tolerating the expected nonzero value. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:24:02 +01:00
Oliver Upton	f677b0efa9	KVM: arm64: selftests: Add basic test for running in VHE EL2 Add an embarrassingly simple selftest for sanity checking KVM's VHE EL2 and test that the ID register bits are consistent with HCR_EL2.E2H being RES1. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	2de21fb623	KVM: arm64: selftests: Enable EL2 by default Take advantage of VHE to implicitly promote KVM selftests to run at EL2 with only slight modification. Update the smccc_filter test to account for this now that the EL2-ness of a VM is visible to tests. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	05c93cbe66	KVM: arm64: selftests: Initialize HCR_EL2 Initialize HCR_EL2 such that EL2&0 is considered 'InHost', allowing the use of (mostly) unmodified EL1 selftests at EL2. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	7ae44d1cda	KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters Configuring the number of implemented counters via PMCR_EL0.N was a bad idea in retrospect as it interacts poorly with nested. Migrate the selftest to use the vCPU attribute instead of the KVM_SET_ONE_REG mechanism. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	0910778e49	KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 Arch timer registers are redirected to their hypervisor counterparts when running in VHE EL2. This is great, except for the fact that the hypervisor timers use different PPIs. Use the correct INTIDs when that is the case. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	d72543ac72	KVM: arm64: selftests: Select SMCCC conduit based on current EL HVCs are taken within the VM when EL2 is in use. Ensure tests use the SMC instruction when running at EL2 to interact with the host. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	a1b91ac238	KVM: arm64: selftests: Provide helper for getting default vCPU target The default vCPU target in KVM selftests is pretty boring in that it doesn't enable any vCPU features. Expose a helper for getting the default target to prepare for cramming in more features. Call KVM_ARM_PREFERRED_TARGET directly from get-reg-list as it needs fine-grained control over feature flags. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	1c9604ba23	KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts FEAT_VHE has the somewhat nice property of implicitly redirecting EL1 register aliases to their corresponding EL2 representations when E2H=1. Unfortunately, there's no such abstraction for userspace and EL2 registers are always accessed by their canonical encoding. Introduce a helper that applies EL2 redirections to sysregs and use aggressive inlining to catch misuse at compile time. Go a little past the architectural definition for ease of use for test authors (e.g. the stack pointer). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	8911c7dbc6	KVM: arm64: selftests: Create a VGICv3 for 'default' VMs Start creating a VGICv3 by default unless explicitly opted-out by the test. While having an interrupt controller is nice, the real benefit here is clearing a hurdle for EL2 VMs which mandate the presence of a VGIC. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	b8daa7ceac	KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation vgic_v3_setup() has a good bit of sanity checking internally to ensure that vCPUs have actually been created and match the dimensioning of the vgic itself. Spin off an unsanitised setup and initialization helper so vgic initialization can be wired in around a 'default' VM's vCPU creation. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	b712afa7a1	KVM: arm64: selftests: Add helper to check for VGICv3 support Introduce a proper predicate for probing VGICv3 by performing a 'test' creation of the device on a dummy VM. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:32 +01:00
Oliver Upton	a5022da5f9	KVM: arm64: selftests: Initialize VGICv3 only once vgic_v3_setup() unnecessarily initializes the vgic twice. Keep the initialization after configuring MMIO frames and get rid of the other. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:31 +01:00
Oliver Upton	7326348209	KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code In order to compel the default usage of EL2 in selftests, move kvm_arch_vm_post_create() to library code and expose an opt-in for using MTE by default. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-09-24 19:23:31 +01:00
Sean Christopherson	947ab90c91	KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Add a check in the MSRs test to verify that KVM's reported support for MSRs with feature bits is consistent between KVM's MSR save/restore lists and KVM's supported CPUID. To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET, track the "second" feature to avoid false failures when running on a CPU with only one of IBT or SHSTK. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-51-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 10:03:02 -07:00
Sean Christopherson	3469fd203b	KVM: selftests: Add coverage for KVM-defined registers in MSRs test Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs test. While _KVM's_ goal is to not tie the uAPI of KVM-defined registers to any particular internal implementation, i.e. to not commit in uAPI to handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing purposes is a-ok and is a naturally fit given the semantics of SSP. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-50-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:59:44 -07:00
Sean Christopherson	80c2b6d8e7	KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed via ONE_REG and through the dedicated MSR ioctls. For simplicity, run the test twice, e.g. instead of trying to get MSR values into the exact right state when switching write methods. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-49-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:52:18 -07:00
Sean Christopherson	a8b9cca99c	KVM: selftests: Extend MSRs test to validate vCPUs without supported features Add a third vCPUs to the MSRs test that runs with all features disabled in the vCPU's CPUID model, to verify that KVM does the right thing with respect to emulating accesses to MSRs that shouldn't exist. Use the same VM to verify that KVM is honoring the vCPU model, e.g. isn't looking at per-VM state when emulating MSR accesses. Link: https://lore.kernel.org/r/20250919223258.1604852-48-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:52:17 -07:00
Sean Christopherson	27c4135306	KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to handled due to the MSRs existing if IBT or SHSTK is supported. To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET, track the second feature, but skip only RDMSR #GP tests to avoid false failures when running on a CPU with only one of IBT or SHSTK (the WRMSR #GP tests are still valid since the enable bits are per-feature). Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-47-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:51:59 -07:00
Sean Christopherson	9c38ddb3df	KVM: selftests: Add an MSR test to exercise guest/host and read/write Add a selftest to verify reads and writes to various MSRs, from both the guest and host, and expect success/failure based on whether or not the vCPU supports the MSR according to supported CPUID. Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but provides more coverage with respect to host accesses, and will be extended to provide addition testing of CPUID-based features, save/restore lists, and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in KUT. If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as KVM's ABI is a mess; what exactly is supposed to be ignored, and when, varies wildly. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20250919223258.1604852-46-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>	2025-09-23 09:51:56 -07:00

1 2 3 4 5 ...

1996 Commits