linux

Commit Graph

Author	SHA1	Message	Date
Marc Zyngier	105485a182	KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes Booting an EL2 guest on a system only supporting a subset of the possible page sizes leads to interesting situations. For example, on a system that only supports 4kB and 64kB, and is booted with a 4kB kernel, we end-up advertising 16kB support at stage-2, which is pretty weird. That's because we consider that any S2 bigger than our base granule is fair game, irrespective of what the HW actually supports. While this is not impossible to support (KVM would happily handle it), it is likely to be confusing for the guest. Add new checks that will verify that this granule size is actually supported before publishing it to the guest. Fixes: `e7ef6ed458` ("KVM: arm64: Enforce NV limits on a per-idregs basis") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-07-03 10:39:24 +01:00
Marc Zyngier	8800b7c4bb	KVM: arm64: Add RMW specific sysreg accessor In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-06-05 14:18:01 +01:00
Marc Zyngier	6673047405	KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation When handling a TLBI VA* instruction that potentially targets a VNCR page mapping, we fail to mask out the top bits that contain the ASID and TTL fields, hence potentially failing the VA check in the TLB code. An additional wrinkle is that we fail to sign extend the VA, again leading to failed VA checks. Fix both in one go by sign-extending the VA from bit 48, making it comparable to the way we interpret VNCR_EL2.BADDR. Fixes: `4ffa72ad8f` ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-30 09:09:16 +01:00
Marc Zyngier	7f3225fe8b	Merge branch kvm-arm64/nv-nv into kvmarm-master/next * kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-23 10:58:57 +01:00
Marc Zyngier	538fbac740	KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section The conversion to kvm_release_faultin_page() missed the requirement for this to be called within a critical section with mmu_lock held for write. Move this call up to satisfy this requirement. Fixes: `069a05e535` ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 11:40:12 +01:00
Marc Zyngier	beab7d0583	KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held Calling invalidate_vncr_va() without the mmu_lock held for write is a bad idea, and lockdep tells you about that. Fixes: `4ffa72ad8f` ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 11:40:12 +01:00
Marc Zyngier	d43548f422	KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating When translating a VNCR translation fault, we start by marking the current SW-managed TLB as invalid, so that we can populate it in place. This is, however, done without the mmu_lock held. A consequence of this is that another CPU dealing with TLBI emulation can observe a translation still flagged as valid, but with invalid walk results (such as pgshift being 0). Bad things can result from this, such as a BUG() in pgshift_level_to_ttl(). Fix it by taking the mmu_lock for write to perform this local invalidation, and use invalidate_vncr() instead of open-coding the write to the 'valid' flag. Fixes: `069a05e535` ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-21 09:53:08 +01:00
Marc Zyngier	4bc0fe0898	KVM: arm64: Add sanitisation for FEAT_FGT2 registers Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:36:10 +01:00
Marc Zyngier	b2a324ff01	KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. An additional complexity stems from the status of some bits such as E2H and RW, which do not had a RESx status, but still take a fixed value due to implementation choices in KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	beed444841	KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:30 +01:00
Marc Zyngier	c6cbe6a4c1	KVM: arm64: Use FGT feature maps to drive RES0 bits Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 11:35:00 +01:00
Marc Zyngier	4ffa72ad8f	KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	7270cc9157	KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers During an invalidation triggered by an MMU notifier, we need to make sure we can drop the host mapping that would have been translated by the stage-2 mapping being invalidated. For the moment, the invalidation is pretty brutal, as we nuke the full IPA range, and therefore any VNCR_EL2 mapping. At some point, we'll be more light-weight, and the code is able to deal with something more targetted. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	2a359e0725	KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	069a05e535	KVM: arm64: nv: Handle VNCR_EL2-triggered faults As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 08:01:19 +01:00
Marc Zyngier	6fb75733f1	KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	ea8d3cf46d	KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	469c4713d4	KVM: arm64: nv: Allocate VNCR page when required If running a NV guest on an ARMv8.4-NV capable system, let's allocate an additional page that will be used by the hypervisor to fulfill system register accesses. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-19 07:59:46 +01:00
Marc Zyngier	ef6d7d2682	KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask We do not have a computed table for HCRX_EL2, so statically define the bits we know about. A warning will fire if the architecture grows bits that are not handled yet. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-10 11:04:35 +01:00
Marc Zyngier	7ed43d84c1	KVM: arm64: Use computed masks as sanitisers for FGT registers Now that we have computed RES0 bits, use them to sanitise the guest view of FGT registers. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-06 17:35:25 +01:00
Marc Zyngier	0f013a524b	arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2 Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake. It makes things hard to reason about, has the potential to introduce bugs by giving a meaning to bits that are really reserved, and is in general a bad description of the architecture. Given that #defines are cheap, let's describe both registers as intended by the architecture, and repaint all the existing uses. Yes, this is painful. The registers themselves are generated from the JSON file in an automated way. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-05-06 17:35:03 +01:00
Oliver Upton	13f64f6d21	Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next * kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-19 14:52:26 -07:00
Marc Zyngier	4b1b97f0d7	KVM: arm64: nv: Handle L2->L1 transition on interrupt injection An interrupt being delivered to L1 while running L2 must result in the correct exception being delivered to L1. This means that if, on entry to L2, we found ourselves with pending interrupts in the L1 distributor, we need to take immediate action. This is done by posting a request which will prevent the entry in L2, and deliver an IRQ exception to L1, forcing the switch. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:57:10 -08:00
Marc Zyngier	21d29cd814	KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses As ICH_HCR_EL2 is a VNCR accessor when runnintg NV, add some sanitising to what gets written. Crucially, mark TDIR as RES0 if the HW doesn't support it (unlikely, but hey...), as well as anything GICv4 related, since we only expose a GICv3 to the uest. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-03-03 14:55:10 -08:00
Marc Zyngier	8b0b98ebf3	KVM: arm64: Advertise FEAT_ECV when possible We can advertise support for FEAT_ECV if supported on the HW as long as we limit it to the basic trap bits, and not advertise CNTPOFF_EL2 support, even if the host has it (the short story being that CNTPOFF_EL2 is not virtualisable). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:33:34 -08:00
Marc Zyngier	f83c41fb3d	KVM: arm64: Allow userspace to limit NV support to nVHE NV is hard. No kidding. In order to make things simpler, we have established that NV would support two mutually exclusive configurations: - VHE-only, and supporting recursive virtualisation - nVHE-only, and not supporting recursive virtualisation For that purpose, introduce a new vcpu feature flag that denotes the second configuration. We use this flag to limit the idregs further. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:30:17 -08:00
Marc Zyngier	94f296dcd6	KVM: arm64: Move NV-specific capping to idreg sanitisation Instead of applying the NV idreg limits at run time, switch to doing it at the same time as the reset of the VM initialisation. This will make things much simpler once we introduce vcpu-driven variants of NV. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:28:43 -08:00
Marc Zyngier	e7ef6ed458	KVM: arm64: Enforce NV limits on a per-idregs basis As we are about to change the way the idreg reset values are computed, move all the NV limits into a function that initialises one register at a time. This will be most useful in the upcoming patches. We take this opportunity to remove the NV_FTR() macro and rely on the generated names instead. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:28:37 -08:00
Marc Zyngier	8f8d6084f5	KVM: arm64: Mark HCR.EL2.{NV,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 Enforce HCR_EL2.{NV,AT} being RES0 when NV2 is disabled, so that we can actually rely on these bits never being flipped behind our back. This of course relies on our earlier ID reg sanitising. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:06:55 -08:00
Marc Zyngier	d9f943f765	KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero Enforce HCR_EL2.E2H being RES0 when VHE is disabled, so that we can actually rely on that bit never being flipped behind our back. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250220134907.554085-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2025-02-24 11:06:55 -08:00
Marc Zyngier	5417a2e9b1	KVM: arm64: Fix nested S2 MMU structures reallocation For each vcpu that userspace creates, we allocate a number of s2_mmu structures that will eventually contain our shadow S2 page tables. Since this is a dynamically allocated array, we reallocate the array and initialise the newly allocated elements. Once everything is correctly initialised, we adjust pointer and size in the kvm structure, and move on. But should that initialisation fail and the reallocation triggered a copy to another location, we end-up returning early, with the kvm structure still containing the (now stale) old pointer. Weeee! Cure it by assigning the pointer early, and use this to perform the initialisation. If everything succeeds, we adjust the size. Otherwise, we just leave the size as it was, no harm done, and the new memory is as good as the ol' one (we hope...). Fixes: `4f128f8e1a` ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-02-04 15:02:16 +00:00
Marc Zyngier	fa5e4043e9	Merge branch kvm-arm64/misc-6.14 into kvmarm-master/next * kvm-arm64/misc-6.14: : . : Misc KVM/arm64 changes for 6.14 : : - Don't expose AArch32 EL0 capability when NV is enabled : : - Update documentation to reflect the full gamut of kvm-arm.mode : behaviours : : - Use the hypervisor VA bit width when dumping stacktraces : : - Decouple the hypervisor stack size from PAGE_SIZE, at least : on the surface... : : - Make use of str_enabled_disabled() when advertising GICv4.1 support : : - Explicitly handle BRBE traps as UNDEFINED : . KVM: arm64: Explicitly handle BRBE traps as UNDEFINED KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() arm64: kvm: Introduce nvhe stack size constants KVM: arm64: Fix nVHE stacktrace VA bits mask Documentation: Update the behaviour of "kvm-arm.mode" KVM: arm64: nv: Advertise the lack of AArch32 EL0 support Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-17 11:06:50 +00:00
Marc Zyngier	3643b334aa	Merge branch kvm-arm64/nv-resx-fixes-6.14 into kvmarm-master/next * kvm-arm64/nv-resx-fixes-6.14: : . : Fixes for NV sysreg accessors. From the cover letter: : : "Joey recently reported that some rather basic tests were failing on : NV, and managed to track it down to critical register fields (such as : HCR_EL2.E2H) not having their expect value. : : Further investigation has outlined a couple of critical issues: : : - Evaluating HCR_EL2.E2H must always be done with a sanitising : accessor, no ifs, no buts. Given that KVM assumes a fixed value for : this bit, we cannot leave it to the guest to mess with. : : - Resetting the sysreg file must result in the RESx bits taking : effect. Otherwise, we may end-up making the wrong decision (see : above), and we definitely expose invalid values to the guest. Note : that because we compute the RESx masks very late in the VM setup, we : need to apply these masks at that particular point as well. : [...]" : . KVM: arm64: nv: Apply RESx settings to sysreg reset values KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c	2025-01-17 11:06:33 +00:00
Marc Zyngier	080612b294	Merge branch kvm-arm64/nv-timers into kvmarm-master/next * kvm-arm64/nv-timers: : . : Nested Virt support for the EL2 timers. From the initial cover letter: : : "Here's another batch of NV-related patches, this time bringing in most : of the timer support for EL2 as well as nested guests. : : The code is pretty convoluted for a bunch of reasons: : : - FEAT_NV2 breaks the timer semantics by redirecting HW controls to : memory, meaning that a guest could setup a timer and never see it : firing until the next exit : : - We go try hard to reflect the timer state in memory, but that's not : great. : : - With FEAT_ECV, we can finally correctly emulate the virtual timer, : but this emulation is pretty costly : : - As a way to make things suck less, we handle timer reads as early as : possible, and only defer writes to the normal trap handling : : - Finally, some implementations are badly broken, and require some : hand-holding, irrespective of NV support. So we try and reuse the NV : infrastructure to make them usable. This could be further optimised, : but I'm running out of patience for this sort of HW. : : [...]" : . KVM: arm64: nv: Fix doc header layout for timers KVM: arm64: nv: Document EL2 timer API KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosity KVM: arm64: nv: Sanitise CNTHCTL_EL2 KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bits KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT} KVM: arm64: Handle counter access early in non-HYP context KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor context KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in use KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state KVM: arm64: nv: Sync nested timer state with FEAT_NV2 KVM: arm64: nv: Add handling of EL2-specific timer registers Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-17 11:04:53 +00:00
Marc Zyngier	36f998de85	KVM: arm64: nv: Apply RESx settings to sysreg reset values While we have sanitisation in place for the guest sysregs, we lack that sanitisation out of reset. So some of the fields could be evaluated and not reflect their RESx status, which sounds like a very bad idea. Apply the RESx masks to the the sysreg file in two situations: - when going via a reset of the sysregs - after having computed the RESx masks Having this separate reset phase from the actual reset handling is a bit grotty, but we need to apply this after the ID registers are final. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-14 11:33:09 +00:00
Marc Zyngier	d1e37a50e1	KVM: arm64: nv: Sanitise CNTHCTL_EL2 Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle more than we advertise to the guest. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-01-02 19:19:10 +00:00
Marc Zyngier	e891432cf7	KVM: arm64: nv: Advertise the lack of AArch32 EL0 support Although we never supported 32bit anywhere in NV, we fail to advertise so for EL0, probably owing to the relative lack of hardware supporting both NV2 and 32bit EL0. Add some sanitising to ID_AA64PFR0_EL1.EL0, and reaffirm that "in 64bit-only we trust". Reported-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-22 10:09:43 +00:00
Fuad Tabba	aac64ad369	KVM: arm64: Use kvm_vcpu_has_feature() directly for struct kvm Now that we have introduced kvm_vcpu_has_feature(), use it in the remaining code that checks for features in struct kvm, instead of using the __vcpu_has_feature() helper. No functional change intended. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-18-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-12-20 13:54:12 +00:00
Marc Zyngier	0f3a0f23f5	KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure When compiling with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, set_sysreg_masks() fails to compile thanks to: BUILD_BUG_ON(!__builtin_constant_p(sr)); as the compiler doesn't identify sr as a constant, despite all the callers passing constants. Fix the issue by always inlining this function, which allows GCC to do the right thing. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411201857.ZNudtGJl-lkp@intel.com/ Fixes: `a016202009` ("KVM: arm64: Extend masking facility to arbitrary registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241120111516.304250-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-20 17:22:00 -08:00
Oliver Upton	6d4b81e2e7	Merge branch kvm-arm64/nv-pmu into kvmarm/next * kvm-arm64/nv-pmu: : Support for vEL2 PMU controls : : Align the vEL2 PMU support with the current state of non-nested KVM, : including: : : - Trap routing, with the annoying complication of EL2 traps that apply : in Host EL0 : : - PMU emulation, using the correct configuration bits depending on : whether a counter falls in the hypervisor or guest range of PMCs : : - Perf event swizzling across nested boundaries, as the event filtering : needs to be remapped to cope with vEL2 KVM: arm64: nv: Reprogram PMU events affected by nested transition KVM: arm64: nv: Apply EL2 event filtering when in hyp context KVM: arm64: nv: Honor MDCR_EL2.HLP KVM: arm64: nv: Honor MDCR_EL2.HPME KVM: arm64: Add helpers to determine if PMC counts at a given EL KVM: arm64: nv: Adjust range of accessible PMCs according to HPMN KVM: arm64: Rename kvm_pmu_valid_counter_mask() KVM: arm64: nv: Advertise support for FEAT_HPMN0 KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMN KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0 KVM: arm64: nv: Reinject traps that take effect in Host EL0 KVM: arm64: nv: Rename BEHAVE_FORWARD_ANY KVM: arm64: nv: Allow coarse-grained trap combos to use complex traps KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2 arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1 arm64: sysreg: Migrate MDCR_EL2 definition to table arm64: sysreg: Describe ID_AA64DFR2_EL1 fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-11-11 18:49:02 +00:00
Oliver Upton	166b77a2f4	KVM: arm64: nv: Advertise support for FEAT_HPMN0 Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not only that, the emulation is capable of doing FEAT_HPMN0. Advertise support for the feature in the VM's ID registers. It is possible to emulate FEAT_HPMN0 on hardware that doesn't support it since KVM currently traps all PMU registers. Having said that, let's only advertise the feature on supporting hardware in case KVM ever provides 'direct' PMU support to VMs w/o involving host perf. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 19:00:40 +00:00
Oliver Upton	eb609638da	KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2 Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits according to the feature set exposed to the VM. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 19:00:39 +00:00
Marc Zyngier	26e89dccdf	KVM: arm64: Add kvm_has_s1poe() helper Just like we have kvm_has_s1pie(), add its S1POE counterpart, making the code slightly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 02:44:22 +00:00
Mark Brown	a68cddbe47	KVM: arm64: Hide S1PIE registers from userspace when disabled for guests When the guest does not support S1PIE we should not allow any access to the system registers it adds in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for these registers. Fixes: `86f9de9db1` ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-3-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_s1pie() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-26-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 02:44:21 +00:00
Mark Brown	0fcb4eea53	KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guests When the guest does not support FEAT_TCR2 we should not allow any access to it in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for it. Fixes: `fbff560682` ("KVM: arm64: Save/restore TCR2_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-2-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_tcr2() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 02:44:21 +00:00
Marc Zyngier	a016202009	KVM: arm64: Extend masking facility to arbitrary registers We currently only use the masking (RES0/RES1) facility for VNCR registers, as they are memory-based and thus easy to sanitise. But we could apply the same thing to other registers if we: - split the sanitisation from __VNCR_START__ - apply the sanitisation when reading from a HW register This involves a new "marker" in the vcpu_sysreg enum, which defines the point at which the sanitisation applies (the VNCR registers being of course after this marker). Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to kvm_vcpu_apply_reg_masks(), which is vaguely more explicit, and harden set_sysreg_masks() against setting masks for random registers... Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 02:42:30 +00:00
Marc Zyngier	ad4f6ef0fa	KVM: arm64: Sanitise TCR2_EL2 TCR2_EL2 is a bag of control bits, all of which are only valid if certain features are present, and RES0 otherwise. Describe these constraints and register them with the masking infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>	2024-10-31 02:42:30 +00:00
Oliver Upton	c268f204f7	KVM: arm64: nv: Punt stage-2 recycling to a vCPU request Currently, when a nested MMU is repurposed for some other MMU context, KVM unmaps everything during vcpu_load() while holding the MMU lock for write. This is quite a performance bottleneck for large nested VMs, as all vCPU scheduling will spin until the unmap completes. Start punting the MMU cleanup to a vCPU request, where it is then possible to periodically release the MMU lock and CPU in the presence of contention. Ensure that no vCPU winds up using a stale MMU by tracking the pending unmap on the S2 MMU itself and requesting an unmap on every vCPU that finds it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-08 10:40:27 +01:00
Oliver Upton	3c164eb946	KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed Right now the nested code allows unmap operations on a shadow stage-2 to block unconditionally. This is wrong in a couple places, such as a non-blocking MMU notifier or on the back of a sched_in() notifier as part of shadow MMU recycling. Carry through whether or not blocking is allowed to kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU reclaim would precipitate a stack overflow from a pile of kvm_sched_in() callbacks, all trying to recycle a stage-2 MMU. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-08 10:40:27 +01:00
Oliver Upton	6ded46b5a4	KVM: arm64: nv: Keep reference on stage-2 MMU when scheduled out If a vCPU is scheduling out and not in WFI emulation, it is highly likely it will get scheduled again soon and reuse the MMU it had before. Dropping the MMU at vcpu_put() can have some unfortunate consequences, as the MMU could get reclaimed and used in a different context, forcing another 'cold start' on an otherwise active MMU. Avoid that altogether by keeping a reference on the MMU if the vCPU is scheduling out, ensuring that another vCPU cannot reclaim it while the current vCPU is away. Since there are more MMUs than vCPUs, this does not affect the guarantee that an unused MMU is available at any time. Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible code, at least for where it matters in the stage-2 abort path. Yes, the MMU can change across WFI emulation, but there isn't even a use case where this would matter. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>	2024-10-08 10:40:27 +01:00

1 2

92 Commits