mirror of https://github.com/torvalds/linux.git
49696 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
5d5d62298b |
- Update Kirill's email address
- Allow hugetlb PMD sharing only on 64-bit as it doesn't make a whole lotta sense on 32-bit - Add fixes for a misconfigured AMD Zen2 client which wasn't even supposed to run Linux -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhziVAACgkQEsHwGGHe VUoPLRAAqnv0D8pKO/UPUp05bOvKvEvYarK3Va4MV1QrqOgPIvabGbJOzYStU9+Q 4FZ5ZCZbi0eV1sZuNP1Zk7Ryp/bYipR6gLX+jg06VTXXTrjKnUN3ofBLPQf0+fE9 AZDShoSjS+6ifzt6BaUWW3uDMLOzwv50X/xwtLG+Nrprshs7HzfvJq2oFFQX6drQ kg7Cj9N8WNHl1kp6CVy2DXVRzv4VR9+yxeNfOCPOJiCVEmzRlMulPzQYWWagYidB +U+IYDJiG2p8YNL9aiCmnrRNpSfA4Podn8ZJVPKDwXSpmuUmfcLPur0c1Tjt97h0 85ovsJs+RqBBzD3ixkbNSpdNLRBFVX7q5mx4n4+1DuR5ygZrBbDyjZce9gwY2YPh h1c2dnxxxkp9LAnBFJcaWiv8jzScRRbkqwHprBidkCS4plJiGhrD6MC78fMf8kE5 i+dydBefrsYnBwe3ciyCZh/fCvPHk6OmegSdT1+0jlz2YJOGlD1uSPSMeE6YFFvW 64R7MV3BLllBkpRx57zafx8tsdRiH9mZM6naltlcOcQV3JkHZhDJ5aNkfvBJpXw3 RZtHHAG0noCGSMAhl/crQUkZOany2TdkDQn6SQpcM+iY/E0OQSH+/QM4Rw/s+05/ FEtmkC2FqJ00RODhzSqKIwkKSgiwSCokR4pty5OJqBirUqDFNiw= =d3UC -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Update Kirill's email address - Allow hugetlb PMD sharing only on 64-bit as it doesn't make a whole lotta sense on 32-bit - Add fixes for a misconfigured AMD Zen2 client which wasn't even supposed to run Linux * tag 'x86_urgent_for_v6.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Update Kirill Shutemov's email address for TDX x86/mm: Disable hugetlb page table sharing on 32-bit x86/CPU/AMD: Disable INVLPGB on Zen2 x86/rdrand: Disable RDSEED on AMD Cyan Skillfish |
|
|
|
73d7cf0710 |
ARM:
- Remove the last leftovers of the ill-fated FPSIMD host state
mapping at EL2 stage-1
- Fix unexpected advertisement to the guest of unimplemented S2 base
granule sizes
- Gracefully fail initialising pKVM if the interrupt controller isn't
GICv3
- Also gracefully fail initialising pKVM if the carveout allocation
fails
- Fix the computing of the minimum MMIO range required for the host on
stage-2 fault
- Fix the generation of the GICv3 Maintenance Interrupt in nested mode
x86:
- Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM.
- Use a pre-allocated, per-vCPU buffer for handling de-sparsification of
vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue.
- Allow out-of-range/invalid Xen event channel ports when configuring IRQ
routing, to avoid dictating a specific ioctl() ordering to userspace.
- Conditionally reschedule when setting memory attributes to avoid soft
lockups when userspace converts huge swaths of memory to/from private.
- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest.
- Add a missing field in struct sev_data_snp_launch_start that resulted in
the guest-visible workarounds field being filled at the wrong offset.
- Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid
VM-Fail on INVVPID.
- Advertise supported TDX TDVMCALLs to userspace.
- Pass SetupEventNotifyInterrupt arguments to userspace.
- Fix TSC frequency underflow.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhurKgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNxHggApTP4vw+oOzfN7UoNmgR9XZMI1p2a
R8AzQ1zDyVbEVWq3xTKvXtld+dKeO0yKB/XeI/1JLck1OiHxY57I3X6k5AnsurEr
CBzeAhAjXivF8woMgmlP+30aqpomcPACdQm0gRnWkRDDJfXqSUas/iE/s9Ct1dT4
4w3PtFLsSsU8vX/RttR+CqF1AQ6SeV/NRvA8hzPGMGZoQ2um74j4ZsM/3xh77Kdw
Z2vOnZOIA4dk0074JjO/Yb9l00Ib4hn+MWG5jVJ+6i2HRRYd2knnB29apVS/ARdL
X20j+LvtYj/jrPPdYwqjvxbIXyLbJrLCZyjKhfueN+rnisPNvzR+7YE4ZQ==
=NduO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Many patches, pretty much all of them small, that accumulated while I
was on vacation.
ARM:
- Remove the last leftovers of the ill-fated FPSIMD host state
mapping at EL2 stage-1
- Fix unexpected advertisement to the guest of unimplemented S2 base
granule sizes
- Gracefully fail initialising pKVM if the interrupt controller isn't
GICv3
- Also gracefully fail initialising pKVM if the carveout allocation
fails
- Fix the computing of the minimum MMIO range required for the host
on stage-2 fault
- Fix the generation of the GICv3 Maintenance Interrupt in nested
mode
x86:
- Reject SEV{-ES} intra-host migration if one or more vCPUs are
actively being created, so as not to create a non-SEV{-ES} vCPU in
an SEV{-ES} VM
- Use a pre-allocated, per-vCPU buffer for handling de-sparsification
of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
large" issue
- Allow out-of-range/invalid Xen event channel ports when configuring
IRQ routing, to avoid dictating a specific ioctl() ordering to
userspace
- Conditionally reschedule when setting memory attributes to avoid
soft lockups when userspace converts huge swaths of memory to/from
private
- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest
- Add a missing field in struct sev_data_snp_launch_start that
resulted in the guest-visible workarounds field being filled at the
wrong offset
- Skip non-canonical address when processing Hyper-V PV TLB flushes
to avoid VM-Fail on INVVPID
- Advertise supported TDX TDVMCALLs to userspace
- Pass SetupEventNotifyInterrupt arguments to userspace
- Fix TSC frequency underflow"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: avoid underflow when scaling TSC frequency
KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
KVM: arm64: Don't free hyp pages with pKVM on GICv2
KVM: arm64: Fix error path in init_hyp_mode()
KVM: arm64: Adjust range correctly during host stage-2 faults
KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
Documentation: KVM: Fix unexpected unindent warnings
KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
KVM: Allow CPU to reschedule while setting per-page memory attributes
KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
|
|
|
|
4578a747f3 |
KVM: x86: avoid underflow when scaling TSC frequency
In function kvm_guest_time_update(), __scale_tsc() is used to calculate
a TSC *frequency* rather than a TSC value. With low-enough ratios,
a TSC value that is less than 1 would underflow to 0 and to an infinite
while loop in kvm_get_time_scale():
kvm_guest_time_update(struct kvm_vcpu *v)
if (kvm_caps.has_tsc_control)
tgt_tsc_khz = kvm_scale_tsc(tgt_tsc_khz,
v->arch.l1_tsc_scaling_ratio);
__scale_tsc(u64 ratio, u64 tsc)
ratio=122380531, tsc=2299998, N=48
ratio*tsc >> N = 0.999... -> 0
Later in the function:
Call Trace:
<TASK>
kvm_get_time_scale arch/x86/kvm/x86.c:2458 [inline]
kvm_guest_time_update+0x926/0xb00 arch/x86/kvm/x86.c:3268
vcpu_enter_guest.constprop.0+0x1e70/0x3cf0 arch/x86/kvm/x86.c:10678
vcpu_run+0x129/0x8d0 arch/x86/kvm/x86.c:11126
kvm_arch_vcpu_ioctl_run+0x37a/0x13d0 arch/x86/kvm/x86.c:11352
kvm_vcpu_ioctl+0x56b/0xe60 virt/kvm/kvm_main.c:4188
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0x12d/0x190 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x59/0x110 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x78/0xe2
This can really happen only when fuzzing, since the TSC frequency
would have to be nonsensically low.
Fixes:
|
|
|
|
76303ee8d5 |
x86/mm: Disable hugetlb page table sharing on 32-bit
Only select ARCH_WANT_HUGE_PMD_SHARE on 64-bit x86. Page table sharing requires at least three levels because it involves shared references to PMD tables; 32-bit x86 has either two-level paging (without PAE) or three-level paging (with PAE), but even with three-level paging, having a dedicated PGD entry for hugetlb is only barely possible (because the PGD only has four entries), and it seems unlikely anyone's actually using PMD sharing on 32-bit. Having ARCH_WANT_HUGE_PMD_SHARE enabled on non-PAE 32-bit X86 (which has 2-level paging) became particularly problematic after commit |
|
|
|
a74bb5f202 |
x86/CPU/AMD: Disable INVLPGB on Zen2
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an issue
that causes system oopses and panics when performing TLB flush using
INVLPGB.
However, the problem is that that machine has misconfigured CPUID and
should not report the INVLPGB bit in the first place. So zap the
kernel's representation of the flag so that nothing gets confused.
[ bp: Massage. ]
Fixes:
|
|
|
|
5b937a1ed6 |
x86/rdrand: Disable RDSEED on AMD Cyan Skillfish
AMD Cyan Skillfish (Family 17h, Model 47h, Stepping 0h) has an error that causes RDSEED to always return 0xffffffff, while RDRAND works correctly. Mask the RDSEED cap for this CPU so that both /proc/cpuinfo and direct CPUID read report RDSEED as unavailable. [ bp: Move to amd.c, massage. ] Signed-off-by: Mikhail Paulyshka <me@mixaill.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250524145319.209075-1-me@mixaill.net |
|
|
|
6e9128ff9d |
Add the mitigation logic for Transient Scheduler Attacks (TSA)
TSA are new aspeculative side channel attacks related to the execution timing of instructions under specific microarchitectural conditions. In some cases, an attacker may be able to use this timing information to infer data from other contexts, resulting in information leakage. Add the usual controls of the mitigation and integrate it into the existing speculation bugs infrastructure in the kernel. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhSsvQACgkQEsHwGGHe VUrWNw//V+ZabYq3Nnvh4jEe6Altobnpn8bOIWmcBx6I3xuuArb9bLqcbKerDIcC POVVW6zrdNigDe/U4aqaJXE7qCRX55uTYbhp8OLH0zzqX3Pjl/hUnEXWtMtlXj/G CIM5mqjqEFp5JRGXetdjjuvjG1IPf+CbjKqj2WXbi//T6F3LiAFxkzdUhd+clBF/ ztWchjwUmqU0WJd6+Smb8ZnvWrLoZuOFldjhFad820B7fqkdJhzjHMmwBHJKUEZu oABv8B0/4IALrx6LenCspWS4OuTOGG7DKyIgzitByXygXXb4L3ZUKpuqkxBU7hFx bscwtOP7e5HIYAekx6ZSLZoZpYQXr1iH0aRGrjwapi3ASIpUwI0UA9ck2PdGo0IY 0GvmN0vbybskewBQyG819BM+DCau5pOLWuL7cYmaD2eTNoOHOknMDNlO8VzXqJxa NnignSuEWFm2vNV1FXEav2YbVjlanV6JleiPDGBe5Xd9dnxZTvg9HuP2NkYio4dZ mb/kEU/kTcN8nWh0Q96tX45kmj0vCbBgrSQkmUpyAugp38n69D1tp3ii9D/hyQFH hKGcFC9m+rYVx1NLyAxhTGxaEqF801d5Qawwud8HsnQudTpCdSXD9fcBg9aCbWEa FymtDpIeUQrFAjDpVEp6Syh3odKvLXsGEzL+DVvqKDuA8r6DxFo= =2cLl -----END PGP SIGNATURE----- Merge tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU speculation fixes from Borislav Petkov: "Add the mitigation logic for Transient Scheduler Attacks (TSA) TSA are new aspeculative side channel attacks related to the execution timing of instructions under specific microarchitectural conditions. In some cases, an attacker may be able to use this timing information to infer data from other contexts, resulting in information leakage. Add the usual controls of the mitigation and integrate it into the existing speculation bugs infrastructure in the kernel" * tag 'tsa_x86_bugs_for_6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/process: Move the buffer clearing before MONITOR x86/microcode/AMD: Add TSA microcode SHAs KVM: SVM: Advertise TSA CPUID bits to guests x86/bugs: Add a Transient Scheduler Attacks mitigation x86/bugs: Rename MDS machinery to something more generic |
|
|
|
5fc2e891a5 |
- Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which
prevents their TSCs from going skewed from the hypervisor's -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhqMCYACgkQEsHwGGHe VUrz6A/9EMN2fEdpeStGrk5t0BvYPhnIApr2x2QuvVN6qXgyGWQXhnF5G94SFNn8 ykNxcGi97R/wxqk2uK3RfBg4P0ScoPDOLKKSeaqO0LVuHDTVX72fwB1F3qdaPNbp EIEL+OOEwUAwviT2GSH4mwTb1C7TuJnOZH2lC6yDkWwN5BLnIA4P4C0Wr4pIQ+MT TMzxGMT01yTnCAGHGOD2NRUIv/29qeJl+18uOqDO4A64RPT5Pp0yLJ72U4grGzOr 2C49+/XD0qjXMs8vRz/CbeBK47ZaE1v/ui3g5ofbZ2YtcrfJXDY2/SwoJ0Oz+liM TSEXj8IpFZ3aq3+Pvgp9Qibu5QnFxJi7xTzrGCG1OSouHXTH2eFSVIXCiojVU12Y s+pKCBTXs9wVJN4z/FaSSwmTvQolld7oozShgPieZsYNBfJeeWIm+6LZRT4Zr/7Y UVsYEc/7m36ggKK+XFHsea2ZnmUFV18kEHPuWAXwmH3DW3dDfI5nm/s811jsbS+6 2RaLZPiKBsYmNZ7iCujrY3GEmE5Eyemr8Ricj2zSGTCH2EYNeODDQBOn+hYgQOTK WJFWqpC5JqI5oJapmhugCkjfT75e+XTgO8Dox7HdlJR4UAb61xxf1zGFbThH8T45 LZgbIKtLwwLShg0FwzDl7swnADJ/SiaKl049Q8Z5YthhllHy6zI= =6yl8 -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which prevents their TSCs from going skewed from the hypervisor's * tag 'x86_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Use TSC_FACTOR for Secure TSC frequency calculation |
|
|
|
bdde3141ce |
- Do not remove the MCE sysfs hierarchy if thresholding sysfs nodes init
fails due to new/unknown banks present, which in itself is not fatal anyway; add default names for new banks - Make sure MCE polling settings are honored after CMCI storms - Make sure MCE threshold limit is reset after the thresholding interrupt has been serviced - Clean up properly and disable CMCI banks on shutdown so that a second/kexec-ed kernel can rediscover those banks again -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhqLrkACgkQEsHwGGHe VUqm/hAAlg6nX/uJIeszPpZK1o4j6WP3IZ0RAo5yxd9mQC+zNP35Xqv3MOUBZVGE 1EhrQSgrqeC9NIbT8a9ansnc3BKxODOB8raoNA3HpViTG3LeQ5ycmpJv5qWqcr8B EV14BpZHAEx3AXAiDcGXkuKYM9RrpHtOwtyKjib4ScT+xCkzQzJtrO2hPTk8Slr/ 8Hly14+st7Iqvnh9nH1TeMOjogGg+lr9cIlG6rzZGj3IPbfEWbvmHhjJut7p4TfU g5AY3djzJ8eyrOv3aKxgVDkJ7qet283sc+mvTzrhAnoJEYI9v7tde4g6AKJj0BLA +u0tTOu47c/wijLcHPpaS+zwifo4BxKDIG8q+tHT6ixMMPT3Mev8nwanjAotjgz7 WSN3eL9jie+IJPq4c0eN2z2um5tiqxFHF5M7q4Ol5VJiUU8Wa31B/pzPZymQoCWZ F/SC5VVq+ZwfRqzQMAK5dHQSj1zvkbtb0HOMYoFTOU1JocF7Px1gxx401UQ6pKdG Qw2rE1SUKxaQT4HZ2+SRvO2egJItsQw4r+ZT/7sMQhII9v9qK500kD8o30HcCEh3 o8kT+dQwKEv0KLga5vYFnITT29XM9GySdAEI7HxKi/kpnwaUppUljuycfhzAMtYe xz86txluUsk7sUW8eUPgjuKPYO3a20FkY8VB5TYWTHxJdMAeb4E= =JI7d -----END PGP SIGNATURE----- Merge tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Borislav Petkov: - Do not remove the MCE sysfs hierarchy if thresholding sysfs nodes init fails due to new/unknown banks present, which in itself is not fatal anyway; add default names for new banks - Make sure MCE polling settings are honored after CMCI storms - Make sure MCE threshold limit is reset after the thresholding interrupt has been serviced - Clean up properly and disable CMCI banks on shutdown so that a second/kexec-ed kernel can rediscover those banks again * tag 'ras_urgent_for_v6.16_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make sure CMCI banks are cleared during shutdown on Intel x86/mce/amd: Fix threshold limit reset x86/mce/amd: Add default names for MCA banks and blocks x86/mce: Ensure user polling settings are honored when restarting timer x86/mce: Don't remove sysfs if thresholding sysfs init fails |
|
|
|
df46426745 |
platform-drivers-x86 for v6.16-3
Fixes and New HW Support
- amd/isp4: Improve swnode graph (new driver exception)
- asus-nb-wmi: Use duo keyboard quirk for Zenbook Duo UX8406CA
- dell-lis3lv02d: Add Latitude 5500 accelerometer address
- dell-wmi-sysman: Fix WMI data block retrieval and class dev unreg
- hp-bioscfg: Fix class device unregistration
- i2c: piix4: Re-enable on non-x86 + move FCH header under platform_data/
- intel/hid: Wildcat Lake support
- mellanox:
- mlxbf-pmc: Fix duplicate event ID
- mlxbf-tmfifo: Fix vring_desc.len assignment
- mlxreg-lc: Fix bit-not-set logic check
- nvsw-sn2201: Fix bus number in error message & spelling errors
- portwell-ec: Move watchdog device under correct platform hierarchy
- think-lmi: Error handling fixes (sysfs, kset, kobject, class dev unreg)
- thinkpad_acpi: Handle HKEY 0x1402 event (2025 Thinkpads)
- wmi: Fix WMI event enablement
The following is an automated shortlog grouped by driver:
asus-nb-wmi:
- add DMI quirk for ASUS Zenbook Duo UX8406CA
dell-lis3lv02d:
- Add Latitude 5500
dell-wmi-sysman:
- Fix class device unregistration
- Fix WMI data block retrieval in sysfs callbacks
hp-bioscfg:
- Fix class device unregistration
i2c:
- Re-enable piix4 driver on non-x86
intel/hid:
- Add Wildcat Lake support
mellanox:
- Fix spelling and comment clarity in Mellanox drivers
mlxbf-pmc:
- Fix duplicate event ID for CACHE_DATA1
mlxbf-tmfifo:
- fix vring_desc.len assignment
mlxreg-lc:
- Fix logic error in power state check
Move FCH header to a location accessible by all archs:
- Move FCH header to a location accessible by all archs
nvsw-sn2201:
- Fix bus number in adapter error message
portwell-ec:
- Move watchdog device under correct platform hierarchy
think-lmi:
- Create ksets consecutively
- Fix class device unregistration
- Fix kobject cleanup
- Fix sysfs group cleanup
thinkpad_acpi:
- handle HKEY 0x1402 event
Update swnode graph for amd isp4:
- Update swnode graph for amd isp4
wmi:
- Fix WMI event enablement
- Update documentation of WCxx/WExx ACPI methods
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSCSUwRdwTNL2MhaBlZrE9hU+XOMQUCaGfkwwAKCRBZrE9hU+XO
MVK1AQCK3C21auqcEbiZrx67hr5ir6VwTAZ9S6IR8R2FKqw8YwEAinUOcHSbmP6a
eXV0v5xVRPxZV7JBO5aN7FESqVHpBQ4=
=uxUH
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers fixes from Ilpo Järvinen:
"Mostly a few lines fixed here and there except amd/isp4 which improves
swnodes relationships but that is a new driver not in any stable
kernels yet. The think-lmi driver changes also look relatively large
but there are just many fixes to it.
The i2c/piix4 change is a effectively a revert of the commit
|
|
|
|
52e1a03e6c |
x86/sev: Use TSC_FACTOR for Secure TSC frequency calculation
When using Secure TSC, the GUEST_TSC_FREQ MSR reports a frequency based on
the nominal P0 frequency, which deviates slightly (typically ~0.2%) from
the actual mean TSC frequency due to clocking parameters.
Over extended VM uptime, this discrepancy accumulates, causing clock skew
between the hypervisor and a SEV-SNP VM, leading to early timer interrupts as
perceived by the guest.
The guest kernel relies on the reported nominal frequency for TSC-based
timekeeping, while the actual frequency set during SNP_LAUNCH_START may
differ. This mismatch results in inaccurate time calculations, causing the
guest to perceive hrtimers as firing earlier than expected.
Utilize the TSC_FACTOR from the SEV firmware's secrets page (see "Secrets
Page Format" in the SNP Firmware ABI Specification) to calculate the mean
TSC frequency, ensuring accurate timekeeping and mitigating clock skew in
SEV-SNP VMs.
Use early_ioremap_encrypted() to map the secrets page as
ioremap_encrypted() uses kmalloc() which is not available during early TSC
initialization and causes a panic.
[ bp: Drop the silly dummy var:
https://lore.kernel.org/r/20250630192726.GBaGLlHl84xIopx4Pt@fat_crate.local ]
Fixes:
|
|
|
|
b1c26e0595
|
Move FCH header to a location accessible by all archs
A new header fch.h was created to store registers used by different AMD drivers. This header was included by i2c-piix4 in commit |
|
|
|
cc69ac7a65 |
- Make sure DR6 and DR7 are initialized to their architectural values and not
accidentally cleared, leading to misconfigurations -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhg/UQACgkQEsHwGGHe VUrmXBAAtOrpEtR4geeBeZtCEaUxE4DE8Zvj36dr+sAScHTXNTzYK94mAy/AHU22 V3rF12/kuyyZSrwROXLBD6PgkHEn8u0WLztSeqP/SisnoLjMTV9H9TuGYBUoz1NS clkoElQ6DJP5BVzmYpZlJrcofqNjkS/mAxfMRoIAq+LzKkb3iL/Lge+Ox/IDUg8z L9wRlKh/IaJ5EETWlqh0gkFeS/M9DXYmfkasDQeVkUxnKFeXBdyUGc2jFzBGX5RA rsdnz+C3x3ow2U9N+ZMVr+n06yTZvh+fAiU8emeBQm0q5fZBBHWDbnZZtWf+KG6s 43tlWyVqic5yzyQbUpRC2sttOkIAtOCMx36XexbGm1eKRNNc6fTz9IlgO/97HkuE lYBNq0zd/p5Kb53lXb3uwBVy4sjIEZUyD/K5DO4YfTgamcwXl8BP5xnKtNPqImI5 aaF3xKKLOUDOTL1CcK5YG0joaU1k+I0F0KO7HYqkDi8Uf5naWZSUNil8nPQn8RX7 3f3LJx0e3j2o0f60AHI4mjUAUJHsxExmpaTl079k03wt8YVE3ucNaUN6se6nidVz H5q0JU4q3C3DCu0I3Ub4wa5QXGA+TOHKuqhJCapKAVAAQDlbV2z8GxA/3B2YbRM+ eZ6/RVyk++VrRXIyfmfwLPu0CLVoSNhaUhu/hFrYbjA3NX+85qw= =fjgT -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure DR6 and DR7 are initialized to their architectural values and not accidentally cleared, leading to misconfigurations * tag 'x86_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Initialize DR7 by writing its architectural reset value x86/traps: Initialize DR6 by writing its architectural reset value |
|
|
|
30ad231a50 |
x86/mce: Make sure CMCI banks are cleared during shutdown on Intel
CMCI banks are not cleared during shutdown on Intel CPUs. As a side effect, when a kexec is performed, CPUs coming back online are unable to rediscover/claim these occupied banks which breaks MCE reporting. Clear the CPU ownership during shutdown via cmci_clear() so the banks can be reclaimed and MCE reporting will become functional once more. [ bp: Massage commit message. ] Reported-by: Aijay Adams <aijay@meta.com> Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/20250627174935.95194-1-inwardvessel@gmail.com |
|
|
|
5f6e3b7206 |
x86/mce/amd: Fix threshold limit reset
The MCA threshold limit must be reset after servicing the interrupt. Currently, the restart function doesn't have an explicit check for this. It makes some assumptions based on the current limit and what's in the registers. These assumptions don't always hold, so the limit won't be reset in some cases. Make the reset condition explicit. Either an interrupt/overflow has occurred or the bank is being initialized. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-4-236dd74f645f@amd.com |
|
|
|
d66e1e90b1 |
x86/mce/amd: Add default names for MCA banks and blocks
Ensure that sysfs init doesn't fail for new/unrecognized bank types or if
a bank has additional blocks available.
Most MCA banks have a single thresholding block, so the block takes the same
name as the bank.
Unified Memory Controllers (UMCs) are a special case where there are two
blocks and each has a unique name.
However, the microarchitecture allows for five blocks. Any new MCA bank types
with more than one block will be missing names for the extra blocks. The MCE
sysfs will fail to initialize in this case.
Fixes:
|
|
|
|
00c092de6f |
x86/mce: Ensure user polling settings are honored when restarting timer
Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.
If the user did not disable CMCI, then storms can occur. When these
happen, the MCE timer will be started with a fixed interval. After the
storm subsides, the timer's next interval is set to check_interval.
This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.
Create a new helper to check these conditions and use it when a CMCI
storm ends.
[ bp: Massage. ]
Fixes:
|
|
|
|
4c113a5b28 |
x86/mce: Don't remove sysfs if thresholding sysfs init fails
Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set. The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface. Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-1-236dd74f645f@amd.com |
|
|
|
fa787ac07b |
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
In KVM guests with Hyper-V hypercalls enabled, the hypercalls
HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST and HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX
allow a guest to request invalidation of portions of a virtual TLB.
For this, the hypercall parameter includes a list of GVAs that are supposed
to be invalidated.
However, when non-canonical GVAs are passed, there is currently no
filtering in place and they are eventually passed to checked invocations of
INVVPID on Intel / INVLPGA on AMD. While AMD's INVLPGA silently ignores
non-canonical addresses (effectively a no-op), Intel's INVVPID explicitly
signals VM-Fail and ultimately triggers the WARN_ONCE in invvpid_error():
invvpid failed: ext=0x0 vpid=1 gva=0xaaaaaaaaaaaaa000
WARNING: CPU: 6 PID: 326 at arch/x86/kvm/vmx/vmx.c:482
invvpid_error+0x91/0xa0 [kvm_intel]
Modules linked in: kvm_intel kvm 9pnet_virtio irqbypass fuse
CPU: 6 UID: 0 PID: 326 Comm: kvm-vm Not tainted 6.15.0 #14 PREEMPT(voluntary)
RIP: 0010:invvpid_error+0x91/0xa0 [kvm_intel]
Call Trace:
vmx_flush_tlb_gva+0x320/0x490 [kvm_intel]
kvm_hv_vcpu_flush_tlb+0x24f/0x4f0 [kvm]
kvm_arch_vcpu_ioctl_run+0x3013/0x5810 [kvm]
Hyper-V documents that invalid GVAs (those that are beyond a partition's
GVA space) are to be ignored. While not completely clear whether this
ruling also applies to non-canonical GVAs, it is likely fine to make that
assumption, and manual testing on Azure confirms "real" Hyper-V interprets
the specification in the same way.
Skip non-canonical GVAs when processing the list of address to avoid
tripping the INVVPID failure. Alternatively, KVM could filter out "bad"
GVAs before inserting into the FIFO, but practically speaking the only
downside of pushing validation to the final processing is that doing so
is suboptimal for the guest, and no well-behaved guest will request TLB
flushes for non-canonical addresses.
Fixes:
|
|
|
|
8948941276 |
um: Use correct data source in fpregs_legacy_set()
Read from the buffer pointed to by 'from' instead of '&buf', as
'buf' contains no valid data when 'ubuf' is NULL.
Fixes:
|
|
|
|
fa7d0f83c5 |
x86/traps: Initialize DR7 by writing its architectural reset value
Initialize DR7 by writing its architectural reset value to always set bit 10, which is reserved to '1', when "clearing" DR7 so as not to trigger unanticipated behavior if said bit is ever unreserved, e.g. as a feature enabling flag with inverted polarity. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sean Christopherson <seanjc@google.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com |
|
|
|
5f465c148c |
x86/traps: Initialize DR6 by writing its architectural reset value
Initialize DR6 by writing its architectural reset value to avoid
incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads
to a false bus lock detected warning.
The Intel SDM says:
1) Certain debug exceptions may clear bits 0-3 of DR6.
2) BLD induced #DB clears DR6.BLD and any other debug exception
doesn't modify DR6.BLD.
3) RTM induced #DB clears DR6.RTM and any other debug exception
sets DR6.RTM.
To avoid confusion in identifying debug exceptions, debug handlers
should set DR6.BLD and DR6.RTM, and clear other DR6 bits before
returning.
The DR6 architectural reset value 0xFFFF0FF0, already defined as
macro DR6_RESERVED, satisfies these requirements, so just use it to
reinitialize DR6 whenever needed.
Since clear_all_debug_regs() no longer zeros all debug registers,
rename it to initialize_debug_regs() to better reflect its current
behavior.
Since debug_read_clear_dr6() no longer clears DR6, rename it to
debug_read_reset_dr6() to better reflect its current behavior.
Fixes:
|
|
|
|
a7f4dff21f |
KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
To avoid imposing an ordering constraint on userspace, allow 'invalid' event channel targets to be configured in the IRQ routing table. This is the same as accepting interrupts targeted at vCPUs which don't exist yet, which is already the case for both Xen event channels *and* for MSIs (which don't do any filtering of permitted APIC ID targets at all). If userspace actually *triggers* an IRQ with an invalid target, that will fail cleanly, as kvm_xen_set_evtchn_fast() also does the same range check. If KVM enforced that the IRQ target must be valid at the time it is *configured*, that would force userspace to create all vCPUs and do various other parts of setup (in this case, setting the Xen long_mode) before restoring the IRQ table. Cc: stable@vger.kernel.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/e489252745ac4b53f1f7f50570b03fb416aa2065.camel@infradead.org [sean: massage comment] Signed-off-by: Sean Christopherson <seanjc@google.com> |
|
|
|
0b6f4a5f08 |
KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
Use a preallocated per-vCPU bitmap for tracking the unpacked set of vCPUs
being targeted for Hyper-V's paravirt TLB flushing. If KVM_MAX_NR_VCPUS
is set to 4096 (which is allowed even for MAXSMP=n builds), putting the
vCPU mask on-stack pushes kvm_hv_flush_tlb() past the default FRAME_WARN
limit.
arch/x86/kvm/hyperv.c:2001:12: error: stack frame size (1288) exceeds limit (1024)
in 'kvm_hv_flush_tlb' [-Werror,-Wframe-larger-than]
2001 | static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
| ^
1 error generated.
Note, sparse_banks was given the same treatment by commit
|
|
|
|
48f15f6241 |
KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
When creating an SEV-ES vCPU for intra-host migration, set its vmsa_pa to INVALID_PAGE to harden against doing VMRUN with a bogus VMSA (KVM checks for a valid VMSA page in pre_sev_run()). Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20250602224459.41505-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> |
|
|
|
ecf371f8b0 |
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
Reject migration of SEV{-ES} state if either the source or destination VM
is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the
section between incrementing created_vcpus and online_vcpus. The bulk of
vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs
in parallel, and so sev_info.es_active can get toggled from false=>true in
the destination VM after (or during) svm_vcpu_create(), resulting in an
SEV{-ES} VM effectively having a non-SEV{-ES} vCPU.
The issue manifests most visibly as a crash when trying to free a vCPU's
NULL VMSA page in an SEV-ES VM, but any number of things can go wrong.
BUG: unable to handle page fault for address: ffffebde00000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP KASAN NOPTI
CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE
Tainted: [U]=USER, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024
RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline]
RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline]
RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline]
RIP: 0010:PageHead include/linux/page-flags.h:866 [inline]
RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067
Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0
RSP: 0018:ffff8984551978d0 EFLAGS: 00010246
RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000
RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000
R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000
R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000
FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169
svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515
kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396
kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline]
kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490
kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895
kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310
kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369
__fput+0x3e4/0x9e0 fs/file_table.c:465
task_work_run+0x1a9/0x220 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x7f0/0x25b0 kernel/exit.c:953
do_group_exit+0x203/0x2d0 kernel/exit.c:1102
get_signal+0x1357/0x1480 kernel/signal.c:3034
arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218
do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f87a898e969
</TASK>
Modules linked in: gq(O)
gsmi: Log Shutdown Reason 0x03
CR2: ffffebde00000000
---[ end trace 0000000000000000 ]---
Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing
the host is likely desirable due to the VMSA being consumed by hardware.
E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a
bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered
away thanks to L1TF, but panicking in this scenario is preferable to
potentially running with corrupted state.
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Fixes:
|
|
|
|
5c00eca95a |
- Make sure the array tracking which kernel text positions need to be
alternatives-patched doesn't get mishandled by out-of-order modifications, leading to it overflowing and causing page faults when patching - Avoid an infinite loop when early code does a ranged TLB invalidation before the broadcast TLB invalidation count of how many pages it can flush, has been read from CPUID - Fix a CONFIG_MODULES typo - Disable broadcast TLB invalidation when PTI is enabled to avoid an overflow of the bitmap tracking dynamic ASIDs which need to be flushed when the kernel switches between the user and kernel address space - Handle the case of a CPU going offline and thus reporting zeroes when reading top-level events in the resctrl code -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhXwaMACgkQEsHwGGHe VUr6OQ//ThzgO6GdONGcLgTdiQ/c3Ug9tUf3k3uUeZhGp7MypEZgwBzHcU6jbkcX XYOdV5wJw+eetyj1/ueJqXBvQRWox6lOT064RZDwmmtJCRcSSAOxa5lbrHfnlgcd y9j7ml5cS/NzIfldiSynlE523SiYz2EOV0x0kY3xGkHdOT98VJHOpahgaaWmizvQ MEnv1AosYFGlwonq5kGu0vV8V4N0VRIe7/FjAxSuSNLUHwVdACqX9oyAdYbyw09p oNZjpozBJBNe70Wkg35ijQKlNGKkngwnPILmRgjZwHDH2FIRzrr6CqmHx6z336b+ FkThRmVzJZfhtQVFM9V/prBB3GW8j/zIm61j67iz2yJW1Biz3xJKnUT+s7jd4ARr gGBAB+41UoaPsVKrrSQYlCsrFkUWn6b4AFT+l1eQDgnuc+ajX+OyHHXC+qVrBUbK lL80EoYyPrEMh8pP9wx7hdtVCvOTOSSSdFPe7RrnaT7satoBI/Aemz9tZqVzmkCs 48iLZ4Xbwswo8U9mKIwap3IksG43LvDLWK1ydw+U6OzBuYGmu148JcSBKO3G1uhm YkRvmykNTxKO62mki8Wc86VP5IB5KxBcS6nO6GAGT3vsKPrkzi2GkqvaDi6Z0bEL 1kjIaN/XKG4OQglba/DWoRLBXmd9LtsVrCJ1QGuY+ejYUJYLLMA= =NT0q -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure the array tracking which kernel text positions need to be alternatives-patched doesn't get mishandled by out-of-order modifications, leading to it overflowing and causing page faults when patching - Avoid an infinite loop when early code does a ranged TLB invalidation before the broadcast TLB invalidation count of how many pages it can flush, has been read from CPUID - Fix a CONFIG_MODULES typo - Disable broadcast TLB invalidation when PTI is enabled to avoid an overflow of the bitmap tracking dynamic ASIDs which need to be flushed when the kernel switches between the user and kernel address space - Handle the case of a CPU going offline and thus reporting zeroes when reading top-level events in the resctrl code * tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fix int3 handling failure from broken text_poke array x86/mm: Fix early boot use of INVPLGB x86/its: Fix an ifdef typo in its_alloc() x86/mm: Disable INVLPGB when PTI is enabled x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem |
|
|
|
17ef32ae66 |
- Avoid a crash on a heterogeneous machine where not all cores support the
same hw events features - Avoid a deadlock when throttling events - Document the perf event states more - Make sure a number of perf paths switching off or rescheduling events call perf_cgroup_event_disable() - Make sure perf does task sampling before its userspace mapping is torn down, and not after -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmhXvYYACgkQEsHwGGHe VUotdg//TXchNnZ9xcGKSTFphDQMWVIy1cRWUffWC5ewUhjE9H7+FMZvCmvih8uc uvAsZ92GXE64fuzF0tU/5ybWEgca6HPbgI8aOhnk+vo9Yzxj9/0eO0SKK8qqSvzo ecn/p9yX4/jD86kIo6K279z7ZX8/0tSLselnicrGy1r4RGuaebEAvXDEzZm8p/c6 0MjaTGC4TzkZkGyEeWXRt7jewiWvXO+91TqqwMyrhmIG3cs2TCbPhSn0QowXUZsF PdCJA5Z+vKp6j8n8fohRTFoATSRw5xAoqT+JRmPZ2K3QOCwtf1X0MbM6ZKkapgZO Y4Tp3HPw9yHUu8cyvEEwqU0jDn4J0EaqFgwCrxzvQj9ufkHBlPgNahjXW5upcw4k TV3qEp6KKfywTWWExh6Gjie7y7Hq3aHOkJVCg/ZeQjwMXhpZg7z+mGwh7x08Jn/2 9/bpLG8Gl8eto3G6L1px/NUMc4poZTbSheKrjEMt3Z6ErNoAR4gb7SO547Lvf8HK bty5NZftDUNv42bqqXI0GY7YXKkr1AtHdRDlTeLlc5YmPzhIyG3LgEi4BqN3gyFf emh/CFG/1KT8GWxNCrPW6d01TBRswZjFyBDHL89HO3i0r2nDe98+2fLmllnl2Bv2 EadgGE1XWv6RB5APJ726HXqMgtXM9cHRMogKMhiHNZnwkQba+ug= =nnMl -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Avoid a crash on a heterogeneous machine where not all cores support the same hw events features - Avoid a deadlock when throttling events - Document the perf event states more - Make sure a number of perf paths switching off or rescheduling events call perf_cgroup_event_disable() - Make sure perf does task sampling before its userspace mapping is torn down, and not after * tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix crash in icl_update_topdown_event() perf: Fix the throttle error of some clock events perf: Add comment to enum perf_event_state perf/core: Fix WARN in perf_cgroup_switch() perf: Fix dangling cgroup pointer in cpuctx perf: Fix cgroup state vs ERROR perf: Fix sample vs do_exit() |
|
|
|
e669e322c5 |
ARM:
- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
missing synchronisation
- A small fix for the irqbypass hook fixes, tightening the check and
ensuring that we only deal with MSI for both the old and the new
route entry
- Rework the way the shadow LRs are addressed in a nesting
configuration, plugging an embarrassing bug as well as simplifying
the whole process
- Add yet another fix for the dreaded arch_timer_edge_cases selftest
RISC-V:
- Fix the size parameter check in SBI SFENCE calls
- Don't treat SBI HFENCE calls as NOPs
x86 TDX:
- Complete API for handling complex TDVMCALLs in userspace. This was
delayed because the spec lacked a way for userspace to deny supporting
these calls; the new exit code is now approved.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhXq3YUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM6JAf/YG2NrLK22eZuNILBu6lwf4+yrUfU
vKcQLHjyX477xRIDZOiAqFOw6JVfqUj7gHbPM+wiVIAg3JT931TkeaZQFuOnMg+k
HalkddeSa9oupd9nhmogcRfGovOXmBQFOleJ+h3GcSeMT/w+6JtgXmgC6/l29f3X
xswi9Lb6yRbcnd2E8XOnJ+ZBqA3ABguuM+0vcmN2xH6EIBaRzjuDPPHV3NGf2aBK
BTAylYzU5P4tZYEHiH/9WlCM19gaeIb2A1LSPDNASpUufvw4VIpmU9MwLDa5LtNB
kVIQxotgFHEoO9qnTX8sGsBrNuStHOuzZA17Z8O6OzTYISK5a7CdworR1w==
=upFj
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
missing synchronisation
- A small fix for the irqbypass hook fixes, tightening the check and
ensuring that we only deal with MSI for both the old and the new
route entry
- Rework the way the shadow LRs are addressed in a nesting
configuration, plugging an embarrassing bug as well as simplifying
the whole process
- Add yet another fix for the dreaded arch_timer_edge_cases selftest
RISC-V:
- Fix the size parameter check in SBI SFENCE calls
- Don't treat SBI HFENCE calls as NOPs
x86 TDX:
- Complete API for handling complex TDVMCALLs in userspace.
This was delayed because the spec lacked a way for userspace to
deny supporting these calls; the new exit code is now approved"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: TDX: Exit to userspace for GetTdVmCallInfo
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
KVM: arm64: VHE: Centralize ISBs when returning to host
KVM: arm64: Remove cpacr_clear_set()
KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd()
KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()
KVM: arm64: Reorganise CPTR trap manipulation
KVM: arm64: VHE: Synchronize CPTR trap deactivation
KVM: arm64: VHE: Synchronize restore of host debug registers
KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases
KVM: arm64: Explicitly treat routing entry type changes as changes
KVM: arm64: nv: Fix tracking of shadow list registers
RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs
RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls
|
|
|
|
28224ef02b |
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
Allow userspace to advertise TDG.VP.VMCALL subfunctions that the kernel also supports. For each output register of GetTdVmCallInfo's leaf 1, add two fields to KVM_TDX_CAPABILITIES: one for kernel-supported TDVMCALLs (userspace can set those blindly) and one for user-supported TDVMCALLs (userspace can set those if it knows how to handle them). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
|
|
4580dbef5c |
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
|
|
25e8b1dd48 |
KVM: TDX: Exit to userspace for GetTdVmCallInfo
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX, to allow userspace to provide information about the support of TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API. GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>, <ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>, <Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and <Instruction.WRMSR>. They must be supported by VMM to support TDX guests. For GetTdVmCallInfo - When leaf (r12) to enumerate TDVMCALL functionality is set to 0, successful execution indicates all GHCI base TDVMCALLs listed above are supported. Update the KVM TDX document with the set of the GHCI base APIs. - When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it indicates the TDX guest is querying the supported TDVMCALLs beyond the GHCI base TDVMCALLs. Exit to userspace to let userspace set the TDVMCALL sub-function bit(s) accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s) supported by itself when the TDVMCALLs don't need support from userspace after returning from userspace and before entering guest. Currently, no such TDVMCALLs implemented, KVM just sets the values returned from userspace. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
|
|
cf207eac06 |
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
Handle TDVMCALL for GetQuote to generate a TD-Quote. GetQuote is a doorbell-like interface used by TDX guests to request VMM to generate a TD-Quote signed by a service hosting TD-Quoting Enclave operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in a shared-memory area as parameter. Host VMM can access it and queue the operation for a service hosting TD-Quoting enclave. When completed, the Quote is returned via the same shared-memory area. KVM only checks the GPA from the TDX guest has the shared-bit set and drops the shared-bit before exiting to userspace to avoid bleeding the shared-bit into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU) and userspace VMM queues the operation asynchronously. KVM sets the return code according to the 'ret' field set by userspace to notify the TDX guest whether the request has been queued successfully or not. When the request has been queued successfully, the TDX guest can poll the status field in the shared-memory area to check whether the Quote generation is completed or not. When completed, the generated Quote is returned via the same buffer. Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is required to handle the KVM exit reason as the initial support for TDX, by reentering KVM to ensure that the TDVMCALL is complete. While at it, add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
|
|
b5aafcb4ef |
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and return it for unimplemented TDVMCALL subfunctions. Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is not implemented is vague because TDX guests can't tell the error is due to the subfunction is not supported or an invalid input of the subfunction. New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid the ambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND. Before the change, for common guest implementations, when a TDX guest receives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases: 1. Some operand is invalid. It could change the operand to another value retry. 2. The subfunction is not supported. For case 1, an invalid operand usually means the guest implementation bug. Since the TDX guest can't tell which case is, the best practice for handling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf, treating the failure as fatal if the TDVMCALL is essential or ignoring it if the TDVMCALL is optional. With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent to old TDX guest that do not know about it, but it is expected that the guest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND. Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERAND specifically; for example Linux just checks for success. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> [Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
|
|
|
2aebf5ee43 |
x86/alternatives: Fix int3 handling failure from broken text_poke array
Since smp_text_poke_single() does not expect there is another
text_poke request is queued, it can make text_poke_array not
sorted or cause a buffer overflow on the text_poke_array.vec[].
This will cause an Oops in int3 because of bsearch failing;
CPU 0 CPU 1 CPU 2
----- ----- -----
smp_text_poke_batch_add()
smp_text_poke_single() <<-- Adds out of order
<int3>
[Fails o find address
in text_poke_array ]
OOPS!
Or unhandled page fault because of a buffer overflow;
CPU 0 CPU 1
----- -----
smp_text_poke_batch_add() <<+
... |
smp_text_poke_batch_add() <<-- Adds TEXT_POKE_ARRAY_MAX times.
smp_text_poke_single() {
__smp_text_poke_batch_add() <<-- Adds entry at
TEXT_POKE_ARRAY_MAX + 1
smp_text_poke_batch_finish()
[Unhandled page fault because
text_poke_array.nr_entries is
overwritten]
BUG!
}
Use smp_text_poke_batch_add() instead of __smp_text_poke_batch_add()
so that it correctly flush the queue if needed.
Closes: https://lore.kernel.org/all/CA+G9fYsLu0roY3DV=tKyqP7FEKbOEETRvTDhnpPxJGbA=Cg+4w@mail.gmail.com/
Fixes:
|
|
|
|
cb6075bc62 |
x86/mm: Fix early boot use of INVPLGB
The INVLPGB instruction has limits on how many pages it can invalidate
at once. That limit is enumerated in CPUID, read by the kernel, and
stored in 'invpgb_count_max'. Ranged invalidation, like
invlpgb_kernel_range_flush() break up their invalidations so
that they do not exceed the limit.
However, early boot code currently attempts to do ranged
invalidation before populating 'invlpgb_count_max'. There is a
for loop which is basically:
for (...; addr < end; addr += invlpgb_count_max*PAGE_SIZE)
If invlpgb_kernel_range_flush is called before the kernel has read
the value of invlpgb_count_max from the hardware, the normally
bounded loop can become an infinite loop if invlpgb_count_max is
initialized to zero.
Fix that issue by initializing invlpgb_count_max to 1.
This way INVPLGB at early boot time will be a little bit slower
than normal (with initialized invplgb_count_max), and not an
instant hang at bootup time.
Fixes:
|
|
|
|
3c902383f2 |
x86/its: Fix an ifdef typo in its_alloc()
Commit |
|
|
|
94a17f2dc9 |
x86/mm: Disable INVLPGB when PTI is enabled
PTI uses separate ASIDs (aka. PCIDs) for kernel and user address
spaces. When the kernel needs to flush the user address space, it
just sets a bit in a bitmap and then flushes the entire PCID on
the next switch to userspace.
This bitmap is a single 'unsigned long' which is plenty for all 6
dynamic ASIDs. But, unfortunately, the INVLPGB support brings along a
bunch more user ASIDs, as many as ~2k more. The bitmap can't address
that many.
Fortunately, the bitmap is only needed for PTI and all the CPUs
with INVLPGB are AMD CPUs that aren't vulnerable to Meltdown and
don't need PTI. The only way someone can run into an issue in
practice is by booting with pti=on on a newer AMD CPU.
Disable INVLPGB if PTI is enabled. Avoid overrunning the small
bitmap.
Note: this will be fixed up properly by making the bitmap bigger.
For now, just avoid the mostly theoretical bug.
Fixes:
|
|
|
|
8e786a85c0 |
x86/process: Move the buffer clearing before MONITOR
Move the VERW clearing before the MONITOR so that VERW doesn't disarm it and the machine never enters C1. Original idea by Kim Phillips <kim.phillips@amd.com>. Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> |
|
|
|
2329f250e0 |
x86/microcode/AMD: Add TSA microcode SHAs
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> |
|
|
|
31272abd59 |
KVM: SVM: Advertise TSA CPUID bits to guests
Synthesize the TSA CPUID feature bits for guests. Set TSA_{SQ,L1}_NO on
unaffected machines.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
|
|
d8010d4ba4 |
x86/bugs: Add a Transient Scheduler Attacks mitigation
Add the required features detection glue to bugs.c et all in order to support the TSA mitigation. Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
|
|
|
594902c986 |
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Fixes:
|
|
|
|
9afe652958 |
* Further fixups for ITS mitigation
* Avoid using large pages for kernel mappings when PSE is not enumerated
* Avoid ever making indirect calls to TDX assembly helpers
* Fix a FRED single step issue when not using an external debugger
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmhQWhsACgkQaDWVMHDJ
krBOLg/8CQ4VVzSqaE5llP/nYkehQblmWM03UFHaOr8gzzPF4Pi+Aov9SpPu/X5E
k0va6s4SuV2XVuWAyKkzJlPxjKBzp8gmOI0XgerEqKMTEihElmSkb89d5O5EhqqM
OYNHe5dIhfDbESrbUep2HFvSWR9Q5Df1Gt8gMDhBzjxT7PlJF/U/sle2/G33Ydhj
9IJvAyh349sJTF9+N8nVUB9YYNE2L6ozf/o14lDF+SoBytLJWugYUVgAukwrQeII
kjcfLYuUGPLeWZcczFA/mqKYWQ+MJ+2Qx8Rh2P00HQhIAdGGKyoIla2b4Lw9MfAk
xYPuWWjvOyI0IiumdKfMfnKRgHpqC8IuCZSPlooNiB8Mk2Nd+0L3C0z96Ov08cwg
nKgKQVS1E0FGq35S2VzS01pitasgxFsBBQft9fTIfL+EBtYNfm+TK3bAr6Ep9b4M
RqcnE997AkFx2D4AWnBLY3lKMi2XcP6b0GPGSXSJAHQlzPZNquYXPNEzI8jOQ4IH
E0F8f5P26XDvFCX5P7EfV9TjACSPYRxZtHtwN9OQWjFngbK8cMmP94XmORSFyFec
AFBZ5ZcgLVfQmNzjKljTUvfZvpQCNIiC8oADAVlcfUsm2B45cnYwpvsuz8vswmdQ
uDrDa87Mh20d8JRMLIMZ2rh7HrXaB2skdxQ9niS/uv0L8jKNq9o=
=BEsb
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Dave Hansen:
"This is a pretty scattered set of fixes. The majority of them are
further fixups around the recent ITS mitigations.
The rest don't really have a coherent story:
- Some flavors of Xen PV guests don't support large pages, but the
set_memory.c code assumes all CPUs support them.
Avoid problems with a quick CPU feature check.
- The TDX code has some wrappers to help retry calls to the TDX
module. They use function pointers to assembly functions and the
compiler usually generates direct CALLs. But some new compilers,
plus -Os turned them in to indirect CALLs and the assembly code was
not annotated for indirect calls.
Force inlining of the helper to fix it up.
- Last, a FRED issue showed up when single-stepping. It's fine when
using an external debugger, but was getting stuck returning from a
SIGTRAP handler otherwise.
Clear the FRED 'swevent' bit to ensure that forward progress is
made"
* tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "mm/execmem: Unify early execmem_cache behaviour"
x86/its: explicitly manage permissions for ITS pages
x86/its: move its_pages array to struct mod_arch_specific
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/mm/pat: don't collapse pages without PSE set
x86/virt/tdx: Avoid indirect calls to TDX assembly functions
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
|
|
|
|
f9af88a3d3 |
x86/bugs: Rename MDS machinery to something more generic
It will be used by other x86 mitigations. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> |
|
|
|
f688b599d7 |
Power management updates for 6.16-rc2
- Implement CpuId Rust abstraction and use it to fix doctest failure
related to the recently introduced cpumask abstraction (Viresh Kumar).
- Do minor cleanups in the `# Safety` sections for cpufreq abstractions
added recently (Viresh Kumar).
- Unbreak cpupower systemd service units installation on some systems
by adding a unitdir variable for specifying the location to install
them (Francesco Poli).
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the 6.16 merge window due to a problem with
handling "dead" SMT siblings, but this time prevent leaving them in
C1 after initialization by taking them online and back offline when
a proper cpuidle driver for the platform has been registered (Rafael
Wysocki).
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition
after recent changes (Uros Bizjak).
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmhMgawSHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1dA0H/j7V4sfA383pZfWwegRC4MQeiW4EVdIx
2G6d33Gsfv+zEzTSsPvtKghR4eUIldTdco04bqusMcI+qIfgdWIEBpi2tQhJU2Tt
Bgc24Kya0n85KKNLHs60xm0WXhkAyu3TFad+4yTGXRZEmAD+O6lyUPjum+mn+gbx
HuLE6KE9D/qzzYOU03kjCsJExBf7vv0bBNIqGqNyuFLYOaoqZd5rLhNhhxm2AkYi
hZ5wmYBY+2SJRzwryNNHQKKmZ1jk9HGapnIVrxQ2Pjc0AhX+tRW7FI5lDDvUWtW+
RN/226Y3OQ3JHXAj4S0K64t0ZEpiEKS2oPatjGzosNmdyI0f+CRACQs=
=OcND
-----END PGP SIGNATURE-----
Merge tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the cpupower utility installation, fix up the recently added
Rust abstractions for cpufreq and OPP, restore the x86 update
eliminating mwait_play_dead_cpuid_hint() that has been reverted during
the 6.16 merge window along with preventing the failure caused by it
from happening, and clean up mwait_idle_with_hints() usage in
intel_idle:
- Implement CpuId Rust abstraction and use it to fix doctest failure
related to the recently introduced cpumask abstraction (Viresh
Kumar)
- Do minor cleanups in the `# Safety` sections for cpufreq
abstractions added recently (Viresh Kumar)
- Unbreak cpupower systemd service units installation on some systems
by adding a unitdir variable for specifying the location to install
them (Francesco Poli)
- Eliminate mwait_play_dead_cpuid_hint() again after reverting its
elimination during the 6.16 merge window due to a problem with
handling "dead" SMT siblings, but this time prevent leaving them in
C1 after initialization by taking them online and back offline when
a proper cpuidle driver for the platform has been registered
(Rafael Wysocki)
- Update data types of variables passed as arguments to
mwait_idle_with_hints() to match the function definition after
recent changes (Uros Bizjak)"
* tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
intel_idle: Update arguments of mwait_idle_with_hints()
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
cpupower: split unitdir from libdir in Makefile
Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"
ACPI: processor: Rescan "dead" SMT siblings during initialization
intel_idle: Rescan "dead" SMT siblings during initialization
x86/smp: PM/hibernate: Split arch_resume_nosmt()
intel_idle: Use subsys_initcall_sync() for initialization
|
|
|
|
dd3581853c |
Merge branch 'pm-cpuidle'
Merge cpuidle updates for 6.16-rc2: - Update data types of variables passed as arguments to mwait_idle_with_hints() to match the function definition after recent changes (Uros Bizjak). - Eliminate mwait_play_dead_cpuid_hint() again after reverting its elimination during the merge window due to a problem with handling "dead" SMT siblings, but this time prevent leaving them in C1 after initialization by taking them online and back offline when a proper cpuidle driver for the platform has been registered (Rafael Wysocki). * pm-cpuidle: intel_idle: Update arguments of mwait_idle_with_hints() Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" ACPI: processor: Rescan "dead" SMT siblings during initialization intel_idle: Rescan "dead" SMT siblings during initialization x86/smp: PM/hibernate: Split arch_resume_nosmt() intel_idle: Use subsys_initcall_sync() for initialization |
|
|
|
dde6379705 |
ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass
only the "untouched" addresses and flipping the shared/private bit
in the implementation.
- Disable SEV-SNP support on initialization failure
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhMVEQUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNCYwgAoqYz+RTcIx6EwHD/kN/9maUcQVl1
MMmXqfF2jQYmTvk7ocEW1qwx/SV0kB+H9LOThV8SWTvVxDbNqAaUWRDz+wcz3zaO
6/sUwz4dtU4XaTgxYhhB82lsPtJHyM+FM+bNL4rFFnrA1tZ93wRsMEeryZ5h960V
C1Bc+PLBdpj+S3gQGvxeMxnG/n0oOAcecUqQa3ViIOKSfZEc/11+BjjvfvkYqExq
s206faKSqor8xVXUbgtOk3LZreIExj/mD+pwMiUBvG0H0g4wnaG7Arc41QCFMowF
4l4sQVMWFZiKQvQZSfdQOeNsXcepWw0qISK7UeoWpLnpM78uUfCS6iG1rA==
=Hc3G
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
pass only the "untouched" addresses and flipping the shared/private
bit in the implementation.
- Disable SEV-SNP support on initialization failure
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
KVM: arm64: Add RMW specific sysreg accessor
KVM: arm64: Add assignment-specific sysreg accessor
|
|
|
|
b0823d5fba |
perf/x86/intel: Fix crash in icl_update_topdown_event()
The perf_fuzzer found a hard-lockup crash on a RaptorLake machine:
Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
CPU: 23 UID: 0 PID: 0 Comm: swapper/23
Tainted: [W]=WARN
Hardware name: Dell Inc. Precision 9660/0VJ762
RIP: 0010:native_read_pmc+0x7/0x40
Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
RSP: 000:fffb03100273de8 EFLAGS: 00010046
....
Call Trace:
<TASK>
icl_update_topdown_event+0x165/0x190
? ktime_get+0x38/0xd0
intel_pmu_read_event+0xf9/0x210
__perf_event_read+0xf9/0x210
CPUs 16-23 are E-core CPUs that don't support the perf metrics feature.
The icl_update_topdown_event() should not be invoked on these CPUs.
It's a regression of commit:
|
|
|
|
8046d29dde |
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
Only let userspace pass the same addresses that were used in KVM_SET_USER_MEMORY_REGION (or KVM_SET_USER_MEMORY_REGION2); gpas in the the upper half of the address space are an implementation detail of TDX and KVM. Extracted from a patch by Sean Christopherson <seanjc@google.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |