mirror of https://github.com/torvalds/linux.git
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
switch a memory area between guest_memfd and regular anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new guest_memfd
and page attributes infrastructure. This is mostly useful for testing,
since there is no pKVM-like infrastructure to provide a meaningfully
reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually care
about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
because some of them don't expect the "TSC stable" bit (added to the pvclock
ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- On AMD machines with vNMI, always rely on hardware instead of intercepting
IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing flag
in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix the
various bugs that were lurking due to lack of said annotation.
There are two non-KVM patches buried in the middle of guest_memfd support:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
=66Ws
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be
resized. guest_memfd files do however support PUNCH_HOLE, which can
be used to switch a memory area between guest_memfd and regular
anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that
guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new
guest_memfd and page attributes infrastructure. This is mostly
useful for testing, since there is no pKVM-like infrastructure to
provide a meaningfully reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages
during CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in
non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually
care about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a
stable TSC", because some of them don't expect the "TSC stable" bit
(added to the pvclock ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for
TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM
always flushes on nested transitions, i.e. always satisfies flush
requests. This allows running bleeding edge versions of VMware
Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV
support.
- On AMD machines with vNMI, always rely on hardware instead of
intercepting IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters
and other state prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events
using a dedicated field instead of snapshotting the "previous"
counter. If the hardware PMC count triggers overflow that is
recognized in the same VM-Exit that KVM manually bumps an event
count, KVM would pend PMIs for both the hardware-triggered overflow
and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be
problematic for subsystems that require no regressions for W=1
builds.
- Advertise all of the host-supported CPUID bits that enumerate
IA32_SPEC_CTRL "features".
- Don't force a masterclock update when a vCPU synchronizes to the
current TSC generation, as updating the masterclock can cause
kvmclock's time to "jump" unexpectedly, e.g. when userspace
hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter
fault paths, partly as a super minor optimization, but mostly to
make KVM play nice with position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the
code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
"emulation" at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with a prefix
branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV support to
that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list
selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing
flag in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix
the various bugs that were lurking due to lack of said annotation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
x86/kvm: Do not try to disable kvmclock if it was not enabled
KVM: x86: add missing "depends on KVM"
KVM: fix direction of dependency on MMU notifiers
KVM: introduce CONFIG_KVM_COMMON
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
RISC-V: KVM: selftests: Add get-reg-list test for STA registers
RISC-V: KVM: selftests: Add steal_time test support
RISC-V: KVM: selftests: Add guest_sbi_probe_extension
RISC-V: KVM: selftests: Move sbi_ecall to processor.c
RISC-V: KVM: Implement SBI STA extension
RISC-V: KVM: Add support for SBI STA registers
RISC-V: KVM: Add support for SBI extension registers
RISC-V: KVM: Add SBI STA info to vcpu_arch
RISC-V: KVM: Add steal-update vcpu request
RISC-V: KVM: Add SBI STA extension skeleton
RISC-V: paravirt: Implement steal-time support
RISC-V: Add SBI STA extension definitions
RISC-V: paravirt: Add skeleton for pv-time support
RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
...
|
||
|---|---|---|
| .. | ||
| dec | ||
| fw | ||
| ip32 | ||
| mach-ath25 | ||
| mach-ath79 | ||
| mach-au1x00 | ||
| mach-bcm47xx | ||
| mach-bcm63xx | ||
| mach-bmips | ||
| mach-cavium-octeon | ||
| mach-cobalt | ||
| mach-db1x00 | ||
| mach-dec | ||
| mach-generic | ||
| mach-ingenic | ||
| mach-ip22 | ||
| mach-ip27 | ||
| mach-ip28 | ||
| mach-ip30 | ||
| mach-ip32 | ||
| mach-jazz | ||
| mach-lantiq | ||
| mach-loongson2ef | ||
| mach-loongson32 | ||
| mach-loongson64 | ||
| mach-malta | ||
| mach-n64 | ||
| mach-pic32 | ||
| mach-ralink | ||
| mach-rc32434 | ||
| mach-rm | ||
| mach-sibyte | ||
| mach-tx49xx | ||
| mips-boards | ||
| octeon | ||
| pci | ||
| sgi | ||
| sibyte | ||
| sn | ||
| txx9 | ||
| vdso | ||
| xtalk | ||
| Kbuild | ||
| abi.h | ||
| addrspace.h | ||
| amon.h | ||
| arch_hweight.h | ||
| asm-eva.h | ||
| asm-offsets.h | ||
| asm-prototypes.h | ||
| asm.h | ||
| asmmacro-32.h | ||
| asmmacro-64.h | ||
| asmmacro.h | ||
| atomic.h | ||
| barrier.h | ||
| bcache.h | ||
| bitops.h | ||
| bitrev.h | ||
| bmips-spaces.h | ||
| bmips.h | ||
| bootinfo.h | ||
| branch.h | ||
| break.h | ||
| bug.h | ||
| bugs.h | ||
| cache.h | ||
| cacheflush.h | ||
| cacheops.h | ||
| cdmm.h | ||
| cevt-r4k.h | ||
| checksum.h | ||
| clocksource.h | ||
| cmp.h | ||
| cmpxchg.h | ||
| compat-signal.h | ||
| compat.h | ||
| compiler.h | ||
| cop2.h | ||
| cpu-features.h | ||
| cpu-info.h | ||
| cpu-type.h | ||
| cpu.h | ||
| cpufeature.h | ||
| debug.h | ||
| delay.h | ||
| div64.h | ||
| dma-direct.h | ||
| dma-mapping.h | ||
| dma.h | ||
| dmi.h | ||
| ds1287.h | ||
| dsemul.h | ||
| dsp.h | ||
| edac.h | ||
| elf.h | ||
| elfcore-compat.h | ||
| errno.h | ||
| eva.h | ||
| exec.h | ||
| extable.h | ||
| fb.h | ||
| fixmap.h | ||
| floppy.h | ||
| fpregdef.h | ||
| fpu.h | ||
| fpu_emulator.h | ||
| ftrace.h | ||
| futex.h | ||
| ginvt.h | ||
| gio_device.h | ||
| gt64120.h | ||
| hardirq.h | ||
| hazards.h | ||
| highmem.h | ||
| hpet.h | ||
| hugetlb.h | ||
| hw_irq.h | ||
| i8259.h | ||
| idle.h | ||
| inst.h | ||
| io.h | ||
| irq.h | ||
| irq_cpu.h | ||
| irq_gt641xx.h | ||
| irq_regs.h | ||
| irqflags.h | ||
| isa-rev.h | ||
| isadep.h | ||
| jazz.h | ||
| jazzdma.h | ||
| jump_label.h | ||
| kdebug.h | ||
| kexec.h | ||
| kgdb.h | ||
| kprobes.h | ||
| kvm_host.h | ||
| kvm_types.h | ||
| linkage.h | ||
| local.h | ||
| maar.h | ||
| machine.h | ||
| mc146818-time.h | ||
| mc146818rtc.h | ||
| mips-cm.h | ||
| mips-cpc.h | ||
| mips-cps.h | ||
| mips-gic.h | ||
| mips-r2-to-r6-emul.h | ||
| mips_mt.h | ||
| mipsmtregs.h | ||
| mipsprom.h | ||
| mipsregs.h | ||
| mmiowb.h | ||
| mmu.h | ||
| mmu_context.h | ||
| mmzone.h | ||
| module.h | ||
| msa.h | ||
| msc01_ic.h | ||
| paccess.h | ||
| page.h | ||
| pci.h | ||
| perf_event.h | ||
| pgalloc.h | ||
| pgtable-32.h | ||
| pgtable-64.h | ||
| pgtable-bits.h | ||
| pgtable.h | ||
| pm-cps.h | ||
| pm.h | ||
| prefetch.h | ||
| processor.h | ||
| prom.h | ||
| ptrace.h | ||
| r4k-timer.h | ||
| r4kcache.h | ||
| reboot.h | ||
| reg.h | ||
| regdef.h | ||
| rtlx.h | ||
| seccomp.h | ||
| setup.h | ||
| sgialib.h | ||
| sgiarcs.h | ||
| shmparam.h | ||
| sigcontext.h | ||
| signal.h | ||
| sim.h | ||
| smp-cps.h | ||
| smp-ops.h | ||
| smp.h | ||
| sni.h | ||
| socket.h | ||
| sparsemem.h | ||
| spinlock.h | ||
| spinlock_types.h | ||
| spram.h | ||
| stackframe.h | ||
| stackprotector.h | ||
| stacktrace.h | ||
| string.h | ||
| switch_to.h | ||
| sync.h | ||
| syscall.h | ||
| syscalls.h | ||
| thread_info.h | ||
| time.h | ||
| timex.h | ||
| tlb.h | ||
| tlbdebug.h | ||
| tlbex.h | ||
| tlbflush.h | ||
| tlbmisc.h | ||
| topology.h | ||
| traps.h | ||
| txx9irq.h | ||
| txx9pio.h | ||
| txx9tmr.h | ||
| types.h | ||
| uaccess.h | ||
| uasm.h | ||
| unaligned-emul.h | ||
| unistd.h | ||
| unroll.h | ||
| uprobes.h | ||
| vdso.h | ||
| vermagic.h | ||
| vga.h | ||
| vmalloc.h | ||
| vpe.h | ||
| watch.h | ||
| wbflush.h | ||
| yamon-dt.h | ||