Commit Graph

19869 Commits

Author SHA1 Message Date
Linus Torvalds 3efc57369a x86:
* KVM currently invalidates the entirety of the page tables, not just
   those for the memslot being touched, when a memslot is moved or deleted.
   The former does not have particularly noticeable overhead, but Intel's
   TDX will require the guest to re-accept private pages if they are
   dropped from the secure EPT, which is a non starter.  Actually,
   the only reason why this is not already being done is a bug which
   was never fully investigated and caused VM instability with assigned
   GeForce GPUs, so allow userspace to opt into the new behavior.
 
 * Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
   functionality that is on the horizon).
 
 * Rework common MSR handling code to suppress errors on userspace accesses to
   unsupported-but-advertised MSRs.  This will allow removing (almost?) all of
   KVM's exemptions for userspace access to MSRs that shouldn't exist based on
   the vCPU model (the actual cleanup is non-trivial future work).
 
 * Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
   64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
   stores the entire 64-bit value at the ICR offset.
 
 * Fix a bug where KVM would fail to exit to userspace if one was triggered by
   a fastpath exit handler.
 
 * Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
   there's already a pending wake event at the time of the exit.
 
 * Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
   state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
   (the SHUTDOWN hits the VM altogether, not the nested guest)
 
 * Overhaul the "unprotect and retry" logic to more precisely identify cases
   where retrying is actually helpful, and to harden all retry paths against
   putting the guest into an infinite retry loop.
 
 * Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
   the shadow MMU.
 
 * Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
   adding multi generation LRU support in KVM.
 
 * Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
   i.e. when the CPU has already flushed the RSB.
 
 * Trace the per-CPU host save area as a VMCB pointer to improve readability
   and cleanup the retrieval of the SEV-ES host save area.
 
 * Remove unnecessary accounting of temporary nested VMCB related allocations.
 
 * Set FINAL/PAGE in the page fault error code for EPT violations if and only
   if the GVA is valid.  If the GVA is NOT valid, there is no guest-side page
   table walk and so stuffing paging related metadata is nonsensical.
 
 * Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
   emulating posted interrupt delivery to L2.
 
 * Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
 
 * Harden eVMCS loading against an impossible NULL pointer deref (really truly
   should be impossible).
 
 * Minor SGX fix and a cleanup.
 
 * Misc cleanups
 
 Generic:
 
 * Register KVM's cpuhp and syscore callbacks when enabling virtualization in
   hardware, as the sole purpose of said callbacks is to disable and re-enable
   virtualization as needed.
 
 * Enable virtualization when KVM is loaded, not right before the first VM
   is created.  Together with the previous change, this simplifies a
   lot the logic of the callbacks, because their very existence implies
   virtualization is enabled.
 
 * Fix a bug that results in KVM prematurely exiting to userspace for coalesced
   MMIO/PIO in many cases, clean up the related code, and add a testcase.
 
 * Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
   the gpa+len crosses a page boundary, which thankfully is guaranteed to not
   happen in the current code base.  Add WARNs in more helpers that read/write
   guest memory to detect similar bugs.
 
 Selftests:
 
 * Fix a goof that caused some Hyper-V tests to be skipped when run on bare
   metal, i.e. NOT in a VM.
 
 * Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
 
 * Explicitly include one-off assets in .gitignore.  Past Sean was completely
   wrong about not being able to detect missing .gitignore entries.
 
 * Verify userspace single-stepping works when KVM happens to handle a VM-Exit
   in its fastpath.
 
 * Misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
 cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
 7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
 VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
 B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
 i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
 =Q7kz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
 "x86:

   - KVM currently invalidates the entirety of the page tables, not just
     those for the memslot being touched, when a memslot is moved or
     deleted.

     This does not traditionally have particularly noticeable overhead,
     but Intel's TDX will require the guest to re-accept private pages
     if they are dropped from the secure EPT, which is a non starter.

     Actually, the only reason why this is not already being done is a
     bug which was never fully investigated and caused VM instability
     with assigned GeForce GPUs, so allow userspace to opt into the new
     behavior.

   - Advertise AVX10.1 to userspace (effectively prep work for the
     "real" AVX10 functionality that is on the horizon)

   - Rework common MSR handling code to suppress errors on userspace
     accesses to unsupported-but-advertised MSRs

     This will allow removing (almost?) all of KVM's exemptions for
     userspace access to MSRs that shouldn't exist based on the vCPU
     model (the actual cleanup is non-trivial future work)

   - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
     splits the 64-bit value into the legacy ICR and ICR2 storage,
     whereas Intel (APICv) stores the entire 64-bit value at the ICR
     offset

   - Fix a bug where KVM would fail to exit to userspace if one was
     triggered by a fastpath exit handler

   - Add fastpath handling of HLT VM-Exit to expedite re-entering the
     guest when there's already a pending wake event at the time of the
     exit

   - Fix a WARN caused by RSM entering a nested guest from SMM with
     invalid guest state, by forcing the vCPU out of guest mode prior to
     signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
     nested guest)

   - Overhaul the "unprotect and retry" logic to more precisely identify
     cases where retrying is actually helpful, and to harden all retry
     paths against putting the guest into an infinite retry loop

   - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
     rmaps in the shadow MMU

   - Refactor pieces of the shadow MMU related to aging SPTEs in
     prepartion for adding multi generation LRU support in KVM

   - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
     enabled, i.e. when the CPU has already flushed the RSB

   - Trace the per-CPU host save area as a VMCB pointer to improve
     readability and cleanup the retrieval of the SEV-ES host save area

   - Remove unnecessary accounting of temporary nested VMCB related
     allocations

   - Set FINAL/PAGE in the page fault error code for EPT violations if
     and only if the GVA is valid. If the GVA is NOT valid, there is no
     guest-side page table walk and so stuffing paging related metadata
     is nonsensical

   - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
     instead of emulating posted interrupt delivery to L2

   - Add a lockdep assertion to detect unsafe accesses of vmcs12
     structures

   - Harden eVMCS loading against an impossible NULL pointer deref
     (really truly should be impossible)

   - Minor SGX fix and a cleanup

   - Misc cleanups

  Generic:

   - Register KVM's cpuhp and syscore callbacks when enabling
     virtualization in hardware, as the sole purpose of said callbacks
     is to disable and re-enable virtualization as needed

   - Enable virtualization when KVM is loaded, not right before the
     first VM is created

     Together with the previous change, this simplifies a lot the logic
     of the callbacks, because their very existence implies
     virtualization is enabled

   - Fix a bug that results in KVM prematurely exiting to userspace for
     coalesced MMIO/PIO in many cases, clean up the related code, and
     add a testcase

   - Fix a bug in kvm_clear_guest() where it would trigger a buffer
     overflow _if_ the gpa+len crosses a page boundary, which thankfully
     is guaranteed to not happen in the current code base. Add WARNs in
     more helpers that read/write guest memory to detect similar bugs

  Selftests:

   - Fix a goof that caused some Hyper-V tests to be skipped when run on
     bare metal, i.e. NOT in a VM

   - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
     guest

   - Explicitly include one-off assets in .gitignore. Past Sean was
     completely wrong about not being able to detect missing .gitignore
     entries

   - Verify userspace single-stepping works when KVM happens to handle a
     VM-Exit in its fastpath

   - Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  Documentation: KVM: fix warning in "make htmldocs"
  s390: Enable KVM_S390_UCONTROL config in debug_defconfig
  selftests: kvm: s390: Add VM run test case
  KVM: SVM: let alternatives handle the cases when RSB filling is required
  KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
  KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
  KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
  KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
  KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
  KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
  KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
  KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
  KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
  KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
  KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
  KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
  KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
  KVM: x86: Update retry protection fields when forcing retry on emulation failure
  KVM: x86: Apply retry protection to "unprotect on failure" path
  KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
  ...
2024-09-28 09:20:14 -07:00
Linus Torvalds 653608c67a xen: branch for v6.12-rc1a
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZvZ8dgAKCRCAXGG7T9hj
 vhirAQCR1LAU+czZlqmx6jmKRPTGff1ss66vh04XbtgTjH+8PQEA8O5KvD/KnnxY
 AnrOvrx6fTLwR6iTN7ANVvPO3kGK/w0=
 =0Tol
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:
 "A second round of Xen related changes and features:

   - a small fix of the xen-pciback driver for a warning issued by
     sparse

   - support PCI passthrough when using a PVH dom0

   - enable loading the kernel in PVH mode at arbitrary addresses,
     avoiding conflicts with the memory map when running as a Xen dom0
     using the host memory layout"

* tag 'for-linus-6.12-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/pvh: Add 64bit relocation page tables
  x86/kernel: Move page table macros to header
  x86/pvh: Set phys_base when calling xen_prepare_pvh()
  x86/pvh: Make PVH entrypoint PIC for x86-64
  xen: sync elfnote.h from xen tree
  xen/pciback: fix cast to restricted pci_ers_result_t and pci_power_t
  xen/privcmd: Add new syscall to get gsi from dev
  xen/pvh: Setup gsi for passthrough device
  xen/pci: Add a function to reset device for xen
2024-09-27 09:55:30 -07:00
Al Viro cb787f4ac0 [tree-wide] finally take no_llseek out
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\<no_llseek\>/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27 08:18:43 -07:00
Linus Torvalds 348325d644 asm-generic updates for 6.12
These are only two small patches, one cleanup for arch/alpha
 and a preparation patch cleaning up the handling of runtime
 constants in the linker scripts.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmboHV0ACgkQYKtH/8kJ
 UifHfhAAqTHHxxe+HiphGBPHN0ODyLVUs7fOQHtLOSmJlQa6x1TCR/+1nL1kTDbe
 j6EcIRxZrllQZ+jZBA8z2XsAmjjBLUxCB4yu6oxYJh8OdFyqeVM/myZEr2TAyb0o
 A3D9b+rfnY8sr9XaFHSHGWbh4c33cGQhACumHVAjtPvU06Voskq4pAf9ZnpGkNBe
 AdKNTVG6+w84dKUNuzXcexP8d7SnsXNfd6T9+evtW/M+fziWzs3aPQr+GZED96E5
 8IRldXi2nzIwm9LT5IzZAt+QvpVb2Qob1+rej9p5WpptGp840CROTo61SwaYHCMV
 DDxTlmADsApWJQ3B5gDu6QS2jXT4eeOrY3JI2baeCyOV6auj15UXKiWc2QVoHOVU
 6+PzlSFuLatI6WsxXfOcD0o3bfQXMKS6zCC/4eD7Y/SmmMqBbL5+d9sU5lwkiOFl
 swoswF4HTwo5d6NdkSuJOt6KA/V8a68lBhKYBXHu2yuLi/LDNOaipEvBHQLzfnlY
 91e5DtDiHK9CYDNkwiR+bV9rQnhA535JSlfR8VtpU/SJTTjyF+dkt9JGPdivXoIA
 8Zv+DN/oyrahUtCrgzzPXahOuBrfD/WfIajsvpEK6vNPuBhscsZFg/thc70FMIXo
 qn8Dmpi/CnDWFNOy0xO0cbYWrGBGn9E7kzbSZ78tUIjPUmmEKfk=
 =OOMl
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "These are only two small patches, one cleanup for arch/alpha and a
  preparation patch cleaning up the handling of runtime constants in the
  linker scripts"

* tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  runtime constants: move list of constants to vmlinux.lds.h
  alpha: no need to include asm/xchg.h twice
2024-09-26 11:54:40 -07:00
Jason Andryuk e3e8cd90f8 x86/kernel: Move page table macros to header
The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to pgtable_64.h, so they can be re-used.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-ID: <20240823193630.2583107-5-jason.andryuk@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25 16:06:03 +02:00
Linus Torvalds f8ffbc365f struct fd layout change (and conversion to accessor helpers)
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ
 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d
 592+iDgLsema/H/0/CqfqlaNtDNY8Q0=
 =HUl5
 -----END PGP SIGNATURE-----

Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' updates from Al Viro:
 "Just the 'struct fd' layout change, with conversion to accessor
  helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add struct fd constructors, get rid of __to_fd()
  struct fd: representation change
  introduce fd_file(), convert all accessors to it.
2024-09-23 09:35:36 -07:00
Linus Torvalds 617a814f14 ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
 
 "Align kvrealloc() with krealloc()" from Danilo Krummrich.  Adds
 consistency to the APIs and behaviour of these two core allocation
 functions.  This also simplifies/enables Rustification.
 
 "Some cleanups for shmem" from Baolin Wang.  No functional changes - mode
 code reuse, better function naming, logic simplifications.
 
 "mm: some small page fault cleanups" from Josef Bacik.  No functional
 changes - code cleanups only.
 
 "Various memory tiering fixes" from Zi Yan.  A small fix and a little
 cleanup.
 
 "mm/swap: remove boilerplate" from Yu Zhao.  Code cleanups and
 simplifications and .text shrinkage.
 
 "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt.  This
 is a feature, it adds new feilds to /proc/vmstat such as
 
     $ grep kstack /proc/vmstat
     kstack_1k 3
     kstack_2k 188
     kstack_4k 11391
     kstack_8k 243
     kstack_16k 0
 
 which tells us that 11391 processes used 4k of stack while none at all
 used 16k.  Useful for some system tuning things, but partivularly useful
 for "the dynamic kernel stack project".
 
 "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
 Teaches kmemleak to detect leaksage of percpu memory.
 
 "mm: memcg: page counters optimizations" from Roman Gushchin.  "3
 independent small optimizations of page counters".
 
 "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
 Hildenbrand.  Improves PTE/PMD splitlock detection, makes powerpc/8xx work
 correctly by design rather than by accident.
 
 "mm: remove arch_make_page_accessible()" from David Hildenbrand.  Some
 folio conversions which make arch_make_page_accessible() unneeded.
 
 "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
 Cleans up and fixes our handling of the resetting of the cgroup/process
 peak-memory-use detector.
 
 "Make core VMA operations internal and testable" from Lorenzo Stoakes.
 Rationalizaion and encapsulation of the VMA manipulation APIs.  With a
 view to better enable testing of the VMA functions, even from a
 userspace-only harness.
 
 "mm: zswap: fixes for global shrinker" from Takero Funaki.  Fix issues in
 the zswap global shrinker, resulting in improved performance.
 
 "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao.  Fill in
 some missing info in /proc/zoneinfo.
 
 "mm: replace follow_page() by folio_walk" from David Hildenbrand.  Code
 cleanups and rationalizations (conversion to folio_walk()) resulting in
 the removal of follow_page().
 
 "improving dynamic zswap shrinker protection scheme" from Nhat Pham.  Some
 tuning to improve zswap's dynamic shrinker.  Significant reductions in
 swapin and improvements in performance are shown.
 
 "mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
 Improvements to the new unaccepted memory feature,
 
 "mm/mprotect: Fix dax puds" from Peter Xu.  Implements mprotect on DAX
 PUDs.  This was missing, although nobody seems to have notied yet.
 
 "Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
 Cleanups and modest performance improvements for the maple tree library
 code.
 
 "memcg: further decouple v1 code from v2" from Shakeel Butt.  Move more
 cgroup v1 remnants away from the v2 memcg code.
 
 "memcg: initiate deprecation of v1 features" from Shakeel Butt.  Adds
 various warnings telling users that memcg v1 features are deprecated.
 
 "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
 Greatly improves the success rate of the mTHP swap allocation.
 
 "mm: introduce numa_memblks" from Mike Rapoport.  Moves various disparate
 per-arch implementations of numa_memblk code into generic code.
 
 "mm: batch free swaps for zap_pte_range()" from Barry Song.  Greatly
 improves the performance of munmap() of swap-filled ptes.
 
 "support large folio swap-out and swap-in for shmem" from Baolin Wang.
 With this series we no longer split shmem large folios into simgle-page
 folios when swapping out shmem.
 
 "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao.  Nice performance
 improvements and code reductions for gigantic folios.
 
 "support shmem mTHP collapse" from Baolin Wang.  Adds support for
 khugepaged's collapsing of shmem mTHP folios.
 
 "mm: Optimize mseal checks" from Pedro Falcato.  Fixes an mprotect()
 performance regression due to the addition of mseal().
 
 "Increase the number of bits available in page_type" from Matthew Wilcox.
 Increases the number of bits available in page_type!
 
 "Simplify the page flags a little" from Matthew Wilcox.  Many legacy page
 flags are now folio flags, so the page-based flags and their
 accessors/mutators can be removed.
 
 "mm: store zero pages to be swapped out in a bitmap" from Usama Arif.  An
 optimization which permits us to avoid writing/reading zero-filled zswap
 pages to backing store.
 
 "Avoid MAP_FIXED gap exposure" from Liam Howlett.  Fixes a race window
 which occurs when a MAP_FIXED operqtion is occurring during an unrelated
 vma tree walk.
 
 "mm: remove vma_merge()" from Lorenzo Stoakes.  Major rotorooting of the
 vma_merge() functionality, making ot cleaner, more testable and better
 tested.
 
 "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.  Minor
 fixups of DAMON selftests and kunit tests.
 
 "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.  Code
 cleanups and folio conversions.
 
 "Shmem mTHP controls and stats improvements" from Ryan Roberts.  Cleanups
 for shmem controls and stats.
 
 "mm: count the number of anonymous THPs per size" from Barry Song.  Expose
 additional anon THP stats to userspace for improved tuning.
 
 "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
 conversions and removal of now-unused page-based APIs.
 
 "replace per-quota region priorities histogram buffer with per-context
 one" from SeongJae Park.  DAMON histogram rationalization.
 
 "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
 Park.  DAMON documentation updates.
 
 "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
 related doc and warn" from Jason Wang: fixes usage of page allocator
 __GFP_NOFAIL and GFP_ATOMIC flags.
 
 "mm: split underused THPs" from Yu Zhao.  Improve THP=always policy - this
 was overprovisioning THPs in sparsely accessed memory areas.
 
 "zram: introduce custom comp backends API" frm Sergey Senozhatsky.  Add
 support for zram run-time compression algorithm tuning.
 
 "mm: Care about shadow stack guard gap when getting an unmapped area" from
 Mark Brown.  Fix up the various arch_get_unmapped_area() implementations
 to better respect guard areas.
 
 "Improve mem_cgroup_iter()" from Kinsey Ho.  Improve the reliability of
 mem_cgroup_iter() and various code cleanups.
 
 "mm: Support huge pfnmaps" from Peter Xu.  Extends the usage of huge
 pfnmap support.
 
 "resource: Fix region_intersects() vs add_memory_driver_managed()" from
 Huang Ying.  Fix a bug in region_intersects() for systems with CXL memory.
 
 "mm: hwpoison: two more poison recovery" from Kefeng Wang.  Teaches a
 couple more code paths to correctly recover from the encountering of
 poisoned memry.
 
 "mm: enable large folios swap-in support" from Barry Song.  Support the
 swapin of mTHP memory into appropriately-sized folios, rather than into
 single-page folios.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
 jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
 =s0T+
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Linus Torvalds 19a519ca87 xen: branch for v6.12-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZuu+BAAKCRCAXGG7T9hj
 vs3bAP4mp0NnxnDbvPObWoPKmLk5OvHdfY9cV+/M+r/UObfyswD+OYaZH0hVCHP6
 L96RzSHE+Q1pKPNpQfMOPcCDFmO3wwI=
 =cN0H
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - fix a boot problem as a Xen dom0 on some AMD systems

 - fix Xen PVH boot problems with KASAN enabled

 - fix for a build warning

 - fixes to swiotlb-xen

* tag 'for-linus-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/swiotlb: fix allocated size
  xen/swiotlb: add alignment check for dma buffers
  xen/pci: Avoid -Wflex-array-member-not-at-end warning
  xen/xenbus: Convert to use ERR_CAST()
  xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()
  x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()
  xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)
  xen: tolerate ACPI NVS memory overlapping with Xen allocated memory
  xen: allow mapping ACPI data using a different physical address
  xen: add capability to remap non-RAM pages to different PFNs
  xen: move max_pfn in xen_memory_setup() out of function scope
  xen: move checks for e820 conflicts further up
  xen: introduce generic helper checking for memory map conflicts
  xen: use correct end address of kernel for conflict checking
2024-09-19 08:20:31 +02:00
Paolo Bonzini 43d97b2ebd Merge tag 'kvm-x86-pat_vmx_msrs-6.12' of https://github.com/kvm-x86/linux into HEAD
KVM VMX and x86 PAT MSR macro cleanup for 6.12:

 - Add common defines for the x86 architectural memory types, i.e. the types
   that are shared across PAT, MTRRs, VMCSes, and EPTPs.

 - Clean up the various VMX MSR macros to make the code self-documenting
   (inasmuch as possible), and to make it less painful to add new macros.
2024-09-17 12:40:39 -04:00
Linus Torvalds fc1dc0d507 Updates for x86 timers:
- Use the topology information of number of packages for making the
     decision about TSC trust instead of using the number of online nodes
     which is not reflecting the real topology.
 
   - Stop the PIT timer 0 when its not in use as to stop pointless emulation
     in the VMM.
 
   - Fix the PIT timer stop sequence for timer 0 so it truly stops both real
     hardware and buggy VMM emulations.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpN3MTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVAKEADAr379sye4HNn9STpFGKsLWGzsZlch
 u5QaR0Nq0WvjO9Rd7+CfeA4AnvXCVwhG70Ut5hEfQEqlpJ62CZrjnAp4YSyaTdyA
 16X22z0Pcy7iq0FeaB5C1HK11AMNfpJyQsj3zLWqIrHcwPmPppCRhHpL6RC/pOrL
 QEPsG12+kAzfqQVTb6jkNaCezlLHZauJxdQMYqm74uQByfn/jFi4DdNLXgUrY8mJ
 gCBBubbF80aBxA6/ZY8aV19zXfklHyxp/u0Y+pVUMgCdyVmh1+Yb5vF4f9J/wbQk
 h5k3Z04I4n7/uH9USA6A5MG/6Wsj2fV5JAa2QH+9jM7dLMDAviPyMhsmaCSdOXlQ
 fjZczvXTCx5JwIFyGU5sL/ma3mrPkUugiq8LA17rfrclS8KxsxHVOh8TLueF8cIe
 5URYIlGg3uDn567rLgUDqieA7HxDxx2Ykqq3aiagNTSaHETFC41oef7Ju01ueriy
 KiWb7Q6kPifZ1Z5L+UJGKK/HPp2+ilCQqQmhwToEWmRKCuZgeje2wq37bjk6Z7sV
 XAXuxW16qn+2y6aHay/OAK6XAfxk3ZX7YGd1yXYuOfC8phJygCkXWq9rsjufLokz
 KTwH2Zj8MlMjfiqvG87aoJkEPy3hIUgIIem+MID4Ff4ERFo0pIL1PAOROnIa/0KN
 KDsLPVW4e/S0jA==
 =1vKt
 -----END PGP SIGNATURE-----

Merge tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 timer updates from Thomas Gleixner:

 - Use the topology information of number of packages for making the
   decision about TSC trust instead of using the number of online nodes
   which is not reflecting the real topology.

 - Stop the PIT timer 0 when its not in use as to stop pointless
   emulation in the VMM.

 - Fix the PIT timer stop sequence for timer 0 so it truly stops both
   real hardware and buggy VMM emulations.

* tag 'x86-timers-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Check for sockets instead of CPUs to make code match comment
  clockevents/drivers/i8253: Fix stop sequence for timer 0
  x86/i8253: Disable PIT timer 0 when not in use
  x86/tsc: Use topology_max_packages() to get package number
2024-09-17 15:27:01 +02:00
Linus Torvalds b507535474 Miscellaneous updates for x86:
- Rework kcpuid to handle the the autogenerated CSV file correctly and
     update the CSV file to cover the whole zoo of CPUID.
 
   - Avoid memcpy() for ia32 syscall_get_arguments() and use direct
     assignments as fortified memcpy() is unhappy about writing/reading
     beyond the end of the addresses destination/source struct member
 
   - A few new PCI IDs for AMD
 
   - Update MAINTAINERS to cover x86 specific selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpOZ8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofVUEACt8JjMxanswpMy1O6HbJcdVf2wwZ3q
 n30BKIFXucvqE6Opc7tWy5THh1+YjHuNXZMkfuuEe2Qjc69z2m3YwUmF0oAB9/AI
 6HU4yoePHTbEiPbTjNZMaKL+9CaYJbWkgoEjQpdQGWmo6gJqJxoRF5fY2assLfdJ
 zik2faebMNj3l1C1R1w646Zu3CScfZUE8512zwBfOxTqkpVBO4uDrspTzLYljlQN
 +gPZ41XDvQKu6SVoVC/TH/oRdshtLBg74fUDoL14yMkWqx3N5IKulFIMCeD2dEHv
 pJcbYb8x0pJ1iLx8q/k+spzbvTewY3sAAzbo5JLvcHy1PhW8jc+uCWorMpqLEhH0
 LzH1XZwC+kYvJytzZ9EEyYJAAMbh3KRBaphEXmRVec19tujwRy2NGjhRyVmLyqYr
 aShIGEVqigCGY8dF0mJgyVu5kd7X4vDZw4xH92c5/G41Ui19cXp1nXh61KMs1WMR
 sQm9FDvtRgcX9Pc89RyRRgYz2U75p3gcNyXKio4Oa2VfIlGRYUB5kg5/qDx3RjJx
 kZZ44TqPA/oJjpJyNjVrYqD6Gd3WUsjuH2gn6IAohKiSEKDdGTtHu7LEnKEcdkQk
 TomxWk1fTR8513GNXgEy2YhXdRN8iTlhgRI9G2BA5c4B6MCGHzPRFzWrosogB3+g
 tAOsEN8Sp3ea+g==
 =XVR5
 -----END PGP SIGNATURE-----

Merge tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Thomas Gleixner:

 - Rework kcpuid to handle the the autogenerated CSV file correctly and
   update the CSV file to cover the whole zoo of CPUID.

 - Avoid memcpy() for ia32 syscall_get_arguments() and use direct
   assignments as fortified memcpy() is unhappy about writing/reading
   beyond the end of the addresses destination/source struct member

 - A few new PCI IDs for AMD

 - Update MAINTAINERS to cover x86 specific selftests

* tag 'x86-misc-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add selftests/x86 entry
  x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h-70h
  x86/syscall: Avoid memcpy() for ia32 syscall_get_arguments()
  MAINTAINERS: Add x86 cpuid database entry
  tools/x86/kcpuid: Introduce a complete cpuid bitfields CSV file
  tools/x86/kcpuid: Parse subleaf ranges if provided
  tools/x86/kcpuid: Recognize all leaves with subleaves
  tools/x86/kcpuid: Strip bitfield names leading/trailing whitespace
  tools/x86/kcpuid: Protect against faulty "max subleaf" values
  tools/x86/kcpuid: Set max possible subleaves count to 64
  tools/x86/kcpuid: Properly align long-description columns
  tools/x86/kcpuid: Remove unused variable
  x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
2024-09-17 15:18:45 +02:00
Linus Torvalds 70f43ea3a3 Updates for x86 memory management:
- Make LAM enablement safe vs. kernel threads using a process mm
    temporarily as switching back to the process would not update CR3 and
    therefore not enable LAM causing faults in user space when using tagged
    pointers. Cure it by synchronizing LAM enablement via IPIs to all CPUs
    which use the related mm.
 
  - Cure a LAM harmless inconsistency between CR3 and the state during
    context switch. It's both confusing and prone to lead to real bugs
 
  - Handle alt stack handling for threads which run with a non-zero
    protection key. The non-zero key prevents the kernel to access the
    alternate stack. Cure it by temporarily enabling all protection keys for
    the alternate stack setup/restore operations.
 
  - Provide a EFI config table identity mapping for kexec kernel to prevent
    kexec fails because the new kernel cannot access the config table array
 
  - Use GB pages only when a full GB is mapped in the identity map as
    otherwise the CPU can speculate into reserved areas after the end of
    memory which causes malfunction on UV systems.
 
  - Remove the noisy and pointless SRAT table dump during boot
 
  - Use is_ioremap_addr() for iounmap() address range checks instead of
    high_memory. is_ioremap_addr() is more precise.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpPpYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYddD/9HeH5/rpWS3JU4ZVC+huY28uJuwAFW
 ER48zniRbmuz8y+dZZ6K8uvqoWB+ro+yNjA9Jhm9nHUzhs7kE5O8+bmkUi6HXViW
 6zS6PW95+u80dmSGy1Gna0SU3158OyBf2X61SySJABLLek7WwrR7jakkgrDBVtL5
 ILKS/dUwIrUPoVlszCh9uE0Kj6gdFquooE06sif5EIibnhSgSXfr2EbGj0Qq/YYf
 FYfpggSSVpTXFSkZSB2VCEqK66jaGUfKzZ6v1DkSioChUCsky2OO6zD9pk0dMixO
 a/0XvRUo3OhiXZbj1tPUtxaEBgJdigpsxke7xQSVxSl+DNNuapiybpgAzFM5Xh+m
 yFcP66nIpJcHE10vjVR3jSUlTSb2zk+v9d1Ujj10G1h8RHLTfsTCRHgzs7P0/nkE
 NJleWstYVRV5rFpPLoY0ryQmjW/PzYokkaqWKI12Lhxg4ojijZso3pS8WfOsk1/B
 081tOZERWeGnJEOOJwwYE1wt0Qq8th4S9b2/fz3vk2fsEHIf42s4fKQwy1CxKopb
 PyIrgnZyWx6ueX9QaIGIzGV1GsY4FKMgFJVOyVb0D0stMnr1ty2m3993eNs/nCXy
 +rHPMwFteLcwiWp/C3hq5IQd7uEvmRt/mYJ5hdvCj5wCIkXI3JtgsXfLSVs3Ln4f
 R6HvZehYmbJoNQ==
 =VZcR
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 memory management updates from Thomas Gleixner:

 - Make LAM enablement safe vs. kernel threads using a process mm
   temporarily as switching back to the process would not update CR3 and
   therefore not enable LAM causing faults in user space when using
   tagged pointers. Cure it by synchronizing LAM enablement via IPIs to
   all CPUs which use the related mm.

 - Cure a LAM harmless inconsistency between CR3 and the state during
   context switch. It's both confusing and prone to lead to real bugs

 - Handle alt stack handling for threads which run with a non-zero
   protection key. The non-zero key prevents the kernel to access the
   alternate stack. Cure it by temporarily enabling all protection keys
   for the alternate stack setup/restore operations.

 - Provide a EFI config table identity mapping for kexec kernel to
   prevent kexec fails because the new kernel cannot access the config
   table array

 - Use GB pages only when a full GB is mapped in the identity map as
   otherwise the CPU can speculate into reserved areas after the end of
   memory which causes malfunction on UV systems.

 - Remove the noisy and pointless SRAT table dump during boot

 - Use is_ioremap_addr() for iounmap() address range checks instead of
   high_memory. is_ioremap_addr() is more precise.

* tag 'x86-mm-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Improve iounmap() address range checks
  x86/mm: Remove duplicate check from build_cr3()
  x86/mm: Remove unused NX related declarations
  x86/mm: Remove unused CR3_HW_ASID_BITS
  x86/mm: Don't print out SRAT table information
  x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
  x86/kexec: Add EFI config table identity mapping for kexec kernel
  selftests/mm: Add new testcases for pkeys
  x86/pkeys: Restore altstack access in sigreturn()
  x86/pkeys: Update PKRU to enable all pkeys before XSAVE
  x86/pkeys: Add helper functions to update PKRU on the sigframe
  x86/pkeys: Add PKRU as a parameter in signal handling functions
  x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking
  x86/mm: Fix LAM inconsistency during context switch
  x86/mm: Use IPIs to synchronize LAM enablement
2024-09-17 15:03:01 +02:00
Linus Torvalds b136021126 Updates for x86 FRED:
- Enable FRED right after init_mem_mapping() because at that point the
     early IDT fault handler is replaced by the real fault handler. The real
     fault handler retrieves the faulting address from the stack frame and
     not from CR2 when the FRED feature is set. But that obviously only
     works when FRED is enabled in the CPU as well.
 
   - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case where
     ERETS can observe a SS mismatch and raises a #GP.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpNZITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobh3EACsU/WhmWG0pjqNs+92i/Hjd5QHRxX8
 WkyB+j0FQ3ZtQ0aqn73G/VxITxCMAE1fwC2iERlN/9eXjGXcwxeaM9upsMs9gq7v
 HmiOPSixn6hH7ulQ6WzDnM478pSnN4lmaZVY2ll1O3z8r79dW2Kz34zSqQCxDGcQ
 3sCJkHr7F0YClUaYxH/dok68F69aZXhU4V9URE30Ec74hnomYd4VuFkHwuA77rHG
 k81lHxSY9/Ttha91CPiK3/lU+lbehYNNZQ+PzUxkNmm9dlzXI8Vl5JRPJGIlYpWQ
 A9L1ZjV4kZcB+tcXPV1bOW+lVSefGVquAia5RgCyUylIFCOtsR/wCoezS3f17Zhf
 Ry+kfkYwuDgD0IYNVp6L3+Fx0LtBJT3BorhnS7YhhiqvLW0EpGe/bBzzRFntp4oR
 TmRAA3nNn3DBCky3rfGg0TWwqfvy/7c6SPY1Zw1SEmqtDdHB/DyKGt+BVQQ2kqWO
 tCtGAMjcE7Cfgca7mI7wILjY7MFirTQW0js6UL5mw22rhZxKV5S9m7N8KkUnFh3S
 acjQ1nL5ZBQ9cKdEGrLNHQjfSSc9ju7aXsGXm5c+vrqKbMG8+Nj+1cvzxaLL5xVY
 LLKACw5rl0LVXHU5H3IwvS+GMipklrmouikdoI4P8vHMd9GBquR4znO3MzqaLtg2
 F1IBXL07s2SYrw==
 =cKRu
 -----END PGP SIGNATURE-----

Merge tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 FRED updates from Thomas Gleixner:

 - Enable FRED right after init_mem_mapping() because at that point the
   early IDT fault handler is replaced by the real fault handler. The
   real fault handler retrieves the faulting address from the stack
   frame and not from CR2 when the FRED feature is set. But that
   obviously only works when FRED is enabled in the CPU as well.

 - Set SS to __KERNEL_DS when enabling FRED to prevent a corner case
   where ERETS can observe a SS mismatch and raises a #GP.

* tag 'x86-fred-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry: Set FRED RSP0 on return to userspace instead of context switch
  x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
  x86/entry: Test ti_work for zero before processing individual bits
  x86/fred: Set SS to __KERNEL_DS when enabling FRED
  x86/fred: Enable FRED right after init_mem_mapping()
  x86/fred: Move FRED RSP initialization into a separate function
  x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param()
2024-09-17 14:55:59 +02:00
Linus Torvalds c3056a7d14 Provide FPU buffer layout in core dumps:
Debuggers have guess the FPU buffer layout in core dumps, which is error
   prone. This is because AMD and Intel layouts differ.
 
   To avoid buggy heuristics add a ELF section which describes the buffer
   layout which can be retrieved by tools.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpOuwTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTRAEACGHPdAYFp5A396c9qUbHUE2gEKIad2
 iuq15TZKLPY/LFqfTwnkp9/nqKtZ0gj4D6XCIucWZjwWJuPgvgGf/tC9Fk+H+C6X
 9+rycP3GdqxU28qLxA428SN2Pg3lvqG4rryVWeHUXQ4x8A0DSMV+3pkNY5YgJ+2+
 fTzNzVi2tkPRAXhKmj3EdcFcgDPiFQBMm1QNBpc+FqrXk4rjJb9Axln0oT8xemDv
 TtJ5BMhFpR73naaiS4IrK8Tk3oFCa8CmafCQfl1zAOor/+EemPQKwMuGeiXE7dLG
 eE+OTw5zuxYwlc9WoaPmM/ZiEc5JptpHQUtyHDBN7BaK87VKjsupAXXVOh6XMRCt
 R2coqq7fqDqMANwWpUKddky3vSwbst1GZpXGAENOy64yU4VoFutr616WSj3sJfUi
 knBauPqLAFeZLhMn/kKr5a0rBgm7VuQSlGPYEhqVdaM3Eb/zJEupFL/bTpqQbbz/
 8lo2hYcfDslhShcEZYBwm4eUg+ytZ96K3ciZ5YgNih9LFBxEOo0SY1CqbQJiRtpB
 3DmgldYtzRdQq5/JtFGNv717uMESn5khG3qHUpXtrDhWfD8spMWiY1yO/cwWvLFJ
 ZS5ATp1dAt1Pbv2MC6r9jQBbW3V7xNNAOJdzUvIZPP04PKeV0ObFOplxhabOzUDj
 OLquyIrjpxeisg==
 =Vqqo
 -----END PGP SIGNATURE-----

Merge tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Thomas Gleixner:
 "Provide FPU buffer layout in core dumps:

  Debuggers have guess the FPU buffer layout in core dumps, which is
  error prone. This is because AMD and Intel layouts differ.

  To avoid buggy heuristics add a ELF section which describes the buffer
  layout which can be retrieved by tools"

* tag 'x86-fpu-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/elf: Add a new FPU buffer layout info to x86 core files
2024-09-17 14:46:17 +02:00
Linus Torvalds dea435d397 Enable UBSAN traps for x86, which provides better reporting through
metadata encodeded into UD1.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpM6ITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoU/kEACWS7Z9mQrWB3r22ufTTPoN+hNudth+
 CP8wluXZGvLPh1Pq9dpB9ZniBUN8levYoGyj3NTdr6VtoMJ6NYcZVuH98lCCEMXO
 1UmDpydSGZ3BqVgmf4h0eYAJgEiA5qTflXMsh6SfsaPQR7jniJTE451hgJdRIogG
 DvgWeVTYn5vt0+oRHJp6ogRLR9oOUgdp94fIwaW34OpesbVJeWUW9zAvBcqdNrDT
 KJIM7ta6eivEakFRxriQZTKRc+3ElvZ2fdWNdo9qrRd64MTIOTXAj3G0lXt3YtpZ
 06pfJ1CfQ+nwHKfxmmy4gz4eJG7KcpMM+KFZTR3NoSAz4oMTzAvVTxAuEt+pahx6
 bmLzaY/I/gRB/Rt+e5oEZSEIq+Sh/Lm3IZoQUhK0+HeJBjwPghBZw3BjkFJvEsMw
 S0arvklH2x37gP9rnzOODf2QG7aIAqLTrvRJS610fctwadR4k+2UIE8ZGHOTt55J
 UdiK/QhU4gMVaRTebTcPquu3IMmnJjla/bEWdIrBtOSiGtVd1BnAp/kvmkdQH3eI
 ZUqJbnfofN4rzSufFqSVY88ORVIcQMnNDLM0qyJofIC79u7OiU40icoDxWS6mDHQ
 wQSEszInhwNzyAxoHnNkXDunjDVKhATQPOde0F4TxLcrYD9KRpvJag/1j5fCQi+0
 ftODZflfGS2UjQ==
 =Z5Hg
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core update from Thomas Gleixner:
 "Enable UBSAN traps for x86, which provides better reporting through
  metadata encodeded into UD1"

* tag 'x86-core-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/traps: Enable UBSAN traps on x86
2024-09-17 13:17:27 +02:00
Linus Torvalds 61d1ea914b Updates for the x86 APIC code:
- Handle an allocation failure in the IO/APIC code gracefully instead of
     crashing the machine.
 
   - Remove support for APIC local destination mode on 64bit
 
     Logical destination mode of the local APIC is used for systems with up
     to 8 CPUs. It has an advantage over physical destination mode as it
     allows to target multiple CPUs at once with IPIs. That advantage was
     definitely worth it when systems with up to 8 CPUs were state of the
     art for servers and workstations, but that's history.
 
     In the recent past there were quite some reports of new laptops failing
     to boot with logical destination mode, but they work fine with physical
     destination mode. That's not a suprise because physical destination
     mode is guaranteed to work as it's the only way to get a CPU up and
     running via the INIT/INIT/STARTUP sequence. Some of the affected
     systems were cured by BIOS updates, but not all OEMs provide them.
 
     As the number of CPUs keep increasing, logical destination mode becomes
     less used and the benefit for small systems, like laptops, is not
     really worth the trouble. So just remove logical destination mode
     support for 64bit and be done with it.
 
   - Code and comment cleanups in the APIC area.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpL0gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYob/VD/984H4Ku5/Djq9HkhBO11hfRTIVz/uf
 1/b5ogd3eN0dK5nAv79/Gj7E/zntVsvCjuCYckXz51xPxkQH2LxUhDKqeUwg5lmz
 xQV0mKK4fIS/g5yymQGplKc7FfjRAnVL9ZZRRvMkvtqbr1+dA665XrfjFAPkp929
 zLaBUbNC6YxYfSddsV+fE8711QP6NzCYdeEBIdZ3NuBrlGfiLy1g1OWCk8za7zjM
 cLJfGnU63MNXI4smrZWrQwJDBOiQl1wPbJYWL216OPHofLzLNGNZFXm4y8OJcyN0
 WPWn1TliAwpRYx18Z/cEPgkoES8mXqqpPcoo0yBjOmPLl31J6QYU7QQhDb3HOnM/
 ALgnnuhoWll5YjNBPJkONAa7lpnmfTbEg82WxaipEscz9CyEBoeOLvYBGPl/YqV+
 B8wMOZHDH+BchJ6rYXDA1AmkD+9q86F+ddbiVOKj09dVm/QeLrGjwox1O7yGALGZ
 hZPQx9MsTOJqQIh40PsqFko6OiMKuMBIebacFb4NqmVA2/WbRbcmkzRyxk+kkBFv
 UMZX5O6sQhat615WZkxTnjmdnXETTIlv4nRQURBd/LF6ECRkXXG11dWaZfTXZ9iW
 8NNlHw8mIbGmzn7wWXHlhk7N7vuhWCikAf7V2y+eZUVtE56qGM2volJNCmTZacP2
 rrjmltwEGR+5gg==
 =Y3a/
 -----END PGP SIGNATURE-----

Merge tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 APIC updates from Thomas Gleixner:

 - Handle an allocation failure in the IO/APIC code gracefully instead
   of crashing the machine.

 - Remove support for APIC local destination mode on 64bit

   Logical destination mode of the local APIC is used for systems with
   up to 8 CPUs. It has an advantage over physical destination mode as
   it allows to target multiple CPUs at once with IPIs. That advantage
   was definitely worth it when systems with up to 8 CPUs were state of
   the art for servers and workstations, but that's history.

   In the recent past there were quite some reports of new laptops
   failing to boot with logical destination mode, but they work fine
   with physical destination mode. That's not a suprise because physical
   destination mode is guaranteed to work as it's the only way to get a
   CPU up and running via the INIT/INIT/STARTUP sequence. Some of the
   affected systems were cured by BIOS updates, but not all OEMs provide
   them.

   As the number of CPUs keep increasing, logical destination mode
   becomes less used and the benefit for small systems, like laptops, is
   not really worth the trouble. So just remove logical destination mode
   support for 64bit and be done with it.

 - Code and comment cleanups in the APIC area.

* tag 'x86-apic-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Fix comment on IRQ vector layout
  x86/apic: Remove unused extern declarations
  x86/apic: Remove logical destination mode for 64-bit
  x86/apic: Remove unused inline function apic_set_eoi_cb()
  x86/ioapic: Cleanup remaining coding style issues
  x86/ioapic: Cleanup line breaks
  x86/ioapic: Cleanup bracket usage
  x86/ioapic: Cleanup comments
  x86/ioapic: Move replace_pin_at_irq_node() to the call site
  iommu/vt-d: Cleanup apic_printk()
  x86/mpparse: Cleanup apic_printk()s
  x86/ioapic: Cleanup guarded debug printk()s
  x86/ioapic: Cleanup apic_printk()s
  x86/apic: Cleanup apic_printk()s
  x86/apic: Provide apic_printk() helpers
  x86/ioapic: Use guard() for locking where applicable
  x86/ioapic: Cleanup structs
  x86/ioapic: Mark mp_alloc_timer_irq() __init
  x86/ioapic: Handle allocation failures gracefully
2024-09-17 13:09:49 +02:00
Linus Torvalds 0279aa780d A set of cleanups across x86:
- Use memremap() for the EISA probe instrad of ioremap(). EISA is
     strictly memory and not MMIO
 
   - Cleanups and enhancement all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpMzcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoa1WD/4txviyFr+1IY/P/JLxE8cBCW3R3aDY
 7+15lGBHiWyJ+uamzlAv8OQab/brgh5ofnRQjkrvK7pLVb7XgBacncFT8tF/j83w
 Yw+36NMAkeVAt2rJbWz1ZdgpK+StFMFmXcclv+BL5m0aTuGP1IsJX3KbbpMAYlyY
 ju++UAm0c/CSjRyuks1HgqADZ2Q8pjQv3dN723BRBxgRv0b3IcFAl7bBdZGf/w5w
 PBC7mFg7x0dAVW3Dpb73VeeNuAJ1LolTasS+OZglo/fhNx1hVHTYInewZ24t37px
 xDSDoYSJq0qQsG6T660gEduVqay80A8Jwu9Mwu+0G7krbuSafqDOqcPlFWPMUbiy
 VP6EPUh1FaJsH+IxloU5nyfmU6DaukYh1cPkGJBfUyCLG4KDyodIxL5c1c3cG90Y
 umK+Ggy3vNbgcLBGJWUgqS9ET55qcxMc+X3DMlnQl+pGhFdkC9cHCTUqSJRwLeuj
 4Dvk76zX1VNGmPmr77kP+rIZl9hqmfw4I2hekUaETSuWOAsf/xHzH/TlcOnPVSr0
 jidxNvHQ0kuRziCeBH7RUU8jpZyepCY4SIvJt+C2f6pZv/82lOao/ZIqVhyNR5Jh
 +zLr+UU6PtxNYyYjg1zcL0FCa6jz40Z2el0cPChoK0xqwOVAPGu/HiqCQW0AmXJR
 +Dl/gGrb68vFsg==
 =aN01
 -----END PGP SIGNATURE-----

Merge tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Thomas Gleixner:
 "A set of cleanups across x86:

   - Use memremap() for the EISA probe instead of ioremap(). EISA is
     strictly memory and not MMIO

   - Cleanups and enhancement all over the place"

* tag 'x86-cleanups-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/EISA: Dereference memory directly instead of using readl()
  x86/extable: Remove unused declaration fixup_bug()
  x86/boot/64: Strip percpu address space when setting up GDT descriptors
  x86/cpu: Clarify the error message when BIOS does not support SGX
  x86/kexec: Add comments around swap_pages() assembly to improve readability
  x86/kexec: Fix a comment of swap_pages() assembly
  x86/sgx: Fix a W=1 build warning in function comment
  x86/EISA: Use memremap() to probe for the EISA BIOS signature
  x86/mtrr: Remove obsolete declaration for mtrr_bp_restore()
  x86/cpu_entry_area: Annotate percpu_setup_exception_stacks() as __init
2024-09-17 13:00:12 +02:00
Linus Torvalds 5ba202a7c9 Updates for KCOV instrumentation on x86:
- Prevent spurious KCOV coverage in common_interrupt()
 
   - Fixup the KCOV Makefile directive which got stale due to a source file
     rename
 
   - Exclude stack unwinding from KCOV as it creates large amounts of
     uninteresting coverage
 
   - Provide a self test to validate that KCOV coverage of the interrupt
     handling code starts not before preempt count got updated.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbpMeITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaOeD/4oO3g0soK0LIcDIwzaG0ap0hx0nucw
 aVSAESuY+ZaSbRbV0fNoYdHORvLdErs67SeyeJRSxTzSNqGH2dGoFrfbkRSXq951
 RdCSPP60T7xgqAme1YLDiChfXt/gkbWk/8V5Q7sG3oq3GaVcPUyZgPo4M4HQMdfg
 Mla3VPikW5Np3fvs0IZYWQ5VdY0fFOHY5JGMhKJznJxf+Ud+VAtxsbJUcO4MEYWW
 A9CVJNHGEXssGA6vm5kgtLu6n2QFuoSj6En/WqLEaJb8f/V332e04Xj2ZHUaOOjV
 2abVeDovv+dwUYb4SgrGVg9gfEwwcLPDnmOuuQJmQBB5kU4mJsCqI5TTS6c1fgU4
 x8tQsGSOKHFQAI14ZWtitrL4rS2uFcBkAFXo0dF8J5o4989RA8cpfeWVSVUb/UXd
 u38BWpc9iHiihHKMmMQgsa1bUMwdSUTvN5XFHkeP4oqUdMiEiWn8iM5+zXd/lfTs
 9mrTv+kcLA7mjFOmn4JyE2b+NuiPdgS2FCBGLycHvGwvJoJlO2UmSpF89AJ5vdKs
 F8vWLkV+gno/HtwS5o949cAwjYiCodfc7u1W0xj2VDAbx0RbaBw1SDhXMQcLxLgn
 BTt4yHKKIeLX++WH3fpeyL91+UJWubUzNzY4rAmLkz5DedWAkpES+45fatp1buIz
 Lp/hGiIsG9p5xw==
 =tiXT
 -----END PGP SIGNATURE-----

Merge tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 build updates from Thomas Gleixner:
 "Updates for KCOV instrumentation on x86:

   - Prevent spurious KCOV coverage in common_interrupt()

   - Fixup the KCOV Makefile directive which got stale due to a source
     file rename

   - Exclude stack unwinding from KCOV as it creates large amounts of
     uninteresting coverage

   - Provide a self test to validate that KCOV coverage of the interrupt
     handling code starts not before preempt count got updated"

* tag 'x86-build-2024-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Ignore stack unwinding in KCOV
  module: Fix KCOV-ignored file name
  kcov: Add interrupt handling self test
  x86/entry: Remove unwanted instrumentation in common_interrupt()
2024-09-17 12:40:34 +02:00
Linus Torvalds 9ea925c806 Updates for timers and timekeeping:
- Core:
 
 	- Overhaul of posix-timers in preparation of removing the
 	  workaround for periodic timers which have signal delivery
 	  ignored.
 
         - Remove the historical extra jiffie in msleep()
 
 	  msleep() adds an extra jiffie to the timeout value to ensure
 	  minimal sleep time. The timer wheel ensures minimal sleep
 	  time since the large rewrite to a non-cascading wheel, but the
 	  extra jiffie in msleep() remained unnoticed. Remove it.
 
         - Make the timer slack handling correct for realtime tasks.
 
 	  The procfs interface is inconsistent and does neither reflect
 	  reality nor conforms to the man page. Show the correct 0 slack
 	  for real time tasks and enforce it at the core level instead of
 	  having inconsistent individual checks in various timer setup
 	  functions.
 
         - The usual set of updates and enhancements all over the place.
 
   - Drivers:
 
         - Allow the ACPI PM timer to be turned off during suspend
 
 	- No new drivers
 
 	- The usual updates and enhancements in various drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmbn7jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobqnD/9COlU0nwsulABI/aNIrsh6iYvnCC9v
 14CcNta7Qn+157Wfw9BWOyHdNhR1/fPCXE8jJ71zTyIOeW27HV2JyTtxTwe9ZcdK
 ViHAaj7YcIjcVUEC3StCoRCPnvLslEw4qJA5AOQuDyMivdQn+YVa2c0baJxKaXZt
 xk4HZdMj4NAS0jRKnoZSwtKW/+Oz6rR4GAWrZo+Zs1/8ur3HfqnQfi8lJ1hJtLLW
 V7XDCVRvamVi6Ah3ocYPPp/1P6yeQDA1ge9aMddqaza5STWISXRtSnFMUmYP3rbS
 FaL8TyL+ilfny8pkGB2WlG6nLuSbtvogtdEh1gG1k1RmZt44kAtk8ba/KiWFPBSb
 zK9cjojRMBS71f9G4kmb5F4rnXoLsg1YbD1Nzhz3wq2Cs1Z90dc2QwMren0zoQ1x
 Fn56ueRyAiagBlnrSaKyso/2RvqJTNoSdi3RkpjYeAph0UoDCqvTvKjGAf1mWiw1
 T/1lUWSVqWHnzZbM7XXzzajIN9bl6A7bbqlcAJ2O9vZIDt7273DG+bQym9Vh6Why
 0LTGGERHxzKBsG7WRg+2Gmvv6S18UPKRo8tLtlA758rHlFuPTZCShWrIriwSNl1K
 Hxon+d4BparSnm1h9W/NHPKJA574UbWRCBjdk58IkAj8DxZZY4ORD9SMP+ggkV7G
 F6p9cgoDNP9KFg==
 =jE0N
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Core:

   - Overhaul of posix-timers in preparation of removing the workaround
     for periodic timers which have signal delivery ignored.

   - Remove the historical extra jiffie in msleep()

     msleep() adds an extra jiffie to the timeout value to ensure
     minimal sleep time. The timer wheel ensures minimal sleep time
     since the large rewrite to a non-cascading wheel, but the extra
     jiffie in msleep() remained unnoticed. Remove it.

   - Make the timer slack handling correct for realtime tasks.

     The procfs interface is inconsistent and does neither reflect
     reality nor conforms to the man page. Show the correct 0 slack for
     real time tasks and enforce it at the core level instead of having
     inconsistent individual checks in various timer setup functions.

   - The usual set of updates and enhancements all over the place.

  Drivers:

   - Allow the ACPI PM timer to be turned off during suspend

   - No new drivers

   - The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
  ntp: Make sure RTC is synchronized when time goes backwards
  treewide: Fix wrong singular form of jiffies in comments
  cpu: Use already existing usleep_range()
  timers: Rename next_expiry_recalc() to be unique
  platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
  clocksource/drivers/jcore: Use request_percpu_irq()
  clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
  clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
  clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
  clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
  platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
  clocksource: acpi_pm: Add external callback for suspend/resume
  clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
  dt-bindings: timer: rockchip: Add rk3576 compatible
  timers: Annotate possible non critical data race of next_expiry
  timers: Remove historical extra jiffie for timeout in msleep()
  hrtimer: Use and report correct timerslack values for realtime tasks
  hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
  timers: Add sparse annotation for timer_sync_wait_running().
  signal: Replace BUG_ON()s
  ...
2024-09-17 07:25:37 +02:00
Linus Torvalds 02824a5fd1 Power management updates for 6.12-rc1
- Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).
 
  - Add support for Granite Rapids and Sierra Forest in OOB mode to the
    intel_pstate cpufreq driver (Srinivas Pandruvada).
 
  - Add basic support for CPU capacity scaling on x86 and make the
    intel_pstate driver set asymmetric CPU capacity on hybrid systems
    without SMT (Rafael Wysocki).
 
  - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
    driver (Jeff Johnson).
 
  - Several OF related cleanups in cpufreq drivers (Rob Herring).
 
  - Enable COMPILE_TEST for ARM drivers (Rob Herrring).
 
  - Introduce quirks for syscon failures and use socinfo to get revision
    for TI cpufreq driver (Dhruva Gole, Nishanth Menon).
 
  - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
    Ugwekar).
 
  - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
    (Danila Tikhonov, Huacai Chen, and Liu Jing).
 
  - Make amd-pstate validate return of any attempt to update EPP limits,
    which fixes the masking hardware problems (Mario Limonciello).
 
  - Move the calculation of the AMD boost numerator outside of amd-pstate,
    correcting acpi-cpufreq on systems with preferred cores (Mario
    Limonciello).
 
  - Harden preferred core detection in amd-pstate to avoid potential
    false positives (Mario Limonciello).
 
  - Add extra unit test coverage for mode state machine (Mario
    Limonciello).
 
  - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).
 
  - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy).
 
  - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
    intel_idle (Kai-Heng Feng).
 
  - Use scoped device node handling to fix missing of_node_put() and
    simplify walking OF children in the riscv-sbi cpuidle driver (Krzysztof
    Kozlowski).
 
  - Remove dead code from cpuidle_enter_state() (Dhruva Gole).
 
  - Change an error pointer to NULL to fix error handling in the
    intel_rapl power capping driver (Dan Carpenter).
 
  - Fix off by one in get_rpi() in the intel_rapl power capping
    driver (Dan Carpenter).
 
  - Add support for ArrowLake-U to the intel_rapl power capping
    driver (Sumeet Pawnikar).
 
  - Fix the energy-pkg event for AMD CPUs in the intel_rapl power capping
    driver (Dhananjay Ugwekar).
 
  - Add support for AMD family 1Ah processors to the intel_rapl power
    capping driver (Dhananjay Ugwekar).
 
  - Remove unused stub for saveable_highmem_page() and remove deprecated
    macros from power management documentation (Andy Shevchenko).
 
  - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
    sysfs interface (Xueqin Luo).
 
  - Update the maintainers information for the operating-points-v2-ti-cpu DT
    binding (Dhruva Gole).
 
  - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring).
 
  - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
    Johnson).
 
  - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
    Moon).
 
  - Use of_property_present() instead of of_get_property() in the imx-bus
    devfreq driver (Rob Herring).
 
  - Update directory handling and installation process in the pm-graph
    Makefile and add .gitignore to ignore sleepgraph.py artifacts to
    pm-graph (Amit Vadhavana, Yo-Jung Lin).
 
  - Make cpupower display residency value in idle-info (Aboorva
    Devarajan).
 
  - Add missing powercap_set_enabled() stub function to cpupower (John
    B. Wyatt IV).
 
  - Add SWIG support to cpupower (John B. Wyatt IV).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmbjKEQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx8g8P/1RqL6NuCxH4eobwZigeyBS6/sLHPmKo
 wqHcerZsU7EH8DOlmBU0SH1Br2WBQAbaP8d1ukT5qkGBrZ+IM/A2ipZct0yAHH2D
 aBKwg7V3LvXo2mPuLve0knpM6W7zibPHJJlcjh8DmGQJabhWO7jr+p/0eS4JE2ek
 iE5FCXTxhvbcNJ9yWSt7+3HHmvj74P81As7txysLSzhWSZDcqXb0XJRgVJnWDt+x
 OyTAMEEAY2BuqmijHzqxxHcA1fxOBK/pa9yfPdKP7ePynLnpP7xd9A5oLbXQ4BL9
 PHqpD06ZBdSMQzKkyCODypZt8PL+FcEALE4u9chV/nzVwp7TrtDneXWA7RA0GXgq
 mp9hm51GmdptRayePR3s4TCA6a2BUw3Ue4fgs6XF/bexNpc3nx0wtP8HEevcuy8q
 Z7XQkpqW942vOohfoN42JwTjfDJhYTwSH3dcIY8UghHtzwZ5YKV1M4f97kNR7V2i
 QLJvaGJ5yTTcaHndkpc4EKknPyLRaWPh8h/yVmMRBcAaGBWaImul3a5NI07f0wLM
 LTenlpEcls7WSu9n3uvFXvT7nSS2CBV0huTbg449X4T2J0T6EooYsVuHNsFMNFLy
 Xm3lUtdm5QjAXFf+azOCO+26XQt8wObC0ttZtCC2j1b8D+9Riuwh5QHLr99rRTzn
 7Ic4U5Lkimzx
 =JM+K
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "By the number of new lines of code, the most visible change here is
  the addition of hybrid CPU capacity scaling support to the
  intel_pstate driver. Next are the amd-pstate driver changes related to
  the calculation of the AMD boost numerator and preferred core
  detection.

  As far as new hardware support is concerned, the intel_idle driver
  will now handle Granite Rapids Xeon processors natively, the
  intel_rapl power capping driver will recognize family 1Ah of AMD
  processors and Intel ArrowLake-U chipos, and intel_pstate will handle
  Granite Rapids and Sierra Forest chips in the out-of-band (OOB) mode.

  Apart from the above, there is a usual collection of assorted fixes
  and code cleanups in many places and there are tooling updates.

  Specifics:

   - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef)

   - Add support for Granite Rapids and Sierra Forest in OOB mode to the
     intel_pstate cpufreq driver (Srinivas Pandruvada)

   - Add basic support for CPU capacity scaling on x86 and make the
     intel_pstate driver set asymmetric CPU capacity on hybrid systems
     without SMT (Rafael Wysocki)

   - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
     driver (Jeff Johnson)

   - Several OF related cleanups in cpufreq drivers (Rob Herring)

   - Enable COMPILE_TEST for ARM drivers (Rob Herrring)

   - Introduce quirks for syscon failures and use socinfo to get
     revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon)

   - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
     Ugwekar)

   - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
     (Danila Tikhonov, Huacai Chen, and Liu Jing)

   - Make amd-pstate validate return of any attempt to update EPP
     limits, which fixes the masking hardware problems (Mario
     Limonciello)

   - Move the calculation of the AMD boost numerator outside of
     amd-pstate, correcting acpi-cpufreq on systems with preferred cores
     (Mario Limonciello)

   - Harden preferred core detection in amd-pstate to avoid potential
     false positives (Mario Limonciello)

   - Add extra unit test coverage for mode state machine (Mario
     Limonciello)

   - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang
     Liu)

   - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy)

   - Disable promotion to C1E on Jasper Lake and Elkhart Lake in
     intel_idle (Kai-Heng Feng)

   - Use scoped device node handling to fix missing of_node_put() and
     simplify walking OF children in the riscv-sbi cpuidle driver
     (Krzysztof Kozlowski)

   - Remove dead code from cpuidle_enter_state() (Dhruva Gole)

   - Change an error pointer to NULL to fix error handling in the
     intel_rapl power capping driver (Dan Carpenter)

   - Fix off by one in get_rpi() in the intel_rapl power capping driver
     (Dan Carpenter)

   - Add support for ArrowLake-U to the intel_rapl power capping driver
     (Sumeet Pawnikar)

   - Fix the energy-pkg event for AMD CPUs in the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Add support for AMD family 1Ah processors to the intel_rapl power
     capping driver (Dhananjay Ugwekar)

   - Remove unused stub for saveable_highmem_page() and remove
     deprecated macros from power management documentation (Andy
     Shevchenko)

   - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM
     sysfs interface (Xueqin Luo)

   - Update the maintainers information for the
     operating-points-v2-ti-cpu DT binding (Dhruva Gole)

   - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring)

   - Add missing MODULE_DESCRIPTION() macros to devfreq governors (Jeff
     Johnson)

   - Use devm_clk_get_enabled() in the exynos-bus devfreq driver (Anand
     Moon)

   - Use of_property_present() instead of of_get_property() in the
     imx-bus devfreq driver (Rob Herring)

   - Update directory handling and installation process in the pm-graph
     Makefile and add .gitignore to ignore sleepgraph.py artifacts to
     pm-graph (Amit Vadhavana, Yo-Jung Lin)

   - Make cpupower display residency value in idle-info (Aboorva
     Devarajan)

   - Add missing powercap_set_enabled() stub function to cpupower (John
     B. Wyatt IV)

   - Add SWIG support to cpupower (John B. Wyatt IV)"

* tag 'pm-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (62 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  PM: hibernate: Remove unused stub for saveable_highmem_page()
  pm:cpupower: Add error warning when SWIG is not installed
  MAINTAINERS: Add Maintainers for SWIG Python bindings
  pm:cpupower: Include test_raw_pylibcpupower.py
  pm:cpupower: Add SWIG bindings files for libcpupower
  pm:cpupower: Add missing powercap_set_enabled() stub function
  ...
2024-09-16 07:47:50 +02:00
Linus Torvalds a4ebad655b Fix deadlock in SGX NUMA node search
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmbiE0YACgkQaDWVMHDJ
 krCmUw//T2NZu0k3H7z2AyBvLlxpdN61tZVZ9UArw71u6PNmDPhhU4Idt/vyidoM
 x0+tGswjpIBgxpt/qU2oN0rYMqKO0Dnwdnbw7u1Wfr+ldHYD3jupgzdQtNvCs70P
 U8qQZN4ltgppYXIEFnfCXoypaiIafyPiRJhR0YZQoVJ75uwbRB2Vu2ax5n1dak4u
 Wkwb55X0ucu2Q93z51tISdtUQQ8+yEytbXP5blu77GCtDf6ZPOFSF/VsBjKU6lER
 XQv7H2ReMUaYrPxvn7z60AApsYVDcbOwC0BDe1FmlNllmLlxxoThpfUMX+9+0pAs
 szHzta5ZZ83VXoFpVzbLIaEvKJZSrksi4EEsfr1qxEzo1QgTrONWt79OFH3GBi/i
 mMug+3vqlVKdx+YoHhZ+e4UcDftz4gqWEwvrlxh0CLomaprZU5ENDF8K53AYVa3g
 whnWzCG3fEAdIfFJ3Jfxw6U0mk8l7AnOM98vJK4Wa7faErJGi1nwNkWScmpYEMMP
 mJf0TOJZ3fXire51Ivq/xA+xsdb/P2h2nzbUZlaZ3vrGN8jBuglsHZtm9c/Rk+dC
 y7/peyPgFGL/1ngOKzzmz6mEQc7POJBKYYuiOe0MEwO3O2YtvK2hAeiL30GPJ31+
 lkXC/F8BwNdxaxcE8KGsEUqFpV3ynvS61Oqvl8CQhYmE8JaAAII=
 =c73j
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Dave Hansen:
 "These fix a deadlock in the SGX NUMA allocator.

  It's probably only triggerable today on servers with buggy BIOSes, but
  it's theoretically possible it can happen on less goofy systems"

* tag 'x86_sgx_for_6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Log information when a node lacks an EPC section
  x86/sgx: Fix deadlock in SGX NUMA node search
2024-09-16 06:51:10 +02:00
Linus Torvalds 963d0d60d6 - Add CONFIG_ option for every hw CPU mitigation. The intent is to support
configurations and scenarios where the mitigations code is irrelevant
 
 - Other small fixlets and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbfDhUACgkQEsHwGGHe
 VUrF9A//UkVKmIihXXak0GPqFhu8XrWeYlmwLxWe/uIy2hZCLp9L7n4pg0Ikxqz3
 9D9hYk+Jykfu/jsv0sR6LH6OAUTlJi+P0w3x3VeL1sgFPUkwFtOaN2v/t5H3SW5r
 l+VQpdUXPmLH6QbhvT84U6L/OQYr2cjhiYro47uwM9vO/SNao4HcbC/pdBr2dwxM
 KzzA9sEDg3Le391phIhEOIogA1lPNV7KMScg2VjPTqQzEJ3NQVzyYmqjPO70sN9F
 sAuksdF+rnPjc9K/W+qUcvlp8e9lDB8g0oPlyoOeubjXsnZU5YchriPdBbyAl0dJ
 bjpftXIrBj8Vtmh7Tc0Jx2tlMFXNT5FrzcqdD4sviLnhrKEJSkwAoFgIMp5A+tN8
 Kl8MrlABO8I8+zGRQB7TzhwaCC4AxCqUS3UEcYd4CBf5AWqT5i12ijbtIxPtdpG4
 5itngIV4HT8casudpC8i8OTjOTggorMa7Pu/bQULhnZwagH8chlBdoOlKKQVkeVG
 FUi+L/BljL9mASic7NRZI11tk44m9xWWkbbJOPlZaGJw9YzGrxD0YOfhbgcc9iaX
 SOUMVJEhJVJMBISGiBUQDB6r51ee6B8RKJ3ByxzpAbwsUR9cXyfSYfUyE5reQJy9
 3luj/iorL3guYU6EGEAtvbuTLGbKqybrV6zOB/QRXHWyhtUgrUA=
 =GFld
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 hw mitigation updates from Borislav Petkov:

 - Add CONFIG_ option for every hw CPU mitigation. The intent is to
   support configurations and scenarios where the mitigations code is
   irrelevant

 - Other small fixlets and improvements

* tag 'x86_bugs_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix handling when SRSO mitigation is disabled
  x86/bugs: Add missing NO_SSB flag
  Documentation/srso: Document a method for checking safe RET operates properly
  x86/bugs: Add a separate config for GDS
  x86/bugs: Remove GDS Force Kconfig option
  x86/bugs: Add a separate config for SSB
  x86/bugs: Add a separate config for Spectre V2
  x86/bugs: Add a separate config for SRBDS
  x86/bugs: Add a separate config for Spectre v1
  x86/bugs: Add a separate config for RETBLEED
  x86/bugs: Add a separate config for L1TF
  x86/bugs: Add a separate config for MMIO Stable Data
  x86/bugs: Add a separate config for TAA
  x86/bugs: Add a separate config for MDS
2024-09-16 06:48:38 +02:00
Linus Torvalds d580d74ea2 - Add the final conversions to the new Intel VFM CPU model matching macros
which include the vendor and finally drop the old ones which hardcode family 6
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbe/GYACgkQEsHwGGHe
 VUqCcw//Y0HgpGpCzi7/WraEI0kzqduV4SCm2louK8MnkgSIVVrccSg6rTWapvs9
 Fxqyg6ZfTNxMuSEexSX9NMc7Nq7nm3m1JPztsZKcwur3fnfwPoxWjfR89dmnXbo6
 iUYbolgiPMUo8S18NSXyEaopMwPJYSV/lvMRPclrUsAFhy40+kQcXYVMvP1BAw2z
 Pi1TuRqMViVf5lzC71Xy/VntoAgWOtIEPnCLXeOLGIPkRZW+T/jUyTe6xFBOqjrg
 BAWfVH8U2Smf2eNzPqO0RDQttSYl6GWcz9bJIPihmlMpFuACSH9j0UadjAMPCVKp
 Th0uLxIaEWL7QV7qfSmWm0W79FZAhfJbA+EEKDQrUr+jgTEDE2r2hL7JVo2y8bHV
 3nXdaUTnyC0oFr0FPl8yRVk4RN23Uj+fB1m6CCkFnZZQ5xIGT5FERGqut6vEwJlV
 fAR8LioKMfRD7q/iQqw/iqMAi8SI0/YQ7R3HGYf6gnjkO86j4snWEdnpWHTraAlR
 y24CSUrJ1hh8FRl/ISj56fB6efPm4Ef/znd9CRhWoIaLMgEV8ICDDVkH8RBePaGK
 8D83mA/l1WJTAyyAUs6bu96x1TVWK+0xsazQmNJjPeh/mG54mmmrl9wK8YUaK0r4
 NasmpovQ7M0QQx5IkFgx4oR84r179pHF246phSV1nrLpX8/EAzQ=
 =0KAq
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cpuid updates from Borislav Petkov:

 - Add the final conversions to the new Intel VFM CPU model matching
   macros which include the vendor and finally drop the old ones which
   hardcode family 6

* tag 'x86_cpu_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/vfm: Delete all the *_FAM6_ CPU #defines
  x86/cpu/vfm: Delete X86_MATCH_INTEL_FAM6_MODEL[_STEPPING]() macros
  extcon: axp288: Switch to new Intel CPU model defines
  x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
2024-09-16 06:47:03 +02:00
Linus Torvalds d0a63f0e1a - Reorganize the struct mce populating functions so that MCA errors
reported through BIOS' BERT method can report the correct CPU number
   the error has been detected on
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbe79UACgkQEsHwGGHe
 VUoC/xAAo0ODks0tfq2NR65i9LOpkzsi5hxpGeh71sBehM3/MY+PiIkoHhq2qKTu
 Iwe85apPPl2mNAVspLZmIHmpdLNvcNtRThMrPuG5hwyt4vnX02JuSQa/Io8qYwMC
 0JXeuJBx8rcHrynqCEU665WAwdgBRtOTNkVQ+EklHkS4Djahmu2p00+pvUu3+B4R
 HDMcxfGhMTU/0LHvFaNPSqiWoaRJ1MmZMuiqnDwQTUGVkwwxeDQ8q5rnG3Tc7MVP
 p12kKE98UaHikKK3p4YiVu1UshfQEzUsRHdROp6iVphxOrrDURSKybXjf6G2AvDC
 /sRE94++jihi/3ULoboUCqSy5a1wiVrLG+JoQka6x66W4CUynGUCuYpa9WCfAsi/
 4mvt5TH2C2Lz/9XbljYSs+64S6Yra40aM5zH0IRLMMSHBEL/mkQiXyLmtOajJRXR
 vFmqlMO9lfWmADjsz5HzsxORpk/1EtZTbMbSXj56sv7ciE+eqnFLI0xaBMD8Z/Dm
 ldiTuInCw9mfIreE+1h1vK44pFY+/d5veFe9Kfv39yFUgObnVZsm0uMyqmZaE565
 T3ZVaQ3N/ghV6blQM+10wZNjs9EsVtv/iaoJSDbKJDcaK9B1BSUXOJ7j1VFeNFhe
 Dmtn5uu0k5DoSPHjvDVHVltYR2YjEClX2bXhrnW+Cf6276BV4kk=
 =TV+0
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - Reorganize the struct mce populating functions so that MCA errors
   reported through BIOS' BERT method can report the correct CPU number
   the error has been detected on

* tag 'ras_core_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()
  x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
  x86/mce: Rename mce_setup() to mce_prep_record()
2024-09-16 06:43:40 +02:00
Linus Torvalds 79f1a6adef - Simplify microcode patches loading on AMD Zen and newer by using the family,
model and stepping encoded in the patch revision number
 
 - Fix a silly clang warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmbe7qkACgkQEsHwGGHe
 VUrUFw/+Npl6FuQY9B18TUgTg64ln41fYtMIFpV3Gn64Ny/LhAJ7kyMbDrxH9nCV
 rwlCrqsyek0tIFWSslTuvTbrjK+omOLmhlJRlrYQ4V+lEWtliTOenQ35vkBwf3wS
 AVnKEhsbe2SWD2eV5kJPpGdNwuZiGhg8t8ZD959OPbMZkyEZ+Rz3KCGYKO5L+5a4
 CHGDnM+HOGhCQ4mek8Rya8aFWNWb7eh6CmGjDTYfAGE5AIoNeNRejruRrFXHZIff
 N7LlNMfqlTDWLx1Q26OXL9wes3PryNrUiAyTuDQnrS74E5OyvjzsyTW0rirmdFEa
 UfcPxedStj8Cse6nJfR0yaprAoTH6eCHkzj2sPcY8dcl8jhq9ChE2T2yjGSX642f
 4zneXA2kFYRpw6E+Y5qqB9kViEZiyUaSZ5LasucqE5TrZwaBPaXMBo3WqhvKRMuc
 mjH//Mo8CPNN4RpFk+1Ii8KnTyOE41WbMEJuzqdfQnzKJ2X5xxa6HZB7oHzne/HI
 tHEWJCInoRz8losvXPICJb20AKu/8vIS2F5ROXNCDPIAw/Fl+UT1prH4+Wo2nZB+
 8wElMzqTaWVcaQ2nAaUDSYompimbYCtgB3KWt9WLnBuHsXVbOQkdNyL7+bcQjV39
 KXVxo5QZlqc1Oqea+BURJ7BBq6VOssFiUeg8dW0FE4xzT3CS4N8=
 =kPyL
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loading updates from Borislav Petkov:

 - Simplify microcode patches loading on AMD Zen and newer by using the
   family, model and stepping encoded in the patch revision number

 - Fix a silly clang warning

* tag 'x86_microcode_for_v6.12_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix a -Wsometimes-uninitialized clang false positive
  x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID
2024-09-16 06:41:49 +02:00
Juergen Gross 9221222c71 xen: allow mapping ACPI data using a different physical address
When running as a Xen PV dom0 the system needs to map ACPI data of the
host using host physical addresses, while those addresses can conflict
with the guest physical addresses of the loaded linux kernel. The same
problem might apply in case a PV guest is configured to use the host
memory map.

This conflict can be solved by mapping the ACPI data to a different
guest physical address, but mapping the data via acpi_os_ioremap()
must still be possible using the host physical address, as this
address might be generated by AML when referencing some of the ACPI
data.

When configured to support running as a Xen PV domain, have an
implementation of acpi_os_ioremap() being aware of the possibility to
need above mentioned translation of a host physical address to the
guest physical address.

This modification requires to #include linux/acpi.h in some sources
which need to include asm/acpi.h directly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12 08:25:07 +02:00
Rafael J. Wysocki 415dff1c96 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.12-rc1:

 - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef).

 - Add support for Granite Rapids and Sierra Forest in OOB mode to the
   intel_pstate cpufreq driver (Srinivas Pandruvada).

 - Add basic support for CPU capacity scaling on x86 and make the
   intel_pstate driver set asymmetric CPU capacity on hybrid systems
   without SMT (Rafael Wysocki).

 - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq
   driver (Jeff Johnson).

 - Several OF related cleanups in cpufreq drivers (Rob Herring).

 - Enable COMPILE_TEST for ARM drivers (Rob Herrring).

 - Introduce quirks for syscon failures and use socinfo to get revision
   for TI cpufreq driver (Dhruva Gole, Nishanth Menon).

 - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay
   Ugwekar).

 - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers
   (Danila Tikhonov, Huacai Chen, and Liu Jing).

 - Make amd-pstate validate return of any attempt to update EPP limits,
   which fixes the masking hardware problems (Mario Limonciello).

 - Move the calculation of the AMD boost numerator outside of amd-pstate,
   correcting acpi-cpufreq on systems with preferred cores (Mario
   Limonciello).

 - Harden preferred core detection in amd-pstate to avoid potential
   false positives (Mario Limonciello).

 - Add extra unit test coverage for mode state machine (Mario
   Limonciello).

 - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu).

* pm-cpufreq: (35 commits)
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
  cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit()
  cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family
  cpufreq: Fix the cacography in powernv-cpufreq.c
  cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately
  cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request()
  cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value
  ...
2024-09-11 18:25:54 +02:00
Rafael J. Wysocki 9bcf30348f second round of amd-pstate changes for 6.12 (second try):
* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCgA4FiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmbhviEaHG1hcmlvLmxp
 bW9uY2llbGxvQGFtZC5jb20ACgkQLRkspiR3AnYqDA//TrvmXcpk1mnVJw3Y7MG0
 /n8dsLpxqVtEf+USnlGR+iRhgSQ/W/Kr7b5a+jmdCwpHChuWHt2FnNgcHLIxDnZC
 vmEJ02/2BCRoPKvcvV4VTh0ATu3O9nqwQiBVWBdNjDy+Dzr0pzA+SQopt1hCIsO2
 mzUodhpiBqYKlMf/i6+aM1gZCGGqoRC40aGqnJsgegb61vl7zIc2ZcbTxUQlyTfv
 t6J73IXLx8+YtrjejBYc7mRHhMQ2hCKy92C/8cNoGocj5faSKsAA3OUDcWq8qX0U
 zK3GGGdW8MLHSbt3VyntstnfiLL7TnzowcjvrMudIWpjC1987GlE9BApbN9VRZ8e
 ARN3Y7/ltjut/1fRB97BwjI9aDpzA0122Qzy4UOcK8o+be1eIr+ihV3Z9EN/snWg
 0L/oq5+rGHvvIzf1BwGhoPSvgBIu7eMIYDcRxKPlEiKsbXrL4DdJC/nXgaZ/HiGO
 eHx1dNy7LFrdnEwVI1frZWC6ZuZcpmOBdhnfU+leVxzB3Z++Qc266rsxKBsc5taZ
 PPV18pxfbbl3iL85KDIbuBUCmA0aY8WEdCKtfXpl7zlB5g0fZQLyYeUbvahK08Sk
 vyQAnPECbX/4v1Vx54Z70GPk0XD2+TXdg8yApnXrmRc36z/SLdprk5hPKbKhZu/r
 iPxFUnvd0HCtjsLrsq/qUiQ=
 =R4HZ
 -----END PGP SIGNATURE-----

Merge tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Merge the second round of amd-pstate changes for 6.12 from Mario
Limonciello:

"* Move the calculation of the AMD boost numerator outside of
   amd-pstate, correcting acpi-cpufreq on systems with preferred cores
 * Harden preferred core detection to avoid potential false positives
 * Add extra unit test coverage for mode state machine"

* tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue
  cpufreq/amd-pstate-ut: Add test case for mode switches
  cpufreq/amd-pstate: Export symbols for changing modes
  amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`
  cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`
  cpufreq: amd-pstate: Optimize amd_pstate_update_limits()
  cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
  x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
  x86/amd: Move amd_get_highest_perf() out of amd-pstate
  ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
  ACPI: CPPC: Drop check for non zero perf ratio
  x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
  ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB
  x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
2024-09-11 18:22:23 +02:00
Mario Limonciello ad4caad58d cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()
The special case in amd_pstate_highest_perf_set() is the value used
for calculating the boost numerator.  Merge this into
amd_get_boost_ratio_numerator() and then use that to calculate boost
ratio.

This allows dropping more special casing of the highest perf value.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello 279f838a61 x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()
AMD systems that support preferred cores will use "166" as their
numerator for max frequency calculations instead of "255".

Add a function for detecting preferred cores by looking at the
highest perf value on all cores.

If preferred cores are enabled return 166 and if disabled the
value in the highest perf register. As the function will be called
multiple times, cache the values for the boost numerator and if
preferred cores will be enabled in global variables.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello 2819bfef64 x86/amd: Move amd_get_highest_perf() out of amd-pstate
amd_pstate_get_highest_perf() is a helper used to get the highest perf
value on AMD systems.  It's used in amd-pstate as part of preferred
core handling, but applicable for acpi-cpufreq as well.

Move it out to cppc handling code as amd_get_highest_perf().

Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:23 -05:00
Mario Limonciello 21fb59ab4b ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn
If the boost ratio isn't calculated properly for the system for any
reason this can cause other problems that are non-obvious.

Raise all messages to warn instead.

Suggested-by: Perry Yuan <Perry.Yuan@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:22 -05:00
Mario Limonciello 3355ac2541 ACPI: CPPC: Drop check for non zero perf ratio
perf_ratio is a u64 and SCHED_CAPACITY_SCALE is a large number.
Shifting by one will never have a zero value.

Drop the check.

Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.sheoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:22 -05:00
Mario Limonciello 6c09e3b445 x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()
The function name is ambiguous because it returns an intermediate value
for calculating maximum frequency rather than the CPPC 'Highest Perf'
register.

Rename the function to clarify its use and allow the function to return
errors. Adjust the consumer in acpi-cpufreq to catch errors.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:22 -05:00
Mario Limonciello 2bcec09cc4 x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
To prepare to let amd_get_highest_perf() detect preferred cores
it will require CPPC functions. Move amd_get_highest_perf() to
cppc.c to prepare for 'preferred core detection' rework.

No functional changes intended.

Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11 10:23:22 -05:00
Thomas Gleixner 2f7eedca6c Merge branch 'linus' into timers/core
To update with the latest fixes.
2024-09-10 13:49:53 +02:00
Mark Brown 25d4054cc9 mm: make arch_get_unmapped_area() take vm_flags by default
Patch series "mm: Care about shadow stack guard gap when getting an
unmapped area", v2.

As covered in the commit log for c44357c2e7 ("x86/mm: care about shadow
stack guard gap during placement") our current mmap() implementation does
not take care to ensure that a new mapping isn't placed with existing
mappings inside it's own guard gaps.  This is particularly important for
shadow stacks since if two shadow stacks end up getting placed adjacent to
each other then they can overflow into each other which weakens the
protection offered by the feature.

On x86 there is a custom arch_get_unmapped_area() which was updated by the
above commit to cover this case by specifying a start_gap for allocations
with VM_SHADOW_STACK.  Both arm64 and RISC-V have equivalent features and
use the generic implementation of arch_get_unmapped_area() so let's make
the equivalent change there so they also don't get shadow stack pages
placed without guard pages.  The arm64 and RISC-V shadow stack
implementations are currently on the list:

   https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec94743
   https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/

Given the addition of the use of vm_flags in the generic implementation we
also simplify the set of possibilities that have to be dealt with in the
core code by making arch_get_unmapped_area() take vm_flags as standard. 
This is a bit invasive since the prototype change touches quite a few
architectures but since the parameter is ignored the change is
straightforward, the simplification for the generic code seems worth it.


This patch (of 3):

When we introduced arch_get_unmapped_area_vmflags() in 961148704a ("mm:
introduce arch_get_unmapped_area_vmflags()") we did so as part of properly
supporting guard pages for shadow stacks on x86_64, which uses a custom
arch_get_unmapped_area().  Equivalent features are also present on both
arm64 and RISC-V, both of which use the generic implementation of
arch_get_unmapped_area() and will require equivalent modification there. 
Rather than continue to deal with having two versions of the functions
let's bite the bullet and have all implementations of
arch_get_unmapped_area() take vm_flags as a parameter.

The new parameter is currently ignored by all implementations other than
x86.  The only caller that doesn't have a vm_flags available is
mm_get_unmapped_area(), as for the x86 implementation and the wrapper used
on other architectures this is modified to supply no flags.

No functional changes.

Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-0-a46b8b6dc0ed@kernel.org
Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-1-a46b8b6dc0ed@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Helge Deller <deller@gmx.de>	[parisc]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09 16:39:13 -07:00
Linus Torvalds fb92a1ffc1 hyperv-fixes for 6.11-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmbeRpsTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXsDDB/4oL6ypxiF3/yo+xR6bt8HlzIfcVeTx
 EuDR+a/hDRQdShMbNtgaF2OxovMO1W5Se2hCoNrKbVxrPRHL6gUuZASdm93l75eh
 l8I0muQif1q9rEXNbwQxe/ydE0860OgmE/ZGv944BXBtirG1fGHei1DNKkdL6VJy
 iEmmURwz7Ykg5neqwzYBY9SV7P/wwWZNR8GIRTWHhWU+ok1cYpehAs1dpQleAxsz
 WZCQLfIMXdSJBSDB/YO7JAlykZ1DkkTkI8pfbe2diReaDSw2QYsnsPXD6MVZArLO
 73kDojwb0LitLyWYEjm07ipOApkzYrEGTXjlLNdUVVF1Fx20nohu8jRd
 =k5P6
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20240908' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Add a documentation overview of Confidential Computing VM support
   (Michael Kelley)

 - Use lapic timer in a TDX VM without paravisor (Dexuan Cui)

 - Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
   (Michael Kelley)

 - Fix a kexec crash due to VP assist page corruption (Anirudh
   Rayabharam)

 - Python3 compatibility fix for lsvmbus (Anthony Nandaa)

 - Misc fixes (Rachel Menge, Roman Kisel, zhang jiao, Hongbo Li)

* tag 'hyperv-fixes-signed-20240908' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hv: vmbus: Constify struct kobj_type and struct attribute_group
  tools: hv: rm .*.cmd when make clean
  x86/hyperv: fix kexec crash due to VP assist page corruption
  Drivers: hv: vmbus: Fix the misplaced function description
  tools: hv: lsvmbus: change shebang to use python3
  x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
  Documentation: hyperv: Add overview of Confidential Computing VM support
  clocksource: hyper-v: Use lapic timer in a TDX VM without paravisor
  Drivers: hv: Remove deprecated hv_fcopy declarations
2024-09-09 09:31:55 -07:00
Anna-Maria Behnsen bd7c8ff9fe treewide: Fix wrong singular form of jiffies in comments
There are several comments all over the place, which uses a wrong singular
form of jiffies.

Replace 'jiffie' by 'jiffy'. No functional change.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Link: https://lore.kernel.org/all/20240904-devel-anna-maria-b4-timers-flseep-v1-3-e98760256370@linutronix.de
2024-09-08 20:47:40 +02:00
Aaron Lu c8ddc99eeb x86/sgx: Log information when a node lacks an EPC section
For optimized performance, firmware typically distributes EPC sections
evenly across different NUMA nodes. However, there are scenarios where
a node may have both CPUs and memory but no EPC section configured. For
example, in an 8-socket system with a Sub-Numa-Cluster=2 setup, there
are a total of 16 nodes. Given that the maximum number of supported EPC
sections is 8, it is simply not feasible to assign one EPC section to
each node. This configuration is not incorrect - SGX will still operate
correctly; it is just not optimized from a NUMA standpoint.

For this reason, log a message when a node with both CPUs and memory
lacks an EPC section. This will provide users with a hint as to why they
might be experiencing less-than-ideal performance when running SGX
enclaves.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20240905080855.1699814-3-aaron.lu%40intel.com
2024-09-05 15:20:47 -07:00
Aaron Lu 9c93684401 x86/sgx: Fix deadlock in SGX NUMA node search
When the current node doesn't have an EPC section configured by firmware
and all other EPC sections are used up, CPU can get stuck inside the
while loop that looks for an available EPC page from remote nodes
indefinitely, leading to a soft lockup. Note how nid_of_current will
never be equal to nid in that while loop because nid_of_current is not
set in sgx_numa_mask.

Also worth mentioning is that it's perfectly fine for the firmware not
to setup an EPC section on a node. While setting up an EPC section on
each node can enhance performance, it is not a requirement for
functionality.

Rework the loop to start and end on *a* node that has SGX memory. This
avoids the deadlock looking for the current SGX-lacking node to show up
in the loop when it never will.

Fixes: 901ddbb9ec ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
Reported-by: "Molina Sabido, Gerardo" <gerardo.molina.sabido@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Zhimin Luo <zhimin.luo@intel.com>
Link: https://lore.kernel.org/all/20240905080855.1699814-2-aaron.lu%40intel.com
2024-09-05 15:20:47 -07:00
David Kaplan 1dbb6b1495 x86/bugs: Fix handling when SRSO mitigation is disabled
When the SRSO mitigation is disabled, either via mitigations=off or
spec_rstack_overflow=off, the warning about the lack of IBPB-enhancing
microcode is printed anyway.

This is unnecessary since the user has turned off the mitigation.

  [ bp: Massage, drop SBPB rationale as it doesn't matter because when
    mitigations are disabled x86_pred_cmd is not being used anyway. ]

Signed-off-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20240904150711.193022-1-david.kaplan@amd.com
2024-09-05 11:20:50 +02:00
Daniel Sneddon 23e12b54ac x86/bugs: Add missing NO_SSB flag
The Moorefield and Lightning Mountain Atom processors are
missing the NO_SSB flag in the vulnerabilities whitelist.
This will cause unaffected parts to incorrectly be reported
as vulnerable. Add the missing flag.

These parts are currently out of service and were verified
internally with archived documentation that they need the
NO_SSB flag.

Closes: https://lore.kernel.org/lkml/CAEJ9NQdhh+4GxrtG1DuYgqYhvc0hi-sKZh-2niukJ-MyFLntAA@mail.gmail.com/
Reported-by: Shanavas.K.S <shanavasks@gmail.com>
Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240829192437.4074196-1-daniel.sneddon@linux.intel.com
2024-09-05 10:29:31 +02:00
Anirudh Rayabharam (Microsoft) b9af641827 x86/hyperv: fix kexec crash due to VP assist page corruption
commit 9636be85cc ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when
CPUs go online/offline") introduces a new cpuhp state for hyperv
initialization.

cpuhp_setup_state() returns the state number if state is
CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN and 0 for all other states.
For the hyperv case, since a new cpuhp state was introduced it would
return 0. However, in hv_machine_shutdown(), the cpuhp_remove_state() call
is conditioned upon "hyperv_init_cpuhp > 0". This will never be true and
so hv_cpu_die() won't be called on all CPUs. This means the VP assist page
won't be reset. When the kexec kernel tries to setup the VP assist page
again, the hypervisor corrupts the memory region of the old VP assist page
causing a panic in case the kexec kernel is using that memory elsewhere.
This was originally fixed in commit dfe94d4086 ("x86/hyperv: Fix kexec
panic/hang issues").

Get rid of hyperv_init_cpuhp entirely since we are no longer using a
dynamic cpuhp state and use CPUHP_AP_HYPERV_ONLINE directly with
cpuhp_remove_state().

Cc: stable@vger.kernel.org
Fixes: 9636be85cc ("x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline")
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240828112158.3538342-1-anirudh@anirudhrb.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240828112158.3538342-1-anirudh@anirudhrb.com>
2024-09-05 07:21:37 +00:00
Rafael J. Wysocki 5a9d10145a x86/sched: Add basic support for CPU capacity scaling
In order be able to compute the sizes of tasks consistently across all
CPUs in a hybrid system, it is necessary to provide CPU capacity scaling
information to the scheduler via arch_scale_cpu_capacity().  Moreover,
the value returned by arch_scale_freq_capacity() for the given CPU must
correspond to the arch_scale_cpu_capacity() return value for it, or
utilization computations will be inaccurate.

Add support for it through per-CPU variables holding the capacity and
maximum-to-base frequency ratio (times SCHED_CAPACITY_SCALE) that will
be returned by arch_scale_cpu_capacity() and used by scale_freq_tick()
to compute arch_freq_scale for the current CPU, respectively.

In order to avoid adding measurable overhead for non-hybrid x86 systems,
which are the vast majority in the field, whether or not the new hybrid
CPU capacity scaling will be in effect is controlled by a static key.
This static key is set by calling arch_enable_hybrid_capacity_scale()
which also allocates memory for the per-CPU data and initializes it.
Next, arch_set_cpu_capacity() is used to set the per-CPU variables
mentioned above for each CPU and arch_rebuild_sched_domains() needs
to be called for the scheduler to realize that capacity-aware
scheduling can be used going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance
Link: https://patch.msgid.link/10523497.nUPlyArG6x@rjwysocki.net
[ rjw: Added parens to function kerneldoc comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-04 13:36:40 +02:00
Dave Hansen fd82221a59 x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
There's an erratum that prevents the PAT from working correctly:

   https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf
   # Document 316515 Version 010

The kernel currently disables PAT support on those CPUs, but it
does it with some magic numbers.

Replace the magic numbers with the new "IFM" macros.

Make the check refer to the last affected CPU (INTEL_CORE_YONAH)
rather than the first fixed one. This makes it easier to find the
documentation of the erratum since Intel documents where it is
broken and not where it is fixed.

I don't think the Pentium Pro (or Pentium II) is actually affected.
But the old check included them, so it can't hurt to keep doing the
same.  I'm also not completely sure about the "Pentium M" CPUs
(models 0x9 and 0xd).  But, again, they were included in in the
old checks and were close Pentium III derivatives, so are likely
affected.

While we're at it, revise the comment referring to the erratum name
and making sure it is a quote of the language from the actual errata
doc.  That should make it easier to find in the future when the URL
inevitably changes.

Why bother with this in the first place? It actually gets rid of one
of the very few remaining direct references to c->x86{,_model}.

No change in functionality intended.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Link: https://lore.kernel.org/r/20240829220042.1007820-1-dave.hansen@linux.intel.com
2024-09-03 11:18:58 +02:00
Maciej W. Rozycki a678164aad x86/EISA: Dereference memory directly instead of using readl()
Sparse expect an __iomem pointer, but after converting the EISA probe to
memremap() the pointer is a regular memory pointer. Access it directly
instead.

[ tglx: Converted it to fix the already applied version  ]

Fixes: 80a4da0564 ("x86/EISA: Use memremap() to probe for the EISA BIOS signature")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/alpine.DEB.2.21.2408261015270.30766@angie.orcam.me.uk
2024-08-29 15:57:09 +02:00
Peter Newman a547a5880c x86/resctrl: Fix arch_mbm_* array overrun on SNC
When using resctrl on systems with Sub-NUMA Clustering enabled, monitoring
groups may be allocated RMID values which would overrun the
arch_mbm_{local,total} arrays.

This is due to inconsistencies in whether the SNC-adjusted num_rmid value or
the unadjusted value in resctrl_arch_system_num_rmid_idx() is used. The
num_rmid value for the L3 resource is currently:

  resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache

As a simple fix, make resctrl_arch_system_num_rmid_idx() return the
SNC-adjusted, L3 num_rmid value on x86.

Fixes: e13db55b5a ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Signed-off-by: Peter Newman <peternewman@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20240822190212.1848788-1-peternewman@google.com
2024-08-28 11:13:08 +02:00
Xin Li (Intel) fe85ee3919 x86/entry: Set FRED RSP0 on return to userspace instead of context switch
The FRED RSP0 MSR points to the top of the kernel stack for user level
event delivery. As this is the task stack it needs to be updated when a
task is scheduled in.

The update is done at context switch. That means it's also done when
switching to kernel threads, which is pointless as those never go out to
user space. For KVM threads this means there are two writes to FRED_RSP0 as
KVM has to switch to the guest value before VMENTER.

Defer the update to the exit to user space path and cache the per CPU
FRED_RSP0 value, so redundant writes can be avoided.

Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual
MSR value after returning from guest to host mode.

[ tglx: Massage change log ]

Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240822073906.2176342-4-xin@zytor.com
2024-08-25 19:23:00 +02:00
Andrew Cooper efe508816d x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism
Per the discussion about FRED MSR writes with WRMSRNS instruction [1],
use the alternatives mechanism to choose WRMSRNS when it's available,
otherwise fallback to WRMSR.

Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer
dependent on FRED.

[1] https://lore.kernel.org/lkml/15f56e6a-6edd-43d0-8e83-bb6430096514@citrix.com/

Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At
least that's the current information from the hardware folks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240822073906.2176342-3-xin@zytor.com
2024-08-25 19:23:00 +02:00