Commit Graph

1336 Commits

Author SHA1 Message Date
Andy Shevchenko 996457176b x86/early_printk: Use 'mmio32' for consistency, fix comments
First of all, using 'mmio' prevents proper implementation of 8-bit accessors.
Second, it's simply inconsistent with uart8250 set of options. Rename it to
'mmio32'. While at it, remove rather misleading comment in the documentation.
From now on mmio32 is self-explanatory and pciserial supports not only 32-bit
MMIO accessors.

Also, while at it, fix the comment for the "pciserial" case. The comment
seems to be a copy'n'paste error when mentioning "serial" instead of
"pciserial" (with double quotes). Fix this.

With that, move it upper, so we don't calculate 'buf' twice.

Fixes: 3181424aea ("x86/early_printk: Add support for MMIO-based UARTs")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Denis Mukhin <dmukhin@ford.com>
Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
2025-04-09 12:27:08 +02:00
Linus Torvalds dd9db3bff8 more s390 updates for 6.15 merge window
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address
 
 - Fix build failure on older linkers by conditionally adding the -no-pie
   linker option only when it is supported
 
 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations
 
 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()
 
 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and translation
   errors
 
 - Cleanup header dependency problems with asm-offsets.c
 
 - Add fault info for unexpected low-address protection faults in user mode
 
 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling
 
 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU
 
 - Remove unused machine_flags for the lowcore
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmfwbhwACgkQjYWKoQLX
 FBgmlwf5ASFl51yYziSg47NTOdCgkdE1un69VKCU7g9nzQBCFiK4PT+H/d69JHpt
 g9dHi4sW6aXooUWyqtrQbsDZSnGKvJd48wOkpqAKKKJCBx3On/7s1fi1sJXgJovo
 rSwC1cRD+yfFDNoQgXytSrRAkoPYvIPxf0aQei/8ziVNOrl7iF7nyPYpgNcZl2iz
 rQWt+o4oWhygs6lK4w9t+0V0QLL+CqTIN/tpf7qeQzNFN5WfRJBn0dtxQhK72enG
 WepU41LnZxDF5DTKhH4+PZFdLPP+YwwTGdqojW28leGy5T3/Eg0wrDaU1034Wpgd
 m5Fg03z94Vdul/rAEscaH1PLT3Ufng==
 =G5/g
 -----END PGP SIGNATURE-----

Merge tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

 - Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
   missing base register for relocated lowcore address

 - Fix build failure on older linkers by conditionally adding the
   -no-pie linker option only when it is supported

 - Fix inaccurate kernel messages in vfio-ap by providing descriptive
   error notifications for AP queue sharing violations

 - Fix PCI isolation logic by ensuring non-VF devices correctly return
   false in zpci_bus_is_isolated_vf()

 - Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
   proper sentinel element, preventing potential overruns and
   translation errors

 - Cleanup header dependency problems with asm-offsets.c

 - Add fault info for unexpected low-address protection faults in user
   mode

 - Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
   handling with common code handling

 - Use bitop functions to implement CPU flag helper functions to ensure
   that bits cannot get lost if modified in different contexts on a CPU

 - Remove unused machine_flags for the lowcore

* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
  s390/pci: Fix dev.dma_range_map missing sentinel element
  s390/mm: Dump fault info in case of low address protection fault
  s390/smp: Add support for HOTPLUG_SMT
  s390: Fix linker error when -no-pie option is unavailable
  s390/processor: Use bitop functions for cpu flag helper functions
  s390/asm-offsets: Remove ASM_OFFSETS_C
  s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
  s390/kvm: Split kvm_host header file
  s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
  s390/lowcore: Remove unused machine_flags
  s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
2025-04-04 16:58:34 -07:00
Linus Torvalds 4a1d8ababd RISC-V Patches for the 6.15 Merge Window, Part 1
* The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed.
 * The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings.
 * Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling.
 * Support for relocatable !MMU kernels builds.
 * Support for hpge pfnmaps, which should improve TLB utilization.
 * Support for runtime constants, which improves the d_hash()
   performance.
 * Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm.
 * Various fixes, including:
       - We were missing a secondary mmu notifier call when flushing the
 	tlb which is required for IOMMU.
       - Fix ftrace panics by saving the registers as expected by ftrace.
       - Fix a couple of stimecmp usage related to cpu hotplug.
       - purgatory_start is now aligned as per the STVEC requirements.
       - A fix for hugetlb when calculating the size of non-present PTEs.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmfv/soTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYierZEACDwI9lJFCEbQPon3z8rAy1moTj0+AZ
 bMfZFqMphUTrJ0cMm2+Bc+XZgck12zHCyu1UljDcZVYMCHA9aOoj5C5NkBBVLCuL
 uLYrhIoQXtJaVIANiFl0SHAZmh2s2OoSgmUzrEZ8JGlHpKCF7EVX5bHEsOvzn9ir
 B2W992W6q3ISuKXHKsTpa7rmTtf7swGYg6zW3pX3l6HmY+EMEQOcQl0tAB383J/T
 lm0K4+YvLpRJdm2ARpNGWlcFXj9/UXUM5hplK3aBAHpPKQ5/83/4tMDsfRvhpEVC
 VJXNgK+H4XLD542aQ8d4ZROguyhwn9e2n6Dkv0OqfNk4lg5pUBcJUZftQ+rB7AWg
 VYB1KVpxhwcruheXJFz8S3EzjZTcS+JrcD80vvx8JmHdXkZwHTfYUgiFwe/TR7yr
 b518fEbXpVwDZiCbaAe3Cmpw0mlNnSVmU4hgNbiwt0fu9DGdPN9WQbyds68RKb7A
 TWwDmmD6kV2BTWl0mHPtu9VhX58CDG+0WYbHA7r82p2T50187766C92GYfN2UPpz
 lH0iMRDkmucclZ3fEoosJ+HsDntc4oe6Bhdzuj52Q7vBpDd/QB6t5cfrlDpEEdgU
 3qoWMN5mb5l1rbvrqENh5ZgmEpzV8K0R5F5quiXh/9wO0y1kepDslTqC2oXK/m0p
 DzsvvD6UnNMOUQ==
 =nCJo
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The sub-architecture selection Kconfig system has been cleaned up,
   the documentation has been improved, and various detections have been
   fixed

 - The vector-related extensions dependencies are now validated when
   parsing from device tree and in the DT bindings

 - Misaligned access probing can be overridden via a kernel command-line
   parameter, along with various fixes to misalign access handling

 - Support for relocatable !MMU kernels builds

 - Support for hpge pfnmaps, which should improve TLB utilization

 - Support for runtime constants, which improves the d_hash()
   performance

 - Support for bfloat16, Zicbom, Zaamo, Zalrsc, Zicntr, Zihpm

 - Various fixes, including:
      - We were missing a secondary mmu notifier call when flushing the
        tlb which is required for IOMMU
      - Fix ftrace panics by saving the registers as expected by ftrace
      - Fix a couple of stimecmp usage related to cpu hotplug
      - purgatory_start is now aligned as per the STVEC requirements
      - A fix for hugetlb when calculating the size of non-present PTEs

* tag 'riscv-for-linus-6.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (65 commits)
  riscv: Add norvc after .option arch in runtime const
  riscv: Make sure toolchain supports zba before using zba instructions
  riscv/purgatory: 4B align purgatory_start
  riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator
  selftests: riscv: fix v_exec_initval_nolibc.c
  riscv: Fix hugetlb retrieval of number of ptes in case of !present pte
  riscv: print hartid on bringup
  riscv: Add norvc after .option arch in runtime const
  riscv: Remove CONFIG_PAGE_OFFSET
  riscv: Support CONFIG_RELOCATABLE on riscv32
  asm-generic: Always define Elf_Rel and Elf_Rela
  riscv: Support CONFIG_RELOCATABLE on NOMMU
  riscv: Allow NOMMU kernels to access all of RAM
  riscv: Remove duplicate CONFIG_PAGE_OFFSET definition
  RISC-V: errata: Use medany for relocatable builds
  dt-bindings: riscv: document vector crypto requirements
  dt-bindings: riscv: add vector sub-extension dependencies
  dt-bindings: riscv: d requires f
  RISC-V: add f & d extension validation checks
  RISC-V: add vector crypto extension validation checks
  ...
2025-04-04 09:49:17 -07:00
Linus Torvalds 6cb0bd94c0 Persistent buffer cleanups and simplifications for v6.15:
It was mistaken that the physical memory returned from "reserve_mem" had to
 be vmap()'d to get to it from a virtual address. But reserve_mem already
 maps the memory to the virtual address of the kernel so a simple
 phys_to_virt() can be used to get to the virtual address from the physical
 memory returned by "reserve_mem". With this new found knowledge, the
 code can be cleaned up and simplified.
 
 - Enforce that the persistent memory is page aligned
 
   As the buffers using the persistent memory are all going to be
   mapped via pages, make sure that the memory given to the tracing
   infrastructure is page aligned. If it is not, it will print a warning
   and fail to map the buffer.
 
 - Use phys_to_virt() to get the virtual address from reserve_mem
 
   Instead of calling vmap() on the physical memory returned from
   "reserve_mem", use phys_to_virt() instead.
 
   As the memory returned by "memmap" or any other means where a physical
   address is given to the tracing infrastructure, it still needs to
   be vmap(). Since this memory can never be returned back to the buddy
   allocator nor should it ever be memmory mapped to user space, flag
   this buffer and up the ref count. The ref count will keep it from
   ever being freed, and the flag will prevent it from ever being memory
   mapped to user space.
 
 - Use vmap_page_range() for memmap virtual address mapping
 
   For the memmap buffer, instead of allocating an array of struct pages,
   assigning them to the contiguous phsycial memory and then passing that to
   vmap(), use vmap_page_range() instead
 
 - Replace flush_dcache_folio() with flush_kernel_vmap_range()
 
   Instead of calling virt_to_folio() and passing that to
   flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
   This also fixes a bug where if a subbuffer was bigger than PAGE_SIZE
   only the PAGE_SIZE portion would be flushed.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+6oZRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhq6AP481KHAgaowQCg7zrKPkMlbYBIigYoU
 7aqoAg2rSLBRSQEAl8fViHZgZ9Q+O7xdozQWiIR7/KQW8VIaTcP/V7cHkAU=
 =+5JB
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:
 "Persistent buffer cleanups and simplifications.

  It was mistaken that the physical memory returned from "reserve_mem"
  had to be vmap()'d to get to it from a virtual address. But
  reserve_mem already maps the memory to the virtual address of the
  kernel so a simple phys_to_virt() can be used to get to the virtual
  address from the physical memory returned by "reserve_mem". With this
  new found knowledge, the code can be cleaned up and simplified.

   - Enforce that the persistent memory is page aligned

     As the buffers using the persistent memory are all going to be
     mapped via pages, make sure that the memory given to the tracing
     infrastructure is page aligned. If it is not, it will print a
     warning and fail to map the buffer.

   - Use phys_to_virt() to get the virtual address from reserve_mem

     Instead of calling vmap() on the physical memory returned from
     "reserve_mem", use phys_to_virt() instead.

     As the memory returned by "memmap" or any other means where a
     physical address is given to the tracing infrastructure, it still
     needs to be vmap(). Since this memory can never be returned back to
     the buddy allocator nor should it ever be memmory mapped to user
     space, flag this buffer and up the ref count. The ref count will
     keep it from ever being freed, and the flag will prevent it from
     ever being memory mapped to user space.

   - Use vmap_page_range() for memmap virtual address mapping

     For the memmap buffer, instead of allocating an array of struct
     pages, assigning them to the contiguous phsycial memory and then
     passing that to vmap(), use vmap_page_range() instead

   - Replace flush_dcache_folio() with flush_kernel_vmap_range()

     Instead of calling virt_to_folio() and passing that to
     flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
     This also fixes a bug where if a subbuffer was bigger than
     PAGE_SIZE only the PAGE_SIZE portion would be flushed"

* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
  tracing: Use vmap_page_range() to map memmap ring buffer
  tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
  tracing: Enforce the persistent ring buffer to be page aligned
2025-04-03 16:09:29 -07:00
Steven Rostedt c44a14f216 tracing: Enforce the persistent ring buffer to be page aligned
Enforce that the address and the size of the memory used by the persistent
ring buffer is page aligned. Also update the documentation to reflect this
requirement.

Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/20250402144953.412882844@goodmis.org
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-02 11:02:26 -04:00
Linus Torvalds d6b02199cd - The 7 patch series "powerpc/crash: use generic crashkernel
reservation" from Sourabh Jain changes powerpc's kexec code to use more
   of the generic layers.
 
 - The 2 patch series "get_maintainer: report subsystem status
   separately" from Vlastimil Babka makes some long-requested improvements
   to the get_maintainer output.
 
 - The 4 patch series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the ucount
   code.
 
 - The 12 patch series "reboot: support runtime configuration of
   emergency hw_protection action" from Ahmad Fatoum improves the ability
   for a driver to perform an emergency system shutdown or reboot.
 
 - The 16 patch series "Converge on using secs_to_jiffies() part two"
   from Easwar Hariharan performs further migrations from
   msecs_to_jiffies() to secs_to_jiffies().
 
 - The 7 patch series "lib/interval_tree: add some test cases and
   cleanup" from Wei Yang permits more userspace testing of kernel library
   code, adds some more tests and performs some cleanups.
 
 - The 2 patch series "hung_task: Dump the blocking task stacktrace" from
   Masami Hiramatsu arranges for the hung_task detector to dump the stack
   of the blocking task and not just that of the blocked task.
 
 - The 4 patch series "resource: Split and use DEFINE_RES*() macros" from
   Andy Shevchenko provides some cleanups to the resource definition
   macros.
 
 - Plus the usual shower of singleton patches - please see the individual
   changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nuqwAKCRDdBJ7gKXxA
 jtNqAQDxqJpjWkzn4yN9CNSs1ivVx3fr6SqazlYCrt3u89WQvwEA1oRrGpETzUGq
 r6khQUIcQImPPcjFqEFpuiSOU0MBZA0=
 =Kii8
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
2025-04-01 10:06:52 -07:00
Linus Torvalds eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Heiko Carstens 1018424ace s390/smp: Add support for HOTPLUG_SMT
Add support for HOTPLUG_SMT. With this the s390 specific "nosmt" kernel
command line parameter handling is replaced with common code handling.

This means that just specifying "nosmt" still enables smt from an
architectural point of view, however only the primary (base) cpu can be set
online. Enabling smt during runtime via /sys/devices/system/cpu/smt/control
allows to set secondary cpus online. This way "nosmt" works like on other
architectures where enabling and disabling smt during runtime is possible.

If "nosmt=force" is specified smt is also still enabled from an
architectural point of view, but there is no way to set secondary cpus
online during runtime, also like on other architectures.

In order to disable smt from architectural point of view, which was
previously achieved with the s390 specific "nosmt" command line option,
"smt=1" can be used.

Tested-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-31 12:20:39 +02:00
Linus Torvalds 7405c0f01a Miscellaneous x86 fixes and updates:
- Fix a large number of x86 Kconfig dependency and help text accuracy
    bugs/problems, by Mateusz Jończyk and David Heideberg.
 
  - Fix a VM_PAT interaction with fork() crash. This also touches
    core kernel code.
 
  - Fix an ORC unwinder bug for interrupt entries
 
  - Fixes and cleanups.
 
  - Fix an AMD microcode loader bug that can promote verification failures
    into success.
 
  - Add early-printk support for MMIO based UARTs on an x86 board that
    had no other serial debugging facility and also experienced early
    boot crashes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnFBERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAmiB4soT3/WbaWJJdeVyxEL7sOmUNOm04
 5kAVHJVK8QGdje0eWa6h7xmuQD3UOxafE2coCrOxHhZi2qpAAY6CPIIy6oIBRwZK
 gLgT5xn1CHojfm4UFC3YUOyecBRPUF2C5jfkajWdZHumyPP/sOObqvGanpQRAYd5
 bfPHEvrBpeEeS7WkATCdyF2j+I5xYflD4g/MDAsMmqasQHOnjBuFX5VBeVxxkysC
 dMsFkFpxqcA95MnnyOnxXzgOtRTY0UystX07D3Bk1pqhG9zor+mp8OynsTRCU87T
 ZPPbUr2qACNmCqEEXl+F1mAkgj5H66xE2gaJdYx0/jBAIbX8Nwih7mMxhJShVU07
 Lhc0tukmVrDoDaVIr2HsxqI8iokuYLszUjDAqEQmQDrgelL6usPYghN1b2bDSJ9r
 0hCO/s79024H/U9oMrC+CF52D5UH/fE98ipigrbKRIO/hOsoxiiniF3DG2NVWZM2
 n5nPnOdbperqjCEteN1nxQfr7XZkvP95Bwmuqqc90XH+tzKJdHruUkbm4ua7NEEz
 WKgsUIYFjeN5ZrHbJaNtHlQueTyvsyGmL1nlaLi/MaJbSXPsM/WfwvHsaKTh3NrE
 BFwEAhMZVLDHEfnFT0Ev7Mm1MGpW8MbHoRBR1+E5FWWNS4X0yGLKXWRp8diw25Tm
 W3ZVsn65E6U=
 =/qKX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

 - Fix a large number of x86 Kconfig dependency and help text accuracy
   bugs/problems, by Mateusz Jończyk and David Heideberg

 - Fix a VM_PAT interaction with fork() crash. This also touches core
   kernel code

 - Fix an ORC unwinder bug for interrupt entries

 - Fixes and cleanups

 - Fix an AMD microcode loader bug that can promote verification
   failures into success

 - Add early-printk support for MMIO based UARTs on an x86 board that
   had no other serial debugging facility and also experienced early
   boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
  x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
  x86/fpu: Update the outdated comment above fpstate_init_user()
  x86/early_printk: Add support for MMIO-based UARTs
  x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
  x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
  x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
  x86/Kconfig: Correct X86_X2APIC help text
  x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
  x86/Kconfig: Document release year of glibc 2.3.3
  x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
  x86/Kconfig: Document CONFIG_PCI_MMCONFIG
  x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
  x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
  x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
  x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30 15:25:15 -07:00
Linus Torvalds 96050814a3 printk changes for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflJC4ACgkQUqAMR0iA
 lPIEMBAAsIJmQPHdZUQ49tkMGMoQPU9gnrrDZbRzN9IWwMQQymwEobSEkJkrx7O3
 MFQHc5Z29cAxDqrWDCG5hrH0P3rd994pLPgC/XZC9+UbksYIkIrudCqaPaekLVaw
 BQmBZnZRe7PltOiDpx/TxQ8WQgNlAmZpLpIRTB3ePHZpikq/mf4CGcoYXkrNij1j
 E7uzurLZssJwXasRkWOVNvaOFyVcxrjn78H0JoT7mVeh2IUkWyMALX4IyDeD6Uvh
 SCe19/6W3Uyeh/jpJ6+gjuirVy/Vam/2GXeer4yKpgSQBtsqIaGyki4Cc6Xy12gU
 71nRPTggysZXNnqZ40qw68Irb9JGAo63qFDwWf3OQI+yQQWHyGZIinxr+2xupwWF
 ZgmDJtvUsWKHT4+WmB9+MVTGW7Jy8wKq2u/Haf8xCJxs2yhI0J8iUvQi3CPAMdRH
 Q8G9UJP29Lfj0hdKVT2/wVNGxZKIZ8c6v1vR6Rd1hJ0Zqp4Oj+XGr0btE8Ah5LYv
 GI/M00LCdTNT2kBbbw8lekovUIs2stiRl6hfKoQFR11oiZwFDUePLDdJH2GiJAZm
 g9fPs6YuBERfPtzUeo9a9UbSd6oPTLgLNBEklpHUTWilzO8E/Jh+ZXyvQwj7eXyq
 Hi4OnShn1uxL68o6DJfKsF2qW8S2U5g61JrU3D1CaQ98Derrp0o=
 =lNTZ
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - New option "printk.debug_non_panic_cpus" allows to store printk
   messages from non-panic CPUs during panic. It might be useful when
   panic() fails. It is disabled by default because it increases the
   chance to see the messages printed before panic() and on the
   panic-CPU.

 - New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build
   kernel without the virtual terminal support which prefers ttynull
   over serial console.

 - Do not unblank suspended consoles.

 - Some code clean up.

* tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
  printk: Add an option to allow ttynull to be a default console device
  printk: Check CON_SUSPEND when unblanking a console
  printk: Rename console_start to console_resume
  printk: Rename console_stop to console_suspend
  printk: Rename resume_console to console_resume_all
  printk: Rename suspend_console to console_suspend_all
2025-03-27 19:22:24 -07:00
Linus Torvalds 744fab2d9f tracing updates for v6.15:
- Add option traceoff_after_boot
 
   In order to debug kernel boot, it sometimes is helpful to enable tracing
   via the kernel command line. Unfortunately, by the time the login prompt
   appears, the trace is overwritten by the init process and other user space
   start up applications. Adding a "traceoff_after_boot" will disable tracing
   when the kernel passes control to init which will allow developers to be
   able to see the traces that occurred during boot.
 
 - Clean up the mmflags macros that display the GFP flags in trace events
 
   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in the
   process it also adds some flags that were missed before.
 
 - Removed some dead code and scripts/draw_functrace.py
 
   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.
 
 - Constify struct event_trigger_ops
 
   The event_trigger_ops is just a structure that has function pointers that
   are assigned when the variables are created. These variables should all be
   constants.
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+V9IhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4RAP9JhE3n69pGuOVaJTN/LGLr2Axl59n4
 KqZSZS1nUM76/gD6AxYpR7nxyxgJ7VjNkLptS9tSjJVdPDxGAl0v3eO04w4=
 =SU30
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Add option traceoff_after_boot

   In order to debug kernel boot, it sometimes is helpful to enable
   tracing via the kernel command line. Unfortunately, by the time the
   login prompt appears, the trace is overwritten by the init process
   and other user space start up applications.

   Adding a "traceoff_after_boot" will disable tracing when the kernel
   passes control to init which will allow developers to be able to see
   the traces that occurred during boot.

 - Clean up the mmflags macros that display the GFP flags in trace
   events

   The macros to print the GFP flags for trace events had a bit of
   duplication. The code was restructured to remove duplication and in
   the process it also adds some flags that were missed before.

 - Removed some dead code and scripts/draw_functrace.py

   draw_functrace.py hasn't worked in years and as nobody complained
   about it, remove it.

 - Constify struct event_trigger_ops

   The event_trigger_ops is just a structure that has function pointers
   that are assigned when the variables are created. These variables
   should all be constants.

 - Other minor clean ups and fixes

* tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Replace strncpy with memcpy for fixed-length substring copy
  tracing: Fix synth event printk format for str fields
  tracing: Do not use PERF enums when perf is not defined
  tracing: Ensure module defining synth event cannot be unloaded while tracing
  tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
  tracing/osnoise: Fix possible recursive locking for cpus_read_lock()
  tracing: Align synth event print fmt
  tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format
  tracepoint: Print the function symbol when tracepoint_debug is set
  tracing: Constify struct event_trigger_ops
  scripts/tracing: Remove scripts/tracing/draw_functrace.py
  tracing: Update MAINTAINERS file to include tracepoint.c
  tracing/user_events: Slightly simplify user_seq_show()
  tracing/user_events: Don't use %pK through printk
  tracing: gfp: Remove duplication of recording GFP flags
  tracing: Remove orphaned event_trace_printk
  ring-buffer: Fix typo in comment about header page pointer
  tracing: Add traceoff_after_boot option
2025-03-27 16:22:12 -07:00
Petr Mladek f49040c7aa Merge branch 'for-6.15-console-suspend-api-cleanup' into for-linus 2025-03-27 11:09:34 +01:00
Linus Torvalds 22093997ac ata changes for 6.15
- Add 'external' to the libata.force module parameter, in order to
    allow a user to workaround broken firmware (me)
 
  - Use the str_up_down() helper in the sata_via driver (Salah Triki)
 
  - Convert the Freescale PowerQUICC SATA device tree binding to YAML
    (J. Neuschäfer)
 
  - Do not use ATAPI DMA for a device that only supports PIO (me)
 
  - Add Marvell 88SE9215 PCI device ID to the ahci driver.
    Since the controller has quirks, it cannot rely on the generic
    AHCI PCI class code entry (Daniel Kral)
 
  - Improve the return value of atapi_check_dma() (Huacai Chen)
 
  - Fix the NCQ Non-Data log not supported print to actually reference
    the correct log (me)
 
  - Make Marvel 88SE9215 prefer DMA for ATAPI devices (Huacai Chen)
 
  - Simplify the AHCI IRQ vector allocations by performing the IRQ
    vector allocations in the same function, regardless of IRQ type
    (Tomas Henzl)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRN+ES/c4tHlMch3DzJZDGjmcZNcgUCZ+MDSwAKCRDJZDGjmcZN
 ctr2AQCu7c9GPgL6VbVIzgoTdHqN39f7YuyajWxsNKi0832CTAEAkkk2exqa1tMy
 3gFmHcKqm1PoWD2VhjU6odzHy0UtQQY=
 =61mg
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata updates from Niklas Cassel:

 - Add 'external' to the libata.force module parameter, in order to
   allow a user to workaround broken firmware (me)

 - Use the str_up_down() helper in the sata_via driver (Salah Triki)

 - Convert the Freescale PowerQUICC SATA device tree binding to YAML
   (J. Neuschäfer)

 - Do not use ATAPI DMA for a device that only supports PIO (me)

 - Add Marvell 88SE9215 PCI device ID to the ahci driver. Since the
   controller has quirks, it cannot rely on the generic AHCI PCI class
   code entry (Daniel Kral)

 - Improve the return value of atapi_check_dma() (Huacai Chen)

 - Fix the NCQ Non-Data log not supported print to actually reference
   the correct log (me)

 - Make Marvel 88SE9215 prefer DMA for ATAPI devices (Huacai Chen)

 - Simplify the AHCI IRQ vector allocations by performing the IRQ vector
   allocations in the same function, regardless of IRQ type (Tomas
   Henzl)

* tag 'ata-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: ahci: simplify init function
  ahci: Marvell 88SE9215 controllers prefer DMA for ATAPI
  ata: libata: Fix NCQ Non-Data log not supported print
  ata: libata: Improve return value of atapi_check_dma()
  ahci: add PCI ID for Marvell 88SE9215 SATA Controller
  ata: libata-eh: Do not use ATAPI DMA for a device limited to PIO mode
  dt-bindings: ata: Convert fsl,pq-sata to YAML
  ata: sata_via: Use str_up_down() helper in vt6420_prereset()
  ata: libata-core: Add 'external' to the libata.force kernel parameter
2025-03-26 19:49:02 -07:00
Linus Torvalds 7d20aa5c32 Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from
    cpufreq core to reduce boilerplate code in drivers (Viresh Kumar).
 
  - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
    Dhananjay Ugwekar, Imran Shaik, zuoqian).
 
  - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
    Bai).
 
  - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
 
  - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).
 
  - Optimize the amd-pstate driver to avoid cases where call paths end
    up calling the same writes multiple times and needlessly caching
    variables through code reorganization, locking overhaul and tracing
    adjustments (Mario Limonciello, Dhananjay Ugwekar).
 
  - Make it possible to avoid enabling capacity-aware scheduling (CAS) in
    the intel_pstate driver and relocate a check for out-of-band (OOB)
    platform handling in it to make it detect OOB before checking HWP
    availability (Rafael Wysocki).
 
  - Fix dbs_update() to avoid inadvertent conversions of negative integer
    values to unsigned int which causes CPU frequency selection to be
    inaccurate in some cases when the "conservative" cpufreq governor is
    in use (Jie Zhan).
 
  - Update the handling of the most recent idle intervals in the menu
    cpuidle governor to prevent useful information from being discarded
    by it in some cases and improve the prediction accuracy (Rafael
    Wysocki).
 
  - Make it possible to tell the intel_idle driver to ignore its built-in
    table of idle states for the given processor, clean up the handling
    of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
    and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
    Rafael Wysocki).
 
  - Make some cpuidle drivers use for_each_present_cpu() instead of
    for_each_possible_cpu() during initialization to avoid issues
    occurring when nosmp or maxcpus=0 are used (Jacky Bai).
 
  - Clean up the Energy Model handling code somewhat (Rafael Wysocki).
 
  - Use kfree_rcu() to simplify the handling of runtime Energy Model
    updates (Li RongQing).
 
  - Add an entry for the Energy Model framework to MAINTAINERS as
    properly maintained (Lukasz Luba).
 
  - Address RCU-related sparse warnings in the Energy Model code (Rafael
    Wysocki).
 
  - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
    when DEVFREQ is set without CPUFREQ so it can be used on a wider
    range of systems (Jeson Gao).
 
  - Unify error handling during runtime suspend and runtime resume in the
    core to help drivers to implement more consistent runtime PM error
    handling (Rafael Wysocki).
 
  - Drop a redundant check from pm_runtime_force_resume() and rearrange
    documentation related to __pm_runtime_disable() (Rafael Wysocki).
 
  - Rework the handling of the "smart suspend" driver flag in the PM core
    to avoid issues hat may occur when drivers using it depend on some
    other drivers and clean up the related PM core code (Rafael Wysocki,
    Colin Ian King).
 
  - Fix the handling of devices with the power.direct_complete flag set
    if device_suspend() returns an error for at least one device to avoid
    situations in which some of them may not be resumed (Rafael Wysocki).
 
  - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
    possible deadlock that may occur if the "compressor" hibernation
    module parameter is accessed during the registration of a new
    ieee80211 device (Lizhi Xu).
 
  - Suppress sleeping parent warning in device_pm_add() in the case when
    new children are added under a device with the power.direct_complete
    set after it has been processed by device_resume() (Xu Yang).
 
  - Remove needless return in three void functions related to system
    wakeup (Zijun Hu).
 
  - Replace deprecated kmap_atomic() with kmap_local_page() in the
    hibernation core code (David Reaver).
 
  - Remove unused helper functions related to system sleep (David Alan
    Gilbert).
 
  - Clean up s2idle_enter() so it does not lock and unlock CPU offline
    in vain and update comments in it (Ulf Hansson).
 
  - Clean up broken white space in dpm_wait_for_children() (Geert
    Uytterhoeven).
 
  - Update the cpupower utility to fix lib version-ing in it and memory
    leaks in error legs, remove hard-coded values, and implement CPU
    physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
    Khan, Yiwei Lin, Zhongqiu Han).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA
 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q
 TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s
 HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS
 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay
 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI=
 =LqVy
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are dominated by cpufreq updates which in turn are dominated by
  updates related to boost support in the core and drivers and
  amd-pstate driver optimizations.

  Apart from the above, there are some cpuidle updates including a
  rework of the most recent idle intervals handling in the venerable
  menu governor that leads to significant improvements in some
  performance benchmarks, as the governor is now more likely to predict
  a shorter idle duration in some cases, and there are updates of the
  core device power management code, mostly related to system suspend
  and resume, that should help to avoid potential issues arising when
  the drivers of devices depending on one another want to use different
  optimizations.

  There is also a usual collection of assorted fixes and cleanups,
  including removal of some unused code.

  Specifics:

   - Manage sysfs attributes and boost frequencies efficiently from
     cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

   - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
     Dhananjay Ugwekar, Imran Shaik, zuoqian)

   - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
     Bai)

   - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

   - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

   - Optimize the amd-pstate driver to avoid cases where call paths end
     up calling the same writes multiple times and needlessly caching
     variables through code reorganization, locking overhaul and tracing
     adjustments (Mario Limonciello, Dhananjay Ugwekar)

   - Make it possible to avoid enabling capacity-aware scheduling (CAS)
     in the intel_pstate driver and relocate a check for out-of-band
     (OOB) platform handling in it to make it detect OOB before checking
     HWP availability (Rafael Wysocki)

   - Fix dbs_update() to avoid inadvertent conversions of negative
     integer values to unsigned int which causes CPU frequency selection
     to be inaccurate in some cases when the "conservative" cpufreq
     governor is in use (Jie Zhan)

   - Update the handling of the most recent idle intervals in the menu
     cpuidle governor to prevent useful information from being discarded
     by it in some cases and improve the prediction accuracy (Rafael
     Wysocki)

   - Make it possible to tell the intel_idle driver to ignore its
     built-in table of idle states for the given processor, clean up the
     handling of auto-demotion disabling on Baytrail and Cherrytrail
     chips in it, and update its MAINTAINERS entry (David Arcari, Artem
     Bityutskiy, Rafael Wysocki)

   - Make some cpuidle drivers use for_each_present_cpu() instead of
     for_each_possible_cpu() during initialization to avoid issues
     occurring when nosmp or maxcpus=0 are used (Jacky Bai)

   - Clean up the Energy Model handling code somewhat (Rafael Wysocki)

   - Use kfree_rcu() to simplify the handling of runtime Energy Model
     updates (Li RongQing)

   - Add an entry for the Energy Model framework to MAINTAINERS as
     properly maintained (Lukasz Luba)

   - Address RCU-related sparse warnings in the Energy Model code
     (Rafael Wysocki)

   - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
     when DEVFREQ is set without CPUFREQ so it can be used on a wider
     range of systems (Jeson Gao)

   - Unify error handling during runtime suspend and runtime resume in
     the core to help drivers to implement more consistent runtime PM
     error handling (Rafael Wysocki)

   - Drop a redundant check from pm_runtime_force_resume() and rearrange
     documentation related to __pm_runtime_disable() (Rafael Wysocki)

   - Rework the handling of the "smart suspend" driver flag in the PM
     core to avoid issues hat may occur when drivers using it depend on
     some other drivers and clean up the related PM core code (Rafael
     Wysocki, Colin Ian King)

   - Fix the handling of devices with the power.direct_complete flag set
     if device_suspend() returns an error for at least one device to
     avoid situations in which some of them may not be resumed (Rafael
     Wysocki)

   - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
     possible deadlock that may occur if the "compressor" hibernation
     module parameter is accessed during the registration of a new
     ieee80211 device (Lizhi Xu)

   - Suppress sleeping parent warning in device_pm_add() in the case
     when new children are added under a device with the
     power.direct_complete set after it has been processed by
     device_resume() (Xu Yang)

   - Remove needless return in three void functions related to system
     wakeup (Zijun Hu)

   - Replace deprecated kmap_atomic() with kmap_local_page() in the
     hibernation core code (David Reaver)

   - Remove unused helper functions related to system sleep (David Alan
     Gilbert)

   - Clean up s2idle_enter() so it does not lock and unlock CPU offline
     in vain and update comments in it (Ulf Hansson)

   - Clean up broken white space in dpm_wait_for_children() (Geert
     Uytterhoeven)

   - Update the cpupower utility to fix lib version-ing in it and memory
     leaks in error legs, remove hard-coded values, and implement CPU
     physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
     Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
  PM: sleep: Fix bit masking operation
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  PM: sleep: Fix handling devices with direct_complete set on errors
  cpuidle: Init cpuidle only for present CPUs
  PM: clk: Remove unused pm_clk_remove()
  PM: sleep: core: Fix indentation in dpm_wait_for_children()
  PM: s2idle: Extend comment in s2idle_enter()
  PM: s2idle: Drop redundant locks when entering s2idle
  PM: sleep: Remove unused pm_generic_ wrappers
  cpufreq: tegra186: Share policy per cluster
  cpupower: Make lib versioning scheme more obvious and fix version link
  PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
  PM: EM: Address RCU-related sparse warnings
  cpupower: Implement CPU physical core querying
  pm: cpupower: remove hard-coded topology depth values
  pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
  ...
2025-03-25 15:00:18 -07:00
Linus Torvalds 906174776c - Some preparatory work to convert the mitigations machinery to mitigating
attack vectors instead of single vulnerabilities
 
 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag
 
 - Add support for a Zen5-specific SRSO mitigation
 
 - Cleanups and minor improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmfixS0ACgkQEsHwGGHe
 VUpi1xAAgvH2u8Eo8ibT5dABQpD65w3oQiykO+9aDpObG9w9beDVGlld8DJE61Rz
 6tcE0Clp2H/tMcCbn8zXIJ92TQ3wIX/85uZwLi1VEM1Tx7A6VtAbPv8WKfZE3FCX
 9v92HRKnK3ql+A2ZR+oyy+/8RedUmia7y7/bXH1H7Zf2uozoKkmq5cQnwfq5iU4A
 qNiKuvSlQwjZ8Zz6Ax1ugHUkE4R7mlKh8rccLXl4+mVr63/lkPHSY3OFTjcYf4HW
 Ir92N86Spfo0/l0vsOOsWoYKmoaiVP7ouJh7YbKR3B0BGN0pt2MT476mehkEs427
 m4J6XhRKhIrsYmzEkLvvpsg12zO4/PKk8BEYNS7YPYlRaOwjV4ivyFS2aY6e55rh
 yUHyo9s+16f/Mp+/fNFXll3mdMxYBioPWh3M191nJkdfyKMrtf0MdKPRibaJB8wH
 yMF4D1gMx+hFbs0/VOS6dtqD9DKW7VgPg0LW+RysfhnLTuFFb5iBcH6Of7l7Z/Ca
 vVK+JxrhB1EDVI1+MKnESKPF9c6j3DRa2xrQHi/XYje1TGqnQ1v4CmsEObYBuJDN
 9M9t4QLzNuA/DA5tS7cxxtQ3YUthuJjPLcO4EVHOCvnqCAxkzp0i3dVMUr+YISl+
 2yFqaZdTt8s8FjTI21LOyuloCo30ZLlzaorFa0lp2cIyYup+1vg=
 =btX/
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 speculation mitigation updates from Borislav Petkov:

 - Some preparatory work to convert the mitigations machinery to
   mitigating attack vectors instead of single vulnerabilities

 - Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag

 - Add support for a Zen5-specific SRSO mitigation

 - Cleanups and minor improvements

* tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
  x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code
  x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds
  x86/bugs: Relocate mds/taa/mmio/rfds defines
  x86/bugs: Add X86_BUG_SPECTRE_V2_USER
  x86/bugs: Remove X86_FEATURE_USE_IBPB
  KVM: nVMX: Always use IBPB to properly virtualize IBRS
  x86/bugs: Use a static branch to guard IBPB on vCPU switch
  x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()
  x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()
  x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers
  x86/bugs: KVM: Add support for SRSO_MSR_FIX
2025-03-25 13:30:18 -07:00
Denis Mukhin 3181424aea x86/early_printk: Add support for MMIO-based UARTs
During the bring-up of an x86 board, the kernel was crashing before
reaching the platform's console driver because of a bug in the firmware,
leaving no trace of the boot progress.

The only available method to debug the kernel boot process was via the
platform's MMIO-based UART, as the board lacked an I/O port-based UART,
PCI UART, or functional video output.

Then it turned out that earlyprintk= does not have a knob to configure
the MMIO-mapped UART.

Extend the early printk facility to support platform MMIO-based UARTs
on x86 systems, enabling debugging during the system bring-up phase.

The command line syntax to enable platform MMIO-based UART is:

  earlyprintk=mmio,membase[,{nocfg|baudrate}][,keep]

Note, the change does not integrate MMIO-based UART support to:

  arch/x86/boot/early_serial_console.c

Also, update kernel parameters documentation with the new syntax and
add the missing 'nocfg' setting to the PCI serial cards description.

Signed-off-by: Denis Mukhin <dmukhin@ford.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250324-earlyprintk-v3-1-aee7421dc469@ford.com
2025-03-25 08:35:38 +01:00
Linus Torvalds e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds 3ba7dfb8da RCU pull request for v6.15
This pull request contains the following branches:
 
 docs.2025.02.04a:
  - Add broken-timing possibility to stallwarn.rst.
  - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr().
  - Document self-propagating callbacks.
  - Point call_srcu() to call_rcu() for detailed memory ordering.
  - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header.
  - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text.
  - Remove references to old grace-period-wait primitives.
 
 srcu.2025.02.05a:
  - Introduce srcu_read_{un,}lock_fast(), which is similar to
    srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock at the
    cost of calling synchronize_rcu() in synchronize_srcu(). Moreover, by
    returning the percpu offset of the counter at srcu_read_lock_fast()
    time, srcu_read_unlock_fast() can save extra pointer dereferencing,
    which makes it faster than srcu_read_{un,}lock_lite().
    srcu_read_{un,}lock_fast() are intended to replace
    rcu_read_{un,}lock_trace() if possible.
 
 torture.2025.02.05a:
  - Add get_torture_init_jiffies() to return the start time of the test.
  - Add a test_boost_holdoff module parameter to allow delaying boosting
    tests when building rcutorture as built-in.
  - Add grace period sequence number logging at the beginning and end of
    failure/close-call results.
  - Switch to hexadecimal for the expedited grace period sequence number
    in the rcu_exp_grace_period trace point.
  - Make cur_ops->format_gp_seqs take buffer length.
  - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool.
  - Complain when invalid SRCU reader_flavor is specified.
  - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
    uses atomics even when percpu ops are NMI safe, and use the Kconfig
    for SRCU lockdep testing.
 
 misc.2025.03.04a:
  - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing.
  - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes().
  - Fix get_state_synchronize_rcu_full() GP-start detection.
  - Move RCU Tasks self-tests to core_initcall().
  - Print segment lengths in show_rcu_nocb_gp_state().
  - Make RCU watch ct_kernel_exit_state() warning.
  - Flush console log from kernel_power_off().
  - rcutorture: Allow a negative value for nfakewriters.
  - rcu: Update TREE05.boot to test normal synchronize_rcu().
  - rcu: Use _full() API to debug synchronize_rcu().
 
 lazypreempt.2025.03.04a: Make RCU handle PREEMPT_LAZY better:
  - Fix header guard for rcu_all_qs().
  - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY.
  - Update __cond_resched comment about RCU quiescent states.
  - Handle unstable rdp in rcu_read_unlock_strict().
  - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y.
  - osnoise: Provide quiescent states.
  - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
    combination.
  - Limit PREEMPT_RCU configurations.
  - Make rcutorture senario TREE07 and senario TREE10 use PREEMPT_LAZY=y.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAmfeBLQACgkQSXnow7UH
 +rh11Qf/Rt6IZJ/YT/V9Sd+8hMx4O0BMh779pr9cD6mbAG+FDk2Yeva1m8vIdFOb
 qId6oc8K/ef2JfFjSn0oHMzQP2D3XUyiJWPNbBDHv/D8Os8GZgjzu8dkxVkSbdbY
 OxtvIflbcqFN1JDJfGKZnTEW0/YxGqfnS9b6R7iyyA7SOGQ/WubGOE5qNCqPufc9
 zJiP+qTUFYQzCIiPlEJul39o9KboPogbt3QAAQjWmi3utd77ehJnm/15FvAjyau4
 uhC2cnGfMY535rQaiaQeBQ/IHIowKripCq0JQFvcUNdyArZM3HOI2x79+2II6ft7
 mjHskNODOIJHfW2o1RzQ0yRYAywFIg==
 =J+mH
 -----END PGP SIGNATURE-----

Merge tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Boqun Feng:
 "Documentation:
   - Add broken-timing possibility to stallwarn.rst
   - Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
   - Document self-propagating callbacks
   - Point call_srcu() to call_rcu() for detailed memory ordering
   - Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
   - Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
   - Remove references to old grace-period-wait primitives

  srcu:
   - Introduce srcu_read_{un,}lock_fast(), which is similar to
     srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
     at the cost of calling synchronize_rcu() in synchronize_srcu()

     Moreover, by returning the percpu offset of the counter at
     srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
     extra pointer dereferencing, which makes it faster than
     srcu_read_{un,}lock_lite()

     srcu_read_{un,}lock_fast() are intended to replace
     rcu_read_{un,}lock_trace() if possible

  RCU torture:
   - Add get_torture_init_jiffies() to return the start time of the test
   - Add a test_boost_holdoff module parameter to allow delaying
     boosting tests when building rcutorture as built-in
   - Add grace period sequence number logging at the beginning and end
     of failure/close-call results
   - Switch to hexadecimal for the expedited grace period sequence
     number in the rcu_exp_grace_period trace point
   - Make cur_ops->format_gp_seqs take buffer length
   - Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
   - Complain when invalid SRCU reader_flavor is specified
   - Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
     uses atomics even when percpu ops are NMI safe, and use the Kconfig
     for SRCU lockdep testing

  Misc:
   - Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
   - Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
   - Fix get_state_synchronize_rcu_full() GP-start detection
   - Move RCU Tasks self-tests to core_initcall()
   - Print segment lengths in show_rcu_nocb_gp_state()
   - Make RCU watch ct_kernel_exit_state() warning
   - Flush console log from kernel_power_off()
   - rcutorture: Allow a negative value for nfakewriters
   - rcu: Update TREE05.boot to test normal synchronize_rcu()
   - rcu: Use _full() API to debug synchronize_rcu()

  Make RCU handle PREEMPT_LAZY better:
   - Fix header guard for rcu_all_qs()
   - rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
   - Update __cond_resched comment about RCU quiescent states
   - Handle unstable rdp in rcu_read_unlock_strict()
   - Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
   - osnoise: Provide quiescent states
   - Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
     combination
   - Limit PREEMPT_RCU configurations
   - Make rcutorture senario TREE07 and senario TREE10 use
     PREEMPT_LAZY=y"

* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
  rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
  rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
  rcu: limit PREEMPT_RCU configurations
  rcutorture: Update ->extendables check for lazy preemption
  rcutorture: Update rcutorture_one_extend_check() for lazy preemption
  osnoise: provide quiescent states
  rcu: Use _full() API to debug synchronize_rcu()
  rcu: Update TREE05.boot to test normal synchronize_rcu()
  rcutorture: Allow a negative value for nfakewriters
  Flush console log from kernel_power_off()
  context_tracking: Make RCU watch ct_kernel_exit_state() warning
  rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
  rcu-tasks: Move RCU Tasks self-tests to core_initcall()
  rcu: Fix get_state_synchronize_rcu_full() GP-start detection
  torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
  srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
  rcutorture: Complain when invalid SRCU reader_flavor is specified
  rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
  rcutorture: Make cur_ops->format_gp_seqs take buffer length
  rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
  ...
2025-03-24 19:41:37 -07:00
Linus Torvalds f81c2b8150 It has been a reasonably busy cycle for docs...
- Significant changes throughout the tree to bring Python code up to
   current standards and raise the minimum Python required to 3.9.  Much of
   this is preparatory to replacing the ancient Perl scripts/kernel-doc
   horror with a slightly less horrifying Python implementation, expected
   for 6.16.
 
 - Update the minimum Sphinx required to 3.4.3, allowing us to remove a
   bunch of older compatibility code.
 
 - Rework and improve the generation of the ABI documentation.
 
   (All of the above done by Mauro)
 
 - Lots of translation updates.  Alex Shi and Yanteng Si are taking on
   responsibility for the Chinese translations going forward; that work will
   still get to you via docs-next
 
 - Try to standardize the format for indicating a developer's affiliation in
   commit tags.
 
 - Clarify the TAB's role in CoC enforcement actions.
 
 - Try to spell out the rules for when a commit tag can name another
   developer without their explicit permission.
 
 Plus lots of other typo fixes and updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmfccxwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoYcH/jL/nS8YAiJ3awF5PH5tR3m5ddt9l+fKXWJx
 PB3KcHtDORbWltTA+Tvo2aP1jxGY9wqsIIvl+nvjJyUcfd72g4HNfTDUDXwP3OFU
 wTkaEAQp3n/hqnLXtJ2AzV3Ir5cIfEL2d7F6QsN1Gnof8iu2OuMk5iMeb0iexUX6
 FYjJq+jknh30VdAp2hxHy8q17R7h7PySh5OsjeAYJJroLv60n3DwQgnzHjXC/FT2
 Qq1UuEzlSpRoso2o2NwVTND6OVW081umo6YrioqD7ZC2G2fhRgLFJJtJGXDNcyUl
 gQv9xLSaTD97V4zaWPm28ObNBpY/GnAd4hMjB17wAH5xUfVS5Aw=
 =Gvdp
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.15' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a reasonably busy cycle for docs...

   - Significant changes throughout the tree to bring Python code up to
     current standards and raise the minimum Python required to 3.9

     Much of this is preparatory to replacing the ancient Perl
     scripts/kernel-doc horror with a slightly less horrifying Python
     implementation, expected for 6.16

   - Update the minimum Sphinx required to 3.4.3, allowing us to remove
     a bunch of older compatibility code

   - Rework and improve the generation of the ABI documentation

  (All of the above done by Mauro)

   - Lots of translation updates. Alex Shi and Yanteng Si are taking on
     responsibility for the Chinese translations going forward; that
     work will still get to you via docs-next

   - Try to standardize the format for indicating a developer's
     affiliation in commit tags

   - Clarify the TAB's role in CoC enforcement actions

   - Try to spell out the rules for when a commit tag can name another
     developer without their explicit permission

  Plus lots of other typo fixes and updates"

* tag 'docs-6.15' of git://git.lwn.net/linux: (98 commits)
  docs/zh_CN: fix spelling mistake
  docs/Chinese: change the disclaimer words
  docs/zh_CN: Add snp-tdx-threat-model index Chinese translation
  docs: driver-api: firmware: clarify userspace requirements
  docs: clarify rules wrt tagging other people
  docs: Remove outdated highuid.rst documentation
  Documentation: dma-buf: heaps: Add heap name definitions
  docs/.../submit-checklist: Use Documentation/admin-guide/abi.rst for cross-ref of README
  docs: Correct installation instruction
  Documentation: kcsan: fix "Plain Accesses and Data Races" URL in kcsan.rst
  Documentation/CoC: Spell out the TAB role in enforcement decisions
  Documentation: ocxl.rst: Update consortium site
  scripts: get_feat.pl: substitute s390x with s390
  scripts/kernel-doc: drop dead code for Wcontents_before_sections
  scripts/kernel-doc: don't add not needed new lines
  docs: driver-api/infiniband.rst: fix Kerneldoc markup
  drivers: firewire: firewire-cdev.h: fix identation on a kernel-doc markup
  drivers: media: intel-ipu3.h: fix identation on a kernel-doc markup
  include/asm-generic/io.h: fix kerneldoc markup
  Docs/arch/arm64: Fix spelling in amu.rst
  ...
2025-03-24 18:42:27 -07:00
Donghyeok Choe c1aa3daa51 printk/panic: Add option to allow non-panic CPUs to write to the ring buffer.
Commit 779dbc2e78 ("printk: Avoid non-panic CPUs writing to ringbuffer")
aimed to isolate panic-related messages. However, when panic() itself
malfunctions, messages from non-panic CPUs become crucial for debugging.

While commit bcc954c6ca ("printk/panic: Allow cpu backtraces to
be written into ringbuffer during panic") enables non-panic CPU
backtraces, it may not provide sufficient diagnostic information.

Introduce the "debug_non_panic_cpus" command-line option, enabling
non-panic CPU messages to be stored in the ring buffer during a panic.
This also prevents discarding non-finalized messages from non-panic CPUs
during console flushing, providing a more comprehensive view of system
state during critical failures.

Link: https://lore.kernel.org/all/Z8cLEkqLL2IOyNIj@pathway/
Signed-off-by: Donghyeok Choe <d7271.choe@samsung.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20250318022320.2428155-1-d7271.choe@samsung.com
[pmladek@suse.com: Added documentation, added module_parameter, removed printk_ prefix.]
Tested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-03-20 15:48:36 +01:00
Andrew Jones 9fe58530a8
Documentation/kernel-parameters: Add riscv unaligned speed parameters
Document riscv parameters used to select scalar and vector unaligned
access speeds.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250304120014.143628-18-ajones@ventanamicro.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-19 14:23:31 +00:00
Thomas Prescher 71f7456889 mm: hugetlb: add hugetlb_alloc_threads cmdline option
Add a command line option that enables control of how many threads should
be used to allocate huge pages.

[akpm@linux-foundation.org: tidy up a comment]
Link: https://lkml.kernel.org/r/20250227-hugepage-parameter-v2-2-7db8c6dc0453@cyberus-technology.de
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17 00:05:36 -07:00
Ahmad Fatoum e016173f65 reboot: add support for configuring emergency hardware protection action
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.

This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.

In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.

Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.

This introduces no functional change yet as hw_protection_trigger() has no
callers, but these will be added in subsequent commits.

[arnd@arndb.de: hide unused hw_protection_attr]
  Link: https://lkml.kernel.org/r/20250224141849.1546019-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Fabio Estevam <festevam@denx.de>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matteo Croce <teknoraver@meta.com>
Cc: Matti Vaittinen <mazziesaccount@gmail.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Sascha Hauer <kernel@pengutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 23:24:14 -07:00
Frank van der Linden f866cfcec2 mm/hugetlb: add hugetlb_cma_only cmdline option
Add an option to force hugetlb gigantic pages to be allocated using CMA
only (if hugetlb_cma is enabled).  This avoids a fallback to allocation
from the rest of system memory if the CMA allocation fails.  This makes
the size of hugetlb_cma a hard upper boundary for gigantic hugetlb page
allocations.

This is useful because, with a large CMA area, the kernel's unmovable
allocations will have less room to work with and it is undesirable for new
hugetlb gigantic page allocations to be done from that remaining area.  It
will eat in to the space available for unmovable allocations, leading to
unwanted system behavior (OOMs because the kernel fails to do unmovable
allocations).

So, with this enabled, an administrator can force a hard upper bound for
runtime gigantic page allocations, and have more predictable system
behavior.

Link: https://lkml.kernel.org/r/20250228182928.2645936-26-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:31 -07:00
Frank van der Linden 5b47c02967 mm/hugetlb: convert cmdline parameters from setup to early
Convert the cmdline parameters (hugepagesz, hugepages, default_hugepagesz
and hugetlb_free_vmemmap) to early parameters.

Since parse_early_param might run before MMU setups on some platforms
(powerpc), validation of huge page sizes as specified in command line
parameters would fail.  So instead, for the hstate-related values, just
record the them and parse them on demand, from hugetlb_bootmem_alloc.

The allocation of hugetlb bootmem pages is now done in
hugetlb_bootmem_alloc, which is called explicitly at the start of
mm_core_init().  core_initcall would be too late, as that happens with
memblock already torn down.

This change will allow earlier allocation and initialization of bootmem
hugetlb pages later on.

No functional change intended.

Link: https://lkml.kernel.org/r/20250228182928.2645936-8-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16 22:06:26 -07:00
Breno Leitao 98fdaeb296 x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
Change the default value of spectre v2 in user mode to respect the
CONFIG_MITIGATION_SPECTRE_V2 config option.

Currently, user mode spectre v2 is set to auto
(SPECTRE_V2_USER_CMD_AUTO) by default, even if
CONFIG_MITIGATION_SPECTRE_V2 is disabled.

Set the spectre_v2 value to auto (SPECTRE_V2_USER_CMD_AUTO) if the
Spectre v2 config (CONFIG_MITIGATION_SPECTRE_V2) is enabled, otherwise
set the value to none (SPECTRE_V2_USER_CMD_NONE).

Important to say the command line argument "spectre_v2_user" overwrites
the default value in both cases.

When CONFIG_MITIGATION_SPECTRE_V2 is not set, users have the flexibility
to opt-in for specific mitigations independently. In this scenario,
setting spectre_v2= will not enable spectre_v2_user=, and command line
options spectre_v2_user and spectre_v2 are independent when
CONFIG_MITIGATION_SPECTRE_V2=n.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Kaplan <David.Kaplan@amd.com>
Link: https://lore.kernel.org/r/20241031-x86_bugs_last_v2-v2-2-b7ff1dab840e@debian.org
2025-03-03 12:48:41 +01:00
Mel Gorman d2132f453e mm: security: Allow default HARDENED_USERCOPY to be set at compile time
HARDENED_USERCOPY defaults to on if enabled at compile time. Allow
hardened_usercopy= default to be set at compile time similar to
init_on_alloc= and init_on_free=. The intent is that hardening
options that can be disabled at runtime can set their default at
build time.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20250123221115.19722-3-mgorman@techsingularity.net
Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-28 11:51:31 -08:00
Arnd Bergmann 0081fdeccb x86/mm: Drop support for CONFIG_HIGHPTE
With the maximum amount of RAM now 4GB, there is very little point
to still have PTE pages in highmem. Drop this for simplification.

The only other architecture supporting HIGHPTE is 32-bit arm, and
once that feature is removed as well, the highpte logic can be
dropped from common code as well.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-8-arnd@kernel.org
2025-02-27 11:22:06 +01:00
Arnd Bergmann 0abf508675 x86/smp: Drop 32-bit "bigsmp" machine support
The x86-32 kernel used to support multiple platforms with more than eight
logical CPUs, from the 1999-2003 timeframe: Sequent NUMA-Q, IBM Summit,
Unisys ES7000 and HP F8. Support for all except the latter was dropped
back in 2014, leaving only the F8 based DL740 and DL760 G2 machines in
this catery, with up to eight single-core Socket-603 Xeon-MP processors
with hyperthreading.

Like the already removed machines, the HP F8 servers at the time cost
upwards of $100k in typical configurations, but were quickly obsoleted
by their 64-bit Socket-604 cousins and the AMD Opteron.

Earlier servers with up to 8 Pentium Pro or Xeon processors remain
fully supported as they had no hyperthreading. Similarly, the more
common 4-socket Xeon-MP machines with hyperthreading using Intel
or ServerWorks chipsets continue to work without this, and all the
multi-core Xeon processors also run 64-bit kernels.

While the "bigsmp" support can also be used to run on later 64-bit
machines (including VM guests), it seems best to discourage that
and get any remaining users to update their kernels to 64-bit builds
on these. As a side-effect of this, there is also no more need to
support NUMA configurations on 32-bit x86, as all true 32-bit
NUMA platforms are already gone.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250226213714.4040853-3-arnd@kernel.org
2025-02-27 11:19:05 +01:00
Steven Rostedt 937fbf111a tracing: Add traceoff_after_boot option
Sometimes tracing is used to debug issues during the boot process. Since
the trace buffer has a limited amount of storage, it may be prudent to
disable tracing after the boot is finished, otherwise the critical
information may be overwritten.  With this option, the main tracing buffer
will be turned off at the end of the boot process.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/20250208103017.48a7ec83@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-02-25 13:46:40 -05:00
Mike Rapoport (Microsoft) 8b2ee518fc Documentation/kernel-parameters: fix typo in description of reserve_mem
The format description of reserve_mem uses [KNG] as units, rather than
[KMG].

Fix it.

Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250218070845.3769520-1-rppt@kernel.org
2025-02-18 13:25:06 -07:00
Rafael J. Wysocki 7802fce7dc cpufreq: intel_pstate: Make it possible to avoid enabling CAS
Capacity-aware scheduling (CAS) is enabled by default by intel_pstate on
hybrid systems without SMT, but in some usage scenarios it may be more
attractive to place tasks for maximum CPU performance regardless of the
extra cost in terms of energy, which is the case on such systems when
CAS is not enabled, so introduce a command line option to forbid
intel_pstate to enable CAS.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by:Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2781262.mvXUDI8C0e@rjwysocki.net
2025-02-18 20:47:33 +01:00
Paul E. McKenney b8726c5aa6 rcutorture: Add a test_boost_holdoff module parameter
This commit adds a test_boost_holdoff module parameter that tells the RCU
priority-boosting tests to wait for the specified number of seconds past
the start of the rcutorture test.  This can be useful when rcutorture
is built into the kernel (as opposed to being modprobed), especially on
large systems where early start of RCU priority boosting can delay the
boot sequence, which adds a full CPU's worth of load onto the system.
This can in turn result in pointless stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05 07:14:39 -08:00
Niklas Cassel deca423213 ata: libata-core: Add 'external' to the libata.force kernel parameter
Commit ae1f3db006 ("ata: ahci: do not enable LPM on external ports")
changed so that LPM is not enabled on external ports (hotplug-capable or
eSATA ports).

This is because hotplug and LPM are mutually exclusive, see 7.3.1 Hot Plug
Removal Detection and Power Management Interaction in AHCI 1.3.1.

This does require that firmware has set the appropate bits (HPCP or ESP)
in PxCMD (which is a per port register in the AHCI controller).

If the firmware has failed to mark a port as hotplug-capable or eSATA in
PxCMD, then there is currently not much a user can do.

If LPM is enabled on the port, hotplug insertions and removals will not be
detected on that port.

In order to allow a user to fix up broken firmware, add 'external' to the
libata.force kernel parameter.

libata.force can be specified either on the kernel command line, or as a
kernel module parameter.

For more information, see Documentation/admin-guide/kernel-parameters.txt.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20250130133544.219297-4-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-02-03 13:09:24 +01:00
Linus Torvalds e2ee2e9b15 KVM/arm64 updates for 6.14
* New features:
 
   - Support for non-protected guest in protected mode, achieving near
     feature parity with the non-protected mode
 
   - Support for the EL2 timers as part of the ongoing NV support
 
   - Allow control of hardware tracing for nVHE/hVHE
 
 * Improvements, fixes and cleanups:
 
   - Massive cleanup of the debug infrastructure, making it a bit less
     awkward and definitely easier to maintain. This should pave the
     way for further optimisations
 
   - Complete rewrite of pKVM's fixed-feature infrastructure, aligning
     it with the rest of KVM and making the code easier to follow
 
   - Large simplification of pKVM's memory protection infrastructure
 
   - Better handling of RES0/RES1 fields for memory-backed system
     registers
 
   - Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
     from a pretty nasty timer bug
 
   - Small collection of cleanups and low-impact fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeYqJcQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNLUhCACxUTMVQXhfW3qbh0UQxPd7XXvjI+Hm7SPS
 wDuVTle4jrFVGHxuZqtgWLmx8hD7bqO965qmFgbevKlwsRY33onH2nbH4i4AcwbA
 jcdM4yMHZI4+Qmnb4G5ZJ89IwjAhHPZTBOV5KRhyHQ/qtRciHHtOgJde7II9fd68
 uIESg4SSSyUzI47YSEHmGVmiBIhdQhq2qust0m6NPFalEGYstPbpluPQ6R1CsDqK
 v14TIAW7t0vSPucBeODxhA5gEa2JsvNi+sqA+DF/ELH2ZqpkuR7rofgMGblaXCSD
 JXa5xamRB9dI5zi8vatwfOzYlog+/gzmPqMh/9JXpiDGHxJe0vlz
 =tQ8F
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull KVM/arm64 updates from Will Deacon:
 "New features:

   - Support for non-protected guest in protected mode, achieving near
     feature parity with the non-protected mode

   - Support for the EL2 timers as part of the ongoing NV support

   - Allow control of hardware tracing for nVHE/hVHE

  Improvements, fixes and cleanups:

   - Massive cleanup of the debug infrastructure, making it a bit less
     awkward and definitely easier to maintain. This should pave the way
     for further optimisations

   - Complete rewrite of pKVM's fixed-feature infrastructure, aligning
     it with the rest of KVM and making the code easier to follow

   - Large simplification of pKVM's memory protection infrastructure

   - Better handling of RES0/RES1 fields for memory-backed system
     registers

   - Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
     from a pretty nasty timer bug

   - Small collection of cleanups and low-impact fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
  arm64/sysreg: Get rid of TRFCR_ELx SysregFields
  KVM: arm64: nv: Fix doc header layout for timers
  KVM: arm64: nv: Apply RESx settings to sysreg reset values
  KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
  KVM: arm64: Fix selftests after sysreg field name update
  coresight: Pass guest TRFCR value to KVM
  KVM: arm64: Support trace filtering for guests
  KVM: arm64: coresight: Give TRBE enabled state to KVM
  coresight: trbe: Remove redundant disable call
  arm64/sysreg/tools: Move TRFCR definitions to sysreg
  tools: arm64: Update sysreg.h header files
  KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
  KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
  KVM: arm64: Drop pkvm_mem_transition for FF-A
  KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
  KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
  arm64: kvm: Introduce nvhe stack size constants
  KVM: arm64: Fix nVHE stacktrace VA bits mask
  KVM: arm64: Fix FEAT_MTE in pKVM
  Documentation: Update the behaviour of "kvm-arm.mode"
  ...
2025-01-28 09:01:36 -08:00
Linus Torvalds 9c5968db9e The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
 
 - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the
   page allocator so we end up with the ability to allocate and free
   zero-refcount pages.  So that callers (ie, slab) can avoid a refcount
   inc & dec.
 
 - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use
   large folios other than PMD-sized ones.
 
 - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and
   fixes for this small built-in kernel selftest.
 
 - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of
   the mapletree code.
 
 - "mm: fix format issues and param types" from Keren Sun implements a
   few minor code cleanups.
 
 - "simplify split calculation" from Wei Yang provides a few fixes and a
   test for the mapletree code.
 
 - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes
   continues the work of moving vma-related code into the (relatively) new
   mm/vma.c.
 
 - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
   Hildenbrand cleans up and rationalizes handling of gfp flags in the page
   allocator.
 
 - "readahead: Reintroduce fix for improper RA window sizing" from Jan
   Kara is a second attempt at fixing a readahead window sizing issue.  It
   should reduce the amount of unnecessary reading.
 
 - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
   addresses an issue where "huge" amounts of pte pagetables are
   accumulated
   (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/).
   Qi's series addresses this windup by synchronously freeing PTE memory
   within the context of madvise(MADV_DONTNEED).
 
 - "selftest/mm: Remove warnings found by adding compiler flags" from
   Muhammad Usama Anjum fixes some build warnings in the selftests code
   when optional compiler warnings are enabled.
 
 - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David
   Hildenbrand tightens the allocator's observance of __GFP_HARDWALL.
 
 - "pkeys kselftests improvements" from Kevin Brodsky implements various
   fixes and cleanups in the MM selftests code, mainly pertaining to the
   pkeys tests.
 
 - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
   estimate application working set size.
 
 - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
   provides some cleanups to memcg's hugetlb charging logic.
 
 - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
   removes the global swap cgroup lock.  A speedup of 10% for a tmpfs-based
   kernel build was demonstrated.
 
 - "zram: split page type read/write handling" from Sergey Senozhatsky
   has several fixes and cleaups for zram in the area of zram_write_page().
   A watchdog softlockup warning was eliminated.
 
 - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky
   cleans up the pagetable destructor implementations.  A rare
   use-after-free race is fixed.
 
 - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
   simplifies and cleans up the debugging code in the VMA merging logic.
 
 - "Account page tables at all levels" from Kevin Brodsky cleans up and
   regularizes the pagetable ctor/dtor handling.  This results in
   improvements in accounting accuracy.
 
 - "mm/damon: replace most damon_callback usages in sysfs with new core
   functions" from SeongJae Park cleans up and generalizes DAMON's sysfs
   file interface logic.
 
 - "mm/damon: enable page level properties based monitoring" from
   SeongJae Park increases the amount of information which is presented in
   response to DAMOS actions.
 
 - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes
   DAMON's long-deprecated debugfs interfaces.  Thus the migration to sysfs
   is completed.
 
 - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter
   Xu cleans up and generalizes the hugetlb reservation accounting.
 
 - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
   removes a never-used feature of the alloc_pages_bulk() interface.
 
 - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
   extends DAMOS filters to support not only exclusion (rejecting), but
   also inclusion (allowing) behavior.
 
 - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
   "introduces a new memory descriptor for zswap.zpool that currently
   overlaps with struct page for now.  This is part of the effort to reduce
   the size of struct page and to enable dynamic allocation of memory
   descriptors."
 
 - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and
   simplifies the swap allocator locking.  A speedup of 400% was
   demonstrated for one workload.  As was a 35% reduction for kernel build
   time with swap-on-zram.
 
 - "mm: update mips to use do_mmap(), make mmap_region() internal" from
   Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
   mmap_region() can be made MM-internal.
 
 - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU
   regressions and otherwise improves MGLRU performance.
 
 - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park
   updates DAMON documentation.
 
 - "Cleanup for memfd_create()" from Isaac Manjarres does that thing.
 
 - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand
   provides various cleanups in the areas of hugetlb folios, THP folios and
   migration.
 
 - "Uncached buffered IO" from Jens Axboe implements the new
   RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache
   reading and writing.  To permite userspace to address issues with
   massive buildup of useless pagecache when reading/writing fast devices.
 
 - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
   Weißschuh fixes and optimizes some of the MM selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA
 jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz
 uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE=
 =Ib2s
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "The various patchsets are summarized below. Plus of course many
  indivudual patches which are described in their changelogs.

   - "Allocate and free frozen pages" from Matthew Wilcox reorganizes
     the page allocator so we end up with the ability to allocate and
     free zero-refcount pages. So that callers (ie, slab) can avoid a
     refcount inc & dec

   - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
     use large folios other than PMD-sized ones

   - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
     and fixes for this small built-in kernel selftest

   - "mas_anode_descend() related cleanup" from Wei Yang tidies up part
     of the mapletree code

   - "mm: fix format issues and param types" from Keren Sun implements a
     few minor code cleanups

   - "simplify split calculation" from Wei Yang provides a few fixes and
     a test for the mapletree code

   - "mm/vma: make more mmap logic userland testable" from Lorenzo
     Stoakes continues the work of moving vma-related code into the
     (relatively) new mm/vma.c

   - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
     Hildenbrand cleans up and rationalizes handling of gfp flags in the
     page allocator

   - "readahead: Reintroduce fix for improper RA window sizing" from Jan
     Kara is a second attempt at fixing a readahead window sizing issue.
     It should reduce the amount of unnecessary reading

   - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
     addresses an issue where "huge" amounts of pte pagetables are
     accumulated:

       https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/

     Qi's series addresses this windup by synchronously freeing PTE
     memory within the context of madvise(MADV_DONTNEED)

   - "selftest/mm: Remove warnings found by adding compiler flags" from
     Muhammad Usama Anjum fixes some build warnings in the selftests
     code when optional compiler warnings are enabled

   - "mm: don't use __GFP_HARDWALL when migrating remote pages" from
     David Hildenbrand tightens the allocator's observance of
     __GFP_HARDWALL

   - "pkeys kselftests improvements" from Kevin Brodsky implements
     various fixes and cleanups in the MM selftests code, mainly
     pertaining to the pkeys tests

   - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
     estimate application working set size

   - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
     provides some cleanups to memcg's hugetlb charging logic

   - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
     removes the global swap cgroup lock. A speedup of 10% for a
     tmpfs-based kernel build was demonstrated

   - "zram: split page type read/write handling" from Sergey Senozhatsky
     has several fixes and cleaups for zram in the area of
     zram_write_page(). A watchdog softlockup warning was eliminated

   - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
     Brodsky cleans up the pagetable destructor implementations. A rare
     use-after-free race is fixed

   - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
     simplifies and cleans up the debugging code in the VMA merging
     logic

   - "Account page tables at all levels" from Kevin Brodsky cleans up
     and regularizes the pagetable ctor/dtor handling. This results in
     improvements in accounting accuracy

   - "mm/damon: replace most damon_callback usages in sysfs with new
     core functions" from SeongJae Park cleans up and generalizes
     DAMON's sysfs file interface logic

   - "mm/damon: enable page level properties based monitoring" from
     SeongJae Park increases the amount of information which is
     presented in response to DAMOS actions

   - "mm/damon: remove DAMON debugfs interface" from SeongJae Park
     removes DAMON's long-deprecated debugfs interfaces. Thus the
     migration to sysfs is completed

   - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
     Peter Xu cleans up and generalizes the hugetlb reservation
     accounting

   - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
     removes a never-used feature of the alloc_pages_bulk() interface

   - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
     extends DAMOS filters to support not only exclusion (rejecting),
     but also inclusion (allowing) behavior

   - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
     introduces a new memory descriptor for zswap.zpool that currently
     overlaps with struct page for now. This is part of the effort to
     reduce the size of struct page and to enable dynamic allocation of
     memory descriptors

   - "mm, swap: rework of swap allocator locks" from Kairui Song redoes
     and simplifies the swap allocator locking. A speedup of 400% was
     demonstrated for one workload. As was a 35% reduction for kernel
     build time with swap-on-zram

   - "mm: update mips to use do_mmap(), make mmap_region() internal"
     from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
     mmap_region() can be made MM-internal

   - "mm/mglru: performance optimizations" from Yu Zhao fixes a few
     MGLRU regressions and otherwise improves MGLRU performance

   - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
     Park updates DAMON documentation

   - "Cleanup for memfd_create()" from Isaac Manjarres does that thing

   - "mm: hugetlb+THP folio and migration cleanups" from David
     Hildenbrand provides various cleanups in the areas of hugetlb
     folios, THP folios and migration

   - "Uncached buffered IO" from Jens Axboe implements the new
     RWF_DONTCACHE flag which provides synchronous dropbehind for
     pagecache reading and writing. To permite userspace to address
     issues with massive buildup of useless pagecache when
     reading/writing fast devices

   - "selftests/mm: virtual_address_range: Reduce memory" from Thomas
     Weißschuh fixes and optimizes some of the MM selftests"

* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm/compaction: fix UBSAN shift-out-of-bounds warning
  s390/mm: add missing ctor/dtor on page table upgrade
  kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
  tools: add VM_WARN_ON_VMG definition
  mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
  seqlock: add missing parameter documentation for raw_seqcount_try_begin()
  mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
  mm/page_alloc: remove the incorrect and misleading comment
  zram: remove zcomp_stream_put() from write_incompressible_page()
  mm: separate move/undo parts from migrate_pages_batch()
  mm/kfence: use str_write_read() helper in get_access_type()
  selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
  kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
  selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
  selftests/mm: vm_util: split up /proc/self/smaps parsing
  selftests/mm: virtual_address_range: unmap chunks after validation
  selftests/mm: virtual_address_range: mmap() without PROT_WRITE
  selftests/memfd/memfd_test: fix possible NULL pointer dereference
  mm: add FGP_DONTCACHE folio creation flag
  mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
  ...
2025-01-26 18:36:23 -08:00
Gregory Price 44d46b76c3 mm: add build-time option for hotplug memory default online type
Memory hotplug presently auto-onlines memory into a zone the kernel deems
appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.

The memhp_default_state boot param enables runtime config, but it's not
possible to do this at build-time.

Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with
CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param.

Selections:
  CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE
    => mhp_default_online_type = "offline"
       Memory will not be onlined automatically.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO
    => mhp_default_online_type = "online"
       Memory will be onlined automatically in a zone deemed.
       appropriate by the kernel.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL
    => mhp_default_online_type = "online_kernel"
       Memory will be onlined automatically.
       The zone may allow kernel data (e.g. ZONE_NORMAL).

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE
    => mhp_default_online_type = "online_movable"
       Memory will be onlined automatically.
       The zone will be ZONE_MOVABLE.

Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing
default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior.

Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.

[gourry@gourry.net: update KConfig comments]
  Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25 20:22:21 -08:00
Linus Torvalds 647d69605c pci-v6.14-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeTr8wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxrMw//TJXH+U6x5LhYvBPD/KZ20ecGHqaA
 eGXrbHAasYbU1CfW7HM0onR8NffOIGoYvQrtefjQAln0w6rTvyFO0xJKLP15vMfN
 hnj+y1WWtKwAkSpu10Cl9nTj8uYRNNSQeoy5kS+1diwuXdby/DlgQONO2APSe9zd
 KMPXJcqSfDJlM5zHrcqqtlxauO9KHInLCc/iutd85AKjvcjOoNHNeZE0pTC0C3gE
 sXYHDqJiS3zdEG6X6mWFo3OzI/Q/7NGlHJ2j0CQaObsgQ9yA7eWkez25ifwZcugc
 TPtjm8DhaDo9/zx0NV9c2dPauHRC6NYUjAflMPK7Aye/41BE1Ag5Ka+tMDgC2i/N
 TbfBxSeArhjnjY+eZwRhrJNNC58TtHTUs69TO7Dbmuwr7cp99MIEDAYI5V6LFAdk
 plKqn1h8FztW5QKRPCgmzy6KTE+WPytiGAGAQFxzIGYkV/QqyvFaVs8FIyOJUIFM
 aDSa6Xy5WLGxmPZ9hPapzEm4ws/HTRpFjNgi/d4rRG5RWMwAxZZa44s9eldhN1D/
 ZwEmF2rJ+U8S7Q+mXPHDlwcsHe5APCbiaTEp4X+e3LNe0i9oxhhaUWG6LrDDmTlQ
 tU5j5daHiBa0nTDL1lfaayJlYX/oJ+IYQrIYzGbnivZv4ZVdPnuWSsOsMOEiLhEt
 4QqCoanqf0mCn2A=
 =O62O
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Batch sizing of multiple BARs while memory decoding is disabled
     instead of disabling/enabling decoding for each BAR individually;
     this optimizes virtualized environments where toggling decoding
     enable is expensive (Alex Williamson)

   - Add host bridge .enable_device() and .disable_device() hooks for
     bridges that need to configure things like Requester ID to StreamID
     mapping when enabling devices (Frank Li)

   - Extend struct pci_ecam_ops with .enable_device() and
     .disable_device() hooks so drivers that use pci_host_common_probe()
     instead of their own .probe() have a way to set the
     .enable_device() callbacks (Marc Zyngier)

   - Drop 'No bus range found' message so we don't complain when DTs
     don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)

   - Rename the drivers/pci/of_property.c struct of_pci_range to
     of_pci_range_entry to avoid confusion with the global of_pci_range
     in include/linux/of_address.h (Bjorn Helgaas)

  Driver binding:

   - Update resource request API documentation to encourage callers to
     supply a driver name when requesting resources (Philipp Stanner)

   - Export pci_intx_unmanaged() and pcim_intx() (always managed) so
     callers of pci_intx() (which is sometimes managed) can explicitly
     choose the one they need (Philipp Stanner)

   - Convert drivers from pci_intx() to always-managed pcim_intx() or
     never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
     pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
     ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)

   - Remove pci_intx_unmanaged() since pci_intx() is now always
     unmanaged and pcim_intx() is always managed (Philipp Stanner)

  Error handling:

   - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
     logging rather than building their own (Ilpo Järvinen)

   - Move TLP Log handling to its own file (Ilpo Järvinen)

   - Store number of supported End-End TLP Prefixes always so we can
     read the correct number of DWORDs from the TLP Prefix Log (Ilpo
     Järvinen)

   - Read TLP Prefixes in addition to the Header Log in
     pcie_read_tlp_log() (Ilpo Järvinen)

   - Add pcie_print_tlp_log() to consolidate printing of TLP Header and
     Prefix Log (Ilpo Järvinen)

   - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
     BIOSes that don't configure it correctly (Takashi Iwai)

  ASPM:

   - Save parent L1 PM Substates config so when we restore it along with
     an endpoint's config, the parent info isn't junk (Jian-Hong Pan)

  Power management:

   - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
     the system can't wake up from suspend (Werner Sembach)

  Endpoint framework:

   - Destroy the EPC device in devm_pci_epc_destroy(), which previously
     didn't call devres_release() (Zijun Hu)

   - Finish virtual EP removal in pci_epf_remove_vepf(), which
     previously caused a subsequent pci_epf_add_vepf() to fail with
     -EBUSY (Zijun Hu)

   - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
     don't depend on the BAR_MASK reset value being larger than the
     requested BAR size (Niklas Cassel)

   - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
     reads from bypassing the iATU if we reduced the BAR size (Niklas
     Cassel)

   - Verify address alignment when programming iATU so we don't attempt
     to write bits that are read-only because of the BAR size, which
     could lead to directing accesses to the wrong address (Niklas
     Cassel)

   - Implement artpec6 pci_epc_features so we can rely on all drivers
     supporting it so we can use it in EPC core code (Niklas Cassel)

   - Check for BARs of fixed size to prevent endpoint drivers from
     trying to change their size (Niklas Cassel)

   - Verify that requested BAR size is a power of two when endpoint
     driver sets the BAR (Niklas Cassel)

  Endpoint framework tests:

   - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
     dma_chan_rx (Mohamed Khalfella)

   - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
     supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)

   - Add pci-epf-test and pci_endpoint_test support for capabilities
     (Niklas Cassel)

   - Add Endpoint test for consecutive BARs (Niklas Cassel)

   - Remove redundant comparison from Endpoint BAR test because a > 1MB
     BAR can always be exactly covered by iterating with a 1MB buffer
     (Hans Zhang)

   - Move and convert PCI Endpoint tests from tools/pci to Kselftests
     (Manivannan Sadhasivam)

  Apple PCIe controller driver:

   - Convert StreamID mapping configuration from a bus notifier to the
     .enable_device() and .disable_device() callbacks (Marc Zyngier)

  Freescale i.MX6 PCIe controller driver:

   - Add Requester ID to StreamID mapping configuration when enabling
     devices (Frank Li)

   - Use DWC core suspend/resume functions for imx6 (Frank Li)

   - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
     Zhu)

   - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
     i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
     Li)

   - Add DT binding for optional i.MX95 Refclk and driver support to
     enable it if the platform hasn't enabled it (Richard Zhu)

   - Configure PHY based on controller being in Root Complex or Endpoint
     mode (Frank Li)

   - Rely on dbi2 and iATU base addresses from DT via
     dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)

   - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
     asserted in imx_pcie_assert_core_reset() (Richard Zhu)

   - Add missing reference clock enable or disable logic for IMX6SX,
     IMX7D, IMX8MM (Richard Zhu)

   - Remove redundant imx7d_pcie_init_phy() since
     imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead
     of syscon_regmap_lookup_by_phandle() followed by
     of_property_read_u32_array() (Krzysztof Kozlowski)

  Marvell MVEBU PCIe controller driver:

   - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)

  MediaTek PCIe Gen3 controller driver:

   - Use clk_bulk_prepare_enable() instead of separate
     clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)

   - Rearrange reset assert/deassert so they're both done in the
     *_power_up() callbacks (Lorenzo Bianconi)

   - Document that Airoha EN7581 requires PHY init and power-on before
     PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
     Bianconi)

   - Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
     method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)

   - Sleep instead of delay during Airoha EN7581 power-up, since this is
     a non-atomic context (Lorenzo Bianconi)

   - Skip PERST# assertion on Airoha EN7581 during probe and
     suspend/resume to avoid a hardware defect (Lorenzo Bianconi)

   - Enable async probe to reduce system startup time (Douglas Anderson)

  Microchip PolarFlare PCIe controller driver:

   - Set up the inbound address translation based on whether the
     platform allows coherent or non-coherent DMA (Daire McNamara)

   - Update DT binding such that platforms are DMA-coherent by default
     and must specify 'dma-noncoherent' if needed (Conor Dooley)

  Mobiveil PCIe controller driver:

   - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
     and 'reg-names' (Frank Li)

  Qualcomm PCIe controller driver:

   - Add DT SM8550 and SM8650 optional 'global' interrupt for link
     events (Neil Armstrong)

   - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
     Mylavarapu)

   - If 'global' IRQ is supported for detection of Link Up events, tell
     DWC core not to wait for link up (Krishna chaitanya chundru)

  Renesas R-Car PCIe controller driver:

   - Avoid passing stack buffer as resource name (King Dix)

  Rockchip PCIe controller driver:

   - Simplify clock and reset handling by using bulk interfaces (Anand
     Moon)

   - Pass typed rockchip_pcie (not void) pointer to
     rockchip_pcie_disable_clocks() (Anand Moon)

   - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
     (Dan Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Use dll_link_up IRQ to detect Link Up and enumerate devices so
     users don't have to manually rescan (Niklas Cassel)

   - Tell DWC core not to wait for link up since the 'sys' interrupt is
     required and detects Link Up events (Niklas Cassel)

  Synopsys DesignWare PCIe controller driver:

   - Don't wait for link up in DWC core if driver can detect Link Up
     event (Krishna chaitanya chundru)

   - Update ICC and OPP votes after Link Up events (Krishna chaitanya
     chundru)

   - Always stop link in dw_pcie_suspend_noirq(), which is required at
     least for i.MX8QM to re-establish link on resume (Richard Zhu)

   - Drop racy and unnecessary LTSSM state check before sending
     PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)

   - Add struct of_pci_range.parent_bus_addr for devices that need their
     immediate parent bus address, not the CPU address, e.g., to program
     an internal Address Translation Unit (iATU) (Frank Li)

  TI DRA7xx PCIe controller driver:

   - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
     syscon_regmap_lookup_by_phandle() followed by
     of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
     (Krzysztof Kozlowski)

  Xilinx Versal CPM PCIe controller driver:

   - Add DT binding and driver support for Xilinx Versal CPM5
     (Thippeswamy Havalige)

  MicroSemi Switchtec management driver:

   - Add Microchip PCI100X device IDs (Rakesh Babu Saladi)

  Miscellaneous:

   - Move reset related sysfs code from pci.c to pci-sysfs.c where other
     similar code lives (Ilpo Järvinen)

   - Simplify reset_method_store() memory management by using __free()
     instead of explicit kfree() cleanup (Ilpo Järvinen)

   - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
     ACPI hotplug driver (Thomas Weißschuh)

   - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
     Zhang)

   - Correct documentation of the 'config_acs=' kernel parameter
     (Akihiko Odaki)"

* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
  PCI: Batch BAR sizing operations
  dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
  PCI: microchip: Set inbound address translation for coherent or non-coherent mode
  Documentation: Fix pci=config_acs= example
  PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
  PCI: Don't include 'pm_wakeup.h' directly
  selftests: pci_endpoint: Migrate to Kselftest framework
  selftests: Move PCI Endpoint tests from tools/pci to Kselftests
  misc: pci_endpoint_test: Fix IOCTL return value
  dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
  dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
  dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
  PCI: switchtec: Add Microchip PCI100X device IDs
  misc: pci_endpoint_test: Remove redundant 'remainder' test
  misc: pci_endpoint_test: Add consecutive BAR test
  misc: pci_endpoint_test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Add support for capabilities
  PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
  PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
  PCI: dwc: Simplify config resource lookup
  ...
2025-01-25 16:03:40 -08:00
Linus Torvalds e8744fbc83 tracing updates for v6.14:
- Cleanup with guard() and free() helpers
 
   There were several places in the code that had a lot of "goto out" in the
   error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.
 
 - Update the Rust tracepoint code to use the C code too
 
   There was some duplication of the tracepoint code for Rust that did the
   same logic as the C code. Add a helper that makes it possible for both
   algorithms to use the same logic in one place.
 
 - Add poll to trace event hist files
 
   It is useful to know when an event is triggered, or even with some
   filtering. Since hist files of events get updated when active and the
   event is triggered, allow applications to poll the hist file and wake up
   when an event is triggered. This will let the application know that the
   event it is waiting for happened.
 
 - Add :mod: command to enable events for current or future modules
 
   The function tracer already has a way to enable functions to be traced in
   modules by writing ":mod:<module>" into set_ftrace_filter. That will
   enable either all the functions for the module if it is loaded, or if it
   is not, it will cache that command, and when the module is loaded that
   matches <module>, its functions will be enabled. This also allows init
   functions to be traced. But currently events do not have that feature.
 
   Add the command where if ':mod:<module>' is written into set_event, then
   either all the modules events are enabled if it is loaded, or cache it so
   that the module's events are enabled when it is loaded. This also works
   from the kernel command line, where "trace_event=:mod:<module>", when the
   module is loaded at boot up, its events will be enabled then.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ5EbMxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkZsAP9Amgx9frSbR1pn1t0I3wVnQx7khgOu
 s/b8Ro+vjTx1/QD/RN2AA7f+HK4F27w3Aqfrs0nKXAPtXWsJ9Epp8raG5w8=
 =Pg+4
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Cleanup with guard() and free() helpers

   There were several places in the code that had a lot of "goto out" in
   the error paths to either unlock a lock or free some memory that was
   allocated. But this is error prone. Convert the code over to use the
   guard() and free() helpers that let the compiler unlock locks or free
   memory when the function exits.

 - Update the Rust tracepoint code to use the C code too

   There was some duplication of the tracepoint code for Rust that did
   the same logic as the C code. Add a helper that makes it possible for
   both algorithms to use the same logic in one place.

 - Add poll to trace event hist files

   It is useful to know when an event is triggered, or even with some
   filtering. Since hist files of events get updated when active and the
   event is triggered, allow applications to poll the hist file and wake
   up when an event is triggered. This will let the application know
   that the event it is waiting for happened.

 - Add :mod: command to enable events for current or future modules

   The function tracer already has a way to enable functions to be
   traced in modules by writing ":mod:<module>" into set_ftrace_filter.
   That will enable either all the functions for the module if it is
   loaded, or if it is not, it will cache that command, and when the
   module is loaded that matches <module>, its functions will be
   enabled. This also allows init functions to be traced. But currently
   events do not have that feature.

   Add the command where if ':mod:<module>' is written into set_event,
   then either all the modules events are enabled if it is loaded, or
   cache it so that the module's events are enabled when it is loaded.
   This also works from the kernel command line, where
   "trace_event=:mod:<module>", when the module is loaded at boot up,
   its events will be enabled then.

* tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
  tracing: Fix output of set_event for some cached module events
  tracing: Fix allocation of printing set_event file content
  tracing: Rename update_cache() to update_mod_cache()
  tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES
  selftests/ftrace: Add test that tests event :mod: commands
  tracing: Cache ":mod:" events for modules not loaded yet
  tracing: Add :mod: command to enabled module events
  selftests/tracing: Add hist poll() support test
  tracing/hist: Support POLLPRI event for poll on histogram
  tracing/hist: Add poll(POLLIN) support on hist file
  tracing: Fix using ret variable in tracing_set_tracer()
  tracepoint: Reduce duplication of __DO_TRACE_CALL
  tracing/string: Create and use __free(argv_free) in trace_dynevent.c
  tracing: Switch trace_stat.c code over to use guard()
  tracing: Switch trace_stack.c code over to use guard()
  tracing: Switch trace_osnoise.c code over to use guard() and __free()
  tracing: Switch trace_events_synth.c code over to use guard()
  tracing: Switch trace_events_filter.c code over to use guard()
  tracing: Switch trace_events_trigger.c code over to use guard()
  tracing: Switch trace_events_hist.c code over to use guard()
  ...
2025-01-23 17:51:16 -08:00
Linus Torvalds d0f93ac2c3 Documentation changes this time around include:
- Quite a bit of Chinese and Spanish translation work.
 
 - Clarifying that Git commit IDs >12chars are OK
 
 - A new nvme-multipath document
 
 - A reorganization of the admin-guide top-level page to make it readable
 
 - Clarification of the role of Acked-by and maintainer discretion on their
   acceptance.
 
 - Some reorganization of debugging-oriented docs.
 
 ...and typo fixes, documentation updates, etc. as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmeOp8EPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YipUH/iffvlVYuqoVdPUFWdmsiNjwOCRE2MIfp8qO
 tPTRRHJAny+NlFT0IWlGUbLNoNXtvpN47YlkaeAjdrsjASerfpwzje7t4Z1B+jWT
 0YwGBCvDIGasfRCx7D14+w5aqkEEynfsy+QurwcuDxcHMQGwt7ZCuTNOVO6BULEr
 L++BMwqapUr5IemP4ItQqDVVF9sp6bWEhaOnTTJCLU6oG23uUSSA/59sJmwDJUk7
 6J3VGO1An4Jte9WX7qkVrSBNO5cOOhaFiFXIeNxfOioOPctBwxKiHDJnzVud8ipz
 R+tnUI/8hEvyJ7GZFezyZxmMnFs0P2DEYAkaN+hBs/nUjx0dKUg=
 =YxaS
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.14' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:

 - Quite a bit of Chinese and Spanish translation work

 - Clarifying that Git commit IDs >12chars are OK

 - A new nvme-multipath document

 - A reorganization of the admin-guide top-level page to make it
   readable

 - Clarification of the role of Acked-by and maintainer discretion on
   their acceptance

 - Some reorganization of debugging-oriented docs

... and typo fixes, documentation updates, etc as usual

* tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits)
  Documentation: Fix x86_64 UEFI outdated references to elilo
  Documentation/sysctl: Add timer_migration to kernel.rst
  docs/mm: Physical memory: Remove zone_t
  docs: submitting-patches: clarify that signers may use their discretion on tags
  docs: submitting-patches: clarify difference between Acked-by and Reviewed-by
  docs: submitting-patches: clarify Acked-by and introduce "# Suffix"
  Documentation: bug-hunting.rst: remove odd contact information
  docs/zh_CN: Add sak index Chinese translation
  doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes
  doc: module: Fix documented type of namespace
  Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
  docs/zh_CN: Add landlock index Chinese translation
  Documentation: Fix typo localmodonfig -> localmodconfig
  overlayfs.rst: Fix and improve grammar
  docs/zh_CN: Add siphash index Chinese translation
  docs/zh_CN: Add security IMA-templates Chinese translation
  docs/zh_CN: Add security digsig Chinese translation
  Align git commit ID abbreviation guidelines and checks
  docs: process: submitting-patches: split canonical patch format section
  docs/zh_CN: Add security lsm Chinese translation
  ...
2025-01-21 18:00:00 -08:00
Akihiko Odaki b388face5f Documentation: Fix pci=config_acs= example
The documentation currently says:

  config_acs=
                  Format:
                  <ACS flags>@<pci_dev>[; ...]
                  Specify one or more PCI devices (in the format
                  specified above) optionally prepended with flags
                  and separated by semicolons. The respective
                  capabilities will be enabled, disabled or
                  unchanged based on what is specified in
                  flags.
           (...)
                  For example,
                    pci=config_acs=10x
                  would configure all devices that support
                  ACS to enable P2P Request Redirect, disable
                  Translation Blocking, and leave Source
                  Validation unchanged from whatever power-up
                  or firmware set it to.

See the complete documentation at:

  https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

However, a flag specification always needs to be suffixed with "@" and
a PCI valid device address, which is missing in this example. Also, to
configure all devices that support ACS, the flag needs to be suffixed
with "@pci:0:0", for the ACS support to be enabled.

Fix the documentation so the example is correct.

Link: https://lore.kernel.org/r/20240915-acs-v1-1-b9ee536ee9bd@daynix.com
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-21 17:29:11 -06:00
Linus Torvalds 9f3ee94e70 RCU pull request for v6.14
This pull request contains the following branches:
 
 fixes.2024.12.14a: Misc fixes, check if IRQs are disabled in rcu_exp_need_qs(),
     instrument KCSAN exclusive-writer assertions, add extra WARN_ON_ONCE() check,
     set the cpu_no_qs.b.exp under lock, warn if callback enqueued on offline CPU.
 
 rcutorture.2024.12.14a: Torture-test updates, add rcutorture.preempt_duration kernel
     module parameter, make the TREE03 scenario do preemption, improve pooling timeouts
     for rcu_torture_writer(), improve output of "Failure/close-call rcutorture reader
     segments", add some reader-state debugging checks, update doc of polled APIs, add
     extra diagnostics for per-reader-segment preemption.
 
 srcu.2024.12.14a: SRCU updates, improve doc for srcu_read_lock() in terms of return
     value, fix typo in comments, remove redundant GP sequence checks in the
     srcu_funnel_gp_start.
 
 torture-test.2024.12.14a: Add an extra test for sched_clock(), improve testing
     on unresponsive systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEu6QRe/mAUYNn5U0PBYqkjnKWLM8FAmeGprwACgkQBYqkjnKW
 LM+QUwv/VVwYKI3f9eH4AcjijIFufmsP3I5REfY3s7+a6BItwZrulwhGK+mWWE+9
 nKjsrrjw3sv3dEvdaUfZxwiLQBfJmWdUUGTZ748qyCpidlo4wB7OW/BR+Pn4ZiB/
 rWkn28fHdDxUJV+nQGbGC82EiGrLC9XYlbTbnh9VzGEjcyKIIU3Dw5tGXzEVMn5w
 Tc6H6jWg8+fXxIdmhdEkjpH+rS9H160Lt1bGeGadI3LMdmMj89x5u+i6gheT83WL
 FBBwgNVITWPTwfQFyK4wuRcKzi/UIrRdQIU+2xqJKs6NeWhwqhFDfW8FP5brBI2o
 f7fFQA+CbP/oRCqXCaZKmB3i/xGeJUsJ/IJ992jq61TCLNoc3LDxovYdaHfUJgph
 W/8KHUc8oZMQU4CjGJkj30jnpBLPwZeZuJvuTfQZl5QuBeUivIMg89cXLpE9Vnny
 yof3pm++Fru8wJ0rooq3ef2A5vblpoBQcnYelQV2EJisMCOkd+P5PNVA2viTrG8F
 QGfmDzm1
 =EwPt
 -----END PGP SIGNATURE-----

Merge tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Uladzislau Rezki:
 "Misc fixes:
   - check if IRQs are disabled in rcu_exp_need_qs()
   - instrument KCSAN exclusive-writer assertions
   - add extra WARN_ON_ONCE() check
   - set the cpu_no_qs.b.exp under lock
   - warn if callback enqueued on offline CPU

  Torture-test updates:
   - add rcutorture.preempt_duration kernel module parameter
   - make the TREE03 scenario do preemption
   - improve pooling timeouts for rcu_torture_writer()
   - improve output of "Failure/close-call rcutorture reader segments"
   - add some reader-state debugging checks
   - update doc of polled APIs
   - add extra diagnostics for per-reader-segment preemption
   - add an extra test for sched_clock()
   - improve testing on unresponsive systems

  SRCU updates:
   - improve doc for srcu_read_lock() in terms of return value
   - fix typo in comments
   - remove redundant GP sequence checks in the srcu_funnel_gp_start"

* tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start
  srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/
  srcu: Guarantee non-negative return value from srcu_read_lock()
  MAINTAINERS: Update RCU git tree
  rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
  rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
  rcu: Make preemptible rcu_exp_handler() check idempotency
  rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
  rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
  rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
  rcu: Report callbacks enqueued on offline CPU blind spot
  rcutorture: Use symbols for SRCU reader flavors
  rcutorture: Add per-reader-segment preemption diagnostics
  rcutorture: Read CPU ID for decoration protected by both reader types
  rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics
  rcutorture: Add parameters to control polled/conditional wait interval
  rcutorture: Add documentation for recent conditional and polled APIs
  rcutorture: Ignore attempts to test preemption and forward progress
  rcutorture: Make rcutorture_one_extend() check reader state
  rcutorture: Pretty-print rcutorture reader segments
  ...
2025-01-21 14:39:21 -08:00
Linus Torvalds 62de6e1685 Scheduler enhancements for v6.14:
- Fair scheduler (SCHED_FAIR) enhancements:
 
    - Behavioral improvements:
      - Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)
 
    - Delayed-dequeue enhancements & fixes: (Vincent Guittot)
 
      - Rename h_nr_running into h_nr_queued
      - Add new cfs_rq.h_nr_runnable
      - Use the new cfs_rq.h_nr_runnable
      - Removed unsued cfs_rq.h_nr_delayed
      - Rename cfs_rq.idle_h_nr_running into h_nr_idle
      - Remove unused cfs_rq.idle_nr_running
      - Rename cfs_rq.nr_running into nr_queued
      - Do not try to migrate delayed dequeue task
      - Fix variable declaration position
      - Encapsulate set custom slice in a __setparam_fair() function
 
    - Fixes:
      - Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
      - Fix CPU bandwidth limit bypass during CPU hotplug (Vishal Chourasia)
 
    - Cleanups:
      - Clean up in migrate_degrades_locality() to improve
        readability (Peter Zijlstra)
      - Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
      - Update comments after sched_tick() rename (Sebastian Andrzej Siewior)
      - Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
        (Valentin Schneider)
 
  - Deadline scheduler (SCHED_DL) enhancements:
 
    - Restore dl_server bandwidth on non-destructive root domain
      changes (Juri Lelli)
 
    - Correctly account for allocated bandwidth during
      hotplug (Juri Lelli)
 
    - Check bandwidth overflow earlier for hotplug (Juri Lelli)
 
    - Clean up goto label in pick_earliest_pushable_dl_task()
      (John Stultz)
 
    - Consolidate timer cancellation (Wander Lairson Costa)
 
  - Load-balancer enhancements:
 
    - Improve performance by prioritizing migrating eligible
      tasks in sched_balance_rq() (Hao Jia)
 
    - Do not compute NUMA Balancing stats unnecessarily during
      load-balancing (K Prateek Nayak)
 
    - Do not compute overloaded status unnecessarily during
      load-balancing (K Prateek Nayak)
 
  - Generic scheduling code enhancements:
 
    - Use READ_ONCE() in task_on_rq_queued(), to consistently use
      the WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)
 
  - Isolated CPUs support enhancements: (Waiman Long)
 
    - Make "isolcpus=nohz" equivalent to "nohz_full"
    - Consolidate housekeeping cpumasks that are always identical
    - Remove HK_TYPE_SCHED
    - Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE
 
  - RSEQ enhancements:
 
    - Validate read-only fields under DEBUG_RSEQ config
      (Mathieu Desnoyers)
 
  - PSI enhancements:
 
    - Fix race when task wakes up before psi_sched_switch()
      adjusts flags (Chengming Zhou)
 
  - IRQ time accounting performance enhancements: (Yafang Shao)
 
    - Define sched_clock_irqtime as static key
    - Don't account irq time if sched_clock_irqtime is disabled
 
  - Virtual machine scheduling enhancements:
 
    - Don't try to catch up excess steal time (Suleiman Souhlal)
 
  - Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)
 
    - Convert "sysctl_sched_itmt_enabled" to boolean
    - Use guard() for itmt_update_mutex
    - Move the "sched_itmt_enabled" sysctl to debugfs
    - Remove x86_smt_flags and use cpu_smt_flags directly
    - Use x86_sched_itmt_flags for PKG domain unconditionally
 
  - Debugging code & instrumentation enhancements:
 
    - Change need_resched warnings to pr_err() (David Rientjes)
    - Print domain name in /proc/schedstat (K Prateek Nayak)
    - Fix value reported by hot tasks pulled in /proc/schedstat (Peter Zijlstra)
    - Report the different kinds of imbalances in /proc/schedstat (Swapnil Sapkal)
    - Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
    - Update Schedstat version to 17 (Swapnil Sapkal)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmePSRcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hrdBAAjYiLl5Q8SHM0xnl+kbvuUkCTgEB/gSgA
 mfrZtHRUgRZuA89NZ9NljlCkQSlsLTOjnpNuaeFzs529GMg9iemc99dbnz3BP5F3
 V5qpYvWe7yIkJ3hd0TOGLmYEPMNQaAW57YBOrxcPjWNLJ4cr9iMdccVA1OQtcmqD
 ZUh3nibv81QI8HDmT2G+figxEIqH3yBV1+SmEIxbrdkQpIJ5702Ng6+0KQK5TShN
 xwjFELWZUl2TfkoCc4nkIpkImV6cI1DvXSw1xK6gbb1xEVOrsmFW3TYFw4trKHBu
 2RBG4wtmzNjh+12GmSdIBJHogPNcay+JIJW9EG/unT7jirqzkkeP1X2eJEbh+X1L
 CMa7GsD9Vy72jCzeJDMuiy7bKfG/MiKUtDXrAZQDo2atbw7H88QOzMuTE5a5WSV+
 tRxXGI/dgFVOk+JQUfctfJbYeXjmG8GAflawvXtGDAfDZsja6M+65fH8p0AOgW1E
 HHmXUzAe2E2xQBiSok/DYHPQeCDBAjoJvU93YhGiXv8UScb2UaD4BAfzfmc8P+Zs
 Eox6444ah5U0jiXmZ3HU707n1zO+Ql4qKoyyMJzSyP+oYHE/Do7NYTElw2QovVdN
 FX/9Uae8T4ttA/5lFe7FNoXgKvSxXDKYyKLZcysjVrWJF866Ui/TWtmxA6w8Osn7
 sfucuLawLPM=
 =5ZNW
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Fair scheduler (SCHED_FAIR) enhancements:

   - Behavioral improvements:
      - Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)

   - Delayed-dequeue enhancements & fixes: (Vincent Guittot)
      - Rename h_nr_running into h_nr_queued
      - Add new cfs_rq.h_nr_runnable
      - Use the new cfs_rq.h_nr_runnable
      - Removed unsued cfs_rq.h_nr_delayed
      - Rename cfs_rq.idle_h_nr_running into h_nr_idle
      - Remove unused cfs_rq.idle_nr_running
      - Rename cfs_rq.nr_running into nr_queued
      - Do not try to migrate delayed dequeue task
      - Fix variable declaration position
      - Encapsulate set custom slice in a __setparam_fair() function

   - Fixes:
      - Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
      - Fix CPU bandwidth limit bypass during CPU hotplug (Vishal
        Chourasia)

   - Cleanups:
      - Clean up in migrate_degrades_locality() to improve readability
        (Peter Zijlstra)
      - Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
      - Update comments after sched_tick() rename (Sebastian Andrzej
        Siewior)
      - Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
        (Valentin Schneider)

  Deadline scheduler (SCHED_DL) enhancements:

   - Restore dl_server bandwidth on non-destructive root domain changes
     (Juri Lelli)

   - Correctly account for allocated bandwidth during hotplug (Juri
     Lelli)

   - Check bandwidth overflow earlier for hotplug (Juri Lelli)

   - Clean up goto label in pick_earliest_pushable_dl_task() (John
     Stultz)

   - Consolidate timer cancellation (Wander Lairson Costa)

  Load-balancer enhancements:

   - Improve performance by prioritizing migrating eligible tasks in
     sched_balance_rq() (Hao Jia)

   - Do not compute NUMA Balancing stats unnecessarily during
     load-balancing (K Prateek Nayak)

   - Do not compute overloaded status unnecessarily during
     load-balancing (K Prateek Nayak)

  Generic scheduling code enhancements:

   - Use READ_ONCE() in task_on_rq_queued(), to consistently use the
     WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)

  Isolated CPUs support enhancements: (Waiman Long)

   - Make "isolcpus=nohz" equivalent to "nohz_full"
   - Consolidate housekeeping cpumasks that are always identical
   - Remove HK_TYPE_SCHED
   - Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE

  RSEQ enhancements:

   - Validate read-only fields under DEBUG_RSEQ config (Mathieu
     Desnoyers)

  PSI enhancements:

   - Fix race when task wakes up before psi_sched_switch() adjusts flags
     (Chengming Zhou)

  IRQ time accounting performance enhancements: (Yafang Shao)

   - Define sched_clock_irqtime as static key
   - Don't account irq time if sched_clock_irqtime is disabled

  Virtual machine scheduling enhancements:

   - Don't try to catch up excess steal time (Suleiman Souhlal)

  Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)

   - Convert "sysctl_sched_itmt_enabled" to boolean
   - Use guard() for itmt_update_mutex
   - Move the "sched_itmt_enabled" sysctl to debugfs
   - Remove x86_smt_flags and use cpu_smt_flags directly
   - Use x86_sched_itmt_flags for PKG domain unconditionally

  Debugging code & instrumentation enhancements:

   - Change need_resched warnings to pr_err() (David Rientjes)
   - Print domain name in /proc/schedstat (K Prateek Nayak)
   - Fix value reported by hot tasks pulled in /proc/schedstat (Peter
     Zijlstra)
   - Report the different kinds of imbalances in /proc/schedstat
     (Swapnil Sapkal)
   - Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
   - Update Schedstat version to 17 (Swapnil Sapkal)"

* tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  rseq: Fix rseq unregistration regression
  psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
  sched, psi: Don't account irq time if sched_clock_irqtime is disabled
  sched: Don't account irq time if sched_clock_irqtime is disabled
  sched: Define sched_clock_irqtime as static key
  sched/fair: Do not compute overloaded status unnecessarily during lb
  sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb
  x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
  x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
  x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
  x86/itmt: Use guard() for itmt_update_mutex
  x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
  sched/core: Prioritize migrating eligible tasks in sched_balance_rq()
  sched/debug: Change need_resched warnings to pr_err
  sched/fair: Encapsulate set custom slice in a __setparam_fair() function
  sched: Fix race between yield_to() and try_to_wake_up()
  docs: Update Schedstat version to 17
  sched/stats: Print domain name in /proc/schedstat
  sched: Move sched domain name out of CONFIG_SCHED_DEBUG
  sched: Report the different kinds of imbalances in /proc/schedstat
  ...
2025-01-21 11:32:36 -08:00
Steven Rostedt b355247df1 tracing: Cache ":mod:" events for modules not loaded yet
When the :mod: command is written into /sys/kernel/tracing/set_event (or
that file within an instance), if the module specified after the ":mod:"
is not yet loaded, it will store that string internally. When the module
is loaded, it will enable the events as if the module was loaded when the
string was written into the set_event file.

This can also be useful to enable events that are in the init section of
the module, as the events are enabled before the init section is executed.

This also works on the kernel command line:

 trace_event=:mod:<module>

Will enable the events for <module> when it is loaded.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250116143533.514730995@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-16 09:41:08 -05:00
Baolin Wang d635ccdb43 mm: shmem: add a kernel command line to change the default huge policy for tmpfs
Now the tmpfs can allow to allocate any sized large folios, and the default
huge policy is still preferred to be 'never'. Due to tmpfs not behaving like
other file systems in some cases as previously explained by David[1]:

: I think I raised this in the past, but tmpfs/shmem is just like any
: other file system .. except it sometimes really isn't and behaves much
: more like (swappable) anonymous memory. (or mlocked files)
: 
: There are many systems out there that run without swap enabled, or with
: extremely minimal swap (IIRC until recently kubernetes was completely
: incompatible with swapping). Swap can even be disabled today for shmem
: using a mount option.
: 
: That's a big difference to all other file systems where you are
: guaranteed to have backend storage where you can simply evict under
: memory pressure (might temporarily fail, of course).
: 
: I *think* that's the reason why we have the "huge=" parameter that also
: controls the THP allocations during page faults (IOW possible memory
: over-allocation). Maybe also because it was a new feature, and we only
: had a single THP size.

Thus adding a new command line to change the default huge policy will be
helpful to use the large folios for tmpfs, which is similar to the
'transparent_hugepage_shmem' cmdline for shmem.

[1] https://lore.kernel.org/all/cbadd5fe-69d5-4c21-8eb8-3344ed36c721@redhat.com/

Link: https://lkml.kernel.org/r/ff390b2656f0d39649547f8f2cbb30fcb7e7be2d.1732779148.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13 22:40:37 -08:00
Lubomir Rintel 769b83735d Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
A very minor oversight that dates all the way back to rst migration in
commit 9d85025b04 ("docs-rst: create an user's manual book").

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20241231190240.417446-1-lkundrak@v3.sk
2025-01-09 12:34:36 -07:00
Mostafa Saleh e8440c1e2d Documentation: Update the behaviour of "kvm-arm.mode"
Commit 5053c3f051 ("KVM: arm64: Use hVHE in pKVM by default on CPUs with
VHE support") modified the behaviour of "kvm-arm.mode=protected" without
the updating the kernel parameters doc.

Update it to match the current implementation.

Also, update required architecture version for nested virtualization as
suggested by Marc.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20241025093259.2216093-1-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08 10:17:13 +00:00
Borislav Petkov (AMD) 1146f7429f Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text
s/lode/load/

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241202120416.6054-5-bp@kernel.org
2024-12-30 17:58:58 +01:00
Paul E. McKenney 282e06cc8f rcutorture: Add parameters to control polled/conditional wait interval
This commit adds rcutorture module parameters gp_cond_wi, gp_cond_wi_exp,
gp_poll_wi, and gp_poll_wi_exp to control the wait interval for
conditional, conditional expedited, polled, and polled expedited grace
periods, respectively.  When rcu_torture_writer() is testing these types
of grace periods, hrtimers are used to randomly wait up to the specified
number of microseconds, but with nanosecond granularity.

In the case of conditional grace periods (get_state_synchronize_rcu()
and cond_synchronize_rcu(), for example) there is just one
wait.  For polled grace periods (start_poll_synchronize_rcu() and
poll_state_synchronize_rcu(), for example), there is a repeated series
of waits until the grace period ends.

For normal grace periods, the default is 16 jiffies (for example, 16,000
microseconds on a HZ=1000 system) and for expedited grace periods the
default is 128 microseconds.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-14 17:05:27 +01:00
Paul E. McKenney cae7f6319e rcutorture: Add documentation for recent conditional and polled APIs
This commit adds kernel-parameters.txt documentation for rcutorture's
(relatively) new gp_cond_exp, gp_cond_full, gp_cond_exp, gp_poll,
gp_poll_exp, gp_poll_full, and gp_poll_exp module parameters.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
2024-12-14 17:05:17 +01:00