Commit Graph

70 Commits

Author SHA1 Message Date
Linus Torvalds 7203ca412f Significant patch series in this merge are as follows:
- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
   Uladzislau Rezki reworks the vmalloc() code to support non-blocking
   allocations (GFP_ATOIC, GFP_NOWAIT).
 
 - The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
   a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
   across fork/exec.
 
 - The 4 patch series "mm/zswap: misc cleanup of code and documentations"
   from SeongJae Park does some light maintenance work on the zswap code.
 
 - The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
   and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
   /sys/kernel/debug/page_owner debug feature.  It adds unique identifiers
   to differentiate the various stack traces so that userspace monitoring
   tools can better match stack traces over time.
 
 - The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
   Hahn makes some minor alterations to the page allocator's per-cpu-pages
   feature.
 
 - The 2 patch series "Improve UFFDIO_MOVE scalability by removing
   anon_vma lock" from Lokesh Gidra addresses a scalability issue in
   userfaultfd's UFFDIO_MOVE operation.
 
 - The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
   Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
 
 - The 2 patch series "drivers/base/node: fold node register and
   unregister functions" from Donet Tom cleans up the NUMA node handling
   code a little.
 
 - The 4 patch series "mm: some optimizations for prot numa" from Kefeng
   Wang provides some cleanups and small optimizations to the NUMA
   allocation hinting code.
 
 - The 5 patch series "mm/page_alloc: Batch callers of
   free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
   boot on large machines.  These were causing (harmless) softlockup
   warnings.
 
 - The 2 patch series "optimize the logic for handling dirty file folios
   during reclaim" from Baolin Wang removes some now-unnecessary work from
   page reclaim.
 
 - The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
   per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
   feature.
 
 - The 2 patch series "mm/damon: fixes for address alignment issues in
   DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
   and DAMON_RECLAIM with certain userspace configuration.
 
 - The 15 patch series "expand mmap_prepare functionality, port more
   users" from Lorenzo Stoakes enhances the new(ish)
   file_operations.mmap_prepare() method and ports additional callsites
   from the old ->mmap() over to ->mmap_prepare().
 
 - The 8 patch series "Fix stale IOTLB entries for kernel address space"
   from Lu Baolu fixes a bug (and possible security issue on non-x86) in
   the IOMMU code.  In some situations the IOMMU could be left hanging onto
   a stale kernel pagetable entry.
 
 - The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
   from Wei Yang cleans up and optimizes the folio splitting code.
 
 - The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
   Song implements some cleanups and a minor fix in the swap discard code.
 
 - The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
   Park does as advertised.
 
 - The 9 patch series "mm/damon: support pin-point targets removal" from
   SeongJae Park permits userspace to remove a specific monitoring target
   in the middle of the current targets list.
 
 - The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
   from Harry Yoo implements a couple of cleanups related to mm header file
   inclusion.
 
 - The 2 patch series "mm/swapfile.c: select swap devices of default
   priority round robin" from Baoquan He improves the selection of swap
   devices for NUMA machines.
 
 - The 3 patch series "mm: Convert memory block states (MEM_*) macros to
   enums" from Israel Batista changes the memory block labels from macros
   to enums so they will appear in kernel debug info.
 
 - The 3 patch series "ksm: perform a range-walk to jump over holes in
   break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
   unmerges an address range.
 
 - The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
   from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
   userspace unit tests.
 
 - The 2 patch series "some cleanups for pageout()" from Baolin Wang
   cleans up a couple of minor things in the page scanner's
   writeback-for-eviction code.
 
 - The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
   Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
 
 - The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
   Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
   and improves the mergeability of guarded VMAs.
 
 - The 2 patch series "mm: perform guard region install/remove under VMA
   lock" from Lorenzo Stoakes reduces mmap lock contention for callers
   performing VMA guard region operations.
 
 - The 2 patch series "vma_start_write_killable" from Matthew Wilcox
   starts work in permitting applications to be killed when they are
   waiting on a read_lock on the VMA lock.
 
 - The 11 patch series "mm/damon/tests: add more tests for online
   parameters commit" from SeongJae Park adds additional userspace testing
   of DAMON's "commit" feature.
 
 - The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
   that.
 
 - The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
   Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
   that VMA is merged with another.
 
 - The 16 patch series "mm: support device-private THP" from Balbir Singh
   introduces support for Transparent Huge Page (THP) migration in zone
   device-private memory.
 
 - The 3 patch series "Optimize folio split in memory failure" from Zi
   Yan optimizes folio split operations in the memory failure code.
 
 - The 2 patch series "mm/huge_memory: Define split_type and consolidate
   split support checks" from Wei Yang provides some more cleanups in the
   folio splitting code.
 
 - The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
   entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
   handling of pagetable leaf entries by introducing the concept of
   'software leaf entries', of type softleaf_t.
 
 - The 4 patch series "reparent the THP split queue" from Muchun Song
   reparents the THP split queue to its parent memcg.  This is in
   preparation for addressing the long-standing "dying memcg" problem,
   wherein dead memcg's linger for too long, consuming memory resources.
 
 - The 3 patch series "unify PMD scan results and remove redundant
   cleanup" from Wei Yang does a little cleanup in the hugepage collapse
   code.
 
 - The 6 patch series "zram: introduce writeback bio batching" from
   Sergey Senozhatsky improves zram writeback efficiency by introducing
   batched bio writeback support.
 
 - The 4 patch series "memcg: cleanup the memcg stats interfaces" from
   Shakeel Butt cleans up our handling of the interrupt safety of some
   memcg stats.
 
 - The 4 patch series "make vmalloc gfp flags usage more apparent" from
   Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
 
 - The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
   from Chunyan Zhang teches soft dirty and userfaultfd write protect
   tracking to use RISC-V's Svrsw60t59b extension.
 
 - The 5 patch series "mm: swap: small fixes and comment cleanups" from
   Youngjun Park fixes a small bug and cleans up some of the swap code.
 
 - The 4 patch series "initial work on making VMA flags a bitmap" from
   Lorenzo Stoakes starts work on converting the vma struct's flags to a
   bitmap, so we stop running out of them, especially on 32-bit.
 
 - The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
   from Youngjun Park addresses a possible bug in the swap discard code and
   cleans things up a little.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
 jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
 o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
 =IUzS
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

  "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
     Rework the vmalloc() code to support non-blocking allocations
     (GFP_ATOIC, GFP_NOWAIT)

  "ksm: fix exec/fork inheritance" (xu xin)
     Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
     inherited across fork/exec

  "mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
     Some light maintenance work on the zswap code

  "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
     Enhance the /sys/kernel/debug/page_owner debug feature by adding
     unique identifiers to differentiate the various stack traces so
     that userspace monitoring tools can better match stack traces over
     time

  "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
     Minor alterations to the page allocator's per-cpu-pages feature

  "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
     Address a scalability issue in userfaultfd's UFFDIO_MOVE operation

  "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)

  "drivers/base/node: fold node register and unregister functions" (Donet Tom)
     Clean up the NUMA node handling code a little

  "mm: some optimizations for prot numa" (Kefeng Wang)
     Cleanups and small optimizations to the NUMA allocation hinting
     code

  "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
     Address long lock hold times at boot on large machines. These were
     causing (harmless) softlockup warnings

  "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
     Remove some now-unnecessary work from page reclaim

  "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
     Enhance the DAMOS auto-tuning feature

  "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
     Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
     configuration

  "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
     Enhance the new(ish) file_operations.mmap_prepare() method and port
     additional callsites from the old ->mmap() over to ->mmap_prepare()

  "Fix stale IOTLB entries for kernel address space" (Lu Baolu)
     Fix a bug (and possible security issue on non-x86) in the IOMMU
     code. In some situations the IOMMU could be left hanging onto a
     stale kernel pagetable entry

  "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
     Clean up and optimize the folio splitting code

  "mm, swap: misc cleanup and bugfix" (Kairui Song)
     Some cleanups and a minor fix in the swap discard code

  "mm/damon: misc documentation fixups" (SeongJae Park)

  "mm/damon: support pin-point targets removal" (SeongJae Park)
     Permit userspace to remove a specific monitoring target in the
     middle of the current targets list

  "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
     A couple of cleanups related to mm header file inclusion

  "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
     improve the selection of swap devices for NUMA machines

  "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
     Change the memory block labels from macros to enums so they will
     appear in kernel debug info

  "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
     Address an inefficiency when KSM unmerges an address range

  "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
     Fix leaks and unhandled malloc() failures in DAMON userspace unit
     tests

  "some cleanups for pageout()" (Baolin Wang)
     Clean up a couple of minor things in the page scanner's
     writeback-for-eviction code

  "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
     Move hugetlb's sysfs/sysctl handling code into a new file

  "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
     Make the VMA guard regions available in /proc/pid/smaps and
     improves the mergeability of guarded VMAs

  "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
     Reduce mmap lock contention for callers performing VMA guard region
     operations

  "vma_start_write_killable" (Matthew Wilcox)
     Start work on permitting applications to be killed when they are
     waiting on a read_lock on the VMA lock

  "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
     Add additional userspace testing of DAMON's "commit" feature

  "mm/damon: misc cleanups" (SeongJae Park)

  "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
     Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
     VMA is merged with another

  "mm: support device-private THP" (Balbir Singh)
     Introduce support for Transparent Huge Page (THP) migration in zone
     device-private memory

  "Optimize folio split in memory failure" (Zi Yan)

  "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
     Some more cleanups in the folio splitting code

  "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
     Clean up our handling of pagetable leaf entries by introducing the
     concept of 'software leaf entries', of type softleaf_t

  "reparent the THP split queue" (Muchun Song)
     Reparent the THP split queue to its parent memcg. This is in
     preparation for addressing the long-standing "dying memcg" problem,
     wherein dead memcg's linger for too long, consuming memory
     resources

  "unify PMD scan results and remove redundant cleanup" (Wei Yang)
     A little cleanup in the hugepage collapse code

  "zram: introduce writeback bio batching" (Sergey Senozhatsky)
     Improve zram writeback efficiency by introducing batched bio
     writeback support

  "memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
     Clean up our handling of the interrupt safety of some memcg stats

  "make vmalloc gfp flags usage more apparent" (Vishal Moola)
     Clean up vmalloc's handling of incoming GFP flags

  "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
     Teach soft dirty and userfaultfd write protect tracking to use
     RISC-V's Svrsw60t59b extension

  "mm: swap: small fixes and comment cleanups" (Youngjun Park)
     Fix a small bug and clean up some of the swap code

  "initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
     Start work on converting the vma struct's flags to a bitmap, so we
     stop running out of them, especially on 32-bit

  "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
     Address a possible bug in the swap discard code and clean things
     up a little

[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
  register device memory for poison handling") because it looks
  broken to me, I've asked for clarification   - Linus ]

* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
  mm: fix vma_start_write_killable() signal handling
  mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
  mm/swapfile: fix list iteration when next node is removed during discard
  fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
  mm/kfence: add reboot notifier to disable KFENCE on shutdown
  memcg: remove inc/dec_lruvec_kmem_state helpers
  selftests/mm/uffd: initialize char variable to Null
  mm: fix DEBUG_RODATA_TEST indentation in Kconfig
  mm: introduce VMA flags bitmap type
  tools/testing/vma: eliminate dependency on vma->__vm_flags
  mm: simplify and rename mm flags function for clarity
  mm: declare VMA flags by bit
  zram: fix a spelling mistake
  mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
  mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
  pagemap: update BUDDY flag documentation
  mm: swap: remove scan_swap_map_slots() references from comments
  mm: swap: change swap_alloc_slow() to void
  mm, swap: remove redundant comment for read_swap_cache_async
  mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
  ...
2025-12-05 13:52:43 -08:00
Linus Torvalds ce5cfb0fa2 IOMMU Updates for Linux v6.19
Including:
 
 	- Introduction of the generic IO page-table framework with support for
 	  Intel and AMD IOMMU formats from Jason. This has good potential for
 	  unifying more IO page-table implementations and making future
 	  enhancements more easy. But this also needed quite some fixes during
 	  development. All known issues have been fixed, but my feeling is that
 	  there is a higher potential than usual that more might be needed.
 
 	- Intel VT-d updates:
 	  - Use right invalidation hint in qi_desc_iotlb().
 
 	  - Reduce the scope of INTEL_IOMMU_FLOPPY_WA.
 
 	- ARM-SMMU updates:
 	  - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
 	    and a new clock for the TBU.
 
 	  - Fix error handling if level 1 CD table allocation fails.
 
 	  - Permit more than the architectural maximum number of SMRs for funky
 	    Qualcomm mis-implementations of SMMUv2.
 
 	- Mediatek driver:
 	  - MT8189 iommu support.
 
 	- Move ARM IO-pgtable selftests to kunit.
 
 	- Device leak fixes for a couple of drivers.
 
 	- Random smaller fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmktjWgACgkQK/BELZcB
 GuMQgA//YljqMVbMmYpFQF/9nXsyTvkpzaaVqj3CscjfnLJQ7YNIrWHLMw4xcZP7
 c2zsSDMPc5LAe2nkyNPvMeFufsbDOYE1CcbqFfZhNqwKGWIVs7J1j73aqJXKfXB4
 RaVv5GA1NFeeIRRKPx/ZpD9W1WiR9PiUQXBxNTAJLzbBNhcTJzo+3YuSpJ6ckOak
 aG67Aox6Dwq/u0gHy8gWnuA2XL6Eit+nQbod7TfchHoRu+TUdbv8qWL+sUChj+u/
 IlyBt1YL/do3bJC0G1G2E81J1KGPU/OZRfB34STQKlopEdXX17ax3b2X0bt3Hz/h
 9Yk3BLDtGMBQ0aVZzAZcOLLlBlEpPiMKBVuJQj29kK9KJSYfmr2iolOK0cGyt+kv
 DfQ8+nv6HRFMbwjetfmhGYf6WemPcggvX44Hm/rgR2qbN3P+Q8/klyyH8MLuQeqO
 ttoQIwDd9DYKJelmWzbLgpb2vGE3O0EAFhiTcCKOk643PaudfengWYKZpJVIIqtF
 nEUEpk17HlpgFkYrtmIE7CMONqUGaQYO84R3j7DXcYXYAvqQJkhR3uJejlWQeh8x
 uMc9y04jpg3p5vC5c7LfkQ3at3p/jf7jzz4GuNoZP5bdVIUkwXXolir0ct3/oby3
 /bXXNA1pSaRuUADm7pYBBhKYAFKC7vCSa8LVaDR3CB95aNZnvS0=
 =KG7j
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:

 - Introduction of the generic IO page-table framework with support for
   Intel and AMD IOMMU formats from Jason.

   This has good potential for unifying more IO page-table
   implementations and making future enhancements more easy. But this
   also needed quite some fixes during development. All known issues
   have been fixed, but my feeling is that there is a higher potential
   than usual that more might be needed.

 - Intel VT-d updates:
    - Use right invalidation hint in qi_desc_iotlb()
    - Reduce the scope of INTEL_IOMMU_FLOPPY_WA

 - ARM-SMMU updates:
    - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs
      and a new clock for the TBU.
    - Fix error handling if level 1 CD table allocation fails.
    - Permit more than the architectural maximum number of SMRs for
      funky Qualcomm mis-implementations of SMMUv2.

 - Mediatek driver:
    - MT8189 iommu support

 - Move ARM IO-pgtable selftests to kunit

 - Device leak fixes for a couple of drivers

 - Random smaller fixes and improvements

* tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits)
  iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
  iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
  powerpc/pseries/svm: Make mem_encrypt.h self contained
  genpt: Make GENERIC_PT invisible
  iommupt: Avoid a compiler bug with sw_bit
  iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal
  iommupt: Fix unlikely flows in increase_top()
  iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga()
  MAINTAINERS: Update my email address
  iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables
  dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock
  iommu/vt-d: Restore previous domain::aperture_end calculation
  iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb
  iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD
  iommu/tegra: fix device leak on probe_device()
  iommu/sun50i: fix device leak on of_xlate()
  iommu/omap: simplify probe_device() error handling
  iommu/omap: fix device leaks on probe_device()
  iommu/mediatek-v1: add missing larb count sanity check
  iommu/mediatek-v1: fix device leaks on probe()
  ...
2025-12-04 18:05:06 -08:00
SeongJae Park 8b02baf373 mm/damon: rename damos core filter helpers to have word core
Patch series "mm/damon: misc cleanups".

Yet another batch of misc cleanups and refactoring for DAMON code, tests,
and documents.

First two patches (1and 2) rename DAMOS core filters related code for
readability.

Three following patches (3-5) refactor page table walk callback functions
in DAMON, as suggested by Hugh and David, and I promised.

Next two patches (6 and 7) refactor DAMON core layer kunit test and sysfs
interface selftest to be simple and deduplicated.

Final two patches (8 and 9) fix up sphinx and grammatical errors on
documents.


This patch (of 9):

DAMOS filters handled by the core layer are called core filters, while
those handled by the ops layer are called ops filters.  They share the
same type but are managed in different places since core filters are
evaluated before the ops filters.  They also have different helper
functions that depend on their managed places.

The helper functions for ops filters have '_ops_' keyword on their name,
so it is easy to know they are for ops filters.  Meanwhile, the helper
functions for core filters are not having the 'core' keyword on their
name.  This makes it easy to be mistakenly used for ops filters.  Actually
there was such a bug.

To avoid future mistakes from similar confusions, rename DAMOS core
filters helper functions to have a keyword 'core' on their names.

Link: https://lkml.kernel.org/r/20251112154114.66053-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20251112154114.66053-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:44:00 -08:00
Jason Gunthorpe 7c5b184db7 genpt: Generic Page Table base API
The generic API is intended to be separated from the implementation of
page table algorithms. It contains only accessors for walking and
manipulating the table and helpers that are useful for building an
implementation. Memory management is not in the generic API, but part of
the implementation.

Using a multi-compilation approach the implementation module would include
headers in this order:

  common.h
  defs_FMT.h
  pt_defs.h
  FMT.h
  pt_common.h
  IMPLEMENTATION.h

Where each compilation unit would have a combination of FMT and
IMPLEMENTATION to produce a per-format per-implementation module.

The API is designed so that the format headers have minimal logic, and
default implementations are provided if the format doesn't include one.

Generally formats provide their code via an inline function using the
pattern:

  static inline FMTpt_XX(..) {}
  #define pt_XX FMTpt_XX

The common code then enforces a function signature so that there is no
drift in function arguments, or accidental polymorphic functions (as has
been slightly troublesome in mm). Use of function-like #defines are
avoided in the format even though many of the functions are small enough.

Provide kdocs for the API surface.

This is enough to implement the 8 initial format variations with all of
their features:
 * Entries comprised of contiguous blocks of IO PTEs for larger page
   sizes (AMDv1, ARMv8)
 * Multi-level tables, up to 6 levels. Runtime selected top level
 * The size of the top table level can be selected at runtime (ARM's
   concatenated tables)
 * The number of levels in the table can optionally increase dynamically
   during map (AMDv1)
 * Optional leaf entries at any level
 * 32 bit/64 bit virtual and output addresses, using every bit
 * Sign extended addressing (x86)
 * Dirty tracking

A basic simple format takes about 200 lines to declare the require inline
functions.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05 09:07:04 +01:00
Simona Vetter 6200442de0 Merge tag 'drm-misc-next-2025-10-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19:

UAPI Changes:

Cross-subsystem Changes:
-  fbcon cleanups.
- Make drivers depend on FB_TILEBLITTING instead of selecting it,
  and hide FB_MODE_HELPERS.

Core Changes:
- More preparations for rust.
- Throttle dirty worker with vblank
- Use drm_for_each_bridge_in_chain_scoped in drm's bridge code and
  assorted fixes.
- Ensure drm_client_modeset tests are enabled in UML.
- Rename ttm_bo_put to ttm_bo_fini, as a further step in removing the
  TTM bo refcount.
- Add POST_LT_ADJ_REQ training sequence.
- Show list of removed but still allocated bridges.
- Add a simulated vblank interrupt for hardware without it,
  and add some helpers to use them in vkms and hypervdrm.

Driver Changes:
- Assorted small fixes, cleanups and updates to host1x, tegra,
  panthor,   amdxdna, gud, vc4, ssd130x, ivpu, panfrost, panthor,
  sysfb, bridge/sn65dsi86, solomon, ast, tidss.
- Convert drivers from using .round_rate() to .determine_rate()
- Add support for KD116N3730A07/A12, chromebook mt8189, JT101TM023,
  LQ079L1SX01, raspberrypi 5" panels.
- Improve reclocking on tegra186+ with nouveau.
- Improve runtime pm in amdxdna.
- Add support for HTX_PAI in imx.
- Use a helper to calculate dumb buffer sizes in most drivers.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/b412fb91-8545-466a-8102-d89c0f2758a7@linux.intel.com
2025-10-21 10:16:34 +02:00
Luca Ceresoli 2f08387a44 drm/bridge: remove drm_for_each_bridge_in_chain()
All users have been replaced by drm_for_each_bridge_in_chain_scoped().

Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250808-drm-bridge-alloc-getput-for_each_bridge-v2-7-edb6ee81edf1@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2025-09-16 15:02:48 +02:00
Luca Ceresoli e46efc6a7d drm/bridge: add drm_for_each_bridge_in_chain_scoped()
drm_for_each_bridge_in_chain() iterates ofer the bridges in an encoder
chain without protecting the lifetime of the bridges using
drm_bridge_get/put(). This creates a risk window where the bridge could be
freed while iterating on it. Users of drm_for_each_bridge_in_chain() cannot
solve this reliably.

Add variant of drm_for_each_bridge_in_chain() that gets/puts the bridge
reference at the beginning/end of each iteration, and puts it if breaking
ot of the loop.

Note that this requires adding a new drm_bridge_get_next_bridge_and_put()
function because, unlike similar functions as __of_get_next_child(),
drm_bridge_get_next_bridge() gets the "next" pointer but does not put the
"prev" pointer. Unfortunately drm_bridge_get_next_bridge() cannot be
modified to put the "prev" pointer because some of its users rely on
this, such as drm_atomic_bridge_propagate_bus_flags().

Also deprecate drm_for_each_bridge_in_chain(), in preparation for removing
it after converting all users to the scoped version.

Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/r/20250808-drm-bridge-alloc-getput-for_each_bridge-v2-3-edb6ee81edf1@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2025-09-16 15:02:13 +02:00
Mike Rapoport (Microsoft) e68f150bc1 memblock: drop for_each_free_mem_pfn_range_in_zone_from()
for_each_free_mem_pfn_range_in_zone_from() and its "backend" implementation
__next_mem_pfn_range_in_zone() were only used by deferred initialization of
the memory map.

Remove them as they are not used anymore.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2025-09-14 08:49:03 +03:00
Dave Airlie 5e0c679981 Linux 6.15-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgX1CgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGxiIH/A7LHlVatGEQgRFi
 0JALDgcuGTMtMU1qD43rv8Z1GXqTpCAlaBt9D1C9cUH/86MGyBTVRWgVy0wkaU2U
 8QSfFWQIbrdaIzelHtzmAv5IDtb+KrcX1iYGLcMb6ZYaWkv8/CMzMX1nkgxEr1QT
 37Xo3/F17yJumAdNQxdRhVLGy2d3X5rScecpufwh97sMwoddllMCDs2LIoeSAYpG
 376/wzni09G2fADa8MEKqcaMue4qcf0FOo/gOkT8YwFGSZLKa6uumlBLg04QoCt0
 foK2vfcci1q4H4ZbCu3uQESYGLQHY0f2ICDCwC3m25VF9a81TmlbC3MLum3vhmKe
 RtLDcXg=
 =xyaI
 -----END PGP SIGNATURE-----

BackMerge tag 'v6.15-rc5' into drm-next

Linux 6.15-rc5, requested by tzimmerman for fixes required in drm-next.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-06 16:39:25 +10:00
Ingo Molnar d833dc597f clang-format: Update the ForEachMacros list for v6.15-rc1
One of my 'git grep' searches tripped on this file listing
an already removed <linux/list.h> primitive.

Refresh it.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-13 11:03:59 +02:00
José Expósito 2c7aafc05c
drm/vkms: Allow to attach connectors and encoders
Add a list of possible encoders to the connector configuration and
helpers to attach and detach them.

Now that the default configuration has its connector and encoder
correctly, configure the output following the configuration.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-15-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:28 +01:00
José Expósito da38c72018
drm/vkms: Allow to configure multiple connectors
Add a list of connectors to vkms_config and helper functions to add and
remove as many connectors as wanted.

For backwards compatibility, add one enabled connector to the default
configuration.

A future patch will allow to attach connectors and encoders, but for the
moment there are no changes in the way the output is configured.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-14-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:28 +01:00
José Expósito b8776fc9b2
drm/vkms: Allow to attach encoders and CRTCs
Add a list of possible CRTCs to the encoder configuration and helpers to
attach and detach them.

Now that the default configuration has its encoder and CRTC correctly
attached, configure the output following the configuration.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-13-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:27 +01:00
José Expósito f60a183dc9
drm/vkms: Allow to configure multiple encoders
Add a list of encoders to vkms_config and helper functions to add and
remove as many encoders as wanted.

For backwards compatibility, add one encoder to the default
configuration.

A future patch will allow to attach encoders and CRTCs, but for the
moment there are no changes in the way the output is configured.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-12-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:26 +01:00
José Expósito c204bf652a
drm/vkms: Allow to attach planes and CRTCs
Add a list of possible CRTCs to the plane configuration and helpers to
attach, detach and get the primary and cursor planes attached to a CRTC.

Now that the default configuration has its planes and CRTC correctly
attached, configure the output following the configuration.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-11-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:26 +01:00
José Expósito 600df32dac
drm/vkms: Allow to configure multiple CRTCs
Add a list of CRTCs to vkms_config and helper functions to add and
remove as many CRTCs as wanted.

For backwards compatibility, add one CRTC to the default configuration.

A future patch will allow to attach planes and CRTCs, but for the
moment there are no changes in the way the output is configured.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-10-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:24 +01:00
José Expósito bc5b0d5dcc
drm/vkms: Allow to configure multiple planes
Add a list of planes to vkms_config and create as many planes as
configured during output initialization.

For backwards compatibility, add one primary plane and, if configured,
one cursor plane and NUM_OVERLAY_PLANES planes to the default
configuration.

Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Co-developed-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218101214.5790-9-jose.exposito89@gmail.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-03-07 10:58:24 +01:00
Javier Carrasco c147f663b6 clang-format: Update with v6.11-rc1's `for_each` macro list
Re-run the shell fragment that generated the original list.

Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20240730-clang-format-for-each-macro-update-v2-1-254fca862c97@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-02 13:20:31 +02:00
SeongJae Park e3b10a02ca Docs: Move clang-format from process/ to dev-tools/
'clang-format' is on 'Other material' section of 'process/index', but it
may fit more under 'dev-tools/' directory.  Move it.

Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240624185312.94537-5-sj@kernel.org
2024-06-26 16:36:00 -06:00
Miguel Ojeda 5a205c6a9f clang-format: Update with v6.7-rc4's `for_each` macro list
Re-run the shell fragment that generated the original list.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-12-08 23:54:38 +01:00
Elliot Berman 2a0b726b04 clang-format: Add maple tree's for_each macros
Add maple tree's for_each macros so clang-format operates correctly on
{mt,mas}_for_each.

Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Link: https://lore.kernel.org/r/20231208-clang-format-mt-for-each-v1-1-b4b73186b886@quicinc.com
[ Sorted properly. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-12-08 23:53:49 +01:00
Jason Gunthorpe 3006b15b36 iommu: Add for_each_group_device()
Convenience macro to iterate over every struct group_device in the group.

Replace all open coded list_for_each_entry's with this macro.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-23 08:15:51 +02:00
Linus Torvalds 7acc137211 cxl for v6.4
- Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle
   mailbox) to be a facility of the PCI core rather than the CXL core.
   This is foundational for upcoming support for PCI device-attestation and
   PCIe / CXL link encryption.
 
 - Add support for retrieving and injecting poison for CXL memory
   expanders. This enabling uses trace-events to convey CXL media error
   records to user tooling. It includes translation of device-local
   addresses (DPA) to system physical addresses (SPA) and their
   corresponding CXL region.
 
 - Fixes for decoder enumeration that missed v6.3-final
 
 - Miscellaneous fixups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZE2nNwAKCRDfioYZHlFs
 Z5c2AQCTWebok6CD+HN01xnIx+CBWAUQe0QIGR40dT2P6/WGEgEA8wMae0w/FDlc
 lQDvSoIyPvy1hGO7Ppb0K2AT6jrQAgU=
 =blcC
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull compute express link updates from Dan Williams:
 "DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's
  blessing, and the CXL core continues to mature its media management
  capabilities with support for listing and injecting media errors. Some
  late fixes that missed v6.3-final are also included:

   - Refactor the DOE infrastructure (Data Object Exchange
     PCI-config-cycle mailbox) to be a facility of the PCI core rather
     than the CXL core.

     This is foundational for upcoming support for PCI
     device-attestation and PCIe / CXL link encryption.

   - Add support for retrieving and injecting poison for CXL memory
     expanders.

     This enabling uses trace-events to convey CXL media error records
     to user tooling. It includes translation of device-local addresses
     (DPA) to system physical addresses (SPA) and their corresponding
     CXL region.

   - Fixes for decoder enumeration that missed v6.3-final

   - Miscellaneous fixups"

* tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits)
  cxl/test: Add mock test for set_timestamp
  cxl/mbox: Update CMD_RC_TABLE
  tools/testing/cxl: Require CONFIG_DEBUG_FS
  tools/testing/cxl: Add a sysfs attr to test poison inject limits
  tools/testing/cxl: Use injected poison for get poison list
  tools/testing/cxl: Mock the Clear Poison mailbox command
  tools/testing/cxl: Mock the Inject Poison mailbox command
  cxl/mem: Add debugfs attributes for poison inject and clear
  cxl/memdev: Trace inject and clear poison as cxl_poison events
  cxl/memdev: Warn of poison inject or clear to a mapped region
  cxl/memdev: Add support for the Clear Poison mailbox command
  cxl/memdev: Add support for the Inject Poison mailbox command
  tools/testing/cxl: Mock support for Get Poison List
  cxl/trace: Add an HPA to cxl_poison trace events
  cxl/region: Provide region info to the cxl_poison trace event
  cxl/memdev: Add trigger_poison_list sysfs attribute
  cxl/trace: Add TRACE support for CXL media-error records
  cxl/mbox: Add GET_POISON_LIST mailbox command
  cxl/mbox: Initialize the poison state
  cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all
  ...
2023-04-30 11:51:51 -07:00
Lukas Wunner 74e491e5d1 PCI/DOE: Make mailbox creation API private
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.  CXL (the only in-tree DOE
user so far) has been migrated to use those mailboxes instead of
creating its own.

That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any
callers, so drop them.

pci_doe_supports_prot() is now only used internally, so declare it
static.

pci_doe_destroy_mb() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.

Because pci_doe_create_mb() is only called on device enumeration, i.e.
before driver binding, the workqueue name never contains a driver name.
So replace dev_driver_string() with dev_bus_name() when generating the
workqueue name.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/64f614b6584982986c55d2c6229b4ee2b276dd59.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-18 10:36:58 -07:00
Mika Westerberg 09cc900632 PCI: Introduce pci_dev_for_each_resource()
Instead of open-coding it everywhere introduce a tiny helper that can be
used to iterate over each resource of a PCI device, and convert the most
obvious users into it.

While at it drop doubled empty line before pdev_sort_resources().

No functional changes intended.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230330162434.35055-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
2023-04-04 10:43:52 -05:00
Linus Torvalds 596ff4a09b cpumask: re-introduce constant-sized cpumask optimizations
Commit aa47a7c215 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.

The optimization was then later added back in a limited form by commit
6f9c07be9d ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.

Instead, just re-introduce the optimization, with some changes.

Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":

 - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.

   This is used for situations where we should use the exact size.

 - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
   fits in a single word and the bitmap operations thus end up able
   to trigger the "small_const_nbits()" optimizations.

   This is used for the operations that have optimized single-word
   cases that get inlined, notably the bit find and scanning functions.

 - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
   is an sufficiently small constant that makes simple "copy" and
   "clear" operations more efficient.

   This is arbitrarily set at four words or less.

As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like

        movl    nr_cpu_ids(%rip), %edx
        addq    $63, %rdx
        shrq    $3, %rdx
        andl    $-8, %edx
        callq   memset@PLT

on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.

In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single

	movq $0,cpumask

instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.

Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.

But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.

In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'.  Better remove them now than have somebody introduce use
of them later.

Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless.  Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-05 14:30:34 -08:00
Jacopo Mondi 837f92f070 media: subdev: Add for_each_active_route() macro
Add a for_each_active_route() macro to replace the repeated pattern
of iterating on the active routes of a routing table.

Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-01-22 09:35:57 +01:00
Linus Torvalds 08cdc21579 iommufd for 6.2
iommufd is the user API to control the IOMMU subsystem as it relates to
 managing IO page tables that point at user space memory.
 
 It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
 container) which is the VFIO specific interface for a similar idea.
 
 We see a broad need for extended features, some being highly IOMMU device
 specific:
  - Binding iommu_domain's to PASID/SSID
  - Userspace IO page tables, for ARM, x86 and S390
  - Kernel bypassed invalidation of user page tables
  - Re-use of the KVM page table in the IOMMU
  - Dirty page tracking in the IOMMU
  - Runtime Increase/Decrease of IOPTE size
  - PRI support with faults resolved in userspace
 
 Many of these HW features exist to support VM use cases - for instance the
 combination of PASID, PRI and Userspace IO Page Tables allows an
 implementation of DMA Shared Virtual Addressing (vSVA) within a
 guest. Dirty tracking enables VM live migration with SRIOV devices and
 PASID support allow creating "scalable IOV" devices, among other things.
 
 As these features are fundamental to a VM platform they need to be
 uniformly exposed to all the driver families that do DMA into VMs, which
 is currently VFIO and VDPA.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
 YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
 s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
 =jJkd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd implementation from Jason Gunthorpe:
 "iommufd is the user API to control the IOMMU subsystem as it relates
  to managing IO page tables that point at user space memory.

  It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
  container) which is the VFIO specific interface for a similar idea.

  We see a broad need for extended features, some being highly IOMMU
  device specific:
   - Binding iommu_domain's to PASID/SSID
   - Userspace IO page tables, for ARM, x86 and S390
   - Kernel bypassed invalidation of user page tables
   - Re-use of the KVM page table in the IOMMU
   - Dirty page tracking in the IOMMU
   - Runtime Increase/Decrease of IOPTE size
   - PRI support with faults resolved in userspace

  Many of these HW features exist to support VM use cases - for instance
  the combination of PASID, PRI and Userspace IO Page Tables allows an
  implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
  Dirty tracking enables VM live migration with SRIOV devices and PASID
  support allow creating "scalable IOV" devices, among other things.

  As these features are fundamental to a VM platform they need to be
  uniformly exposed to all the driver families that do DMA into VMs,
  which is currently VFIO and VDPA"

For more background, see the extended explanations in Jason's pull request:

  https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
  iommufd: Change the order of MSI setup
  iommufd: Improve a few unclear bits of code
  iommufd: Fix comment typos
  vfio: Move vfio group specific code into group.c
  vfio: Refactor dma APIs for emulated devices
  vfio: Wrap vfio group module init/clean code into helpers
  vfio: Refactor vfio_device open and close
  vfio: Make vfio_device_open() truly device specific
  vfio: Swap order of vfio_device_container_register() and open_device()
  vfio: Set device->group in helper function
  vfio: Create wrappers for group register/unregister
  vfio: Move the sanity check of the group to vfio_create_group()
  vfio: Simplify vfio_create_group()
  iommufd: Allow iommufd to supply /dev/vfio/vfio
  vfio: Make vfio_container optionally compiled
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  ...
2022-12-14 09:15:43 -08:00
Linus Torvalds 98d0052d0d printk changes for 6.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmORzikACgkQUqAMR0iA
 lPKF/g/7Bmcao3rJkZjEagsYY+s7rGhaFaSbML8FDdyE3UzeXLJOnNxBLrD0JIe9
 XFW7+DMqr2uRxsab5C7APy0mrIWp/zCGyJ8CmBILnrPDNcAQ27OhFzxv6WlMUmEc
 xEjGHrk5dFV96s63gyHGLkKGOZMd/cfcpy/QDOyg0vfF8EZCiPywWMbQQ2Ij8E50
 N6UL70ExkoLjT9tzb8NXQiaDqHxqNRvd15aIomDjRrce7eeaL4TaZIT7fKnEcULz
 0Lmdo8RUknonCI7Y00RWdVXMqqPD2JsKz3+fh0vBnXEN+aItwyxis/YajtN+m6l7
 jhPGt7hNhCKG17auK0/6XVJ3717QwjI3+xLXCvayA8jyewMK14PgzX70hCws0eXM
 +5M+IeXI4ze5qsq+ln9Dt8zfC+5HGmwXODUtaYTBWhB4nVWdL/CZ+nTv349zt+Uc
 VIi/QcPQ4vq6EfsxUZR2r6Y12+sSH40iLIROUfqSchtujbLo7qxSNF5x7x9+rtff
 nWuXo5OsjGE7TZDwn3kr0zSuJ+w/pkWMYQ7jch+A2WqUMYyGC86sL3At7ocL+Esq
 34uvzwEgWnNySV8cLiMh34kBmgBwhAP34RhV0RS9iCv8kev2DV7pLQTs9V3QAjw9
 EZnFDHATUdikgugaFKCeDV86R3wFgnRWWOdlRrRi6aAzFDqNcYk=
 =1PTZ
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of
   this_cpu_inc() on strong load-store architectures.

 - Introduce new console_list_lock to synchronize a manipulation of the
   list of registered consoles and their flags.

   This is a first step in removing the big-kernel-lock-like behavior of
   console_lock(). This semaphore still serializes console->write()
   calbacks against:

      - each other. It primary prevents potential races between early
        and proper console drivers using the same device.

      - suspend()/resume() callbacks and init() operations in some
        drivers.

      - various other operations in the tty/vt and framebufer
        susbsystems. It is likely that console_lock() serializes even
        operations that are not directly conflicting with the
        console->write() callbacks here. This is the most complicated
        big-kernel-lock aspect of the console_lock() that will be hard
        to untangle.

 - Introduce new console_srcu lock that is used to safely iterate and
   access the registered console drivers under SRCU read lock.

   This is a prerequisite for introducing atomic console drivers and
   console kthreads. It will reduce the complexity of serialization
   against normal consoles and console_lock(). Also it should remove the
   risk of deadlock during critical situations, like Oops or panic, when
   only atomic consoles are registered.

 - Check whether the console is registered instead of enabled on many
   locations. It was a historical leftover.

 - Cleanly force a preferred console in xenfb code instead of a dirty
   hack.

 - A lot of code and comment clean ups and improvements.

* tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits)
  printk: htmldocs: add missing description
  tty: serial: sh-sci: use setup() callback for early console
  printk: relieve console_lock of list synchronization duties
  tty: serial: kgdboc: use console_list_lock to trap exit
  tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()
  tty: serial: kgdboc: use console_list_lock for list traversal
  tty: serial: kgdboc: use srcu console list iterator
  proc: consoles: use console_list_lock for list iteration
  tty: tty_io: use console_list_lock for list synchronization
  printk, xen: fbfront: create/use safe function for forcing preferred
  netconsole: avoid CON_ENABLED misuse to track registration
  usb: early: xhci-dbc: use console_is_registered()
  tty: serial: xilinx_uartps: use console_is_registered()
  tty: serial: samsung_tty: use console_is_registered()
  tty: serial: pic32_uart: use console_is_registered()
  tty: serial: earlycon: use console_is_registered()
  tty: hvc: use console_is_registered()
  efi: earlycon: use console_is_registered()
  tty: nfcon: use console_is_registered()
  serial_core: replace uart_console_enabled() with uart_console_registered()
  ...
2022-12-12 09:01:36 -08:00
John Ogness 6c4afa7914 printk: Prepare for SRCU console list protection
Provide an NMI-safe SRCU protected variant to walk the console list.

Note that all console fields are now set before adding the console
to the list to avoid the console becoming visible by SCRU readers
before being fully initialized.

This is a preparatory change for a new console infrastructure which
operates independent of the console BKL.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20221116162152.193147-4-john.ogness@linutronix.de
2022-12-02 11:24:59 +01:00
Florian Westphal c25b7a7a56 inet: ping: use hlist_nulls rcu iterator during lookup
ping_lookup() does not acquire the table spinlock, so iteration should
use hlist_nulls_for_each_entry_rcu().

Spotted during code review.

Fixes: dbca1596bb ("ping: convert to RCU lookups, get rid of rwlock")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221129140644.28525-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-01 12:42:46 +01:00
Jason Gunthorpe 51fe6141f0 iommufd: Data structure to provide IOVA to PFN mapping
This is the remainder of the IOAS data structure. Provide an object called
an io_pagetable that is composed of iopt_areas pointing at iopt_pages,
along with a list of iommu_domains that mirror the IOVA to PFN map.

At the top this is a simple interval tree of iopt_areas indicating the map
of IOVA to iopt_pages. An xarray keeps track of a list of domains. Based
on the attached domains there is a minimum alignment for areas (which may
be smaller than PAGE_SIZE), an interval tree of reserved IOVA that can't
be mapped and an IOVA of allowed IOVA that can always be mappable.

The concept of an 'access' refers to something like a VFIO mdev that is
accessing the IOVA and using a 'struct page *' for CPU based access.

Externally an API is provided that matches the requirements of the IOCTL
interface for map/unmap and domain attachment.

The API provides a 'copy' primitive to establish a new IOVA map in a
different IOAS from an existing mapping by re-using the iopt_pages. This
is the basic mechanism to provide single pinning.

This is designed to support a pre-registration flow where userspace would
setup an dummy IOAS with no domains, map in memory and then establish an
access to pin all PFNs into the xarray.

Copy can then be used to create new IOVA mappings in a different IOAS,
with iommu_domains attached. Upon copy the PFNs will be read out of the
xarray and mapped into the iommu_domains, avoiding any pin_user_pages()
overheads.

Link: https://lore.kernel.org/r/10-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30 20:16:49 -04:00
Jason Gunthorpe f394576eb1 iommufd: PFN handling for iopt_pages
The top of the data structure provides an IO Address Space (IOAS) that is
similar to a VFIO container. The IOAS allows map/unmap of memory into
ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables)
and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to
access the PFNs that those IOVA areas cover.

The IO Address Space (IOAS) datastructure is composed of:
 - struct io_pagetable holding the IOVA map
 - struct iopt_areas representing populated portions of IOVA
 - struct iopt_pages representing the storage of PFNs
 - struct iommu_domain representing each IO page table in the system IOMMU
 - struct iopt_pages_access representing in-kernel accesses of PFNs (ie
   VFIO mdevs)
 - struct xarray pinned_pfns holding a list of pages pinned by in-kernel
   accesses

This patch introduces the lowest part of the datastructure - the movement
of PFNs in a tiered storage scheme:
 1) iopt_pages::pinned_pfns xarray
 2) Multiple iommu_domains
 3) The origin of the PFNs, i.e. the userspace pointer

PFN have to be copied between all combinations of tiers, depending on the
configuration.

The interface is an iterator called a 'pfn_reader' which determines which
tier each PFN is stored and loads it into a list of PFNs held in a struct
pfn_batch.

Each step of the iterator will fill up the pfn_batch, then the caller can
use the pfn_batch to send the PFNs to the required destination. Repeating
this loop will read all the PFNs in an IOVA range.

The pfn_reader and pfn_batch also keep track of the pinned page accounting.

While PFNs are always stored and accessed as full PAGE_SIZE units the
iommu_domain tier can store with a sub-page offset/length to support
IOMMUs with a smaller IOPTE size than PAGE_SIZE.

Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-30 20:16:49 -04:00
Jason Gunthorpe 5fe937862c interval-tree: Add a utility to iterate over spans in an interval tree
The span iterator travels over the indexes of the interval_tree, not the
nodes, and classifies spans of indexes as either 'used' or 'hole'.

'used' spans are fully covered by nodes in the tree and 'hole' spans have
no node intersecting the span.

This is done greedily such that spans are maximally sized and every
iteration step switches between used/hole.

As an example a trivial allocator can be written as:

	for (interval_tree_span_iter_first(&span, itree, 0, ULONG_MAX);
	     !interval_tree_span_iter_done(&span);
	     interval_tree_span_iter_next(&span))
		if (span.is_hole &&
		    span.last_hole - span.start_hole >= allocation_size - 1)
			return span.start_hole;

With all the tricky boundary conditions handled by the library code.

The following iommufd patches have several algorithms for its overlapping
node interval trees that are significantly simplified with this kind of
iteration primitive. As it seems generally useful, put it into lib/.

Link: https://lore.kernel.org/r/3-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29 16:34:15 -04:00
Jonathan Cameron 9d24322e88 PCI/DOE: Add DOE mailbox support functions
Introduced in a PCIe r6.0, sec 6.30, DOE provides a config space based
mailbox with standard protocol discovery.  Each mailbox is accessed
through a DOE Extended Capability.

Each DOE mailbox must support the DOE discovery protocol in addition to
any number of additional protocols.

Define core PCIe functionality to manage a single PCIe DOE mailbox at a
defined config space offset.  Functionality includes iterating,
creating, query of supported protocol, and task submission.  Destruction
of the mailboxes is device managed.

Cc: "Li, Ming" <ming4.li@intel.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-developed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-4-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-19 15:38:04 -07:00
Brian Norris 781121a7f6 clang-format: Fix space after for_each macros
Set SpaceBeforeParens to ControlStatementsExceptForEachMacros to not add
space between a for_each macro and the following parenthesis.  This
option is available since clang-format-11 [1] and is in line with the
checkpatch.pl rules [2].

I found that this patch has also been sent by Brian Norris some weeks
ago [3].

Link: https://clang.llvm.org/docs/ClangFormatStyleOptions.html [1]
Link: https://lore.kernel.org/r/8b6b252b-47a6-9d52-f0bd-10d3bc4ad244@digikod.net [2]
Link: https://lore.kernel.org/lkml/YmHuZjmP9MxkgJ0R@google.com/ [3]
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506160106.522341-6-mic@digikod.net
[Adjusted authorship as agreed]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 19:27:16 +02:00
Mickaël Salaün d7f6604341 clang-format: Fix goto labels indentation
Thanks to IndentGotoLabels introduced with clang-format-10 [1], we can
avoid goto labels identation.  This follows the current coding style and
it is then in line with the checkpatch.pl rules [2].

Link: https://clang.llvm.org/docs/ClangFormatStyleOptions.html [1]
Link: https://lore.kernel.org/r/8b6b252b-47a6-9d52-f0bd-10d3bc4ad244@digikod.net [2]
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506160106.522341-4-mic@digikod.net
[Updated header comment to >= 10]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 19:22:55 +02:00
Mickaël Salaün 96232c7d4f clang-format: Update to clang-format >= 6
We get new interesting formating with clang-format greater or equal to 6
as stated in the removed comments.  Miguel Ojeda suggested to even move
the minimal clang-format version to 11, which is the minimum LLVM
supported at the moment [1].

Automatically updated with:
sed -i 's/^\(\s*\)#\(\S*\s\+\S*\) # Unknown to clang-format.*/\1\2/' .clang-format

Link: https://lore.kernel.org/r/CANiq72nLOfmEt-CZBmm2ouEB_x6Jm9ggDVFCVJxYxKw7O0LTzQ@mail.gmail.com [1]
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506160106.522341-3-mic@digikod.net
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 14:12:28 +02:00
Mickaël Salaün 49bb63a261 clang-format: Extend the for_each list with tools/
Add tools/ to the shell fragment generating the for_each list and update
it.  This is useful to format files in the tools directory (e.g.
selftests) with the same coding style as the kernel.

Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506160106.522341-2-mic@digikod.net
[Reworded and rebased on top of previous commits]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 14:10:54 +02:00
Miguel Ojeda 72e14aa9f8 clang-format: Simplify command with `sort -u`
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 14:06:51 +02:00
Miguel Ojeda 4312087919 clang-format: Use POSIX locale for `sort`
This avoids differences when different people run the command,
which is relevant for our use case, e.g.:

    $ LC_ALL=en_US.UTF-8 sort test
    ata_for_each_link
    __ata_qc_for_each
    ata_qc_for_each

    $ LC_ALL=C sort test
    __ata_qc_for_each
    ata_for_each_link
    ata_qc_for_each

Link: https://lore.kernel.org/lkml/CANiq72=7=ZpAObWRmposOmnyZ8XR_eNHCBtA3bu3fusmcPUwDA@mail.gmail.com/
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 14:02:06 +02:00
Miguel Ojeda 882178947b clang-format: Update with v5.18-rc7's `for_each` macro list
Re-run the shell fragment that generated the original list.

This brings it up to date, so that the next patches that tweak it
further are more clear on what they change.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2022-05-20 13:47:25 +02:00
Thomas Gleixner ef8dd01538 genirq/msi: Make interrupt allocation less convoluted
There is no real reason to do several loops over the MSI descriptors
instead of just doing one loop. In case of an error everything is undone
anyway so it does not matter whether it's a partial or a full rollback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20211206210749.010234767@linutronix.de
2021-12-16 22:22:20 +01:00
Miguel Ojeda 4792f9dd12 clang-format: Update with the latest for_each macro list
Re-run the shell fragment that generated the original list.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2021-05-12 23:32:39 +02:00
Linus Torvalds 825d150875 cxl for 5.12
Introduce an initial driver for CXL 2.0 Type-3 Memory Devices. CXL is
 Compute Express Link which released the 2.0 specification in November.
 The Linux relevant changes in CXL 2.0 are support for an OS to
 dynamically assign address space to memory devices, support for
 switches, persistent memory, and hotplug. A Type-3 Memory Device is a
 PCI enumerated device presenting the CXL Memory Device Class Code and
 implementing the CXL.mem protocol. CXL.mem allows device to advertise
 CPU and I/O coherent memory to the system, i.e. typical "System RAM" and
 "Persistent Memory" in Linux /proc/iomem terms.
 
 In addition to the CXL.mem fast path there is an administrative command
 hardware mailbox interface for maintenance and provisioning. It is this
 command interface that is the focus of the initial driver. With this
 driver a CXL device that is mapped by the BIOS can be administered by
 Linux. Linux support for CXL PMEM and dynamic CXL address space
 management are to be implemented post v5.12.
 
 4cdadfd5e0 cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
 Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 
 8adaf747c9 cxl/mem: Find device capabilities
 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
 
 b39cb1052a cxl/mem: Register CXL memX devices
 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
 
 13237183c7 cxl/mem: Add a "RAW" send command
 Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 
 472b1ce6e9 cxl/mem: Enable commands via CEL
 Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 
 57ee605b97 cxl/mem: Add set of informational commands
 Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAmA1xV0ACgkQHtKRamZ9
 iALEMQ/8Ce45LCh0oWh8FsSZ50i1KRwKGwpYNCiutTYLBArpBZXJdE1ZRFFCKgi9
 ahMs29KSsj/60vG/DYuOwZBKClUiqOQHmtCRUQbb5wGxb7q8f2AKQSPOJ+Nn0nJE
 kgstMnkqe/neAlNDeMRdZcoBku2++hWjVVnz8QqE5Py3v3T+uEU5Au3fIhnCyvk5
 usXcH8Y6R7Lb3BxT4z3DKumaRfoxIsQlH5XFbnUbgwlkE7KHoyAagZluJqh3cZpo
 sZrCpwG5Onw8rKqfLl//CZ8FfBjE2XfSqJkEPCCMfZUhI78sGGdmHL3ElM9/MNIB
 neGs3dQ5lkTaiw0nCqFtZMvDZEUsIgXPLiBByG22TM3/aIMmLqbJzeYG6UHENwC+
 hLZDV/WJNLRfeUVppt+6PgcOgjTUjNV45SdVryf10Kh3NPZh7m6OPeqG/QTKHMv9
 EgbFGihZF3NcSwvf5mdQNIMlnEL0WxOl/I+bSszYPXP6l38btegHR75gUXu7UGwl
 9LQhkVEQL8UmfRKX2HaG6h8hyTUOf1kQiXgvchLxYLKHXSc0J/wAwCa0w3jw1m5r
 bdcVQx3JcBv2S1tUHp+wMqHDbLSGpeE5nF3emWabttsjSUmlb1LuAQxgrdyBtQi9
 o5v6dDLOTmAH4sAt96HWKDzpUIMix3YmO3YSghYtNrwWUylQLuA=
 =3SAQ
 -----END PGP SIGNATURE-----

Merge tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull initial support for CXL (Compute Express Link) from Dan Williams:
 "Introduce an initial driver for CXL 2.0 Type-3 Memory Devices.

  CXL is Compute Express Link which released the 2.0 specification in
  November. The Linux relevant changes in CXL 2.0 are support for an OS
  to dynamically assign address space to memory devices, support for
  switches, persistent memory, and hotplug.

  A Type-3 Memory Device is a PCI enumerated device presenting the CXL
  Memory Device Class Code and implementing the CXL.mem protocol.
  CXL.mem allows device to advertise CPU and I/O coherent memory to the
  system, i.e. typical "System RAM" and "Persistent Memory" in Linux
  /proc/iomem terms.

  In addition to the CXL.mem fast path there is an administrative
  command hardware mailbox interface for maintenance and provisioning.
  It is this command interface that is the focus of the initial driver.
  With this driver a CXL device that is mapped by the BIOS can be
  administered by Linux.

  Linux support for CXL PMEM and dynamic CXL address space management
  are to be implemented post v5.12"

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  4cdadfd5e0 ("cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints")
  13237183c7 ("cxl/mem: Add a "RAW" send command")
  472b1ce6e9 ("cxl/mem: Enable commands via CEL")
  57ee605b97 ("cxl/mem: Add set of informational commands")

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
  8adaf747c9 ("cxl/mem: Find device capabilities")
  b39cb1052a ("cxl/mem: Register CXL memX devices")

* tag 'cxl-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  cxl/mem: Fix potential memory leak
  cxl/mem: Return -EFAULT if copy_to_user() fails
  MAINTAINERS: Add maintainers of the CXL driver
  cxl/mem: Add set of informational commands
  cxl/mem: Enable commands via CEL
  cxl/mem: Add a "RAW" send command
  cxl/mem: Add basic IOCTL interface
  cxl/mem: Register CXL memX devices
  cxl/mem: Find device capabilities
  cxl/mem: Introduce a driver for CXL-2.0-Type-3 endpoints
2021-02-24 09:38:36 -08:00
Ben Widawsky 583fa5e71c cxl/mem: Add basic IOCTL interface
Add a straightforward IOCTL that provides a mechanism for userspace to
query the supported memory device commands. CXL commands as they appear
to userspace are described as part of the UAPI kerneldoc. The command
list returned via this IOCTL will contain the full set of commands that
the driver supports, however, some of those commands may not be
available for use by userspace.

Memory device commands first appear in the CXL 2.0 specification. They
are submitted through a mailbox mechanism specified in the CXL 2.0
specification.

The send command allows userspace to issue mailbox commands directly to
the hardware. The list of available commands to send are the output of
the query command. The driver verifies basic properties of the command
and possibly inspect the input (or output) payload to determine whether
or not the command is allowed (or might taint the kernel).

Reported-by: kernel test robot <lkp@intel.com> # bug in earlier revision
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2)
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20210217040958.1354670-5-ben.widawsky@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-16 20:36:38 -08:00
Miguel Ojeda 1074f8ec28 clang-format: Update with the latest for_each macro list
Re-run the shell fragment that generated the original list.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2021-01-29 15:00:23 +01:00
Linus Torvalds a1e16bc7d5 RDMA 5.10 pull request
The typical set of driver updates across the subsystem:
 
  - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
    usnic, qib, qedr, cxgb4, hns, bnxt_re
 
  - Various rtrs fixes and updates
 
  - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
    wasn't working right
 
  - Use tracepoints instead of pr_debug in the CM code
 
  - Scrub the locking in ucma and cma to close more syzkaller bugs
 
  - Use tasklet_setup in the subsystem
 
  - Revert the idea that 'destroy' operations are not allowed to fail at
    the driver level. This proved unworkable from a HW perspective.
 
  - Revise how the umem API works so drivers make fewer mistakes using it
 
  - XRC support for qedr
 
  - Convert uverbs objects RWQ and MW to new the allocation scheme
 
  - Large queue entry sizes for hns
 
  - Use hmm_range_fault() for mlx5 On Demand Paging
 
  - uverbs APIs to inspect the GID table instead of sysfs
 
  - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
 mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
 oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
 YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
 p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
 ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
 fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
 =LKpl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Mike Rapoport cc6de16805 memblock: use separate iterators for memory and reserved regions
for_each_memblock() is used to iterate over memblock.memory in a few
places that use data from memblock_region rather than the memory ranges.

Introduce separate for_each_mem_region() and
for_each_reserved_mem_region() to improve encapsulation of memblock
internals from its users.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>			[x86]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>	[MIPS]
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00
Mike Rapoport 9f3d5eaa3c memblock: implement for_each_reserved_mem_region() using __next_mem_region()
Iteration over memblock.reserved with for_each_reserved_mem_region() used
__next_reserved_mem_region() that implemented a subset of
__next_mem_region().

Use __for_each_mem_range() and, essentially, __next_mem_region() with
appropriate parameters to reduce code duplication.

While on it, rename for_each_reserved_mem_region() to
for_each_reserved_mem_range() for consistency.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:35 -07:00