mirror of https://github.com/torvalds/linux.git
54 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
2b6a3f061f |
mm: declare VMA flags by bit
Patch series "initial work on making VMA flags a bitmap", v3.
We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.
This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.
This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.
This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.
This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.
This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.
Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.
While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.
We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit
|
|
|
|
ccf9eb326b |
tools/testing/vma: add missing stub
vm_flags_reset() is not available in the userland VMA tests, so add a stub which const-casts vma->vm_flags and avoids the upcoming removal of the vma->__vm_flags field. Link: https://lkml.kernel.org/r/4aff8bf7-d367-4ba3-90ad-13eef7a063fa@lucifer.local Fixes: c5c67c1de357 ("tools/testing/vma: eliminate dependency on vma->__vm_flags") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
277a1ae387 |
mm: softdirty: add pgtable_supports_soft_dirty()
Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15. This patchset adds support for Svrsw60t59b [1] extension which is ratified now, also add soft dirty and userfaultfd write protect tracking for RISC-V. The patches 1 and 2 add macros to allow architectures to define their own checks if the soft-dirty / uffd_wp PTE bits are available, in other words for RISC-V, the Svrsw60t59b extension is supported on which device the kernel is running. Also patch1-2 are removing "ifdef CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by the architecture, no change in behavior is expected. This patchset has been tested with kselftest mm suite in which soft-dirty, madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and no regressions are observed in any of the other tests. This patch (of 6): Some platforms can customize the PTE PMD entry soft-dirty bit making it unavailable even if the architecture provides the resource. Add an API which architectures can define their specific implementations to detect if soft-dirty bit is available on which device the kernel is running. This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of pgtable_supports_soft_dirty() checks that defaults to IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture, no change in behavior is expected. We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(), so we will never run into VM_SOFTDIRTY checks. [lorenzo.stoakes@oracle.com: fix VMA selftests] Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1] Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Acked-by: David Hildenbrand <david@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor@kernel.org> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
6707915e03 |
mm: propagate VM_SOFTDIRTY on merge
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2. Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. Now we have the concept of making VMA flags 'sticky', that is that they both don't prevent merge and, importantly, are propagated to merged VMAs, this seems a sensible alternative to the existing special-casing of VM_SOFTDIRTY. We additionally add a self-test that demonstrates that this logic behaves as expected. This patch (of 2): Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. This is because currently we simply ignore VM_SOFTDIRTY for the purposes of merge, so one VMA may possess the flag and another not, and whichever happens to be the target VMA will be the one upon which the merge is performed which may or may not have VM_SOFTDIRTY set. Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one which solves this issue. Additionally update VMA userland tests to propagate changes. [akpm@linux-foundation.org: update comments, per Lorenzo] Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local Link: https://lkml.kernel.org/r/cover.1763399675.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Andrey Vagin <avagin@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
2197bb60f8 |
mm: add vma_start_write_killable()
Patch series "vma_start_write_killable"", v2. When we added the VMA lock, we made a major oversight in not adding a killable variant. That can run us into trouble where a thread takes the VMA lock for read (eg handling a page fault) and then goes out to lunch for an hour (eg doing reclaim). Another thread tries to modify the VMA, taking the mmap_lock for write, then attempts to lock the VMA for write. That blocks on the first thread, and ensures that every other page fault now tries to take the mmap_lock for read. Because everything's in an uninterruptible sleep, we can't kill the task, which makes me angry. This patchset just adds vma_start_write_killable() and converts one caller to use it. Most users are somewhat tricky to convert, so expect follow-up individual patches per call-site which need careful analysis to make sure we've done proper cleanup. This patch (of 2): The vma can be held read-locked for a substantial period of time, eg if memory allocation needs to go into reclaim. It's useful to be able to send fatal signals to threads which are waiting for the write lock. Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Chris Li <chriscli@google.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
ab04b530e7 |
mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
Gather all the VMA flags whose presence implies that page tables must be copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this rather than specifying individual flags in vma_needs_copy(). We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies that there may be metadata contained in the page tables (that is - guard markers) which would will not and cannot be propagated upon fork. This was already being done manually previously in vma_needs_copy(), but this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and VM_UFFD_WP all of which imply the same. Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too - because equally a flag being VM_STICKY indicates that the VMA contains metadat that is not propagated by being faulted in - i.e. that the VMA metadata does not fully describe the VMA alone, and thus we must propagate whatever metadata there is on a fork. However, for maximum flexibility, we do not make this necessarily the case here. Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
64212ba02e |
mm: implement sticky VMA flags
It is useful to be able to designate that certain flags are 'sticky', that is, if two VMAs are merged one with a flag of this nature and one without, the merged VMA sets this flag. As a result we ignore these flags for the purposes of determining VMA flag differences between VMAs being considered for merge. This patch therefore updates the VMA merge logic to perform this action, with flags possessing this property being described in the VM_STICKY bitmap. Those flags which ought to be ignored for the purposes of VMA merge are described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also updated to use. As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it already had this behaviour, alongside VM_STICKY as sticky flags by implication must not disallow merge. Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its own right, but this change is out of scope for this series. The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result of this change, once the VMA flag is set upon guard region installation, VMAs with guard ranges will now not have their merge behaviour impacted as a result and can be freely merged with other VMAs without VM_MAYBE_GUARD set. Also update the comments for vma_modify_flags() to directly reference sticky flags now we have established the concept. We also update the VMA userland tests to account for the changes. Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
5dba5cc2e0 |
mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4. Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Additionally, we add the ability for allow-listed VMA flags to be atomically writable with only mmap/VMA read locks held. The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure does not cause any races by being allowed to do so. This allows us to maintain guard region installation as a read-locked operation and not endure the overhead of obtaining a write lock here. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. This patch (of 9): Currently, if a user needs to determine if guard regions are present in a range, they have to scan all VMAs (or have knowledge of which ones might have guard regions). Since commit |
|
|
|
340b59816b |
mm: kill mm_wr_locked from unmap_vmas() and unmap_single_vma()
Kill mm_wr_locked since commit
|
|
|
|
ac0a3fc9c0 |
mm: add ability to take further action in vm_area_desc
Some drivers/filesystems need to perform additional tasks after the VMA is set up. This is typically in the form of pre-population. The forms of pre-population most likely to be performed are a PFN remap or the insertion of normal folios and PFNs into a mixed map. We start by implementing the PFN remap functionality, ensuring that we perform the appropriate actions at the appropriate time - that is setting flags at the point of .mmap_prepare, and performing the actual remap at the point at which the VMA is fully established. This prevents the driver from doing anything too crazy with a VMA at any stage, and we retain complete control over how the mm functionality is applied. Unfortunately callers still do often require some kind of custom action, so we add an optional success/error _hook to allow the caller to do something after the action has succeeded or failed. This is done at the point when the VMA has already been established, so the harm that can be done is limited. The error hook can be used to filter errors if necessary. There may be cases in which the caller absolutely must hold the file rmap lock until the operation is entirely complete. It is an edge case, but certainly the hugetlbfs mmap hook requires it. To accommodate this, we add the hide_from_rmap_until_complete flag to the mmap_action type. In this case, if a new VMA is allocated, we will hold the file rmap lock until the operation is entirely completed (including any success/error hooks). Note that we do not need to update __compat_vma_mmap() to accommodate this flag, as this function will be invoked from an .mmap handler whose VMA is not yet visible, so we implicitly hide it from the rmap. If any error arises on these final actions, we simply unmap the VMA altogether. Also update the stacked filesystem compatibility layer to utilise the action behaviour, and update the VMA tests accordingly. While we're here, rename __compat_vma_mmap_prepare() to __compat_vma_mmap() as we are now performing actions invoked by the mmap_prepare in addition to just the mmap_prepare hook. Link: https://lkml.kernel.org/r/2601199a7b2eaeadfcd8ab6e199c6d1706650c94.1760959442.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Chatre, Reinette <reinette.chatre@intel.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Martin <dave.martin@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Morse <james.morse@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
8804d970fa |
Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation. - The 4 patch series "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs. - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters. - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps. - The 2 patch series "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code. - The 11 patch series "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code. - The 5 patch series "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero. - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature. - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs. - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code. - The 7 patch series "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code. - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations. - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc. - The 3 patch series "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path. - The 5 patch series "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code. - The 2 patch series "test that rmap behaves as expected" from Wei Yang adds some rmap selftests. - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers. - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues. - The 3 patch series "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks. - The 2 patch series "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code. - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem. - The 4 patch series "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/. - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c. - The 2 patch series "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation. - The 3 patch series "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc). - The 2 patch series "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code. - The 37 patch series "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function. - The 2 patch series "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only. - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code. - The 12 patch series "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy. - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages(). - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver. - The 2 patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code. - The 14 patch series "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations. - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little. - The 3 patch series "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code. - The 2 patch series "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature. - The 3 patch series "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work. - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem. - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code. - The 10 patch series "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources. - The 5 patch series "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON. - The 7 patch series "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix. - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information. - The 2 patch series "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma. - The 2 patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems. - The 6 patch series "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate. - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters. - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA= =4mhY -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ... |
|
|
|
c4fb7f0a79 |
tools/testing: Add support for changes to slab for sheaves
The slab changes for sheaves requires more effort in the testing code. Unite all the kmem_cache work into the tools/include slab header for both the vma and maple tree testing. The vma test code also requires importing more #defines to allow for seamless use of the shared kmem_cache code. This adds the pthread header to the slab header in the tools directory to allow for the pthread_mutex in linux.c. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
|
|
|
d09a61a3aa |
tools/testing/vma: Implement vm_refcnt reset
Add the reset of the ref count in vma_lock_init(). This is needed if the vma memory is not zeroed on allocation. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
|
|
|
da577f1fcb |
tools/testing/vma: clean up stubs in vma_internal.h
We do not need to references arguments just to avoid compiler warnings, the warning in question does not arise here, so remove all of the instances of '(void)xxx' introduced purely to avoid this warning. As reported by WagYuli in the referenced mail, GCC 8.3 and before will have issues compiling this file if parameter names are not provided, so ensure these are always provided. Finally, perform a trivial fix up of kmem_cache_alloc() which technically has parameters in the incorrect order (as reported by Vlastimil Babka off-list). Link: https://lkml.kernel.org/r/20250826102824.22730-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: WangYuli <wangyuli@uniontech.com> Closes: https://lore.kernel.org/linux-mm/EFCEBE7E301589DE+20250729084700.208767-1-wangyuli@uniontech.com/ Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: WangYuli <wangyuli@uniontech.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> |
|
|
|
f7a741c53b |
mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()
In commit
|
|
|
|
af6703838e |
mm: specify separate file and vm_file params in vm_area_desc
Patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()", v2. As part of the efforts to eliminate the problematic f_op->mmap callback, a new callback - f_op->mmap_prepare was provided. While we are converting these callbacks, we must deal with 'stacked' filesystems and drivers - those which in their own f_op->mmap callback invoke an inner f_op->mmap callback. To accomodate for this, a compatibility layer is provided that, via vfs_mmap(), detects if f_op->mmap_prepare is provided and if so, generates a vm_area_desc containing the VMA's metadata and invokes the call. So far, we have provided desc->file equal to vma->vm_file. However this is not necessarily valid, especially in the case of stacked drivers which wish to assign a new file after the inner hook is invoked. To account for this, we adjust vm_area_desc to have both file and vm_file fields. The .vm_file field is strictly set to vma->vm_file (or in the case of a new mapping, what will become vma->vm_file). However, .file is set to whichever file vfs_mmap() is invoked with when using the compatibilty layer. Therefore, if the VMA's file needs to be updated in .mmap_prepare, desc->vm_file should be assigned, whilst desc->file should be read. No current f_op->mmap_prepare users assign desc->file so this is safe to do. This makes the .mmap_prepare callback in the context of a stacked filesystem or driver completely consistent with the existing .mmap implementations. While we're here, we do a few small cleanups, and ensure that we const-ify things correctly in the vm_area_desc struct to avoid hooks accidentally trying to assign fields they should not. This patch (of 2): Stacked filesystems and drivers may invoke mmap hooks with a struct file pointer that differs from the overlying file. We will make this functionality possible in a subsequent patch. In order to prepare for this, let's update vm_area_struct to separately provide desc->file and desc->vm_file parameters. The desc->file parameter is the file that the hook is expected to operate upon, and is not assignable (though the hok may wish to e.g. update the file's accessed time for instance). The desc->vm_file defaults to what will become vma->vm_file and is what the hook must reassign should it wish to change the VMA"s vma->vm_file. For now we keep desc->file, vm_file the same to remain consistent. No f_op->mmap_prepare() callback sets a new vma->vm_file currently, so this is safe to change. While we're here, make the mm_struct desc->mm pointers at immutable as well as the desc->mm field itself. As part of this change, also update the single hook which this would otherwise break - mlock_future_ok(), invoked by secretmem_mmap_prepare()). We additionally update set_vma_from_desc() to compare fields in a more logical fashion, checking the (possibly) user-modified fields as the first operand against the existing value as the second one. Additionally, update VMA tests to accommodate changes. Link: https://lkml.kernel.org/r/cover.1756920635.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3fa15a861bb7419f033d22970598aa61850ea267.1756920635.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
c66ae64401 |
tools: testing: use existing atomic.h for vma/maple tests
The shared userspace logic used for unit-testing maple tree and VMA code currently has its own replacements for atomics helpers. This is not needed as the necessary APIs already have userspace implementations in the tools tree. Switching over to that allows deleting a bit of code. Note that the implementation is different; while the version being deleted here is implemented using liburcu, the existing version in tools uses either x86 asm or compiler builtins. It's assumed that both are equally likely to be correct. The tools tree's version of atomic_t is a struct type while the version being deleted was just a typedef of an integer. This means it's no longer valid to call __sync_bool_compare_and_swap() directly on it. One option would be to just peek into the struct and call it on the field, but it seems a little cleaner to just use the corresponding atomic.h API whic has been added recently. Now the fake mapping_map_writable() is copied from the real one. Link: https://lkml.kernel.org/r/20250828-b4-vma-no-atomic-h-v2-4-02d146a58ed2@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
8166353fb8 |
mm: replace mm->flags with bitmap entirely and set to 64 bits
Now we have updated all users of mm->flags to use the bitmap accessors, repalce it with the bitmap version entirely. We are then able to move to having 64 bits of mm->flags on both 32-bit and 64-bit architectures. We also update the VMA userland tests to ensure that everything remains functional there. No functional changes intended, other than there now being 64 bits of available mm_struct flags. Link: https://lkml.kernel.org/r/e1f6654e016d36c43959764b01355736c5cbcdf8.1755012943.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Ben Segall <bsegall@google.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dev Jain <dev.jain@arm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Mariano Pache <npache@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Namhyung kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
4c89792ea0 |
mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page()
... and hide it behind a kconfig option. There is really no need for any
!xen code to perform this check.
The naming is a bit off: we want to find the "normal" page when a PTE was
marked "special". So it's really not "finding a special" page.
Improve the documentation, and add a comment in the code where XEN ends up
performing the pte_mkspecial() through a hypercall. More details can be
found in commit
|
|
|
|
da23ea194d |
Significant patch series in this pull request:
- The 4 patch series "mseal cleanups" from Lorenzo Stoakes erforms some
mseal cleaning with no intended functional change.
- The 3 patch series "Optimizations for khugepaged" from David
Hildenbrand improves khugepaged throughput by batching PTE operations
for large folios. This gain is mainly for arm64.
- The 8 patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and
kprobes" from Mike Rapoport provides a bugfix, additional debug code and
cleanups to the execmem code.
- The 7 patch series "mm/shmem, swap: bugfix and improvement of mTHP
swap in" from Kairui Song provides bugfixes, cleanups and performance
improvememnts to the mTHP swapin code.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaI+6HQAKCRDdBJ7gKXxA
jv7lAQCAKE5dUhdZ0pOYbhBKTlDapQh2KqHrlV3QFcxXgknEoQD/c3gG01rY3fLh
Cnf5l9+cdyfKxFniO48sUPx6IpriRg8=
=HT5/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
|
|
|
|
f225b34f1e |
mm/mseal: always define VM_SEALED
Patch series "mseal cleanups", v4. Perform a number of cleanups to the mseal logic. Firstly, VM_SEALED is treated differently from every other VMA flag, it really doesn't make sense to do this, so we start by making this consistent with everything else. Next we place the madvise logic where it belongs - in mm/madvise.c. It really makes no sense to abstract this elsewhere. In doing so, we go to great lengths to explain very clearly the previously very confusing logic as to what sealed mappings are impacted here. In doing so, we retain existing logic regarding treatment of madvise() discard operations for a sealed, read-only MAP_PRIVATE file-backed mapping. This is something we likely need to revisit. We then abstract out and explain the 'are there are any gaps in this range in the mm?' check being performed as a prerequisite to mseal being performed. Finally, we simplify the actual mseal logic which is really quite straightforward. No functional change is intended. This patch (of 4): There is no reason to treat VM_SEALED in a special way, in each other case in which a VMA flag is unavailable due to configuration, we simply assign that flag to VM_NONE, so make VM_SEALED consistent with all other VMA flags in this respect. Additionally, use the next available bit for VM_SEALED, 42, rather than arbitrarily putting it at 63 and update the declaration to match all other VMA flags. No functional change intended. Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kees Cook <kees@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
beace86e61 |
Summary of significant series in this pull request:
- The 4 patch series "mm: ksm: prevent KSM from breaking merging of new
VMAs" from Lorenzo Stoakes addresses an issue with KSM's
PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for
merging with existing adjacent VMAs.
- The 4 patch series "mm/damon: introduce DAMON_STAT for simple and
practical access monitoring" from SeongJae Park adds a new kernel module
which simplifies the setup and usage of DAMON in production
environments.
- The 6 patch series "stop passing a writeback_control to swap/shmem
writeout" from Christoph Hellwig is a cleanup to the writeback code
which removes a couple of pointers from struct writeback_control.
- The 7 patch series "drivers/base/node.c: optimization and cleanups"
from Donet Tom contains largely uncorrelated cleanups to the NUMA node
setup and management code.
- The 4 patch series "mm: userfaultfd: assorted fixes and cleanups" from
Tal Zussman does some maintenance work on the userfaultfd code.
- The 5 patch series "Readahead tweaks for larger folios" from Ryan
Roberts implements some tuneups for pagecache readahead when it is
reading into order>0 folios.
- The 4 patch series "selftests/mm: Tweaks to the cow test" from Mark
Brown provides some cleanups and consistency improvements to the
selftests code.
- The 4 patch series "Optimize mremap() for large folios" from Dev Jain
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
- The 5 patch series "Remove zero_user()" from Matthew Wilcox expunges
zero_user() in favor of the more modern memzero_page().
- The 3 patch series "mm/huge_memory: vmf_insert_folio_*() and
vmf_insert_pfn_pud() fixes" from David Hildenbrand addresses some warts
which David noticed in the huge page code. These were not known to be
causing any issues at this time.
- The 3 patch series "mm/damon: use alloc_migrate_target() for
DAMOS_MIGRATE_{HOT,COLD" from SeongJae Park provides some cleanup and
consolidation work in DAMON.
- The 3 patch series "use vm_flags_t consistently" from Lorenzo Stoakes
uses vm_flags_t in places where we were inappropriately using other
types.
- The 3 patch series "mm/memfd: Reserve hugetlb folios before
allocation" from Vivek Kasireddy increases the reliability of large page
allocation in the memfd code.
- The 14 patch series "mm: Remove pXX_devmap page table bit and pfn_t
type" from Alistair Popple removes several now-unneeded PFN_* flags.
- The 5 patch series "mm/damon: decouple sysfs from core" from SeongJae
Park implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
- The 5 patch series "madvise cleanup" from Lorenzo Stoakes does quite a
lot of cleanup/maintenance work in the madvise() code.
- The 4 patch series "madvise anon_name cleanups" from Vlastimil Babka
provides additional cleanups on top or Lorenzo's effort.
- The 11 patch series "Implement numa node notifier" from Oscar Salvador
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory on/offline
notifier.
- The 6 patch series "Make MIGRATE_ISOLATE a standalone bit" from Zi Yan
cleans up the pageblock isolation code and fixes a potential issue which
doesn't seem to cause any problems in practice.
- The 5 patch series "selftests/damon: add python and drgn based DAMON
sysfs functionality tests" from SeongJae Park adds additional drgn- and
python-based DAMON selftests which are more comprehensive than the
existing selftest suite.
- The 5 patch series "Misc rework on hugetlb faulting path" from Oscar
Salvador fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
- The 3 patch series "cma: factor out allocation logic from
__cma_declare_contiguous_nid" from Mike Rapoport rationalizes and cleans
up the highmem-specific code in the CMA allocator.
- The 28 patch series "mm/migration: rework movable_ops page migration
(part 1)" from David Hildenbrand provides cleanups and
future-preparedness to the migration code.
- The 2 patch series "mm/damon: add trace events for auto-tuned
monitoring intervals and DAMOS quota" from SeongJae Park adds some
tracepoints to some DAMON auto-tuning code.
- The 6 patch series "mm/damon: fix misc bugs in DAMON modules" from
SeongJae Park does that.
- The 6 patch series "mm/damon: misc cleanups" from SeongJae Park also
does what it claims.
- The 4 patch series "mm: folio_pte_batch() improvements" from David
Hildenbrand cleans up the large folio PTE batching code.
- The 13 patch series "mm/damon/vaddr: Allow interleaving in
migrate_{hot,cold} actions" from SeongJae Park facilitates dynamic
alteration of DAMON's inter-node allocation policy.
- The 3 patch series "Remove unmap_and_put_page()" from Vishal Moola
provides a couple of page->folio conversions.
- The 4 patch series "mm: per-node proactive reclaim" from Davidlohr
Bueso implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
- The 14 patch series "mm/damon: remove damon_callback" from SeongJae
Park replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
- The 10 patch series "mm/mremap: permit mremap() move of multiple VMAs"
from Lorenzo Stoakes implements a number of mremap cleanups (of course)
in preparation for adding new mremap() functionality: newly permit the
remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It
still excludes some specialized situations where this cannot be
performed reliably.
- The 3 patch series "drop hugetlb_free_pgd_range()" from Anthony Yznaga
switches some sparc hugetlb code over to the generic version and removes
the thus-unneeded hugetlb_free_pgd_range().
- The 4 patch series "mm/damon/sysfs: support periodic and automated
stats update" from SeongJae Park augments the present
userspace-requested update of DAMON sysfs monitoring files. Automatic
update is now provided, along with a tunable to control the update
interval.
- The 4 patch series "Some randome fixes and cleanups to swapfile" from
Kemeng Shi does what is claims.
- The 4 patch series "mm: introduce snapshot_page" from Luiz Capitulino
and David Hildenbrand provides (and uses) a means by which debug-style
functions can grab a copy of a pageframe and inspect it locklessly
without tripping over the races inherent in operating on the live
pageframe directly.
- The 6 patch series "use per-vma locks for /proc/pid/maps reads" from
Suren Baghdasaryan addresses the large contention issues which can be
triggered by reads from that procfs file. Latencies are reduced by more
than half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
- The 6 patch series "__folio_split() clean up" from Zi Yan cleans up
__folio_split()!
- The 7 patch series "Optimize mprotect() for large folios" from Dev
Jain provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
- The 2 patch series "selftests/mm: reuse FORCE_READ to replace "asm
volatile("" : "+r" (XXX));" and some cleanup" from wang lian does some
cleanup work in the selftests code.
- The 3 patch series "tools/testing: expand mremap testing" from Lorenzo
Stoakes extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
- The 22 patch series "selftests/damon/sysfs.py: test all parameters"
from SeongJae Park extends the DAMON sysfs interface selftest so that it
tests all possible user-requested parameters. Rather than the present
minimal subset.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaIqcCgAKCRDdBJ7gKXxA
jkVBAQCCn9DR1QP0CRk961ot0cKzOgioSc0aA03DPb2KXRt2kQEAzDAz0ARurFhL
8BzbvI0c+4tntHLXvIlrC33n9KWAOQM=
=XsFy
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
|
|
|
|
d75fa3c947 |
mm: update architecture and driver code to use vm_flags_t
In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
bfbe71109f |
mm: update core kernel code to use vm_flags_t consistently
The core kernel code is currently very inconsistent in its use of vm_flags_t vs. unsigned long. This prevents us from changing the type of vm_flags_t in the future and is simply not correct, so correct this. While this results in rather a lot of churn, it is a critical pre-requisite for a future planned change to VMA flag type. Additionally, update VMA userland tests to account for the changes. To make review easier and to break things into smaller parts, driver and architecture-specific changes is left for a subsequent commit. The code has been adjusted to cascade the changes across all calling code as far as is needed. We will adjust architecture-specific and driver code in a subsequent patch. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/d1588e7bb96d1ea3fe7b9df2c699d5b4592d901d.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Kees Cook <kees@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
78ddaa358e |
mm: change vm_get_page_prot() to accept vm_flags_t argument
Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
cf7e7a3503 |
mm: prevent KSM from breaking VMA merging for new VMAs
If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.
This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this
flag for all existing VMAs and propagating this across fork/exec.
However it also breaks VMA merging for new VMAs, both in the process and
all forked (and fork/exec'd) child processes.
This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set. However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.
To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge. In the case of brk() this can always be done.
However on mmap() things are more complicated - while KSM is not supported
for MAP_SHARED file-backed mappings, it is supported for MAP_PRIVATE
file-backed mappings.
These mappings may have deprecated .mmap() callbacks specified which
could, in theory, adjust flags and thus KSM eligibility.
So we check to determine whether this is possible. If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.
This fixes VMA merging for all new anonymous mappings, which covers the
majority of real-world cases, so we should see a significant improvement
in VMA mergeability.
For MAP_PRIVATE file-backed mappings, those which implement the
.mmap_prepare() hook and shmem are both known to be safe, so we allow
these, disallowing all other cases.
Also add stubs for newly introduced function invocations to VMA userland
testing.
[lorenzo.stoakes@oracle.com: correctly invoke late KSM check after mmap hook]
Link: https://lkml.kernel.org/r/5861f8f6-cf5a-4d82-a062-139fb3f9cddb@lucifer.local
Link: https://lkml.kernel.org/r/3ba660af716d87a18ca5b4e635f2101edeb56340.1748537921.git.lorenzo.stoakes@oracle.com
Fixes:
|
|
|
|
b013ed4031
|
fs: consistently use can_mmap_file() helper
Since commit |
|
|
|
20ca475d98
|
mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
The call_mmap() function violates the existing convention in include/linux/fs.h whereby invocations of virtual file system hooks is performed by functions prefixed with vfs_xxx(). Correct this by renaming call_mmap() to vfs_mmap(). This also avoids confusion as to the fact that f_op->mmap_prepare may be invoked here. Also rename __call_mmap_prepare() function to vfs_mmap_prepare() and adjust to accept a file parameter, this is useful later for nested file systems. Finally, fix up the VMA userland tests and ensure the mmap_prepare -> mmap shim is implemented there. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/8d389f4994fa736aa8f9172bef8533c10a9e9011.1750099179.git.lorenzo.stoakes@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
bb666b7c27 |
mm: add mmap_prepare() compatibility layer for nested file systems
Nested file systems, that is those which invoke call_mmap() within their own f_op->mmap() handlers, may encounter underlying file systems which provide the f_op->mmap_prepare() hook introduced by commit |
|
|
|
081056dc00 |
mm/hugetlb: unshare page tables during VMA split, not before
Currently, __split_vma() triggers hugetlb page table unsharing through
vm_ops->may_split(). This happens before the VMA lock and rmap locks are
taken - which is too early, it allows racing VMA-locked page faults in our
process and racing rmap walks from other processes to cause page tables to
be shared again before we actually perform the split.
Fix it by explicitly calling into the hugetlb unshare logic from
__split_vma() in the same place where THP splitting also happens. At that
point, both the VMA and the rmap(s) are write-locked.
An annoying detail is that we can now call into the helper
hugetlb_unshare_pmds() from two different locking contexts:
1. from hugetlb_split(), holding:
- mmap lock (exclusively)
- VMA lock
- file rmap lock (exclusively)
2. hugetlb_unshare_all_pmds(), which I think is designed to be able to
call us with only the mmap lock held (in shared mode), but currently
only runs while holding mmap lock (exclusively) and VMA lock
Backporting note:
This commit fixes a racy protection that was introduced in commit
|
|
|
|
918850c136 |
tools/testing/vma: add missing function stub
The hugetlb fix introduced in commit |
|
|
|
c84bf6dd2b |
mm: introduce new .mmap_prepare() file callback
Patch series "eliminate mmap() retry merge, add .mmap_prepare hook", v2. During the mmap() of a file-backed mapping, we invoke the underlying driver file's mmap() callback in order to perform driver/file system initialisation of the underlying VMA. This has been a source of issues in the past, including a significant security concern relating to unwinding of error state discovered by Jann Horn, as fixed in commit |
|
|
|
3e43e260f1 |
mm: perform VMA allocation, freeing, duplication in mm
Right now these are performed in kernel/fork.c which is odd and a violation of separation of concerns, as well as preventing us from integrating this and related logic into userland VMA testing going forward. There is a fly in the ointment - nommu - mmap.c is not compiled if CONFIG_MMU not set, and neither is vma.c. To square the circle, let's add a new file - vma_init.c. This will be compiled for both CONFIG_MMU and nommu builds, and will also form part of the VMA userland testing. This allows us to de-duplicate code, while maintaining separation of concerns and the ability for us to userland test this logic. Update the VMA userland tests accordingly, additionally adding a detach_free_vma() helper function to correctly detach VMAs before freeing them in test code, as this change was triggering the assert for this. [akpm@linux-foundation.org: remove stray newline, per Liam] Link: https://lkml.kernel.org/r/f97b3a85a6da0196b28070df331b99e22b263be8.1745853549.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
dd7a6246f4 |
mm: abstract initial stack setup to mm subsystem
There are peculiarities within the kernel where what is very clearly mm code is performed elsewhere arbitrarily. This violates separation of concerns and makes it harder to refactor code to make changes to how fundamental initialisation and operation of mm logic is performed. One such case is the creation of the VMA containing the initial stack upon execve()'ing a new process. This is currently performed in __bprm_mm_init() in fs/exec.c. Abstract this operation to create_init_stack_vma(). This allows us to limit use of vma allocation and free code to fork and mm only. We previously did the same for the step at which we relocate the initial stack VMA downwards via relocate_vma_down(), now we move the initial VMA establishment too. Take the opportunity to also move insert_vm_struct() to mm/vma.c as it's no longer needed anywhere outside of mm. Link: https://lkml.kernel.org/r/118c950ef7a8dd19ab20a23a68c3603751acd30e.1745853549.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
6c36ac1e12 |
mm: establish mm/vma_exec.c for shared exec/mm VMA functionality
Patch series "move all VMA allocation, freeing and duplication logic to mm", v3. Currently VMA allocation, freeing and duplication exist in kernel/fork.c, which is a violation of separation of concerns, and leaves these functions exposed to the rest of the kernel when they are in fact internal implementation details. Resolve this by moving this logic to mm, and making it internal to vma.c, vma.h. This also allows us, in future, to provide userland testing around this functionality. We additionally abstract dup_mmap() to mm, being careful to ensure kernel/fork.c acceses this via the mm internal header so it is not exposed elsewhere in the kernel. As part of this change, also abstract initial stack allocation performed in __bprm_mm_init() out of fs code into mm via the create_init_stack_vma(), as this code uses vm_area_alloc() and vm_area_free(). In order to do so sensibly, we introduce a new mm/vma_exec.c file, which contains the code that is shared by mm and exec. This file is added to both memory mapping and exec sections in MAINTAINERS so both sets of maintainers can maintain oversight. As part of this change, we also move relocate_vma_down() to mm/vma_exec.c so all shared mm/exec functionality is kept in one place. We add code shared between nommu and mmu-enabled configurations in order to share VMA allocation, freeing and duplication code correctly while also keeping these functions available in userland VMA testing. This is achieved by adding a mm/vma_init.c file which is also compiled by the userland tests. This patch (of 4): There is functionality that overlaps the exec and memory mapping subsystems. While it properly belongs in mm, it is important that exec maintainers maintain oversight of this functionality correctly. We can establish both goals by adding a new mm/vma_exec.c file which contains these 'glue' functions, and have fs/exec.c import them. As a part of this change, to ensure that proper oversight is achieved, add the file to both the MEMORY MAPPING and EXEC & BINFMT API, ELF sections. scripts/get_maintainer.pl can correctly handle files in multiple entries and this neatly handles the cross-over. [akpm@linux-foundation.org: fix comment typo] Link: https://lkml.kernel.org/r/80f0d0c6-0b68-47f9-ab78-0ab7f74677fc@lucifer.local Link: https://lkml.kernel.org/r/cover.1745853549.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/91f2cee8f17d65214a9d83abb7011aa15f1ea690.1745853549.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
3104138517 |
mm: make vma cache SLAB_TYPESAFE_BY_RCU
To enable SLAB_TYPESAFE_BY_RCU for vma cache we need to ensure that object reuse before RCU grace period is over will be detected by lock_vma_under_rcu(). Current checks are sufficient as long as vma is detached before it is freed. The only place this is not currently happening is in exit_mmap(). Add the missing vma_mark_detached() in exit_mmap(). Another issue which might trick lock_vma_under_rcu() during vma reuse is vm_area_dup(), which copies the entire content of the vma into a new one, overriding new vma's vm_refcnt and temporarily making it appear as attached. This might trick a racing lock_vma_under_rcu() to operate on a reused vma if it found the vma before it got reused. To prevent this situation, we should ensure that vm_refcnt stays at detached state (0) when it is copied and advances to attached state only after it is added into the vma tree. Introduce vm_area_init_from() which preserves new vma's vm_refcnt and use it in vm_area_dup(). Since all vmas are in detached state with no current readers when they are freed, lock_vma_under_rcu() will not be able to take vm_refcnt after vma got detached even if vma is reused. vma_mark_attached() in modified to include a release fence to ensure all stores to the vma happen before vm_refcnt gets initialized. Finally, make vm_area_cachep SLAB_TYPESAFE_BY_RCU. This will facilitate vm_area_struct reuse and will minimize the number of call_rcu() calls. [surenb@google.com: remove atomic_set_release() usage in tools/] Link: https://lkml.kernel.org/r/20250217054351.2973666-1-surenb@google.com Link: https://lkml.kernel.org/r/20250213224655.1680278-18-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
6bef4c2f97 |
mm: move lesser used vma_area_struct members into the last cacheline
Move several vma_area_struct members which are rarely or never used during
page fault handling into the last cacheline to better pack vm_area_struct.
As a result vm_area_struct will fit into 3 as opposed to 4 cachelines.
New typical vm_area_struct layout:
struct vm_area_struct {
union {
struct {
long unsigned int vm_start; /* 0 8 */
long unsigned int vm_end; /* 8 8 */
}; /* 0 16 */
freeptr_t vm_freeptr; /* 0 8 */
}; /* 0 16 */
struct mm_struct * vm_mm; /* 16 8 */
pgprot_t vm_page_prot; /* 24 8 */
union {
const vm_flags_t vm_flags; /* 32 8 */
vm_flags_t __vm_flags; /* 32 8 */
}; /* 32 8 */
unsigned int vm_lock_seq; /* 40 4 */
/* XXX 4 bytes hole, try to pack */
struct list_head anon_vma_chain; /* 48 16 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct anon_vma * anon_vma; /* 64 8 */
const struct vm_operations_struct * vm_ops; /* 72 8 */
long unsigned int vm_pgoff; /* 80 8 */
struct file * vm_file; /* 88 8 */
void * vm_private_data; /* 96 8 */
atomic_long_t swap_readahead_info; /* 104 8 */
struct mempolicy * vm_policy; /* 112 8 */
struct vma_numab_state * numab_state; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
refcount_t vm_refcnt (__aligned__(64)); /* 128 4 */
/* XXX 4 bytes hole, try to pack */
struct {
struct rb_node rb (__aligned__(8)); /* 136 24 */
long unsigned int rb_subtree_last; /* 160 8 */
} __attribute__((__aligned__(8))) shared; /* 136 32 */
struct anon_vma_name * anon_name; /* 168 8 */
struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 176 8 */
/* size: 192, cachelines: 3, members: 18 */
/* sum members: 176, holes: 2, sum holes: 8 */
/* padding: 8 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
} __attribute__((__aligned__(64)));
Memory consumption per 1000 VMAs becomes 48 pages:
slabinfo after vm_area_struct changes:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vm_area_struct ... 192 42 2 : ...
Link: https://lkml.kernel.org/r/20250213224655.1680278-14-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
f35ab95ca0 |
mm: replace vm_lock and detached flag with a reference count
rw_semaphore is a sizable structure of 40 bytes and consumes considerable space for each vm_area_struct. However vma_lock has two important specifics which can be used to replace rw_semaphore with a simpler structure: 1. Readers never wait. They try to take the vma_lock and fall back to mmap_lock if that fails. 2. Only one writer at a time will ever try to write-lock a vma_lock because writers first take mmap_lock in write mode. Because of these requirements, full rw_semaphore functionality is not needed and we can replace rw_semaphore and the vma->detached flag with a refcount (vm_refcnt). When vma is in detached state, vm_refcnt is 0 and only a call to vma_mark_attached() can take it out of this state. Note that unlike before, now we enforce both vma_mark_attached() and vma_mark_detached() to be done only after vma has been write-locked. vma_mark_attached() changes vm_refcnt to 1 to indicate that it has been attached to the vma tree. When a reader takes read lock, it increments vm_refcnt, unless the top usable bit of vm_refcnt (0x40000000) is set, indicating presence of a writer. When writer takes write lock, it sets the top usable bit to indicate its presence. If there are readers, writer will wait using newly introduced mm->vma_writer_wait. Since all writers take mmap_lock in write mode first, there can be only one writer at a time. The last reader to release the lock will signal the writer to wake up. refcount might overflow if there are many competing readers, in which case read-locking will fail. Readers are expected to handle such failures. In summary: 1. all readers increment the vm_refcnt; 2. writer sets top usable (writer) bit of vm_refcnt; 3. readers cannot increment the vm_refcnt if the writer bit is set; 4. in the presence of readers, writer must wait for the vm_refcnt to drop to 1 (plus the VMA_LOCK_OFFSET writer bit), indicating an attached vma with no readers; 5. vm_refcnt overflow is handled by the readers. While this vm_lock replacement does not yet result in a smaller vm_area_struct (it stays at 256 bytes due to cacheline alignment), it allows for further size optimization by structure member regrouping to bring the size of vm_area_struct below 192 bytes. [surenb@google.com: fix a crash due to vma_end_read() that should have been removed] Link: https://lkml.kernel.org/r/20250220200208.323769-1-surenb@google.com Link: https://lkml.kernel.org/r/20250213224655.1680278-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Matthew Wilcox <willy@infradead.org> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
55e50223bf |
mm: introduce vma_iter_store_attached() to use with attached vmas
vma_iter_store() functions can be used both when adding a new vma and when updating an existing one. However for existing ones we do not need to mark them attached as they are already marked that way. With vma->detached being a separate flag, double-marking a vmas as attached or detached is not an issue because the flag will simply be overwritten with the same value. However once we fold this flag into the refcount later in this series, re-attaching or re-detaching a vma becomes an issue since these operations will be incrementing/decrementing a refcount. Introduce vma_iter_store_new() and vma_iter_store_overwrite() to replace vma_iter_store() and avoid re-attaching a vma during vma update. Add assertions in vma_mark_attached()/vma_mark_detached() to catch invalid usage. Update vma tests to check for vma detached state correctness. Link: https://lkml.kernel.org/r/20250213224655.1680278-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
8ef95d8f15 |
mm: mark vma as detached until it's added into vma tree
Current implementation does not set detached flag when a VMA is first allocated. This does not represent the real state of the VMA, which is detached until it is added into mm's VMA tree. Fix this by marking new VMAs as detached and resetting detached flag only after VMA is added into a tree. Introduce vma_mark_attached() to make the API more readable and to simplify possible future cleanup when vma->vm_mm might be used to indicate detached vma and vma_mark_attached() will need an additional mm parameter. Link: https://lkml.kernel.org/r/20250213224655.1680278-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sourav Panda <souravpanda@google.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
7b6218ae12 |
mm: move per-vma lock into vm_area_struct
Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing. Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].
Splitting single logical structure into multiple ones leads to more
complicated management, extra pointer dereferences and overall less
maintainable code. When that split-away part is a lock, it complicates
things even further. With no performance benefits, there are no reasons
for this split. Merging the vm_lock back into vm_area_struct also allows
vm_area_struct to use SLAB_TYPESAFE_BY_RCU later in this patchset. Move
vm_lock back into vm_area_struct, aligning it at the cacheline boundary
and changing the cache to be cacheline-aligned as well. With kernel
compiled using defconfig, this causes VMA memory consumption to grow from
160 (vm_area_struct) + 40 (vm_lock) bytes to 256 bytes:
slabinfo before:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vma_lock ... 40 102 1 : ...
vm_area_struct ... 160 51 2 : ...
slabinfo after moving vm_lock:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vm_area_struct ... 256 32 2 : ...
Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs. Note that the size of this structure is
dependent on the kernel configuration and typically the original size is
higher than 160 bytes. Therefore these calculations are close to the
worst case scenario. A more realistic vm_area_struct usage before this
change is:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vma_lock ... 40 102 1 : ...
vm_area_struct ... 176 46 2 : ...
Aggregate VMA memory consumption per 1000 VMAs grows from 54 to 64 pages,
which is 3.9MB per 100000 VMAs. This memory consumption growth can be
addressed later by optimizing the vm_lock.
[1] https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/
[2] https://lore.kernel.org/all/ZsQyI%2F087V34JoIt@xsang-OptiPlex-9020/
[3] https://lore.kernel.org/all/CAJuCfpEisU8Lfe96AYJDZ+OM4NoPmnw9bP53cT_kbfP_pR+-2g@mail.gmail.com/
Link: https://lkml.kernel.org/r/20250213224655.1680278-3-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
c372473a54 |
mm: completely abstract unnecessary adj_start calculation
The adj_start calculation has been a constant source of confusion in the
VMA merge code.
There are two cases to consider, one where we adjust the start of the
vmg->middle VMA (i.e. the vmg->__adjust_middle_start merge flag is set),
in which case adj_start is calculated as:
(1) adj_start = vmg->end - vmg->middle->vm_start
And the case where we adjust the start of the vmg->next VMA (i.e. the
vmg->__adjust_next_start merge flag is set), in which case adj_start is
calculated as:
(2) adj_start = -(vmg->middle->vm_end - vmg->end)
We apply (1) thusly:
vmg->middle->vm_start =
vmg->middle->vm_start + vmg->end - vmg->middle->vm_start
Which simplifies to:
vmg->middle->vm_start = vmg->end
Similarly, we apply (2) as:
vmg->next->vm_start =
vmg->next->vm_start + -(vmg->middle->vm_end - vmg->end)
Noting that for these VMAs to be mergeable vmg->middle->vm_end ==
vmg->next->vm_start and so this simplifies to:
vmg->next->vm_start =
vmg->next->vm_start + -(vmg->next->vm_start - vmg->end)
Which simplifies to:
vmg->next->vm_start = vmg->end
Therefore in each case, we simply need to adjust the start of the VMA to
vmg->end (!) and can do away with this adj_start calculation. The only
caveat is that we must ensure we update the vm_pgoff field correctly.
We therefore abstract this entire calculation to a new function
vmg_adjust_set_range() which performs this calculation and sets the
adjusted VMA's new range using the general vma_set_range() function.
We also must update vma_adjust_trans_huge() which expects the
now-abstracted adj_start parameter. It turns out this is wholly
unnecessary.
In vma_adjust_trans_huge() the relevant code is:
if (adjust_next > 0) {
struct vm_area_struct *next = find_vma(vma->vm_mm, vma->vm_end);
unsigned long nstart = next->vm_start;
nstart += adjust_next;
split_huge_pmd_if_needed(next, nstart);
}
The only case where this is relevant is when vmg->__adjust_middle_start is
specified (in which case adj_next would have been positive), i.e. the one
in which the vma specified is vmg->prev and this the sought 'next' VMA
would be vmg->middle.
We can therefore eliminate the find_vma() invocation altogether and simply
provide the vmg->middle VMA in this instance, or NULL otherwise.
Again we have an adj_next offset calculation:
next->vm_start + vmg->end - vmg->middle->vm_start
Where next == vmg->middle this simplifies to vmg->end as previously
demonstrated.
Therefore nstart is equal to vmg->end, which is already passed to
vma_adjust_trans_huge() via the 'end' parameter and so this code (rather
delightfully) simplifies to:
if (next)
split_huge_pmd_if_needed(next, end);
With these changes in place, it becomes silly for commit_merge() to return
vmg->target, as it is always the same and threaded through vmg, so we
finally change commit_merge() to return an error value once again.
This patch has no change in functional behaviour.
Link: https://lkml.kernel.org/r/7bce2cd4b5afb56211822835d145471280c3dccc.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
cf929a2863 |
tools: add VM_WARN_ON_VMG definition
vma tests compilation yields the following error: vma.c:732:9: error: implicit declaration of function ‘VM_WARN_ON_VMG’ Fix it by adding missing VM_WARN_ON_VMG() definition. Link: https://lkml.kernel.org/r/20250116181538.759469-1-surenb@google.com Fixes: e3a7ae85f87c ("mm/debug: prefer VM_WARN_ON_VMG() to report VMG debug warnings") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
f8d4a6cabb |
mm: make mmap_region() internal
Now that we have removed the one user of mmap_region() outside of mm, make it internal and add it to vma.c so it can be userland tested. This ensures that all external memory mappings are performed using the appropriate interfaces and allows us to modify memory mapping logic as we see fit. Additionally expand test stubs to allow for the mmap_region() code to compile and be userland testable. Link: https://lkml.kernel.org/r/de5a3c574d35c26237edf20a1d8652d7305709c9.1735819274.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
7e8c8fd348 |
tools: testing: add simple __mmap_region() userland test
Introduce demonstrative, basic, __mmap_region() test upon which we can base further work upon moving forwards. This simply asserts that mappings can be made and merges occur as expected. As part of this change, fix the security_vm_enough_memory_mm() stub which was previously incorrectly implemented. Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
e5e7fb278e |
mm: convert mm_lock_seq to a proper seqcount
Convert mm_lock_seq to be seqcount_t and change all mmap_write_lock variants to increment it, in-line with the usual seqcount usage pattern. This lets us check whether the mmap_lock is write-locked by checking mm_lock_seq.sequence counter (odd=locked, even=unlocked). This will be used when implementing mmap_lock speculation functions. As a result vm_lock_seq is also change to be unsigned to match the type of mm_lock_seq.sequence. Link: https://lkml.kernel.org/r/20241122174416.1367052-2-surenb@google.com Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
bef5418d1f |
mm/vma: move __vm_munmap() to mm/vma.c
This was arbitrarily left in mmap.c it makes no sense being there, move it to vma.c to render it testable. Link: https://lkml.kernel.org/r/5e5e81807c54dfbe363edb2d431eb3d7a37fcdba.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
a9d1f3f2d7 |
mm/vma: move stack expansion logic to mm/vma.c
We build on previous work making expand_downwards() an entirely internal function. This logic is subtle and so it is highly useful to get it into vma.c so we can then userland unit test. We must additionally move acct_stack_growth() to vma.c as it is a helper function used by both expand_downwards() and expand_upwards(). We are also then able to mark anon_vma_interval_tree_pre_update_vma() and anon_vma_interval_tree_post_update_vma() static as these are no longer used by anything else. Link: https://lkml.kernel.org/r/0feb104eff85922019d4fb29280f3afb130c5204.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
c7c643d985 |
mm/vma: move unmapped_area() internals to mm/vma.c
We want to be able to unit test the unmapped area logic, so move it to mm/vma.c. The wrappers which invoke this remain in place in mm/mmap.c. In addition, naturally, update the existing test code to enable this to be compiled in userland. Link: https://lkml.kernel.org/r/53a57a52a64ea54e9d129d2e2abca3a538022379.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
7d344babac |
mm/vma: move brk() internals to mm/vma.c
Patch series "mm/vma: make more mmap logic userland testable".
This series carries on the work started in previous series and
continued in commit
|