mirror of https://github.com/torvalds/linux.git
96 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
617a814f14 |
ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
"Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
"Some cleanups for shmem" from Baolin Wang. No functional changes - mode
code reuse, better function naming, logic simplifications.
"mm: some small page fault cleanups" from Josef Bacik. No functional
changes - code cleanups only.
"Various memory tiering fixes" from Zi Yan. A small fix and a little
cleanup.
"mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
"Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This
is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at all
used 16k. Useful for some system tuning things, but partivularly useful
for "the dynamic kernel stack project".
"kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
Teaches kmemleak to detect leaksage of percpu memory.
"mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
"mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work
correctly by design rather than by accident.
"mm: remove arch_make_page_accessible()" from David Hildenbrand. Some
folio conversions which make arch_make_page_accessible() unneeded.
"mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
Cleans up and fixes our handling of the resetting of the cgroup/process
peak-memory-use detector.
"Make core VMA operations internal and testable" from Lorenzo Stoakes.
Rationalizaion and encapsulation of the VMA manipulation APIs. With a
view to better enable testing of the VMA functions, even from a
userspace-only harness.
"mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in
the zswap global shrinker, resulting in improved performance.
"mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in
some missing info in /proc/zoneinfo.
"mm: replace follow_page() by folio_walk" from David Hildenbrand. Code
cleanups and rationalizations (conversion to folio_walk()) resulting in
the removal of follow_page().
"improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some
tuning to improve zswap's dynamic shrinker. Significant reductions in
swapin and improvements in performance are shown.
"mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
Improvements to the new unaccepted memory feature,
"mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX
PUDs. This was missing, although nobody seems to have notied yet.
"Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
Cleanups and modest performance improvements for the maple tree library
code.
"memcg: further decouple v1 code from v2" from Shakeel Butt. Move more
cgroup v1 remnants away from the v2 memcg code.
"memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds
various warnings telling users that memcg v1 features are deprecated.
"mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
Greatly improves the success rate of the mTHP swap allocation.
"mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate
per-arch implementations of numa_memblk code into generic code.
"mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
"support large folio swap-out and swap-in for shmem" from Baolin Wang.
With this series we no longer split shmem large folios into simgle-page
folios when swapping out shmem.
"mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance
improvements and code reductions for gigantic folios.
"support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
"mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
"Increase the number of bits available in page_type" from Matthew Wilcox.
Increases the number of bits available in page_type!
"Simplify the page flags a little" from Matthew Wilcox. Many legacy page
flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
"mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An
optimization which permits us to avoid writing/reading zero-filled zswap
pages to backing store.
"Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window
which occurs when a MAP_FIXED operqtion is occurring during an unrelated
vma tree walk.
"mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the
vma_merge() functionality, making ot cleaner, more testable and better
tested.
"misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor
fixups of DAMON selftests and kunit tests.
"mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code
cleanups and folio conversions.
"Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups
for shmem controls and stats.
"mm: count the number of anonymous THPs per size" from Barry Song. Expose
additional anon THP stats to userspace for improved tuning.
"mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
conversions and removal of now-unused page-based APIs.
"replace per-quota region priorities histogram buffer with per-context
one" from SeongJae Park. DAMON histogram rationalization.
"Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
Park. DAMON documentation updates.
"mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
related doc and warn" from Jason Wang: fixes usage of page allocator
__GFP_NOFAIL and GFP_ATOMIC flags.
"mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this
was overprovisioning THPs in sparsely accessed memory areas.
"zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add
support for zram run-time compression algorithm tuning.
"mm: Care about shadow stack guard gap when getting an unmapped area" from
Mark Brown. Fix up the various arch_get_unmapped_area() implementations
to better respect guard areas.
"Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of
mem_cgroup_iter() and various code cleanups.
"mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
"resource: Fix region_intersects() vs add_memory_driver_managed()" from
Huang Ying. Fix a bug in region_intersects() for systems with CXL memory.
"mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a
couple more code paths to correctly recover from the encountering of
poisoned memry.
"mm: enable large folios swap-in support" from Barry Song. Support the
swapin of mTHP memory into appropriately-sized folios, rather than into
single-page folios.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
=s0T+
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
|
|
40b88644dd |
mm: remove arch_unmap()
Now that powerpc no longer uses arch_unmap() to handle VDSO unmapping, there are no meaningful implementions left. Drop support for it entirely, and update comments which refer to it. Link: https://lkml.kernel.org/r/20240812082605.743814-3-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
5463bafab4 |
powerpc/mm: handle VDSO unmapping via close() rather than arch_unmap()
Add a close() callback to the VDSO special mapping to handle unmapping of the VDSO. That will make it possible to remove the arch_unmap() hook entirely in a subsequent patch. Link: https://lkml.kernel.org/r/20240812082605.743814-2-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
600d6a7e63 |
powerpc: Remove obsoleted declarations for use_cop and drop_cop
The use_cop() and drop_cop() have been removed since
commit
|
|
|
|
f74b2a6c01 |
powerpc/64s: Use dec_mm_active_cpus helper
Avoid open-coded atomic_dec on mm->context.active_cpus and use the function made for it. Add CONFIG_DEBUG_VM underflow checking on the counter. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230524060821.148015-3-npiggin@gmail.com |
|
|
|
0f0a0a6091 |
cxl: Use radix__flush_all_mm instead of generic flush_all_mm
The generic implementation of this function isn't really generic (Hash is not implemented). Unfortunately, the runtime warnings cannot be replaced with BUILD_BUG's, so it seems safer not to provide a stub in the first place. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221109045112.187069-6-bgray@linux.ibm.com |
|
|
|
d24b8f01fe |
powerpc/mm: remove orphan declarations from mmu_context.h
Remove the following orphan declarations from mmu_context.h: 1. switch_cop() and drop_cop() have been removed since commit |
|
|
|
cad32d9d42 |
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru |
|
|
|
d1d8a3b4d0 |
mm: Turn isolate_lru_page() into folio_isolate_lru()
Add isolate_lru_page() as a wrapper around isolate_lru_folio(). TestClearPageLRU() would have always failed on a tail page, so returning -EBUSY is the same behaviour. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> |
|
|
|
c13f2b2bb5 |
powerpc/mm: Add __init attribute to eligible functions
Some functions defined in 'arch/powerpc/mm' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-4-nick.child@ibm.com |
|
|
|
387e220a2e |
powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
Fix radix_enabled() use in setup_initial_memory_limit(). Add some
stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
|
|
|
|
a736143afd |
Merge branch 'topic/ppc-kvm' into next
Pull in some more ppc KVM patches we are keeping in our topic branch. In particular this brings in the series to add H_RPT_INVALIDATE. |
|
|
|
f0c6fbbb90 |
KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
H_RPT_INVALIDATE does two types of TLB invalidations: 1. Process-scoped invalidations for guests when LPCR[GTSE]=0. This is currently not used in KVM as GTSE is not usually disabled in KVM. 2. Partition-scoped invalidations that an L1 hypervisor does on behalf of an L2 guest. This is currently handled by H_TLB_INVALIDATE hcall and this new replaces the old that. This commit enables process-scoped invalidations for L1 guests. Support for process-scoped and partition-scoped invalidations from/for nested guests will be added separately. Process scoped tlbie invalidations from L1 and nested guests need RS register for TLBIE instruction to contain both PID and LPID. This patch introduces primitives that execute tlbie instruction with both PID and LPID set in prepartion for H_RPT_INVALIDATE hcall. A description of H_RPT_INVALIDATE follows: int64 /* H_Success: Return code on successful completion */ /* H_Busy - repeat the call with the same */ /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */ hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */ uint64 id, /* PID/LPID to invalidate */ uint64 target, /* Invalidation target */ uint64 type, /* Type of lookaside information */ uint64 pg_sizes, /* Page sizes */ uint64 start, /* Start of Effective Address (EA) range (inclusive) */ uint64 end) /* End of EA range (exclusive) */ Invalidation targets (target) ----------------------------- Core MMU 0x01 /* All virtual processors in the partition */ Core local MMU 0x02 /* Current virtual processor */ Nest MMU 0x04 /* All nest/accelerator agents in use by the partition */ A combination of the above can be specified, except core and core local. Type of translation to invalidate (type) --------------------------------------- NESTED 0x0001 /* invalidate nested guest partition-scope */ TLB 0x0002 /* Invalidate TLB */ PWC 0x0004 /* Invalidate Page Walk Cache */ PRT 0x0008 /* Invalidate caching of Process Table Entries if NESTED is clear */ PAT 0x0008 /* Invalidate caching of Partition Table Entries if NESTED is set */ A combination of the above can be specified. Page size mask (pages) ---------------------- 4K 0x01 64K 0x02 2M 0x04 1G 0x08 All sizes (-1UL) A combination of the above can be specified. All page sizes can be selected with -1. Semantics: Invalidate radix tree lookaside information matching the parameters given. * Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are different from the defined values. * Return H_PARAMETER if NESTED is set and pid is not a valid nested LPID allocated to this partition * Return H_P5 if (start, end) doesn't form a valid range. Start and end should be a valid Quadrant address and end > start. * Return H_NotSupported if the partition is not in running in radix translation mode. * May invalidate more translation information than requested. * If start = 0 and end = -1, set the range to cover all valid addresses. Else start and end should be aligned to 4kB (lower 11 bits clear). * If NESTED is clear, then invalidate process scoped lookaside information. Else pid specifies a nested LPID, and the invalidation is performed on nested guest partition table and nested guest partition scope real addresses. * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and quadrant 0 spaces, Else valid addresses are quadrant 0. * Pages which are fully covered by the range are to be invalidated. Those which are partially covered are considered outside invalidation range, which allows a caller to optimally invalidate ranges that may contain mixed page sizes. * Return H_SUCCESS on success. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com |
|
|
|
3c53642324 |
Merge branch 'topic/ppc-kvm' into next
Merge some powerpc KVM patches from our topic branch. In particular this brings in Nick's big series rewriting parts of the guest entry/exit path in C. Conflicts: arch/powerpc/kernel/security.c arch/powerpc/kvm/book3s_hv_rmhandlers.S |
|
|
|
a56ab7c729 |
powerpc/nohash: Convert set_context() to C
ppc8xx already has set_context() in C. Other ones have it in assembly. The only thing it does is to write the context id into SPRN_PID. Do it in C. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu |
|
|
|
2e1ae9cd56 |
KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU
Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com |
|
|
|
1c4bce6753 |
powerpc/vdso: Separate vvar vma from vdso
Since commit |
|
|
|
266d8f7586 |
powerpc/pkeys: Remove unused code
This removes arch_supports_pkeys(), arch_usable_pkeys() and thread_pkey_regs_*() which are remnants from the following: commit |
|
|
|
8a5be36b93 |
powerpc updates for 5.11
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
setup/handling code.
- Support for KUAP (Kernel User Access Prevention) on systems using the hashed
page table MMU, using memory protection keys.
- Better handling of PowerVM SMT8 systems where all threads of a core do not
share an L2, allowing the scheduler to make better scheduling decisions.
- Further improvements to our machine check handling.
- Show registers when unwinding interrupt frames during stack traces.
- Improvements to our pseries (PowerVM) partition migration code.
- Several series from Christophe refactoring and cleaning up various parts of
the 32-bit code.
- Other smaller features, fixes & cleanups.
Thanks to:
Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard
Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater,
Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David
Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert
Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan
Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh
Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov,
Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior ,
Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König,
Vincent Stehlé, Youling Tang, Zhang Xiaoxu.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V
h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz
gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a
EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl
lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms
+5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi
iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg
MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J
CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv
Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD
GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2
azmcaOyLdb+HXA==
=Ga3K
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
setup/handling code.
- Support for KUAP (Kernel User Access Prevention) on systems using the
hashed page table MMU, using memory protection keys.
- Better handling of PowerVM SMT8 systems where all threads of a core
do not share an L2, allowing the scheduler to make better scheduling
decisions.
- Further improvements to our machine check handling.
- Show registers when unwinding interrupt frames during stack traces.
- Improvements to our pseries (PowerVM) partition migration code.
- Several series from Christophe refactoring and cleaning up various
parts of the 32-bit code.
- Other smaller features, fixes & cleanups.
Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.
* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
powerpc: Add config fragment for disabling -Werror
powerpc/configs: Add ppc64le_allnoconfig target
powerpc/powernv: Rate limit opal-elog read failure message
powerpc/pseries/memhotplug: Quieten some DLPAR operations
powerpc/ps3: use dma_mapping_error()
powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
KVM: PPC: fix comparison to bool warning
KVM: PPC: Book3S: Assign boolean values to a bool variable
powerpc: Inline setup_kup()
powerpc/64s: Mark the kuap/kuep functions non __init
KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
powerpc/xive: Improve error reporting of OPAL calls
powerpc/xive: Simplify xive_do_source_eoi()
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
...
|
|
|
|
d94b827e89 |
powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation
This patch updates kernel hash page table entries to use storage key 3 for its mapping. This implies all kernel access will now use key 3 to control READ/WRITE. The patch also prevents the allocation of key 3 from userspace and UAMOR value is updated such that userspace cannot modify key 3. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-9-aneesh.kumar@linux.ibm.com |
|
|
|
511157ab64 |
powerpc/vdso: Move vdso datapage up front
Move the vdso datapage in front of the VDSO area, before vdso test. This will allow to remove the __kernel_datapage_offset symbol and simplify __get_datapage() in following patches. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b68c99b6e8ee0b1d99bfa4c7e34c359fc1bc1000.1601197618.git.christophe.leroy@csgroup.eu |
|
|
|
c102f07667 |
powerpc/vdso: Replace vdso_base by vdso
All other architectures but s390 use a void pointer named 'vdso' to reference the VDSO mapping. In a following patch, the VDSO data page will be put in front of text, vdso_base will then not anymore point to VDSO text. To avoid confusion between vdso_base and VDSO text, rename vdso_base into vdso and make it a void __user *. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8e6cefe474aa4ceba028abb729485cd46c140990.1601197618.git.christophe.leroy@csgroup.eu |
|
|
|
f4b90e37e3 |
powerpc: use asm-generic/mmu_context.h for no-op implementations
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> |
|
|
|
66acd46080 |
powerpc: select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
powerpc uses IPIs in some situations to switch a kernel thread away from a lazy tlb mm, which is subject to the TLB flushing race described in the changelog introducing ARCH_WANT_IRQS_OFF_ACTIVATE_MM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200914045219.3736466-3-npiggin@gmail.com |
|
|
|
a3f3f8aa1f |
powerpc: Remove unneeded inline functions
Both of those functions are only called from 64-bit only code, so the stubs should not be needed at all. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200717112714.19304-1-yuehaibing@huawei.com |
|
|
|
c420644c0a |
powerpc: Use mm_context vas_windows counter to issue CP_ABORT
set_thread_uses_vas() sets used_vas flag for a process that opened VAS
window and issue CP_ABORT during context switch for only that process.
In multi-thread application, windows can be shared. For example Thread
A can open a window and Thread B can run COPY/PASTE instructions to
send NX request which may cause corruption or snooping or a covert
channel Also once this flag is set, continue to run CP_ABORT even the
VAS window is closed.
So define vas-windows counter in process mm_context, increment this
counter for each window open and decrement it for window close. If
vas-windows is set, issue CP_ABORT during context switch. It means
clear the foreign real address mapping only if the process / thread
uses COPY/PASTE. Then disable it for that process if windows are not
open.
Moved set_thread_uses_vas() code to vas_tx_win_open() as this
functionality is needed only for userspace open windows. We are adding
VAS userspace support along with this fix. So no need to include this
fix in stable releases.
Fixes:
|
|
|
|
42222eae17 |
mm: remove arch_bprm_mm_init() hook
From: Dave Hansen <dave.hansen@linux.intel.com> MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc). arch_bprm_mm_init() is used at execve() time. The only non-stub implementation is on x86 for MPX. Remove the hook entirely from all architectures and generic code. Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> |
|
|
|
1335d9a1fb |
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar: "This fixes a particularly thorny munmap() bug with MPX, plus fixes a host build environment assumption in objtool" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Allow AR to be overridden with HOSTAR x86/mpx, mm/core: Fix recursive munmap() corruption |
|
|
|
5a28fc94c9 |
x86/mpx, mm/core: Fix recursive munmap() corruption
This is a bit of a mess, to put it mildly. But, it's a bug
that only seems to have showed up in 4.20 but wasn't noticed
until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional
change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the
VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
arch_unmap() makes this happen earlier in __do_munmap(). But,
'vdso_base' seems to only be used in perf and in the signal
delivery that happens near the return to userspace. I can not
find any likely impact to powerpc, other than the zeroing
happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by
its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fix up some of the accounting (like fput()) and actually
free the VMA itself.
This specific ordering was actually introduced by:
|
|
|
|
93f2cd8137 |
powerpc/mm: define an empty mm_iommu_init()
To avoid ifdefs, define a empty static inline mm_iommu_init() function when CONFIG_SPAPR_TCE_IOMMU is not selected. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
737b434d3d |
powerpc/mm: convert Book3E 64 to pte_fragment
Book3E 64 is the only subarch not using pte_fragment. In order to allow refactorisation, this patch converts it to pte_fragment. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
c10c21efa4 |
powerpc/vfio/iommu/kvm: Do not pin device memory
This new memory does not have page structs as it is not plugged to the host so gup() will fail anyway. This adds 2 helpers: - mm_iommu_newdev() to preregister the "memory device" memory so the rest of API can still be used; - mm_iommu_is_devmem() to know if the physical address is one of thise new regions which we must avoid unpinning of. This adds @mm to tce_page_is_contained() and iommu_tce_xchg() to test if the memory is device memory to avoid pfn_to_page(). This adds a check for device memory in mm_iommu_ua_mark_dirty_rm() which does delayed pages dirtying. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
e0bf78b0f9 |
powerpc/mm/iommu/vfio_spapr_tce: Change mm_iommu_get to reference a region
Normally mm_iommu_get() should add a reference and mm_iommu_put() should remove it. However historically mm_iommu_find() does the referencing and mm_iommu_get() is doing allocation and referencing. We are going to add another helper to preregister device memory so instead of having mm_iommu_new() (which pre-registers the normal memory and references the region), we need separate helpers for pre-registering and referencing. This renames: - mm_iommu_get to mm_iommu_new; - mm_iommu_find to mm_iommu_get. This changes mm_iommu_get() to reference the region so the name now reflects what it does. This removes the check for exact match from mm_iommu_new() as we want it to fail on existing regions; mm_iommu_get() should be used instead. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
2cd4bd192e |
powerpc/pkeys: Fix handling of pkey state across fork()
Protection key tracking information is not copied over to the
mm_struct of the child during fork(). This can cause the child to
erroneously allocate keys that were already allocated. Any allocated
execute-only key is lost aswell.
Add code; called by dup_mmap(), to copy the pkey state from parent to
child explicitly.
This problem was originally found by Dave Hansen on x86, which turns
out to be a problem on powerpc aswell.
Fixes:
|
|
|
|
32ea4c1499 |
powerpc/mm: Extend pte_fragment functionality to PPC32
In order to allow the 8xx to handle pte_fragments, this patch extends the use of pte_fragments to PPC32 platforms. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
685f7e4f16 |
powerpc updates for 4.20
Notable changes:
- A large series to rewrite our SLB miss handling, replacing a lot of fairly
complicated asm with much fewer lines of C.
- Following on from that, we now maintain a cache of SLB entries for each
process and preload them on context switch. Leading to a 27% speedup for our
context switch benchmark on Power9.
- Improvements to our handling of SLB multi-hit errors. We now print more debug
information when they occur, and try to continue running by flushing the SLB
and reloading, rather than treating them as fatal.
- Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).
- Add support for physical memory up to 2PB in the linear mapping on 64-bit
Book3S. We only support up to 512TB as regular system memory, otherwise the
percpu allocator runs out of vmalloc space.
- Add stack protector support for 32 and 64-bit, with a per-task canary.
- Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.
- Support recognising "big cores" on Power9, where two SMT4 cores are presented
to us as a single SMT8 core.
- A large series to cleanup some of our ioremap handling and PTE flags.
- Add a driver for the PAPR SCM (storage class memory) interface, allowing
guests to operate on SCM devices (acked by Dan).
- Changes to our ftrace code to handle very large kernels, where we need to use
a trampoline to get to ftrace_caller().
Many other smaller enhancements and cleanups.
Thanks to:
Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton Blanchard, Aravinda
Prasad, Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Breno Leitao,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Dan Carpenter, Daniel
Axtens, Finn Thain, Gautham R. Shenoy, Gustavo Romero, Haren Myneni, Hari
Bathini, Jia Hongtao, Joel Stanley, John Allen, Laurent Dufour, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael
Bringmann, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran,
Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab, Rob Herring, Sam
Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan Johnson, Stephen Rothwell,
Stewart Smith, Suraj Jitindar Singh, Tyrel Datwyler, Vaibhav Jain, Vasant
Hegde, YueHaibing, zhong jiang,
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJb01vTAAoJEFHr6jzI4aWADsEP/jqL3+2qxs098ra80tpXCpXJ
tgXCosEs4b35sGtyHeUWZZZfWXeisaPAIlP8zTx1n50HACZduDYRAl0Ew9XB7Xdw
enDHRVccD21FsmHBOx/Ii1rVJlovWlj6EQCWHKeZmNjeRoFuClVZ7CYmf+mBifKR
sw2Db2fKA/59wMTq2zIMy5pqYgqlAs4jTWS6uN5hKPoBmO/82ARnNG+qgLuloD3Z
O8zSDM9QQ7PpuyDgTjO9SAo2YjmEfXlEG6cOCCejsU3DMctaEAK5PUZ+blsHYHBH
BYZYKs/x4pcw0SO41GtTh0M2YqDYBVuBIpRw8lLZap97Xo9ucSkAm5WD3rGxk4CY
YeZKEPUql6MHN3+DKl8mx2F0V+Et/tio2HNqc9KReR1tfoolZAbe+SFZHfgmc/Rq
RD9nnG8KRd4K2K1BTqpkTmI1EtE7jPtPJPSV8gMGhgL/N5vPmH3mql/qyOtYx48E
6/hPzWESgs16VRZ/opLh8VvjlY1HBDODQhehhhl+o23/Vb8qEgRf8Uqhq50rQW1H
EeOqyyYQ90txSU31Sgy1kQkvOgIFAsBObWT1ZCJ3RbfGbB4/tdEAvZqTZRlXo2OY
7P0Sqcw/9Le5eJkHIlLtBv0TF7y1OYemCbLgRQzFlcRP+UKtYyg8eFnFjqbPEEmP
ulwhn/BfFVSgaYKQ503u
=I0pj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- A large series to rewrite our SLB miss handling, replacing a lot of
fairly complicated asm with much fewer lines of C.
- Following on from that, we now maintain a cache of SLB entries for
each process and preload them on context switch. Leading to a 27%
speedup for our context switch benchmark on Power9.
- Improvements to our handling of SLB multi-hit errors. We now print
more debug information when they occur, and try to continue running
by flushing the SLB and reloading, rather than treating them as
fatal.
- Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9).
- Add support for physical memory up to 2PB in the linear mapping on
64-bit Book3S. We only support up to 512TB as regular system
memory, otherwise the percpu allocator runs out of vmalloc space.
- Add stack protector support for 32 and 64-bit, with a per-task
canary.
- Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.
- Support recognising "big cores" on Power9, where two SMT4 cores are
presented to us as a single SMT8 core.
- A large series to cleanup some of our ioremap handling and PTE
flags.
- Add a driver for the PAPR SCM (storage class memory) interface,
allowing guests to operate on SCM devices (acked by Dan).
- Changes to our ftrace code to handle very large kernels, where we
need to use a trampoline to get to ftrace_caller().
And many other smaller enhancements and cleanups.
Thanks to: Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton
Blanchard, Aravinda Prasad, Bartlomiej Zolnierkiewicz, Benjamin
Herrenschmidt, Breno Leitao, Cédric Le Goater, Christophe Leroy,
Christophe Lombard, Dan Carpenter, Daniel Axtens, Finn Thain, Gautham
R. Shenoy, Gustavo Romero, Haren Myneni, Hari Bathini, Jia Hongtao,
Joel Stanley, John Allen, Laurent Dufour, Madhavan Srinivasan, Mahesh
Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael Bringmann,
Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan
Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver
O'Halloran, Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab,
Rob Herring, Sam Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan
Johnson, Stephen Rothwell, Stewart Smith, Suraj Jitindar Singh, Tyrel
Datwyler, Vaibhav Jain, Vasant Hegde, YueHaibing, zhong jiang"
* tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (221 commits)
Revert "selftests/powerpc: Fix out-of-tree build errors"
powerpc/msi: Fix compile error on mpc83xx
powerpc: Fix stack protector crashes on CPU hotplug
powerpc/traps: restore recoverability of machine_check interrupts
powerpc/64/module: REL32 relocation range check
powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd
selftests/powerpc: Add a test of wild bctr
powerpc/mm: Fix page table dump to work on Radix
powerpc/mm/radix: Display if mappings are exec or not
powerpc/mm/radix: Simplify split mapping logic
powerpc/mm/radix: Remove the retry in the split mapping logic
powerpc/mm/radix: Fix small page at boundary when splitting
powerpc/mm/radix: Fix overuse of small pages in splitting logic
powerpc/mm/radix: Fix off-by-one in split mapping logic
powerpc/ftrace: Handle large kernel configs
powerpc/mm: Fix WARN_ON with THP NUMA migration
selftests/powerpc: Fix out-of-tree build errors
powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected
powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64
powerpc/time: isolate scaled cputime accounting in dedicated functions.
...
|
|
|
|
c9f80734cd |
powerpc/mm/hash: Rename get_ea_context to get_user_context
We will be adding get_kernel_context later. Update function name to indicate this handle context allocation user space address. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
425333bf3a |
KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm() which in turn reads the old TCE and if it was a valid entry, marks the physical page dirty if it was mapped for writing. Since it is in real mode, realmode_pfn_to_page() is used instead of pfn_to_page() to get the page struct. However SetPageDirty() itself reads the compound page head and returns a virtual address for the head page struct and setting dirty bit for that kills the system. This adds additional dirty bit tracking into the MM/IOMMU API for use in the real mode. Note that this does not change how VFIO and KVM (in virtual mode) set this bit. The KVM (real mode) changes include: - use the lowest bit of the cached host phys address to carry the dirty bit; - mark pages dirty when they are unpinned which happens when the preregistered memory is released which always happens in virtual mode; - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit; - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use in the new mm_iommu_ua_mark_dirty_rm() helper; - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only caller anyway) to reduce the real mode KVM and IOMMU knowledge across different subsystems. This removes realmode_pfn_to_page() as it is not used anymore. While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for the real mode only and modules cannot call it anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> |
|
|
|
cca19f0b68 |
powerpc/64s/radix: Fix missing global invalidations when removing copro
With the optimizations for TLB invalidation from commit |
|
|
|
76fa4975f3 |
KVM: PPC: Check if IOMMU page is contained in the pinned physical page
A VM which has:
- a DMA capable device passed through to it (eg. network card);
- running a malicious kernel that ignores H_PUT_TCE failure;
- capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.
The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.
The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.
We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,
This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.
This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.
Fixes:
|
|
|
|
dbec10e58d |
mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
Consolidate the pkey handling by providing a common empty definition of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n. This also removes another entanglement of pkeys.h and asm/mmu_context.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> |
|
|
|
f384796c40 |
powerpc/mm: Add support for handling > 512TB address in SLB miss
For addresses above 512TB we allocate additional mmu contexts. To make
it all easy, addresses above 512TB are handled with IR/DR=1 and with
stack frame setup.
The mmu_context_t is also updated to track the new extended_ids. To
support upto 4PB we need a total 8 contexts.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Minor formatting tweaks and comment wording, switch BUG to WARN
in get_ea_context().]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
|
|
aff6f8cb3e |
powerpc/mm: Add tracking of the number of coprocessors using a context
Currently, when using coprocessors (which use the Nest MMU), we
simply increment the active_cpu count to force all TLB invalidations
to be come broadcast.
Unfortunately, due to an errata in POWER9, we will need to know
more specifically that coprocessors are in use.
This maintains a separate copros counter in the MMU context for
that purpose.
NB. The commit mentioned in the fixes tag below is not at fault for
the bug we're fixing in this commit and the next, but this fix applies
on top the infrastructure it introduced.
Fixes:
|
|
|
|
03f51d4efa |
powerpc updates for 4.16
Highlights:
- Enable support for memory protection keys aka "pkeys" on Power7/8/9 when
using the hash table MMU.
- Extend our interrupt soft masking to support masking PMU interrupts as well
as "normal" interrupts, and then use that to implement local_t for a ~4x
speedup vs the current atomics-based implementation.
- A new driver "ocxl" for "Open Coherent Accelerator Processor Interface
(OpenCAPI)" devices.
- Support for new device tree properties on PowerVM to describe hotpluggable
memory and devices.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO.
- Freescale updates from Scott:
"Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a
minor cleanup patch."
As well as quite a lot of other changes all over the place, and small fixes and
cleanups as always.
Thanks to:
Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas
Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman
Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater,
Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes
do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G.
Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim
Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright,
Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre,
Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai,
Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart
Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl
Gomonovych.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJadF6wExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYA2
nBAAnguCEyAIYpc+ffE3WU9xJEWxa6bKuVufHcUFVntGiGD+igmMS+SHp4ay3Aos
HcA4WFrpzNb2KZ++kmFWtAKWnMfCiW9xuYJNicjr7X5ZiVBEhLWN/mQCwBKs3p6L
5+HhvytcdkKVbEcyVjEGvRL40AyxXNOI02o6Co9X8vanHsmWB4q0eWe4PHstZqlg
6K6kazMp+NTvEFYwKNXDOvuHouKSL57l14SLROH7CpJkNTOQ9s+W59/LmnuCjRlu
o70b7iWOAEbF9tvMma1ksDZVNj7mSyaymLYCyOXu4CkuuleJacZYJ9oQGNddoIbC
wk7l93vPT/yze7DYg8x3uXpKcaDEvEepPuQ/ubz+UXFQWuJtl5ej6Cv+0eOmyZIs
+bjWhGHKdNttnsiPlTRCX/gWD13RE1dB6xXJlfOJ7Oz9OnXXK8ZKc1NTREbQXRWM
8tClAwf9upWpm86GHPVnyrgYbgZo5b1os4SoS8e3kESzakrQVQP7J376u2DtccRq
2AGqjJ+tl5tYPnhm8zG1cNrpqHHpgkNGqLS7DvWRg3EPmEKVQcltN1b/0aKaAjHA
aTRofjrVo+jJ4MX1uyEo59yNCEQPfjkmHRQdLwm+xjWTzEPfIMzpWyXm14tawDQf
OjcAe90W/qQ18brw4z+2BI14J76XziOSX/QcunOn1u/sqaM=
=3rYn
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Enable support for memory protection keys aka "pkeys" on Power7/8/9
when using the hash table MMU.
- Extend our interrupt soft masking to support masking PMU interrupts
as well as "normal" interrupts, and then use that to implement
local_t for a ~4x speedup vs the current atomics-based
implementation.
- A new driver "ocxl" for "Open Coherent Accelerator Processor
Interface (OpenCAPI)" devices.
- Support for new device tree properties on PowerVM to describe
hotpluggable memory and devices.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit
VDSO.
- Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI
erratum workaround, plus a minor cleanup patch.
As well as quite a lot of other changes all over the place, and small
fixes and cleanups as always.
Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy,
Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann,
Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G.
Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur,
David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic
Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva,
Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh
Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal,
Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud,
Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee,
Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann,
Vaibhav Jain, Vasyl Gomonovych"
* tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits)
powerpc/mm/radix: Fix build error when RADIX_MMU=n
macintosh/ams-input: Use true and false for boolean values
macintosh: change some data types from int to bool
powerpc/watchdog: Print the NIP in soft_nmi_interrupt()
powerpc/watchdog: regs can't be null in soft_nmi_interrupt()
powerpc/watchdog: Tweak watchdog printks
powerpc/cell: Remove axonram driver
rtc-opal: Fix handling of firmware error codes, prevent busy loops
powerpc/mpc52xx_gpt: make use of raw_spinlock variants
macintosh/adb: Properly mark continued kernel messages
powerpc/pseries: Fix cpu hotplug crash with memoryless nodes
powerpc/numa: Ensure nodes initialized for hotplug
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/kernel: Block interrupts when updating TIDR
powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
powerpc/mm/nohash: do not flush the entire mm when range is a single page
powerpc/pseries: Add Initialization of VF Bars
powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
powerpc/eeh: Add EEH notify resume sysfs
powerpc/eeh: Add EEH operations to notify resume
...
|
|
|
|
1137573acf |
powerpc: implementation for arch_vma_access_permitted()
This patch provides the implementation for arch_vma_access_permitted(). Returns true if the requested access is allowed by pkey associated with the vma. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
a6590ca55f |
powerpc: Program HPTE key protection bits
Map the PTE protection key bits to the HPTE key protection bits, while creating HPTE entries. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
87bbabbed8 |
powerpc: implementation for arch_override_mprotect_pkey()
arch independent code calls arch_override_mprotect_pkey() to return a pkey that best matches the requested protection. This patch provides the implementation. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
06bb53b338 |
powerpc: store and restore the pkey state across context switches
Store and restore the AMR, IAMR and UAMOR register state of the task before scheduling out and after scheduling in, respectively. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
4fb158f65a |
powerpc: track allocation status of all pkeys
Total 32 keys are available on power7 and above. However pkey 0,1 are reserved. So effectively we have 30 pkeys. On 4K kernels, we do not have 5 bits in the PTE to represent all the keys; we only have 3bits. Two of those keys are reserved; pkey 0 and pkey 1. So effectively we have 6 pkeys. This patch keeps track of reserved keys, allocated keys and keys that are currently free. Also it adds skeletal functions and macros, that the architecture-independent code expects to be available. Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
|
|
|
92e3da3cf1 |
powerpc: initial pkey plumbing
Basic plumbing to initialize the pkey system. Nothing is enabled yet. A later patch will enable it once all the infrastructure is in place. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Rework copyrights to use SPDX tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |