mirror of https://github.com/torvalds/linux.git
607 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
af133562d5 |
swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
Allow a buffer pre-padding of up to alloc_align_mask, even if it requires
allocating additional IO TLB slots.
If the allocation alignment is bigger than IO_TLB_SIZE and min_align_mask
covers any non-zero bits in the original address between IO_TLB_SIZE and
alloc_align_mask, these bits are not preserved in the swiotlb buffer
address.
To fix this case, increase the allocation size and use a larger offset
within the allocated buffer. As a result, extra padding slots may be
allocated before the mapping start address.
Leave orig_addr in these padding slots initialized to INVALID_PHYS_ADDR.
These slots do not correspond to any CPU buffer, so attempts to sync the
data should be ignored.
The padding slots should be automatically released when the buffer is
unmapped. However, swiotlb_tbl_unmap_single() takes only the address of the
DMA buffer slot, not the first padding slot. Save the number of padding
slots in struct io_tlb_slot and use it to adjust the slot index in
swiotlb_release_slots(), so all allocated slots are properly freed.
Fixes: 2fd4fa5d3fb5 ("swiotlb: Fix alignment checks when both allocation and DMA masks are present")
Link: https://lore.kernel.org/linux-iommu/20240311210507.217daf8b@meshulam.tesarici.cz/
Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
864ad046c1 |
dma-mapping fixes for Linux 6.9
This has a set of swiotlb alignment fixes for sometimes very long standing bugs from Will. We've been discussion them for a while and they should be solid now. -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmX/bmILHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOuKQ//cUR3EywszAc04x8dIYsfegFGdQxUeJD0+1elAPss ELiqrlg5A/Yn4uHKpXjWbvJ+v1Ywh3o8+vlgUiG4aFeg4xEd+FsJqm2SDa3jhdMP 2hV8pwB92kpkKCxyCAqx8O/4o4fY++KCFsOtnammEudFjurJaCrRTlauOn6D1t/i JsBYCFtjFIhIPHQe7jmZ6dNiLEfiIJ+q8ImW+UxuB+gOGgU8C4VVW3tHuo3KeU7n yVOcz4yJrQ4xYzG3RKtaU0FE0ybA860xwiA5oPvqpI9A2ISGovv7ik0QCUlHXhff z+iL8Lj/KsOucq5pBDhbRYeN2n4VVogEwb/hut6mgyqj1ESjqeZaLioVHqOTDbmB +vNTVBt6OGTOq1YkNKttK9vBBXs5RdZSBalzBG/QO1ewmrNVVZ7z8fWXVRDipoIl sAIXmI8xAy5TNL6UbJ+RDfYeLlTzHjXGKQGB49gumOA8s4w5P5v9diYegX6GcVZV PKkYLOvprwcyi8Xxx2mNxFDxh+LWqzMYqzwsN7AoRTW4TRc7Tel0G6Axs+V/cL/Y 23IHfFfT2HqDUM5PuBfUcgCrtw1hinuD80xqXVcvaU+AYoQhrGHJFLHkj6lTwV2b hmuul170froI2A/vm8yGGqcn2Me55AexlpMab+UWL+iisGtqFTWi9b9vK/2Vi+Zj wBg= =Xaob -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: "This has a set of swiotlb alignment fixes for sometimes very long standing bugs from Will. We've been discussion them for a while and they should be solid now" * tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE iommu/dma: Force swiotlb_max_mapping_size on an untrusted device swiotlb: Fix alignment checks when both allocation and DMA masks are present swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc() swiotlb: Enforce page alignment in swiotlb_alloc() swiotlb: Fix double-allocation of slots due to broken alignment handling |
|
|
|
902861e34c |
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
|
|
14cebf689a |
swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
For swiotlb allocations >= PAGE_SIZE, the slab search historically
adjusted the stride to avoid checking unaligned slots. This had the
side-effect of aligning large mapping requests to PAGE_SIZE, but that
was broken by
|
|
|
|
51b30ecb73 |
swiotlb: Fix alignment checks when both allocation and DMA masks are present
Nicolin reports that swiotlb buffer allocations fail for an NVME device
behind an IOMMU using 64KiB pages. This is because we end up with a
minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
page (i.e. preserving the 4KiB page offset from the original allocation).
If the original address is not 4KiB-aligned, the allocation will fail
because swiotlb_search_pool_area() erroneously compares these unmasked
bits with the 64KiB-aligned candidate allocation.
Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
reduced based on the required alignment of the allocation.
Fixes:
|
|
|
|
cbf53074a5 |
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
core-api/dma-api-howto.rst states the following properties of
dma_alloc_coherent():
| The CPU virtual address and the DMA address are both guaranteed to
| be aligned to the smallest PAGE_SIZE order which is greater than or
| equal to the requested size.
However, swiotlb_alloc() passes zero for the 'alloc_align_mask'
parameter of swiotlb_find_slots() and so this property is not upheld.
Instead, allocations larger than a page are aligned to PAGE_SIZE,
Calculate the mask corresponding to the page order suitable for holding
the allocation and pass that to swiotlb_find_slots().
Fixes:
|
|
|
|
823353b7cf |
swiotlb: Enforce page alignment in swiotlb_alloc()
When allocating pages from a restricted DMA pool in swiotlb_alloc(), the buffer address is blindly converted to a 'struct page *' that is returned to the caller. In the unlikely event of an allocation bug, page-unaligned addresses are not detected and slots can silently be double-allocated. Add a simple check of the buffer alignment in swiotlb_alloc() to make debugging a little easier if something has gone wonky. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
04867a7a33 |
swiotlb: Fix double-allocation of slots due to broken alignment handling
Commit |
|
|
|
b9fa16949d |
dma-direct: Leak pages on dma_set_decrypted() failure
On TDX it is possible for the untrusted host to cause set_memory_encrypted() or set_memory_decrypted() to fail such that an error is returned and the resulting memory is shared. Callers need to take care to handle these errors to avoid returning decrypted (shared) memory to the page allocator, which could lead to functional or security issues. DMA could free decrypted/shared pages if dma_set_decrypted() fails. This should be a rare case. Just leak the pages in this case instead of freeing them. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
02e7656970 |
swiotlb: add debugfs to track swiotlb transient pool usage
Introduce a new debugfs interface io_tlb_transient_nslabs. The device driver can create a new swiotlb transient memory pool once default memory pool is full. To export the swiotlb transient memory pool usage via debugfs would help the user estimate the size of transient swiotlb memory pool or analyze device driver memory leak issue. Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
fe58582c0e |
mm/cma: drop CONFIG_CMA_DEBUG
All pr_debug() prints in (mm/cma.c) could be enabled via standard Makefile based method. Besides cma_debug_show_areas() should always be called during cma_alloc() failure path. This seemingly redundant config, CONFIG_CMA_DEBUG can be dropped without any problem. [lukas.bulwahn@gmail.com: remove debug code to removed CONFIG_CMA_DEBUG] Link: https://lkml.kernel.org/r/20240207143825.986-1-lukas.bulwahn@gmail.com Link: https://lkml.kernel.org/r/20240205031647.283510-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
17e232b6d2 |
dma-mapping fixes for Linux 6.8
- fix kerneldoc warnings (Randy Dunlap) - better bounds checking in swiotlb (ZhangPeng) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmWo0rYLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYPxwRAAy/TfnQJZQX+VqLmdQ5JsHGoPkYCrdMdcGtM7JxHP 0QJZSCN/FRIRIB/kXXceCLlls79pQUdrfyiIM4ZyWZU6nWdKlUVw3xJZ8VtbSXla W0tKUlAaVwt6HuK+H0GkSs/l1JlU/qWUwrVSXdsAeWT8aQes6OqRbesm/N8vvgJE Do/uyGG4WFQaiSB4vv6M6S32KGzNN+BBZGLEk+K4Et4fRhwmMh2ciGMK8Df8yyDU iKFVPVii26PySCpkoupxFw59pS7LNEn7NCE0hwuCqQt1VE0+8Hyb8BG4sh8U66FM sP9FRqv8HAMofJmn3OGIQXIS3J2/0OPoDAGJUbl0JqMfOUj9B3X2Iu3xFYUVK7MJ C1OG3vsKUJ8Dq3SStTNan2PsOXekfcLzzY5iuUoWysl5YiDmJs7zut22oX43z+k5 lJ91QiEtejat39lAXERZYJ4BVuIoGM46xhJLbNjhdAcoakO4IpdETAEowsmZLxte 6LMGXKUd/swcy0BCuRtoKheAvLfE+OGPd6RA299nsSxfn6T9zGmnl5pvXqrS17yV n3riHqgohcOaUFTaj88aO/y+WunVN5J/bpdbrPeitEnE94DW3Y2ah+5tx5sj+MiW p5qMO2AiHqx0Go1mlpYbk78adtBMogdwHjSdYBX0NfZVt9agTIB3FG6UVCkNWYta ing= =5+TR -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.8-2024-01-18' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix kerneldoc warnings (Randy Dunlap) - better bounds checking in swiotlb (ZhangPeng) * tag 'dma-mapping-6.8-2024-01-18' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: fix kernel-doc warnings swiotlb: check alloc_size before the allocation of a new memory pool |
|
|
|
296455ade1 |
Char/Misc and other Driver changes for 6.8-rc1
Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Lots of stuff in here, but first off, you will get a merge conflict in drivers/android/binder_alloc.c when merging this tree due to changing coming in through the -mm tree. The resolution of the merge issue can be found here: https://lore.kernel.org/r/20231207134213.25631ae9@canb.auug.org.au or in a simpler patch form in that thread: https://lore.kernel.org/r/ZXHzooF07LfQQYiE@google.com If there are issues with the merge of this file, please let me know. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues (other than the binder merge conflict.) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeMMQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynWNgCfQ/Yz7QO6EMLDwHO5LRsb3YMhjL4AoNVdanjP YoI7f1I4GBcC0GKNfK6s =+Kyv -----END PGP SIGNATURE----- Merge tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver updates from Greg KH: "Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits) android: removed duplicate linux/errno uio: Fix use-after-free in uio_open drivers: soc: xilinx: add check for platform firmware: xilinx: Export function to use in other module scripts/tags.sh: remove find_sources scripts/tags.sh: use -n to test archinclude scripts/tags.sh: add local annotation scripts/tags.sh: use more portable -path instead of -wholename scripts/tags.sh: Update comment (addition of gtags) firmware: zynqmp: Convert to platform remove callback returning void firmware: turris-mox-rwtm: Convert to platform remove callback returning void firmware: stratix10-svc: Convert to platform remove callback returning void firmware: stratix10-rsu: Convert to platform remove callback returning void firmware: raspberrypi: Convert to platform remove callback returning void firmware: qemu_fw_cfg: Convert to platform remove callback returning void firmware: mtk-adsp-ipc: Convert to platform remove callback returning void firmware: imx-dsp: Convert to platform remove callback returning void firmware: coreboot_table: Convert to platform remove callback returning void firmware: arm_scpi: Convert to platform remove callback returning void firmware: arm_scmi: Convert to platform remove callback returning void ... |
|
|
|
7c65aa3cc0 |
dma-debug: fix kernel-doc warnings
Update the kernel-doc comments to catch up with the code changes and
fix the kernel-doc warnings:
debug.c:83: warning: Excess struct member 'stacktrace' description in 'dma_debug_entry'
debug.c:83: warning: Function parameter or struct member 'stack_len' not described in 'dma_debug_entry'
debug.c:83: warning: Function parameter or struct member 'stack_entries' not described in 'dma_debug_entry'
Fixes:
|
|
|
|
893e2f9eac |
dma-mapping updates for Linux 6.8
- reduce area lock contention for non-primary IO TLB pools (Petr Tesarik)
- don't store redundant offsets in the dma_ranges stuctures
(Robin Murphy)
- clear dev->dma_mem when freeing per-device pools (Joakim Zhang)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmWcGxQLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNf6hAAi9wP1ehnIqHCczCxpideyJnE76r+LgjInjudQUqE
cnpl7E+dO/e/7Trk+L7hIrzi5uz8m9e+DZgL9wUY4h5mvJ+8ELet3Ec62UMVL1g0
cWSYOtlMdUZn9Oy+qy2TTCa//1HyzcWQdplVwcOqD7zCLO4PavUR18+Vw5eDUpBR
TE8EB+7P8ta8XSFnsryZS4zI1AhTTjZfh8ZgPdp+niBh7XAqOFNn3WiGK4qvA9o6
nIjIV6ydBjZYkyYPeDsqszqmZG64mEeGUZhLWmjAyg1/c8so7uFviNfJ05od34js
aWpmFxrM9Mm4BaBiU3FsSQkMBGCGaD/H2UXjIl1Qayt+pzUfaP7+8UWWo/T7Mj35
RFKe9xzlPY8rqOszdBBvy6lCWguHXw4d4IFoqOz+YoUaxlV+RAbFOCHtW2BNvtPe
b1YCr/FKNQ8NxsJWnbcehDtClY461pqBbaDrio3K7eTJgG10biAoWBfPhV+5VEer
aB14krQcn7v1vXjfLu2huSrPt1ZjXuWVfXA3nO3Mt3VWxZWat82gLkFyt5N6ZfQ4
juaDMX3Vzlz3VPf4MHFC+yFRx55b/9X26lC1BlSoo4tAknoo746Lvy/PasZarILC
sGPt+2BVlPQ466zkjky4GtoNof1TMNuPF0Xr/mNCEjxCYUGQcTIdmOvA9y8mu/V+
7M4=
=v5F+
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.8-2024-01-08' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- reduce area lock contention for non-primary IO TLB pools (Petr
Tesarik)
- don't store redundant offsets in the dma_ranges stuctures (Robin
Murphy)
- clear dev->dma_mem when freeing per-device pools (Joakim Zhang)
* tag 'dma-mapping-6.8-2024-01-08' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: clear dev->dma_mem to NULL after freeing it
swiotlb: reduce area lock contention for non-primary IO TLB pools
dma-mapping: don't store redundant offsets
|
|
|
|
3dc2f20920 |
swiotlb: check alloc_size before the allocation of a new memory pool
The allocation request for swiotlb contiguous memory greater than 128*2KB cannot be fulfilled because it exceeds the maximum contiguous memory limit. If the swiotlb memory we allocate is larger than 128*2KB, swiotlb_find_slots() will still schedule the allocation of a new memory pool, which will increase memory overhead. Fix it by adding a check with alloc_size no more than 128*2KB before scheduling the allocation of a new memory pool in swiotlb_find_slots(). Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
5e0a760b44 |
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit
|
|
|
|
86438841e4 |
dma-debug: make dma_debug_add_bus take a const pointer
The driver core now can handle a const struct bus_type pointer, and the dma_debug_add_bus() call just passes on the pointer give to it to the driver core, so make this pointer const as well to allow everyone to use read-only struct bus_type pointers going forward. Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: <iommu@lists.linux.dev> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/2023121941-dejected-nugget-681e@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
b07bc23476 |
dma-mapping: clear dev->dma_mem to NULL after freeing it
Reproduced with below sequence:
dma_declare_coherent_memory()->dma_release_coherent_memory()
->dma_declare_coherent_memory()->"return -EBUSY" error
It will return -EBUSY from the dma_assign_coherent_memory()
in dma_declare_coherent_memory(), the reason is that dev->dma_mem
pointer has not been set to NULL after it's freed.
Fixes:
|
|
|
|
55c543865b |
swiotlb: reduce area lock contention for non-primary IO TLB pools
If multiple areas and multiple IO TLB pools exist, first iterate the
current CPU specific area in all pools. Then move to the next area index.
This is best illustrated by a diagram:
area 0 | area 1 | ... | area M |
pool 0 A B C
pool 1 D E
...
pool N F G H
Currently, each pool is searched before moving on to the next pool,
i.e. the search order is A, B ... C, D, E ... F, G ... H. With this patch,
each area is searched in all pools before moving on to the next area,
i.e. the search order is A, D ... F, B, E ... G ... C ... H.
Note that preemption is not disabled, and raw_smp_processor_id() may not
return a stable result, but it is called only once to determine the initial
area index. The search will iterate over all areas eventually, even if the
current task is preempted.
Next, some pools may have less (but not more) areas than default_nareas.
Skip such pools if the distance from the initial area index is greater than
pool->nareas. This logic ensures that for every pool the search starts in
the initial CPU's own area and never tries any area twice.
To verify performance impact, I booted the kernel with a minimum pool
size ("swiotlb=512,4,force"), so multiple pools get allocated, and I ran
these benchmarks:
- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.
The "var" column in the tables below is the coefficient of variance over 5
runs of the test, the "diff" column is the relative difference against base
in read-write I/O bandwidth (MiB/s).
Tested on an x86 VM against a QEMU virtio SATA driver backed by a RAM-based
block device on the host:
base patched
var var diff
small 0.69% 0.62% +25.4%
big 2.14% 2.27% +25.7%
4way 2.65% 1.70% +23.6%
Tested on a Raspberry Pi against a class-10 A1 microSD card:
base patched
var var diff
small 0.53% 1.96% -0.3%
big 0.02% 0.57% +0.8%
4way 6.17% 0.40% +0.3%
These results confirm that there is significant performance boost in the
software IO TLB slot allocation itself. Where performance is dominated by
actual hardware, there is no measurable change.
Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Reviewed-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
4ad4c1f394 |
dma-mapping: don't store redundant offsets
A bus_dma_region necessarily stores both CPU and DMA base addresses for a range, so there's no need to also store the difference between them. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
53c87e846e |
swiotlb: fix out-of-bounds TLB allocations with CONFIG_SWIOTLB_DYNAMIC
Limit the free list length to the size of the IO TLB. Transient pool can be
smaller than IO_TLB_SEGSIZE, but the free list is initialized with the
assumption that the total number of slots is a multiple of IO_TLB_SEGSIZE.
As a result, swiotlb_area_find_slots() may allocate slots past the end of
a transient IO TLB buffer.
Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
Closes: https://lore.kernel.org/linux-iommu/104a8c8fedffd1ff8a2890983e2ec1c26bff6810.camel@linux.ibm.com/
Fixes:
|
|
|
|
a409d96009 |
dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM
There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.
E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
{
Name (RBUF, ResourceTemplate ()
{
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0x00000000FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000100000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000006010200000, // Range Minimum
0x000000602FFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x000000001FE00000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x00000060F0000000, // Range Minimum
0x00000060FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000010000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000007000000000, // Range Minimum
0x000003FFFFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000039000000000, // Length
,, , AddressRangeMemory, TypeStatic)
})
But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.
Fix it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
8ae0e97031 |
dma-mapping: move dma_addressing_limited() out of line
This patch moves dma_addressing_limited() out of line, serving as a preliminary step to prevent the introduction of a new publicly accessible low-level helper when validating whether all system RAM is mapped within the DMA mapping range. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
a5e3b12745 |
swiotlb: do not free decrypted pages if dynamic
Fix these two error paths:
1. When set_memory_decrypted() fails, pages may be left fully or partially
decrypted.
2. Decrypted pages may be freed if swiotlb_alloc_tlb() determines that the
physical address is too high.
To fix the first issue, call set_memory_encrypted() on the allocated region
after a failed decryption attempt. If that also fails, leak the pages.
To fix the second issue, check that the TLB physical address is below the
requested limit before decrypting.
Let the caller differentiate between unsuitable physical address (=> retry
from a lower zone) and allocation failures (=> no point in retrying).
Cc: stable@vger.kernel.org
Fixes:
|
|
|
|
009fbfc97b |
dma-mapping updates for Linux 6.7
- get rid of the fake support for coherent DMA allocation on coldfire with
caches (Christoph Hellwig)
- add a few Kconfig dependencies so that Kconfig catches the use of
invalid configurations (Christoph Hellwig)
- fix a type in dma-debug output (Chuck Lever)
- rewrite a comment in swiotlb (Sean Christopherson)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmU/t/8LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNRsA/9GurDhfwje9qOaMIOfrmrB+mppEJ67pi0dgAXKgGN
HpZJwHEJCoM3zrAmvq58tCCI4r8kOjqkfKkPZNHaqSLF+fAPzI7YhSD+Y28GClM4
cutrYovJVGeOTXJMwINMRo/r6n3nBZ4fG17YflGnuZHL27H7+dmaxwXusLvwBTwv
7rFr8WRuqEpnMb7OktHIG9fnsy6oxWNhxBvG8Vu93yiZqprv3xbhI/BaRaOtZM2W
zQA7OqM5YxQCH5gNnfcx25f5bkfkDoxUYh8gDd4JSwTUJz0ZlIL8/ROPJScjpFvh
M3ur/NXdFfaqfDYWzO40wxmF6N0moHLvppOaEzM/tmGvtZBzqKmpNCkVBQCoxNAS
1jwW4kh1ZhoW4RbPEKX6kcfJjn97o+RE9pY5t956a9sDd3DBqPNaPIOqlwmeB8Sd
bh2ekwuNmxwsZXqTv5c5vvN4a95RNhZMvS2ma9o6lnsLTaeog7x4mnU0cf69tQuT
850JexGcM0fzD2nMqrmfyyLgUjPN6k+Z71Ay6FiTWjnK4mLRN8zmVgF8tXtQuexH
4HAJ70LJ2OxfEkW5nD3yUc2S/RwyVR6HeGG9bciYQbob3hqb4glzALNpB9C02Cf1
/iOwAMdUgsj4MYaeOOwk2u3+ZMsuF3DlaoJ+8Gr/M60C92SCkMAYIYJDh61b6qk1
i04=
=D8Eq
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.7-2023-10-30' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- get rid of the fake support for coherent DMA allocation on coldfire
with caches (Christoph Hellwig)
- add a few Kconfig dependencies so that Kconfig catches the use of
invalid configurations (Christoph Hellwig)
- fix a type in dma-debug output (Chuck Lever)
- rewrite a comment in swiotlb (Sean Christopherson)
* tag 'dma-mapping-6.7-2023-10-30' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: Fix a typo in a debugging eye-catcher
swiotlb: rewrite comment explaining why the source is preserved on DMA_FROM_DEVICE
m68k: remove unused includes from dma.c
m68k: don't provide arch_dma_alloc for nommu/coldfire
net: fec: use dma_alloc_noncoherent for data cache enabled coldfire
m68k: use the coherent DMA code for coldfire without data cache
dma-direct: warn when coherent allocations aren't supported
dma-direct: simplify the use atomic pool logic in dma_direct_alloc
dma-direct: add a CONFIG_ARCH_HAS_DMA_ALLOC symbol
dma-direct: add dependencies to CONFIG_DMA_GLOBAL_POOL
|
|
|
|
d5090484b0 |
swiotlb: do not try to allocate a TLB bigger than MAX_ORDER pages
When allocating a new pool at runtime, reduce the number of slabs so
that the allocation order is at most MAX_ORDER. This avoids a kernel
warning in __alloc_pages().
The warning is relatively benign, because the pool size is subsequently
reduced when allocation fails, but it is silly to start with a request
that is known to fail, especially since this is the default behavior if
the kernel is built with CONFIG_SWIOTLB_DYNAMIC=y and booted without any
swiotlb= parameter.
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/4f173dd2-324a-0240-ff8d-abf5c191be18@candelatech.com/
Fixes:
|
|
|
|
36d91e8515 |
dma-debug: Fix a typo in a debugging eye-catcher
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
1132a1dc05 |
swiotlb: rewrite comment explaining why the source is preserved on DMA_FROM_DEVICE
Rewrite the comment explaining why swiotlb copies the original buffer to the TLB buffer before initiating DMA *from* the device, i.e. before the device DMAs into the TLB buffer. The existing comment's argument that preserving the original data can prevent a kernel memory leak is bogus. If the driver that triggered the mapping _knows_ that the device will overwrite the entire mapping, or the driver will consume only the written parts, then copying from the original memory is completely pointless. If neither of the above holds true, then copying from the original adds value only if preserving the data is necessary for functional correctness, or the driver explicitly initialized the original memory. If the driver didn't initialize the memory, then copying the original buffer to the TLB buffer simply changes what kernel data is leaked to user space. Writing the entire TLB buffer _does_ prevent leaking stale TLB buffer data from a previous bounce, but that can be achieved by simply zeroing the TLB buffer when grabbing a slot. The real reason swiotlb ended up initializing the TLB buffer with the original buffer is that it's necessary to make swiotlb operate as transparently as possible, i.e. to behave as closely as possible to hardware, and to avoid corrupting the original buffer, e.g. if the driver knows the device will do partial writes and is relying on the unwritten data to be preserved. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/all/ZN5elYQ5szQndN8n@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
63f067e33c |
dma-direct: warn when coherent allocations aren't supported
Log a warning once when dma_alloc_coherent fails because the platform does not support coherent allocations at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Greg Ungerer <gerg@linux-m68k.org> |
|
|
|
b1da46d70e |
dma-direct: simplify the use atomic pool logic in dma_direct_alloc
The logic in dma_direct_alloc when to use the atomic pool vs remapping grew a bit unreadable. Consolidate it into a single check, and clean up the set_uncached vs remap logic a bit as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Greg Ungerer <gerg@linux-m68k.org> |
|
|
|
2c8ed1b960 |
dma-direct: add a CONFIG_ARCH_HAS_DMA_ALLOC symbol
Instead of using arch_dma_alloc if none of the generic coherent allocators are used, require the architectures to explicitly opt into providing it. This will used to deal with the case of m68knommu and coldfire where we can't do any coherent allocations whatsoever, and also makes it clear that arch_dma_alloc is a last resort. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Greg Ungerer <gerg@linux-m68k.org> |
|
|
|
da323d4640 |
dma-direct: add dependencies to CONFIG_DMA_GLOBAL_POOL
CONFIG_DMA_GLOBAL_POOL can't be combined with other DMA coherent allocators. Add dependencies to Kconfig to document this, and make kconfig complain about unmet dependencies if someone tries. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Greg Ungerer <gerg@linux-m68k.org> |
|
|
|
2d5780bbef |
swiotlb: fix the check whether a device has used software IO TLB
When CONFIG_SWIOTLB_DYNAMIC=y, devices which do not use the software IO TLB can avoid swiotlb lookup. A flag is added by commit |
|
|
|
a6a241764f |
swiotlb: use the calculated number of areas
Commit |
|
|
|
f875db4f20 |
Revert "dma-contiguous: check for memory region overlap"
This reverts commit
|
|
|
|
765aa6b3a4 |
dma-pool: remove a __maybe_unused label in atomic_pool_expand
Move the #endif a line so that free_page label is only seen by the compile pass when actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chunhui He <hchunhui@mail.ustc.edu.cn> Reviewed-by: Robin Murphy <roin.murphy@arm.com> |
|
|
|
2dcdf8c18d |
dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
It makes no sense to expose CONFIG_DMA_NUMA_CMA if CONFIG_NUMA is not enabled, and random config options shouldn't be default unless there is a good reason. Replace the default NUMA with a depends on to fix both issues. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <roin.murphy@arm.com> |
|
|
|
fb5a431559 |
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
__dma_entry_alloc_check_leak() calls into printk -> serial console
output (qcom geni) and grabs port->lock under free_entries_lock
spin lock, which is a reverse locking dependency chain as qcom_geni
IRQ handler can call into dma-debug code and grab free_entries_lock
under port->lock.
Move __dma_entry_alloc_check_leak() call out of free_entries_lock
scope so that we don't acquire serial console's port->lock under it.
Trimmed-down lockdep splat:
The existing dependency chain (in reverse order) is:
-> #2 (free_entries_lock){-.-.}-{2:2}:
_raw_spin_lock_irqsave+0x60/0x80
dma_entry_alloc+0x38/0x110
debug_dma_map_page+0x60/0xf8
dma_map_page_attrs+0x1e0/0x230
dma_map_single_attrs.constprop.0+0x6c/0xc8
geni_se_rx_dma_prep+0x40/0xcc
qcom_geni_serial_isr+0x310/0x510
__handle_irq_event_percpu+0x110/0x244
handle_irq_event_percpu+0x20/0x54
handle_irq_event+0x50/0x88
handle_fasteoi_irq+0xa4/0xcc
handle_irq_desc+0x28/0x40
generic_handle_domain_irq+0x24/0x30
gic_handle_irq+0xc4/0x148
do_interrupt_handler+0xa4/0xb0
el1_interrupt+0x34/0x64
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
arch_local_irq_enable+0x4/0x8
____do_softirq+0x18/0x24
...
-> #1 (&port_lock_key){-.-.}-{2:2}:
_raw_spin_lock_irqsave+0x60/0x80
qcom_geni_serial_console_write+0x184/0x1dc
console_flush_all+0x344/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
register_console+0x230/0x38c
uart_add_one_port+0x338/0x494
qcom_geni_serial_probe+0x390/0x424
platform_probe+0x70/0xc0
really_probe+0x148/0x280
__driver_probe_device+0xfc/0x114
driver_probe_device+0x44/0x100
__device_attach_driver+0x64/0xdc
bus_for_each_drv+0xb0/0xd8
__device_attach+0xe4/0x140
device_initial_probe+0x1c/0x28
bus_probe_device+0x44/0xb0
device_add+0x538/0x668
of_device_add+0x44/0x50
of_platform_device_create_pdata+0x94/0xc8
of_platform_bus_create+0x270/0x304
of_platform_populate+0xac/0xc4
devm_of_platform_populate+0x60/0xac
geni_se_probe+0x154/0x160
platform_probe+0x70/0xc0
...
-> #0 (console_owner){-...}-{0:0}:
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
console_flush_all+0x330/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
dma_entry_alloc+0xb4/0x110
debug_dma_map_sg+0xdc/0x2f8
__dma_map_sg_attrs+0xac/0xe4
dma_map_sgtable+0x30/0x4c
get_pages+0x1d4/0x1e4 [msm]
msm_gem_pin_pages_locked+0x38/0xac [msm]
msm_gem_pin_vma_locked+0x58/0x88 [msm]
msm_ioctl_gem_submit+0xde4/0x13ac [msm]
drm_ioctl_kernel+0xe0/0x15c
drm_ioctl+0x2e8/0x3f4
vfs_ioctl+0x30/0x50
...
Chain exists of:
console_owner --> &port_lock_key --> free_entries_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(free_entries_lock);
lock(&port_lock_key);
lock(free_entries_lock);
lock(console_owner);
*** DEADLOCK ***
Call trace:
dump_backtrace+0xb4/0xf0
show_stack+0x20/0x30
dump_stack_lvl+0x60/0x84
dump_stack+0x18/0x24
print_circular_bug+0x1cc/0x234
check_noncircular+0x78/0xac
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
console_flush_all+0x330/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
dma_entry_alloc+0xb4/0x110
debug_dma_map_sg+0xdc/0x2f8
__dma_map_sg_attrs+0xac/0xe4
dma_map_sgtable+0x30/0x4c
get_pages+0x1d4/0x1e4 [msm]
msm_gem_pin_pages_locked+0x38/0xac [msm]
msm_gem_pin_vma_locked+0x58/0x88 [msm]
msm_ioctl_gem_submit+0xde4/0x13ac [msm]
drm_ioctl_kernel+0xe0/0x15c
drm_ioctl+0x2e8/0x3f4
vfs_ioctl+0x30/0x50
...
Reported-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
d069ed288a |
swiotlb: optimize get_max_slots()
Use a simple logical shift and increment to calculate the number of slots taken by the DMA segment boundary. At least GCC-13 is not able to optimize the expression, producing this horrible assembly code on x86: cmpq $-1, %rcx je .L364 addq $2048, %rcx shrq $11, %rcx movq %rcx, %r13 .L331: // rest of the function here... // after function epilogue and return: .L364: movabsq $9007199254740992, %r13 jmp .L331 After the optimization, the code looks more reasonable: shrq $11, %r11 leaq 1(%r11), %rbx Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
f94cb36e76 |
swiotlb: move slot allocation explanation comment where it belongs
Move the comment down in front of the loop that actually sets the list
member of struct io_tlb_slot to zero.
Fixes:
|
|
|
|
1395706a14 |
swiotlb: search the software IO TLB only if the device makes use of it
Skip searching the software IO TLB if a device has never used it, making sure these devices are not affected by the introduction of multiple IO TLB memory pools. Additional memory barrier is required to ensure that the new value of the flag is visible to other CPUs after mapping a new bounce buffer. For efficiency, the flag check should be inlined, and then the memory barrier must be moved to is_swiotlb_buffer(). However, it can replace the existing barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer() first to verify that the buffer address belongs to the software IO TLB. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
1aaa736815 |
swiotlb: allocate a new memory pool when existing pools are full
When swiotlb_find_slots() cannot find suitable slots, schedule the allocation of a new memory pool. It is not possible to allocate the pool immediately, because this code may run in interrupt context, which is not suitable for large memory allocations. This means that the memory pool will be available too late for the currently requested mapping, but the stress on the software IO TLB allocator is likely to continue, and subsequent allocations will benefit from the additional pool eventually. Keep all memory pools for an allocator in an RCU list to avoid locking on the read side. For modifications, add a new spinlock to struct io_tlb_mem. The spinlock also protects updates to the total number of slabs (nslabs in struct io_tlb_mem), but not reads of the value. Readers may therefore encounter a stale value, but this is not an issue: - swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero value. This is ensured by the existence of the default memory pool, allocated at boot. - The exact value is used only for non-critical purposes (debugfs, kernel messages). Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
ad96ce3252 |
swiotlb: determine potential physical address limit
The value returned by default_swiotlb_limit() should be constant, because it is used to decide whether DMA can be used. To allow allocating memory pools on the fly, use the maximum possible physical address rather than the highest address used by the default pool. For swiotlb_init_remap(), this is either an arch-specific limit used by memblock_alloc_low(), or the highest directly mapped physical address if the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the highest address is determined by the GFP flags. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
79636caad3 |
swiotlb: if swiotlb is full, fall back to a transient memory pool
Try to allocate a transient memory pool if no suitable slots can be found and the respective SWIOTLB is allowed to grow. The transient pool is just enough big for this one bounce buffer. It is inserted into a per-device list of transient memory pools, and it is freed again when the bounce buffer is unmapped. Transient memory pools are kept in an RCU list. A memory barrier is required after adding a new entry, because any address within a transient buffer must be immediately recognized as belonging to the SWIOTLB, even if it is passed to another CPU. Deletion does not require any synchronization beyond RCU ordering guarantees. After a buffer is unmapped, its physical addresses may no longer be passed to the DMA API, so the memory range of the corresponding stale entry in the RCU list never matches. If the memory range gets allocated again, then it happens only after a RCU quiescent state. Since bounce buffers can now be allocated from different pools, add a parameter to swiotlb_alloc_pool() to let the caller know which memory pool is used. Add swiotlb_find_pool() to find the memory pool corresponding to an address. This function is now also used by is_swiotlb_buffer(), because a simple boundary check is no longer sufficient. The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(), simplified and enhanced to use coherent memory pools if needed. Note that this is not the most efficient way to provide a bounce buffer, but when a DMA buffer can't be mapped, something may (and will) actually break. At that point it is better to make an allocation, even if it may be an expensive operation. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
62708b2ba4 |
swiotlb: add a flag whether SWIOTLB is allowed to grow
Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic allocation of additional bounce buffers. If this option is set, mark the default SWIOTLB as able to grow and restricted DMA pools as unable. However, if the address of the default memory pool is explicitly queried, make the default SWIOTLB also unable to grow. This is currently used to set up PCI BAR movable regions on some Octeon MIPS boards which may not be able to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup() for more details. If a remap function is specified, it must be also called on any dynamically allocated pools, but there are some issues: - The remap function may block, so it should not be called from an atomic context. - There is no corresponding unremap() function if the memory pool is freed. - The only in-tree implementation (xen_swiotlb_fixup) requires that the number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE. Keep it simple for now and disable growing the SWIOTLB if a remap function was specified. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
158dbe9c9a |
swiotlb: separate memory pool data from other allocator data
Carve out memory pool specific fields from struct io_tlb_mem. The original struct now contains shared data for the whole allocator, while the new struct io_tlb_pool contains data that is specific to one memory pool of (potentially) many. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
fea18777a7 |
swiotlb: add documentation and rename swiotlb_do_find_slots()
Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
|
|
|
|
05ee774122 |
swiotlb: make io_tlb_default_mem local to swiotlb.c
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code. To avoid breaking existing users, provide helper functions for the few required fields. As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
0c6874a6ac |
swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
If swiotlb is allocated, immediately return 0, so callers do not have to check io_tlb_default_mem.nslabs explicitly. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
3fa6456ebe |
dma-contiguous: check for memory region overlap
In the process of parsing the DTS, check whether the memory region specified by the DTS CMA node area overlaps with the kernel text memory space reserved by memblock before calling early_init_fdt_scan_reserved_mem. Signed-off-by: Binglei Wang <l3b2w1@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
bf29bfaa54 |
dma-contiguous: support numa CMA for specified node
The kernel parameter 'cma_pernuma=' only supports reserving the same size of CMA area for each node. We need to reserve different sizes of CMA area for specified nodes if these devices belong to different nodes. Adding another kernel parameter 'numa_cma=' to reserve CMA area for the specified node. If we want to use one of these parameters, we need to enable DMA_NUMA_CMA. At the same time, print the node id in cma_declare_contiguous_nid() if CONFIG_NUMA is enabled. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
22e4a348f8 |
dma-contiguous: support per-numa CMA for all architectures
In the commit
|
|
|
|
3d6f126b15 |
dma-mapping: move arch_dma_set_mask() declaration to header
This function has a __weak definition and an override that is only used on freescale powerpc chips. The powerpc definition however does not see the declaration that is in a .c file: arch/powerpc/kernel/dma-mask.c:7:6: error: no previous prototype for 'arch_dma_set_mask' [-Werror=missing-prototypes] Move it into the linux/dma-map-ops.h header where the other arch_dma_* functions are declared. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
42e584a985 |
swiotlb: unexport is_swiotlb_active
Drivers have no business looking at dma-mapping or swiotlb internals. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> |
|
|
|
f71f64210d |
dma-mapping fixes for Linux 6.5
- swiotlb area sizing fixes (Petr Tesarik) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmSq57sLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOh0xAAkwklIxxzXvNlgjvy2hdgPWImPS8tGPDSIsqA9TDD WDZq89nz/ndnchdPObDvyJXmfBgqa0qCHqopBVPqMKv5a1pKZhrRXYlbajGQQwji MIqefTLZ/VGw7bDpEivOt+yadwQ3KxuxWWs7/JPKLLReSJ22H8P+jkrK7P7kBL5X YaMtZG9d86fvFHnQHKdAOlF1iCvnoZDHPcvaZbI6m5mMSZ+HIYqK5pP1MTUEAbIU MX4ZSI7/mL0q67+kZuM/NCSeq1pH0Cd0D2DGm+8k/y87G81GS6E5Wgg+xO7vYiXf 5ChuwlAO9K9KhH7NIRkKhkad/Ii89ENXSyv5gmPRoKYK5FXajnHSlJTUrZojV6XC Pbsd9ATVzV0rY61EPyh6G1a+Ttp/pwMp+W0I2fi032GVAePQ/vhB9x9O+2O/3QiC v80nUSatkciZncWqkguhp3NONsRmLKep3CCQnEAA/gLs27B0avdQeslnqbOOUQKd Si+djIoel8ONjQ+mW8eFCsVYMH1xFSo0aGsgGe0y2cyBE3DN1AW9eRnOXWT4C1PR UyDlx8ACs87ojec+YRQFYk2/PbsU7CQiH1pteXvBHcbhiVUAvrtXtg6ANQ+7066P IIduAZmlHcHb1BhyrSQbAtRllVLIp/l9IAkCSY9SvL0tjh/B5CaRBD5m2Taow5I/ KUI= =4Lfc -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - swiotlb area sizing fixes (Petr Tesarik) * tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: reduce the number of areas to match actual memory pool size swiotlb: always set the number of areas before allocating the pool |
|
|
|
1e6d5dea34 |
dma-mapping uodates for Linux 6.5
- swiotlb cleanups (Petr Tesarik) - use kvmalloc_array (gaoxu) - a small step towards removing is_swiotlb_active (Christoph Hellwig) - fix a Kconfig typo Sui Jingfeng) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmScHTELHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYOoGhAAuKdcWAumhEuKXu/vuA4/xH5Kvn2uhrAVs/RSKKh7 Wbh2WXh+9GOQs4dwpYI5/chRmCJgyC2ics/E7dny2ovJumM/RGOR9zff/+zJk5yh 3tpiZ4VW/2x/9qmoRpdQVTaj1uB2Ug8wpztUohyeEYPpjLX0/Vq/zTItAyvQs4kO vskg2u9EGwzgV4qmHWq2fjowljHS1sMZq2URKGX8JDHrwOh0+uCke626ze/WznSQ 5quWW4mE0IYq/nldIMTvJxUx6273zBMm+yoZSfFY6K0VrP3WDE5endP5oYR7d251 zNRPyWqW78HQT0z/gntaPOKDB0t7YxsWVLwam98Yv8wl1wLJQYe5sRoL7qXL0yBz cu4uTOGU6Kvw3ZHWQOk2qkGxYAxDLnaFejz8ZDCycWQA7T2+TNdaVXNnl/vE+aRN 25mnT0JtClfZOvHihhHKaiRXMKE0WgPy4vBZBT0SAUVuUR7JwtYDv/5giGf6Bi3x QwNeXAmFD75VELJoqX1KL/UiJh8FzyWZMI7l7XmDVy9lJsA1P3PO+npI8SOmA+e8 q4GSD36BxDoDE06Hg/BQcRqPohDCEsrRgfvOaa7xld9SCwe1lGaN3kDsyv6HoBzX NRS3aishdqHf21dNctjX9f+II+30NFkPbAa64B8L4tP5aQLy3bgjNLweLP9fhKgl KQs= =yYCS -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - swiotlb cleanups (Petr Tesarik) - use kvmalloc_array (gaoxu) - a small step towards removing is_swiotlb_active (Christoph Hellwig) - fix a Kconfig typo Sui Jingfeng) * tag 'dma-mapping-6.5-2023-06-28' of git://git.infradead.org/users/hch/dma-mapping: drm/nouveau: stop using is_swiotlb_active swiotlb: use the atomic counter of total used slabs if available swiotlb: remove unused field "used" from struct io_tlb_mem dma-remap: use kvmalloc_array/kvfree for larger dma memory remap dma-mapping: fix a Kconfig typo |
|
|
|
8ac0406335 |
swiotlb: reduce the number of areas to match actual memory pool size
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes:
|
|
|
|
aabd12609f |
swiotlb: always set the number of areas before allocating the pool
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes:
|
|
|
|
370645f41e |
dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned
For direct DMA, if the size is small enough to have originated from a kmalloc() cache below ARCH_DMA_MINALIGN, check its alignment against dma_get_cache_alignment() and bounce if necessary. For larger sizes, it is the responsibility of the DMA API caller to ensure proper alignment. At this point, the kmalloc() caches are properly aligned but this will change in a subsequent patch. Architectures can opt in by selecting DMA_BOUNCE_UNALIGNED_KMALLOC. Link: https://lkml.kernel.org/r/20230612153201.554742-15-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
cb147bbe22 |
dma-mapping: name SG DMA flag helpers consistently
sg_is_dma_bus_address() is inconsistent with the naming pattern of its corresponding setters and its own kerneldoc, so take the majority vote and rename it sg_dma_is_bus_address() (and fix up the missing underscores in the kerneldoc too). This gives us a nice clear pattern where SG DMA flags are SG_DMA_<NAME>, and the helpers for acting on them are sg_dma_<action>_<name>(). Link: https://lkml.kernel.org/r/20230612153201.554742-14-catalin.marinas@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Link: https://lore.kernel.org/r/fa2eca2862c7ffc41b50337abffb2dfd2864d3ea.1685036694.git.robin.murphy@arm.com Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
af2880ec44 |
scatterlist: add dedicated config for DMA flags
The DMA flags field will be useful for users beyond PCI P2P, so upgrade to its own dedicated config option. [catalin.marinas@arm.com: use #ifdef CONFIG_NEED_SG_DMA_FLAGS in scatterlist.h] [catalin.marinas@arm.com: update PCI_P2PDMA dma_flags comment in scatterlist.h] Link: https://lkml.kernel.org/r/20230612153201.554742-13-catalin.marinas@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
693405cf11 |
swiotlb: use the atomic counter of total used slabs if available
If DEBUG_FS is enabled, the cost of keeping an exact number of total used slabs is already paid. In this case, there is no reason to use an inexact number for statistics and kernel messages. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
51ff97d54f |
dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
If dma_direct_alloc() alloc memory in size of 64MB, the inner function dma_common_contiguous_remap() will allocate 128KB memory by invoking the function kmalloc_array(). and the kmalloc_array seems to fail to try to allocate 128KB mem. Call trace: [14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0 [14977.928638] dump_backtrace.cfi_jt+0x0/0x8 [14977.928647] dump_stack_lvl+0x80/0xb8 [14977.928652] warn_alloc+0x164/0x200 [14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c [14977.928660] __alloc_pages+0x21c/0x39c [14977.928662] kmalloc_order+0x48/0x108 [14977.928666] kmalloc_order_trace+0x34/0x154 [14977.928668] __kmalloc+0x548/0x7e4 [14977.928673] dma_direct_alloc+0x11c/0x4f8 [14977.928678] dma_alloc_attrs+0xf4/0x138 [14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah] [14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah] [14977.928705] __arm64_sys_ioctl+0x184/0x210 work around by doing kvmalloc_array instead. Signed-off-by: Gao Xu <gaoxu2@hihonor.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
4d3af20eaf |
dma-mapping: fix a Kconfig typo
'thing' -> 'think' Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
b28e6315a0 |
dma-mapping updates for Linux 6.4
- fix a PageHighMem check in dma-coherent initialization (Doug Berger)
- clean up the coherency defaul initialiation (Jiaxun Yang)
- add cacheline to user/kernel dma-debug space dump messages
(Desnes Nunes, Geert Uytterhoeve)
- swiotlb statistics improvements (Michael Kelley)
- misc cleanups (Petr Tesarik)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmRLYsoLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYP4+RAAwpIqI198CrPxodCuBdwetuxznwncdwFvU3W+NQLF
cC5gDeUB2ZZevVh3moKITV7gXHrbTJF7jQs9jpWV0QEA5APzu0WDf3Y0m4sXPVpn
E9jS3jGJyntZ9rIMzHFs/lguI37xzT1YRAHAYgoZ84b7K/9g94NgEE2HecfNKVqZ
D6PN0UJcA4KQo+5UJ7MWiQxWM3QAwVfSKsP1mXv51tiRGo4UUzNW77Ej2nKRJjhK
wDNiZ+08khfeS2BuF9J2ebAzpgma5EgweH2z7zmx8Ch5t4Cx6hVAQ4Z6axbZMGjP
HxXPw5rIwZTnQYoaGU86BrxrFH2j2bb963kWoDzliH+4PQrJ/iIEpkF7vu5Y2oWr
WtXdOo6CsdQh1rT1UWA87ZYDtkWgj3/ITv5xJrXf8VyD9WHHSPst616XHLzBLGzo
Hc+lAPhnVm59XZhQbVgXZy37Eqa9qHEG6GIRUkwD13nttSSfLfizO0IlXlH+awQV
2A+TjbAt2lneUaRzMPfxG/yFt3rPqbBfSWj3o2ClPPn9sKksKxj7IjNW0v81Ztq/
H6UmYRuq+wlQJzlwiF8+6SzoBXObztrmtIa2ipiM5k+xePG1jsPGFLm98UMlPcxN
5IMz78DQ/hE3K3fKRt6clImd98xq5R0H9iUQPor2I7C/67fpTjThDRdHDUina1tk
Oxo=
=vAit
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- fix a PageHighMem check in dma-coherent initialization (Doug Berger)
- clean up the coherency defaul initialiation (Jiaxun Yang)
- add cacheline to user/kernel dma-debug space dump messages (Desnes
Nunes, Geert Uytterhoeve)
- swiotlb statistics improvements (Michael Kelley)
- misc cleanups (Petr Tesarik)
* tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS
swiotlb: track and report io_tlb_used high water marks in debugfs
swiotlb: fix debugfs reporting of reserved memory pools
swiotlb: relocate PageHighMem test away from rmem_swiotlb_setup
of: address: always use dma_default_coherent for default coherency
dma-mapping: provide CONFIG_ARCH_DMA_DEFAULT_COHERENT
dma-mapping: provide a fallback dma_default_coherent
dma-debug: Use %pa to format phys_addr_t
dma-debug: add cacheline to user/kernel space dump messages
dma-debug: small dma_debug_entry's comment and variable name updates
dma-direct: cleanup parameters to dma_direct_optimal_gfp_mask
|
|
|
|
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
|
|
|
|
da46b58ff8 |
hyperv-next for v6.4
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmRHJSgTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXjSOCAClsmFmyP320yAB74vQer5cSzxbIpFW 3qt/P3D8zABn0UxjjmD8+LTHuyB+72KANU6qQ9No6zdYs8yaA1vGX8j8UglWWHuj fmaAD4DuZl+V+fmqDgHukgaPlhakmW0m5tJkR+TW3kCgnyrtvSWpXPoxUAe6CLvj Kb/SPl6ylHRWlIAEZ51gy0Ipqxjvs5vR/h9CWpTmRMuZvxdWUro2Cm82wJgzXPqq 3eLbAzB29kLFEIIUpba9a/rif1yrWgVFlfpuENFZ+HUYuR78wrPB9evhwuPvhXd2 +f+Wk0IXORAJo8h7aaMMIr6bd4Lyn98GPgmS5YSe92HRIqjBvtYs3Dq8 =F6+n -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - PCI passthrough for Hyper-V confidential VMs (Michael Kelley) - Hyper-V VTL mode support (Saurabh Sengar) - Move panic report initialization code earlier (Long Li) - Various improvements and bug fixes (Dexuan Cui and Michael Kelley) * tag 'hyperv-next-signed-20230424' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (22 commits) PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg Drivers: hv: move panic report code from vmbus to hv early init code x86/hyperv: VTL support for Hyper-V Drivers: hv: Kconfig: Add HYPERV_VTL_MODE x86/hyperv: Make hv_get_nmi_reason public x86/hyperv: Add VTL specific structs and hypercalls x86/init: Make get/set_rtc_noop() public x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushes x86/hyperv: Add callback filter to cpumask_to_vpset() Drivers: hv: vmbus: Remove the per-CPU post_msg_page clocksource: hyper-v: make sure Invariant-TSC is used if it is available PCI: hv: Enable PCI pass-thru devices in Confidential VMs Drivers: hv: Don't remap addresses that are above shared_gpa_boundary hv_netvsc: Remove second mapping of send and recv buffers Drivers: hv: vmbus: Remove second way of mapping ring buffers Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages swiotlb: Remove bounce buffer remapping for Hyper-V Driver: VMBus: Add Devicetree support dt-bindings: bus: Add Hyper-V VMBus Drivers: hv: vmbus: Convert acpi_device to more generic platform_device ... |
|
|
|
b6a7828502 |
modules-6.4-rc1
The summary of the changes for this pull requests is:
* Song Liu's new struct module_memory replacement
* Nick Alcock's MODULE_LICENSE() removal for non-modules
* My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded
prior to allocating the final module memory with vmalloc and the
respective debug code it introduces to help clarify the issue. Although
the functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to have
been picked up. Folks on larger CPU systems with modules will want to
just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details
on this pull request.
The functional change change in this pull request is the very first
patch from Song Liu which replaces the struct module_layout with a new
struct module memory. The old data structure tried to put together all
types of supported module memory types in one data structure, the new
one abstracts the differences in memory types in a module to allow each
one to provide their own set of details. This paves the way in the
future so we can deal with them in a cleaner way. If you look at changes
they also provide a nice cleanup of how we handle these different memory
areas in a module. This change has been in linux-next since before the
merge window opened for v6.3 so to provide more than a full kernel cycle
of testing. It's a good thing as quite a bit of fixes have been found
for it.
Jason Baron then made dynamic debug a first class citizen module user by
using module notifier callbacks to allocate / remove module specific
dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area
is active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
|
|
|
|
ec274aff21 |
swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS
The tracking of used_hiwater adds an atomic operation to the hot path. This is acceptable only when debugging the kernel. To make sure that the fields can never be used by mistake, do not even include them in struct io_tlb_mem if CONFIG_DEBUG_FS is not set. The build fails after doing that. To fix it, it is necessary to remove all code specific to debugfs and instead provide a stub implementation of swiotlb_create_debugfs_files(). As a bonus, this change allows to remove one __maybe_unused attribute. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
0459ff4873 |
swiotlb: Remove bounce buffer remapping for Hyper-V
With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove swiotlb_unencrypted_base and the associated code. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1679838727-87310-8-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> |
|
|
|
8b0977ecc8 |
swiotlb: track and report io_tlb_used high water marks in debugfs
swiotlb currently reports the total number of slabs and the instantaneous in-use slabs in debugfs. But with increased usage of swiotlb for all I/O in Confidential Computing (coco) VMs, it has become difficult to know how much memory to allocate for swiotlb bounce buffers, either via the automatic algorithm in the kernel or by specifying a value on the kernel boot line. The current automatic algorithm generously allocates swiotlb bounce buffer memory, and may be wasting significant memory in many use cases. To support better understanding of swiotlb usage, add tracking of the the high water mark for usage of the default swiotlb bounce buffer memory pool and any reserved memory pools. Report these high water marks in debugfs along with the other swiotlb pool metrics. Allow the high water marks to be reset to zero at runtime by writing to them. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
5499d01c02 |
swiotlb: fix debugfs reporting of reserved memory pools
For io_tlb_nslabs, the debugfs code reports the correct value for a
specific reserved memory pool. But for io_tlb_used, the value reported
is always for the default pool, not the specific reserved pool. Fix this.
Fixes:
|
|
|
|
a90922fa25 |
swiotlb: relocate PageHighMem test away from rmem_swiotlb_setup
The reservedmem_of_init_fn's are invoked very early at boot before the
memory zones have even been defined. This makes it inappropriate to test
whether the page corresponding to a PFN is in ZONE_HIGHMEM from within
one.
Removing the check allows an ARM 32-bit kernel with SPARSEMEM enabled to
boot properly since otherwise we would be de-referencing an
uninitialized sparsemem map to perform pfn_to_page() check.
The arm64 architecture happens to work (and also has no high memory) but
other 32-bit architectures could also be having similar issues.
While it would be nice to provide early feedback about a reserved DMA
pool residing in highmem, it is not possible to do that until the first
time we try to use it, which is where the check is moved to.
Fixes:
|
|
|
|
114da4b026 |
dma-mapping: benchmark: remove MODULE_LICENSE in non-modules
Since commit
|
|
|
|
1d3f56b295 |
dma-mapping: provide CONFIG_ARCH_DMA_DEFAULT_COHERENT
Provide a kconfig option to allow arches to manipulate default value of dma_default_coherent in Kconfig. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
fe4e5efa40 |
dma-mapping: provide a fallback dma_default_coherent
dma_default_coherent was decleared unconditionally at kernel/dma/mapping.c but only decleared when any of non-coherent options is enabled in dma-map-ops.h. Guard the declaration in mapping.c with non-coherent options and provide a fallback definition. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
bbb73a103f |
swiotlb: fix a braino in the alignment check fix
The alignment mask in swiotlb_do_find_slots() masks off the high
bits which are not relevant for the alignment, so multiple
requirements are combined with a bitwise OR rather than AND.
In plain English, the stricter the alignment, the more bits must
be set in iotlb_align_mask.
Confusion may arise from the fact that the same variable is also
used to mask off the offset within a swiotlb slot, which is
achieved with a bitwise AND.
Fixes:
|
|
|
|
23baf831a3 |
mm, treewide: redefine MAX_ORDER sanely
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
b31507dcaf |
dma-debug: Use %pa to format phys_addr_t
On 32-bit without LPAE:
kernel/dma/debug.c: In function ‘debug_dma_dump_mappings’:
kernel/dma/debug.c:537:7: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 9 has type ‘phys_addr_t’ {aka ‘unsigned int’} [-Wformat=]
kernel/dma/debug.c: In function ‘dump_show’:
kernel/dma/debug.c:568:59: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 11 has type ‘phys_addr_t’ {aka ‘unsigned int’} [-Wformat=]
Fixes:
|
|
|
|
bd89d69a52 |
dma-debug: add cacheline to user/kernel space dump messages
Having the cacheline also printed on the debug_dma_dump_mappings() and dump_show() is useful for debugging. Furthermore, this also standardizes the messages shown on both dump functions. Signed-off-by: Desnes Nunes <desnesn@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
479623fd0c |
dma-debug: small dma_debug_entry's comment and variable name updates
Small update on dma_debug_entry's struct commentary and also standardize the usage of 'dma_addr' variable name from debug_dma_map_page() on debug_dma_unmap_page(), and similarly on debug_dma_free_coherent() Signed-off-by: Desnes Nunes <desnesn@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
25a4ce5649 |
dma-direct: cleanup parameters to dma_direct_optimal_gfp_mask
Since both callers of dma_direct_optimal_gfp_mask() pass dev->coherent_dma_mask as the second argument, it is better to remove that parameter altogether. Not only is reducing number of parameters good for readability, but the new function signature is also more logical: The optimal flags depend only on data contained in struct device. While touching this code, let's also rename phys_mask to phys_limit in dma_direct_alloc_from_pool(), because it is indeed a limit. Signed-off-by: Petr Tesarik <petrtesarik@huaweicloud.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
0eee5ae102 |
swiotlb: fix slot alignment checks
Explicit alignment and page alignment are used only to calculate the stride, not when checking actual slot physical address. Originally, only page alignment was implemented, and that worked, because the whole SWIOTLB is allocated on a page boundary, so aligning the start index was sufficient to ensure a page-aligned slot. When commit |
|
|
|
39e7d2ab6e |
swiotlb: use wrap_area_index() instead of open-coding it
No functional change, just use an existing helper. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
7c3940bf81 |
swiotlb: fix the deadlock in swiotlb_do_find_slots
In general, if swiotlb is sufficient, the logic of index =
wrap_area_index(mem, index + 1) is fine, it will quickly take a slot and
release the area->lock; But if swiotlb is insufficient and the device
has min_align_mask requirements, such as NVME, we may not be able to
satisfy index == wrap and exit the loop properly. In this case, other
kernel threads will not be able to acquire the area->lock and release
the slot, resulting in a deadlock.
The current implementation of wrap_area_index does not involve a modulo
operation, so adjusting the wrap to ensure the loop ends is not trivial.
Introduce a new variable to record the number of loops and exit the loop
after completing the traversal.
Backtraces:
Other CPUs are waiting this core to exit the swiotlb_do_find_slots
loop.
[10199.924391] RIP: 0010:swiotlb_do_find_slots+0x1fe/0x3e0
[10199.924403] Call Trace:
[10199.924404] <TASK>
[10199.924405] swiotlb_tbl_map_single+0xec/0x1f0
[10199.924407] swiotlb_map+0x5c/0x260
[10199.924409] ? nvme_pci_setup_prps+0x1ed/0x340
[10199.924411] dma_direct_map_page+0x12e/0x1c0
[10199.924413] nvme_map_data+0x304/0x370
[10199.924415] nvme_prep_rq.part.0+0x31/0x120
[10199.924417] nvme_queue_rq+0x77/0x1f0
...
[ 9639.596311] NMI backtrace for cpu 48
[ 9639.596336] Call Trace:
[ 9639.596337]
[ 9639.596338] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.596341] swiotlb_do_find_slots+0xef/0x3e0
[ 9639.596344] swiotlb_tbl_map_single+0xec/0x1f0
[ 9639.596347] swiotlb_map+0x5c/0x260
[ 9639.596349] dma_direct_map_sg+0x7a/0x280
[ 9639.596352] __dma_map_sg_attrs+0x30/0x70
[ 9639.596355] dma_map_sgtable+0x1d/0x30
[ 9639.596356] nvme_map_data+0xce/0x370
...
[ 9639.595665] NMI backtrace for cpu 50
[ 9639.595682] Call Trace:
[ 9639.595682]
[ 9639.595683] _raw_spin_lock_irqsave+0x37/0x40
[ 9639.595686] swiotlb_release_slots.isra.0+0x86/0x180
[ 9639.595688] dma_direct_unmap_sg+0xcf/0x1a0
[ 9639.595690] nvme_unmap_data.part.0+0x43/0xc0
Fixes:
|
|
|
|
9b07d27d0f |
swiotlb: mark swiotlb_memblock_alloc() as __init
swiotlb_memblock_alloc() calls memblock_alloc(), which calls
(__init) memblock_alloc_try_nid(). However, swiotlb_membloc_alloc()
can be marked as __init since it is only called by swiotlb_init_remap(),
which is already marked as __init. This prevents a modpost build
warning/error:
WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text)
This fixes the build warning/error seen on ARM64, PPC64, S390, i386,
and x86_64.
Fixes:
|
|
|
|
5e7b9a6ae8 |
swiotlb: remove swiotlb_max_segment
swiotlb_max_segment has always been a bogus API, so remove it now that the remaining callers are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> |
|
|
|
3622b86f49 |
dma-mapping: reject GFP_COMP for noncoherent allocations
While not quite as bogus as for the dma-coherent allocations that were fixed earlier, GFP_COMP for these allocations has no benefits for the dma-direct case, and can't be supported at all by dma dma-iommu backend which splits up allocations into smaller orders. Due to an oversight in |
|
|
|
ffcb754584 |
dma-mapping: reject __GFP_COMP in dma_alloc_attrs
DMA allocations can never be turned back into a page pointer, so requesting compound pages doesn't make sense and it can't even be supported at all by various backends. Reject __GFP_COMP with a warning in dma_alloc_attrs, and stop clearing the flag in the arm dma ops and dma-iommu. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> |
|
|
|
8d58aa4849 |
swiotlb: reduce the swiotlb buffer size on allocation failure
At the moment the AMD encrypted platform reserves 6% of RAM for SWIOTLB or 1GB, whichever is less. However it is possible that there is no block big enough in the low memory which make SWIOTLB allocation fail and the kernel continues without DMA. In such case a VM hangs on DMA. This moves alloc+remap to a helper and calls it from a loop where the size is halved on each iteration. This updates default_nslabs on successful allocation which looks like an oversight as not doing so should have broken callers of swiotlb_size_or_default(). Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
27bc50fc90 |
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam R. Howlett. An overlapping range-based tree for vmas. It it apparently slight more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat (https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com). This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY0HaPgAKCRDdBJ7gKXxA joPjAQDZ5LlRCMWZ1oxLP2NOTp6nm63q9PWcGnmY50FjD/dNlwEAnx7OejCLWGWf bbTuk6U2+TKgJa4X7+pbbejeoqnt5QU= =xfWx -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ... |
|
|
|
7ade4f1077 |
dma: kmsan: unpoison DMA mappings
KMSAN doesn't know about DMA memory writes performed by devices. We unpoison such memory when it's mapped to avoid false positive reports. Link: https://lkml.kernel.org/r/20220915150417.722975-22-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
639205ed20 |
swiotlb: don't panic!
The panics in swiotlb are relics of a bygone era, some of them inadvertently inherited from a memblock refactor, and all of them unnecessary since they are in places that may also fail gracefully anyway. Convert the panics in swiotlb_init_remap() into non-fatal warnings more consistent with the other bail-out paths there and in swiotlb_init_late() (but don't bother trying to roll anything back, since if anything does actually fail that early, the aim is merely to keep going as far as possible to get more diagnostic information out of the inevitably-dying kernel). It's not for SWIOTLB to decide that the system is terminally compromised just because there *might* turn out to be one or more 32-bit devices that might want to make streaming DMA mappings, especially since we already handle the no-buffer case later if it turns out someone did want it. Similarly though, downgrade that panic in swiotlb_tbl_map_single(), since even if we do get to that point it's an overly extreme reaction. It makes little difference to the DMA API caller whether a mapping fails because the buffer is full or because there is no buffer, and once again it's not for SWIOTLB to presume that any particular DMA mapping is so fundamental to the operation of the system that it must be terminal if it could never succeed. Even if the caller handles failure by futilely retrying forever, a single stuck thread is considerably less impactful to the user than a needless panic. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
1d61261bfe |
swiotlb: replace kmap_atomic() with memcpy_{from,to}_page()
The use of kmap_atomic() is being deprecated in favor of
kmap_local_page(), which can also be used in atomic context (including
interrupts).
Replace kmap_atomic() with kmap_local_page(). Instead of open coding
mapping, memcpy(), and un-mapping, use the memcpy_{from,to}_page() helper.
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
|
|
9fc18f6d56 |
dma-mapping: mark dma_supported static
Now that the remaining users in drivers are gone, this function can be marked static. Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
43b919017f |
swiotlb: fix a typo
"overwirte" isn't a word. It should be "overwrite". Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
3f0461613e |
swiotlb: avoid potential left shift overflow
The second operand passed to slot_addr() is declared as int or unsigned int
in all call sites. The left-shift to get the offset of a slot can overflow
if swiotlb size is larger than 4G.
Convert the macro to an inline function and declare the second argument as
phys_addr_t to avoid the potential overflow.
Fixes:
|
|
|
|
2995b8002c |
dma-debug: improve search for partial syncs
When bucket_find_contains() tries to find the original entry for a partial sync, it manages to constrain its search in a way that is both too restrictive and not restrictive enough. A driver which only uses single mappings rather than scatterlists might not set max_seg_size, but could still technically perform a partial sync at an offset of more than 64KB into a sufficiently large mapping, so we could stop searching too early before reaching a legitimate entry. Conversely, if no valid entry is present and max_range is large enough, we can pointlessly search buckets that we've already searched, or that represent an impossible wrapping around the bottom of the address space. At worst, the (legitimate) case of max_seg_size == UINT_MAX can make the loop infinite. Replace the fragile and frankly hard-to-follow "range" logic with a simple counted loop for the number of possible hash buckets below the given address. Reported-by: Yunfei Wang <yf.wang@mediatek.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
81c12e922b |
Revert "swiotlb: panic if nslabs is too small"
This reverts commit |