mirror of https://github.com/torvalds/linux.git
2668 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
930b7c32ea |
overlayfs.rst: update metacopy section in overlayfs documentation
- Provide info about trusted.overlay.metacopy extended attribute - Minor rephrasing regarding copy-up operation with metacopy=on Signed-off-by: Yuriy Belikov <yuriybelikov1@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
|
|
|
d224338aa1 |
Linux 6.11-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbUG7oeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG7LUH/26M4QJ5UGJHsehd bbHlE4or0jibFyMbUiYDOElqLITjCVH6mi3Kv3E7sfyLxSsglVRRNzLCTq/UgTf8 E1L90q4wCySElzzIhH6cltuQdAhs7pRWs5BETByvIW+g+ayN0LZxUPbvB8yl/nOU Zx8flBEuM2isuRlnx+iRccbf2PxNadSkSYg2TlmZr8mfFKCiRxjU7x355Q3UcylQ b8S2jVgq69CSDF3IBOzwHZjdq5OceDsO8he0KcfSTvSgyFMcwhntAT397YEnFXnk KKjKPNCu3KqHtTxsi4Sc0wOxVcgctDv4OPethaL8yROQ7jdBTkvNpPT1yMf7bca8 ZLpSo5Y= =TBcj -----END PGP SIGNATURE----- Merge tag 'v6.11-rc6' into docs-mw This is done primarily to get a docs build fix merged via another tree so that "make htmldocs" stops failing. |
|
|
|
673f0c3ffc |
tools: usb: p9_fwd: add usb gadget packet forwarder script
This patch is adding an small python tool to forward 9pfs requests from the USB gadget to an existing 9pfs TCP server. Since currently all 9pfs servers lack support for the usb transport this tool is an useful helper to get started. Refer the Documentation section "USBG Example" in Documentation/filesystems/9p.rst on how to use it. Signed-off-by: Jan Luebbe <jlu@pengutronix.de> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-3-9a27de5160e0@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
a3be076dc1 |
net/9p/usbg: Add new usb gadget function transport
Add the new gadget function for 9pfs transport. This function is defining an simple 9pfs transport interface that consists of one in and one out endpoint. The endpoints transmit and receive the 9pfs protocol payload when mounting a 9p filesystem over usb. Tested-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-2-9a27de5160e0@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
420e05d0de |
fs: remove calls to set and clear the folio error flag
Nobody checks the folio error flag any more, so we can stop setting and clearing it. Also remove the documentation suggesting to not bother setting the error bit. Link: https://lkml.kernel.org/r/20240807193528.1865100-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
3717bbcb59 |
doc: correcting the idmapping mount example
In step 2, we obtain the kernel id `k1000`. So in next step (step 3), we should translate the `k1000` not `k21000`. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20240816063611.1961910-1-lihongbo22@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
28c7658b2c |
Fix spelling and gramatical errors
Fixed 3 typos in design.rst Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.com Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
f92a24ae7c |
Documentation/fs/9p: Expand goo.gl link
The goo.gl URL shortener is deprecated and is due to stop expanding existing links in 2025. The old goo.gl link in the 9p docs doesn't work anyway, replace it by a kernel.org link suggested by Randy instead. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240725180041.80862-1-linux@treblig.org |
|
|
|
5c6154ffd4 |
Changes since last update:
- Allow large folios on compressed inodes;
- Fix invalid memory accesses if z_erofs_gbuf_growsize()
partially fails;
- Two minor cleanups.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmbF1s8RHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qqzghAAjSi/OITIcpjYe3GOxDNEbIxTFf+5d7ge
66KRMBm5FG6xBaXOK4xhit2EdFsDVMAna6FjYyRJYpDDQX5MAUsifDeZeGZCKOV5
KpUk2CfhIp1zi9CeVA5Tl6rfdz0b0SKz77uRK9BN8PWmCdbeenOKgK6HU06S5enC
uPud9QegQELXad0bIe+uo+4zGQnqxwGMfkNEW+cfbp200m6kpig/SgttoXCtz1mm
CEZpv87MlrQLbLtox+KcesFhYofxWdx/My5ckJVbaHGZdM29HKehRLIsBFX3di5N
4OkGnFbh16lhx6s4a/b1pCeDNH1NcmP7uqCoGr+dCVKQ79glSZQvfiV4myQk7vRN
4Orj+zq4aBSmYOvizIFMH++IGDioPBxkhKv3ReeCe4t/L8LHPkGEaeT+iM9b+b4n
uK0faj6qREygf2rHIRr6ciq7GdxlMwdlF7AeNUIyCHNO84J39+GkAID3Nqs+vzuF
wSJAuqp959WgJf2Co5ugb4vjbrpq8TKfZG0yk7KnIOWJkhAKDcIwta+ooloIYtD8
5gHhMRodIgKjYKPOi9vx9cQPlISTMsqVWpRuyzH9vCvqNvkzDriAez1kFq01tZTO
gcK5QGekSgft/5h7PLFrbVV1kVfVdjNapsIY+XAEmBGGCJCScCrlIxjbsRwsseFW
JIpQXilIU3c=
=gFZW
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.11-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"As I mentioned in the merge window pull request, there is a regression
which could cause system hang due to page migration. The corresponding
fix landed upstream through MM tree last week (commit 2e6506e1c4ee:
"mm/migrate: fix deadlock in migrate_pages_batch() on large folios"),
therefore large folios can be safely allowed for compressed inodes and
stress tests have been running on my fleet for over 20 days without
any regression. Users have explicitly requested this for months, so
let's allow large folios for EROFS full cases now for wider testing.
Additionally, there is a fix which addresses invalid memory accesses
on a failure path triggered by fault injection and two minor cleanups
to simplify the codebase.
Summary:
- Allow large folios on compressed inodes
- Fix invalid memory accesses if z_erofs_gbuf_growsize() partially
fails
- Two minor cleanups"
* tag 'erofs-for-6.11-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix out-of-bound access when z_erofs_gbuf_growsize() partially fails
erofs: allow large folios for compressed files
erofs: get rid of check_layout_compatibility()
erofs: simplify readdir operation
|
|
|
|
ac6731870e |
documentation: add IPE documentation
Add IPE's admin and developer documentation to the kernel tree. Co-developed-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
|
|
|
7c373e4f14 |
fsverity: expose verified fsverity built-in signatures to LSMs
This patch enhances fsverity's capabilities to support both integrity and authenticity protection by introducing the exposure of built-in signatures through a new LSM hook. This functionality allows LSMs, e.g. IPE, to enforce policies based on the authenticity and integrity of files, specifically focusing on built-in fsverity signatures. It enables a policy enforcement layer within LSMs for fsverity, offering granular control over the usage of authenticity claims. For instance, a policy could be established to only permit the execution of all files with verified built-in fsverity signatures. The introduction of a security_inode_setintegrity() hook call within fsverity's workflow ensures that the verified built-in signature of a file is exposed to LSMs. This enables LSMs to recognize and label fsverity files that contain a verified built-in fsverity signature. This hook is invoked subsequent to the fsverity_verify_signature() process, guaranteeing the signature's verification against fsverity's keyring. This mechanism is crucial for maintaining system security, as it operates in kernel space, effectively thwarting attempts by malicious binaries to bypass user space stack interactions. The second to last commit in this patch set will add a link to the IPE documentation in fsverity.rst. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
|
|
|
e080a26725 |
erofs: allow large folios for compressed files
As commit
|
|
|
|
4fdd8664c8 |
ksmbd: fix spelling mistakes in documentation
There are a couple of spelling mistakes in the documentation. This patch fixes them. Signed-off-by: Victor Timofei <victor@vtimothy.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> |
|
|
|
889ced4c93
|
netfs: clean up after renaming FSCACHE_DEBUG config
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to
CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First,
NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas
the actual intended config is called NETFS_SUPPORT. Second, the config
renaming misses to adjust the documentation of the functionality of this
config.
Clean up those two points.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
|
|
1da86618bd
|
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
a225800f32
|
fs: Convert aops->write_end to take a folio
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
fbc90c042c |
- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN
walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ... |
|
|
|
6706415bf9 |
gfs2 fixes and cleanups
- Revise the glock reference counting model
- Several quota related fixes
- Clean up the glock demote logic
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmaVQ4gUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToLog//T6Iljxro4CMkNvGJx2B3puo2rtbd
mToxam0ZTkE/xXcxwRJDMFjxLdQ74xtiZLFJF8l/OwpiUpkKjh+hXdH4IABZG9Xm
hNSvYBFiUCt86pcDKc/ia7dH/xSBN3nH1IpNtr6dCFBXHkc1tK1v+QJ+RnFDZ9Re
kgMMYjmGKRfBRuR+r0uxF0V09jQYHmQ5K/o4arF5NX6ifUKX0tnnr8wB8bfXCznp
uXG6Jf8TWSGDcJI+phi7o5tNUN4187RRlODPewHsBmS0bdZla5buu5q9ATBDk1Ca
Btst+Oa6uIc6MHv9e9e59mVIp1NScYNnfDedFLLfxigskcGo2f7kaTlTNmccrMrm
sQNyVhWG5zlUJL7OcdonKP64XAJZbFt5I29RJOiqY1Z4OxCB3OH7Rl7MhbCATq/o
6jjN6+1DOGDKy3vbxIwaIsjC1E9H5hzIsmbzjEya0TpjrHENFPOCAFvWH2PdV2yc
hNhAiIKn6ihZs5QiXDGs9F46Qxb8C4nMDI/UAm2S7qAABlsD7m34PIZcfsVd+ouF
ySZOhf2xfLFovk1+QqAzaOGxtxUHqNkUhpakKPcjocb2NIdwUo1k8QM+plPOvnrm
1arppql5u1DAdzAvKI2LLjeKd64Wl8TG7SJuQQq8PiBMSa/MRSlKB5N3LFiqnbzC
iM/MYm1dPQ088wA=
=BfXp
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
"Fixes and cleanups:
- Revise the glock reference counting model and LRU list handling to
be more sensible
- Several quota related fixes: clean up the quota code, add some
missing locking, and work around the on-disk corruption that the
reverted patch "gfs2: ignore negated quota changes" causes
- Clean up the glock demote logic in glock_work_func()"
* tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
gfs2: Clean up glock demote logic
gfs2: Revert "check for no eligible quota changes"
gfs2: Be more careful with the quota sync generation
gfs2: Get rid of some unnecessary quota locking
gfs2: Add some missing quota locking
gfs2: Fold qd_fish into gfs2_quota_sync
gfs2: quota need_sync cleanup
gfs2: Fix and clean up function do_qc
gfs2: Revert "Add quota_change type"
gfs2: Revert "ignore negated quota changes"
gfs2: qd_check_sync cleanups
gfs2: Revert "introduce qd_bh_get_or_undo"
gfs2: Check quota consistency on mount
gfs2: Minor gfs2_quota_init error path cleanup
gfs2: Get rid of demote_ok checks
Revert "GFS2: Don't add all glocks to the lru"
gfs2: Revise glock reference counting model
gfs2: Switch to a per-filesystem glock workqueue
gfs2: Report when glocks cannot be freed for a long time
gfs2: gfs2_glock_get cleanup
...
|
|
|
|
4f5e249ec0 |
vfs-6.11.iomap
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHLQAKCRCRxhvAZXjc
ot3sAP9TBUM+vzUcQ5SVcUnSX+y3dhOGYnquORBbRc/Y6AzLMAEAu3TcsvdoaWfy
6ImUaju6iLqy9cCY3uDlNmJR16E4IgE=
=Bwpy
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
"This contains some minor work for the iomap subsystem:
- Add documentation on the design of iomap and how to port to it
- Optimize iomap_read_folio()
- Bring back the change to iomap_write_end() to no increase i_size.
This is accompanied by a change to xfs to reserve blocks for
truncating large realtime inodes to avoid exposing stale data when
iomap_write_end() stops increasing i_size"
* tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: don't increase i_size in iomap_write_end()
xfs: reserve blocks for truncating large realtime inode
Documentation: the design of iomap and how to port
iomap: Optimize iomap_read_folio
|
|
|
|
b8fc1bd73a |
vfs-6.11.mount.api
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGjAAKCRCRxhvAZXjc
okXfAP4tFUYszUsSqYdsgy9UvXw3Dr5zOIzQmN++NdjGkbU5fgEAs2ystqEfJgr3
v7XvGbu65CvL4/slNhBZOU4yekGx5Qc=
=C4QD
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount API updates from Christian Brauner:
- Add a generic helper to parse uid and gid mount options.
Currently we open-code the same logic in various filesystems which is
error prone, especially since the verification of uid and gid mount
options is a sensitive operation in the face of idmappings.
Add a generic helper and convert all filesystems over to it. Make
sure that filesystems that are mountable in unprivileged containers
verify that the specified uid and gid can be represented in the
owning namespace of the filesystem.
- Convert hostfs to the new mount api.
* tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fuse: Convert to new uid/gid option parsing helpers
fuse: verify {g,u}id mount options correctly
fat: Convert to new uid/gid option parsing helpers
fat: Convert to new mount api
fat: move debug into fat_mount_options
vboxsf: Convert to new uid/gid option parsing helpers
tracefs: Convert to new uid/gid option parsing helpers
smb: client: Convert to new uid/gid option parsing helpers
tmpfs: Convert to new uid/gid option parsing helpers
ntfs3: Convert to new uid/gid option parsing helpers
isofs: Convert to new uid/gid option parsing helpers
hugetlbfs: Convert to new uid/gid option parsing helpers
ext4: Convert to new uid/gid option parsing helpers
exfat: Convert to new uid/gid option parsing helpers
efivarfs: Convert to new uid/gid option parsing helpers
debugfs: Convert to new uid/gid option parsing helpers
autofs: Convert to new uid/gid option parsing helpers
fs_parse: add uid & gid option option parsing helpers
hostfs: Add const qualifier to host_root in hostfs_fill_super()
hostfs: convert hostfs to use the new mount API
|
|
|
|
c10cb9148e |
docs/procfs: call out ioctl()-based PROCMAP_QUERY command existence
Call out PROCMAP_QUERY ioctl() existence in the section describing /proc/PID/maps file in documentation. We refer user to UAPI header for low-level details of this programmatic interface. Link: https://lkml.kernel.org/r/20240627170900.1672542-5-andrii@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
9f111059e7
|
fs_parse: add uid & gid option option parsing helpers
Multiple filesystems take uid and gid as options, and the code to create the ID from an integer and validate it is standard boilerplate that can be moved into common helper functions, so do that for consistency and less cut&paste. This also helps avoid the buggy pattern noted by Seth Jenkins at https://lore.kernel.org/lkml/CALxfFW4BXhEwxR0Q5LSkg-8Vb4r2MONKCcUCVioehXQKr35eHg@mail.gmail.com/ because uid/gid parsing will fail before any assignment in most filesystems. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Link: https://lore.kernel.org/r/de859d0a-feb9-473d-a5e2-c195a3d47abb@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
399ab86ea5 |
/proc/pid/smaps: add mseal info for vma
Add sl in /proc/pid/smaps to indicate vma is sealed
Link: https://lkml.kernel.org/r/20240614232014.806352-2-jeffxu@google.com
Fixes:
|
|
|
|
a7ca193bc9
|
Documentation: the design of iomap and how to port
Capture the design of iomap and how to port filesystems to use it. Apologies for all the rst formatting, but it's necessary to distinguish code from regular text. A lot of this has been collected from various email conversations, code comments, commit messages, my own understanding of iomap, and Ritesh/Luis' previous efforts to create a document. Please note a large part of this has been taken from Dave's reply to last iomap doc patchset. Thanks to Ritesh, Luis, Dave, Darrick, Matthew, Christoph and other iomap developers who have taken time to explain the iomap design in various emails, commits, comments etc. Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Cc: Jan Kara <jack@suse.cz> Cc: Luis Chamberlain <mcgrof@kernel.org> Inspired-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240614214347.GK6125@frogsfrogsfrogs Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
713f883438 |
gfs2: Get rid of demote_ok checks
The demote_ok glock operation is only still used to prevent the inode glocks of the "jindex" and "rindex" directories from getting recycled while they are still referenced by sdp->sd_jindex and sdp->sd_rindex. However, the LRU walking code will no longer recycle glocks which are referenced, so the demote_ok glock operation is obsolete and can be removed. Each of a glock's holders in the gl_holders list is holding a reference on the glock, so when the list of holders isn't empty in demote_ok(), the existing reference count check will already prevent the glock from getting released. This means that demote_ok() is obsolete as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
|
|
|
97d6fdcd79 |
gfs2: Update glocks documentation
Rearrange the table of locking modes and associated caching capability to be in order of increasing caching capability. Update the description of the glock operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
|
|
|
9b62e02e63 |
16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZlIOUgAKCRDdBJ7gKXxA jrYnAP9UeOw8YchTIsjEllmAbTMAqWGI+54CU/qD78jdIHoVWAEAmp0QqgFW3r2p jze4jBkh3lGQjykTjkUskaR71h9AZww= =AHeV -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes, 11 of which are cc:stable. A few nilfs2 fixes, the remainder are for MM: a couple of selftests fixes, various singletons fixing various issues in various parts" * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/ksm: fix possible UAF of stable_node mm/memory-failure: fix handling of dissolved but not taken off from buddy pages mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again nilfs2: fix potential hang in nilfs_detach_log_writer() nilfs2: fix unexpected freezing of nilfs_segctor_sync() nilfs2: fix use-after-free of timer for log writer thread selftests/mm: fix build warnings on ppc64 arm64: patching: fix handling of execmem addresses selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages selftests/mm: compaction_test: fix bogus test success on Aarch64 mailmap: update email address for Satya Priya mm/huge_memory: don't unpoison huge_zero_folio kasan, fortify: properly rename memintrinsics lib: add version into /proc/allocinfo output mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL |
|
|
|
74eca356f6 |
We have a series from Xiubo that adds support for additional access
checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmZRsm0THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi6++B/4o7j4CNzjJcBw9UgxEUugwJYBe2Ht3 vSTUkHD9NILVgrYSHNhgkCdvU8ckv8Sd+7W/Kb0BC/GRyQd57F7zoM6mvR1WMozt lLbYU/kdVI+TcIY2bhupMma5f+nWv6vIBTzca78UhogEBuIEYHAG3BnNaT/AEqF+ yZ3uQEVQ2bHmwbPn3A5dibYxOR8zLyhmaq/RUvqpiiYcSkEfZ6QqKiMlZkcVD7F8 c+NfjwXXGNTXDhfIbG4VndQi7xLXk3GI5E8xvdo2ALwumfp2KdVlMEYT8SHggSPS A8bWq+d5o7uVV6WviNK63XcGRpadCSSSL310vR78K+tNIIq5dnI81IsU =6c1e -----END PGP SIGNATURE----- Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "A series from Xiubo that adds support for additional access checks based on MDS auth caps which were recently made available to clients. This is needed to prevent scenarios where the MDS quietly discards updates that a UID-restricted client previously (wrongfully) acked to the user. Other than that, just a documentation fixup" * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client: doc: ceph: update userspace command to get CephFS metadata ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit ceph: check the cephx mds auth access for async dirop ceph: check the cephx mds auth access for open ceph: check the cephx mds auth access for setattr ceph: add ceph_mds_check_access() helper ceph: save cap_auths in MDS client when session is opened |
|
|
|
a38568a0b4 |
lib: add version into /proc/allocinfo output
Add version string and a header at the beginning of /proc/allocinfo to
allow later format changes. Example output:
> head /proc/allocinfo
allocinfo - version: 1.0
# <size> <calls> <tag info>
0 0 init/main.c:1314 func:do_initcalls
0 0 init/do_mounts.c:353 func:mount_nodev_root
0 0 init/do_mounts.c:187 func:mount_root_generic
0 0 init/do_mounts.c:158 func:do_mount_root
0 0 init/initramfs.c:493 func:unpack_to_rootfs
0 0 init/initramfs.c:492 func:unpack_to_rootfs
0 0 init/initramfs.c:491 func:unpack_to_rootfs
512 1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
128 1 arch/x86/events/rapl.c:571 func:rapl_cpu_online
[akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
93a2221c9c |
doc: ceph: update userspace command to get CephFS metadata
According to ceph documentation [1], "getfattr -d /some/dir" no longer displays the list of all extended attributes. Both CephFS kernel and FUSE clients hide this information. To retrieve the information you have to specify the particular attribute name e.g. "getfattr -n ceph.dir.rbytes /some/dir". [1] https://docs.ceph.com/en/latest/cephfs/quota/ Signed-off-by: Artem Ikonnikov <artem@datacrunch.io> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> |
|
|
|
5ad8b6ad9a |
getting rid of bogus set_blocksize() uses, switching it
to struct file * and verifying that caller has device opened exclusively. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwkfQAKCRBZ7Krx/gZQ 62C3AQDW5vuXNx2+KDPma5YStjFpPLC0xtSyAS5D3YANjtyRFgD/TOcCarq7rvBt KubxHVFsfW+eu6ASeaoMRB83w5OIzwk= =Liix -----END PGP SIGNATURE----- Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize() |
|
|
|
72ece20127 |
f2fs update for 6.10-rc1
In this round, we've tried to address some performance issues on zoned storage
such as direct IO and write_hints. In addition, we've migrated some IO paths
using folio. Meanwhile, there are multiple bug fixes in the compression paths,
sanity check conditions, and error handlers.
Enhancement:
- allow direct io of pinned files for zoned storage
- assign the write hint per stream by default
- convert read paths and test_writeback to folio
- avoid allocating WARM_DATA segment for direct IO
Bug fix:
- fix false alarm on invalid block address
- fix to add missing iput() in gc_data_segment()
- fix to release node block count in error path of f2fs_new_node_page()
- compress: don't allow unaligned truncation on released compress inode
- compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
- compress: fix error path of inc_valid_block_count()
- compress: fix to update i_compr_blocks correctly
- fix block migration when section is not aligned to pow2
- don't trigger OPU on pinfile for direct IO
- fix to do sanity check on i_xattr_nid in sanity_check_inode()
- write missing last sum blk of file pinning section
- clear writeback when compression failed
- fix to adjust appropirate defragment pg_end
As usual, there are several minor code clean-ups, and fixes to manage missing
corner cases in the error paths.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmZLpYcACgkQQBSofoJI
UNJQTw/+NaY7a1EgkMUpBAzxrJMKHcuBtyG42QKqgk6new0XejQGjPHojL2nPrw/
t5G9TsbZbkHNMuhAkkTZMH+DFg92QYhByJlq79fxzya0XyGH4OaY1i4u67FLu0Qz
PS/UKRkEI2B9lH+bGwa//XNMDSnzcao46bNi1SFbCNPGzU1cS35uOy/YgAdFlqTM
WKJmM/AcNir4xtL30tBCVU//0OTtzT8+5YFVyPTeFR4WACsF6eTJAre9938xw1Ef
p6ed6Wl2GYehqgFrAdAF07veZ1hVDSRAAB/1Mu1WKnNp57VBRjJW3DFDyApf+fIe
2KJIDJd9/ece3dycuiZP/LXPV0sODqOI1/5s9RbFVq/QAhTSME5xq8hNXTejdl28
PV6M2tKcTKMRpykppQg/K/N9PaO5Q6oFz0xlrOsrGoAhT1YnZfJi/DmzCZCCwYxW
jyZor/r+849yDDdjhB94ZaByvj5S3OVqgsaunnbMBcGy+DDe0rUMXvRzVK4gTcCF
lSTSp895BggWXLyPuXVNTjC4GIbzVbEDaHILPicfbqi0h5OCXG8YybKHiRs+ss6z
ZrKJQxSVVvhjyHTVcBhb/Nc1s7Fm7DkX+KjV9GV3gwzB+AlVIgPlwyMTc2fZp3ST
dUbmBR5+g4UUz2v4v4ZStAGy9eUFktO89u/roet8/74ppklj73E=
=3mwj
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues on zoned
storage such as direct IO and write_hints. In addition, we've migrated
some IO paths using folio. Meanwhile, there are multiple bug fixes in
the compression paths, sanity check conditions, and error handlers.
Enhancements:
- allow direct io of pinned files for zoned storage
- assign the write hint per stream by default
- convert read paths and test_writeback to folio
- avoid allocating WARM_DATA segment for direct IO
Bug fixes:
- fix false alarm on invalid block address
- fix to add missing iput() in gc_data_segment()
- fix to release node block count in error path of
f2fs_new_node_page()
- compress:
- don't allow unaligned truncation on released compress inode
- cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
- fix error path of inc_valid_block_count()
- fix to update i_compr_blocks correctly
- fix block migration when section is not aligned to pow2
- don't trigger OPU on pinfile for direct IO
- fix to do sanity check on i_xattr_nid in sanity_check_inode()
- write missing last sum blk of file pinning section
- clear writeback when compression failed
- fix to adjust appropirate defragment pg_end
As usual, there are several minor code clean-ups, and fixes to manage
missing corner cases in the error paths"
* tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
f2fs: initialize last_block_in_bio variable
f2fs: Add inline to f2fs_build_fault_attr() stub
f2fs: fix some ambiguous comments
f2fs: fix to add missing iput() in gc_data_segment()
f2fs: allow dirty sections with zero valid block for checkpoint disabled
f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs: fix to release node block count in error path of f2fs_new_node_page()
f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
f2fs: compress: fix error path of inc_valid_block_count()
f2fs: compress: fix typo in f2fs_reserve_compress_blocks()
f2fs: compress: fix to update i_compr_blocks correctly
f2fs: check validation of fault attrs in f2fs_build_fault_attr()
f2fs: fix to limit gc_pin_file_threshold
f2fs: remove unused GC_FAILURE_PIN
f2fs: use f2fs_{err,info}_ratelimited() for cleanup
f2fs: fix block migration when section is not aligned to pow2
f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
f2fs: fix to avoid allocating WARM_DATA segment for direct IO
f2fs: remove redundant parameter in is_next_segment_free()
...
|
|
|
|
119d1b8a5d |
New code for 6.10:
* Introduce Parent Pointer extended attribute for inodes.
* Online Repair
- Implement atomic file content exchanges i.e. exchange ranges of bytes
between two files atomically.
- Create temporary files to repair file-based metadata. This uses atomic
file content exchange facility to swap file fork mappings between the
temporary file and the metadata inode.
- Allow callers of directory/xattr code to set an explicit owner number to
be written into the header fields of any new blocks that are created.
This is required to avoid walking every block of the new structure and
modify their ownership during online repair.
- Repair
- Extended attributes
- Inode unlinked state
- Directories
- Symbolic links
- AGI's unlinked inode list.
- Parent pointers.
- Move Orphan files to lost and found directory.
- Fixes for Inode repair functionality.
- Introduce a new sub-AG FITRIM implementation to reduce the duration for
which the AGF lock is held.
- Updates for the design documentation.
- Use Parent Pointers to assist in checking directories, parent pointers,
extended attributes, and link counts.
* Bring back delalloc support for realtime devices which have an extent size
that is equal to filesystem's block size.
* Improve performance of log incompat feature handling.
* Fixes
- Prevent userspace from reading invalid file data due to incorrect.
updation of file size when performing a non-atomic clone operation.
- Minor fixes to online repair.
- Fix confusing return values from xfs_bmapi_write().
- Fix an out of bounds access due to incorrect h_size during log recovery.
- Defer upgrading the extent counters in xfs_reflink_end_cow_extent() until
we know we are going to modify the extent mapping.
- Remove racy access to if_bytes check in xfs_reflink_end_cow_extent().
- Fix sparse warnings.
* Cleanups
- Hold inode locks on all files involved in a rename until the completion
of the operation. This is in preparation for the parent pointers patchset
where parent pointers are applied in a separate chained update from the
actual directory update.
- Compile out v4 support when disabled.
- Cleanup xfs_extent_busy_clear().
- Remove unused flags and fields from struct xfs_da_args.
- Remove definitions of unused functions.
- Improve extended attribute validation.
- Add higher level directory operations helpers to remove duplication of
code.
- Cleanup quota (un)reservation interfaces.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZjZC0wAKCRAH7y4RirJu
9HsCAPoCQvmPefDv56aMb5JEQNpv9dPz2Djj14hqLytQs5P/twD+LF5NhJgQNDUo
Lwnb0tmkAhmG9Y4CCiN1FwSj1rq59gE=
=2hXB
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"Online repair feature continues to be expanded. Also, we now support
delayed allocation for realtime devices which have an extent size that
is equal to filesystem's block size.
New code:
- Introduce Parent Pointer extended attribute for inodes
- Bring back delalloc support for realtime devices which have an
extent size that is equal to filesystem's block size
- Improve performance of log incompat feature handling
Online Repair:
- Implement atomic file content exchanges i.e. exchange ranges of
bytes between two files atomically
- Create temporary files to repair file-based metadata. This uses
atomic file content exchange facility to swap file fork mappings
between the temporary file and the metadata inode
- Allow callers of directory/xattr code to set an explicit owner
number to be written into the header fields of any new blocks that
are created. This is required to avoid walking every block of the
new structure and modify their ownership during online repair
- Repair more data structures:
- Extended attributes
- Inode unlinked state
- Directories
- Symbolic links
- AGI's unlinked inode list
- Parent pointers
- Move Orphan files to lost and found directory
- Fixes for Inode repair functionality
- Introduce a new sub-AG FITRIM implementation to reduce the duration
for which the AGF lock is held
- Updates for the design documentation
- Use Parent Pointers to assist in checking directories, parent
pointers, extended attributes, and link counts
Fixes:
- Prevent userspace from reading invalid file data due to incorrect.
updation of file size when performing a non-atomic clone operation
- Minor fixes to online repair
- Fix confusing return values from xfs_bmapi_write()
- Fix an out of bounds access due to incorrect h_size during log
recovery
- Defer upgrading the extent counters in xfs_reflink_end_cow_extent()
until we know we are going to modify the extent mapping
- Remove racy access to if_bytes check in
xfs_reflink_end_cow_extent()
- Fix sparse warnings
Cleanups:
- Hold inode locks on all files involved in a rename until the
completion of the operation. This is in preparation for the parent
pointers patchset where parent pointers are applied in a separate
chained update from the actual directory update
- Compile out v4 support when disabled
- Cleanup xfs_extent_busy_clear()
- Remove unused flags and fields from struct xfs_da_args
- Remove definitions of unused functions
- Improve extended attribute validation
- Add higher level directory operations helpers to remove duplication
of code
- Cleanup quota (un)reservation interfaces"
* tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits)
xfs: simplify iext overflow checking and upgrade
xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
xfs: xfs_quota_unreserve_blkres can't fail
xfs: consolidate the xfs_quota_reserve_blkres definitions
xfs: clean up buffer allocation in xlog_do_recovery_pass
xfs: fix log recovery buffer allocation for the legacy h_size fixup
xfs: widen flags argument to the xfs_iflags_* helpers
xfs: minor cleanups of xfs_attr3_rmt_blocks
xfs: create a helper to compute the blockcount of a max sized remote value
xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate
xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
xfs: fix error returns from xfs_bmapi_write
...
|
|
|
|
16dbfae867 |
bcachefs changes for 6.10-rc1
- More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmZKJQgACgkQE6szbY3K bnZETg//SU9H0OHnBSMB/cteF6PKo9QR+dhT+3n+gWTxl0o/egbGTqwbzVqGtd2f J6II1BsDk8VoTOb/gFfLRShlmJfnj2jpRThU265faR/7LQYeSaqndDPkjOpTayAD Nj/DJyiSUTL753rZh3yUhOpOIHf7iapH6wuaZCPfhdfk+yvZNW8iz07JHjHLKRp8 I2cFH0r6kN916NdRkt9oDCz68WouT8eWTqwcKra04XsLEZjNJHxLpKMq4M8UdPc7 YynJPVt+aP8+VduGIq6pV8Co3afCP2oUywo11JpRmvLsw4tex/59wxOYtpMfgn6k 4H+9WqiBwkbmnLDrfFHWRameS6F/7+GRAOVuz9nkmfk61UPU15gLjSRffqZ6u2YC 7vbrXgebId/sZXtBpQd83RMMX52BnEJah0upNJ54IsSqfDYkU9lwl6CEyYpcX1hf YNBGBTbspZztc3AB13b3ow421FMhaySUg0FDmntMR9O8Z6/BXk7Ykc7b8DPEfrFs W6JY7q+ARBxr+EgFcV74fvMCf7NJTAhyv80AKryo7NFU2JZOyyaTxcTGSnolX4Mi lyHiOgicmOX+vy3vbC1dZoDcmIDJ4Uc0vixYcpKiZqxlR8XJ+wpevC50TEhxrcW+ ZO4SloQvgyjI34xu/gZgjRYb3BhXK3x+ougVFpRG8V8zQ/+ccWg= =MKrF -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ... |
|
|
|
61307b7be4 |
The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
|
|
|
|
0cc6f45cec |
IOMMU Updates for Linux v6.10
Including:
- Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()
- Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring
- ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback
- ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from the
STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic
- AMD-Vi:
- Final part of SVA changes with generic IO page fault handling
- Renesas IPMMU:
- Add support for R8A779H0 hardware
- A couple smaller fixes and updates across the sub-tree
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmZHJMkACgkQK/BELZcB
GuND1Q/+M4RN5jM66XCfhqoP8QaI8I7zDlPDd14ismx0bjtOZhoiXpptKkAA8guo
7mS57MLqBw/hKYucm1mw+F1qi1HnRWSstKXiCPmzDm3UXYgZJlKkrOw6vydFeHJH
zx2ei7TmBrc0SrsybWK3NWRfVBBkO8enGZTmti0DfHL/rOFcUM0LHegY51GcDaaH
SlDr+LLDMeGynSQWhRlVNJVmEI5gpVPitY/mDUpVPoELiW9C0WGk8kPlR11z2pCR
eUNiqGJUcGasOhmfiYnpJR462eg7J41glquu+YHj8ivPbbu3C4wxgruY/tR4dmJG
8s6AMAWR53JzG2SrCCwtzyRPSXmKfvixF+VKmlB2Ksc7VAn1xA0DYnY5Tx99EtXu
qcEaR4SICMti0urmBGo/cGFdXi2TB1ccXqwoRtp1N3KiYnnOaQdLNO9qZdl9uUTI
uleXACzkCVSssSpBfGjFcPyHU4r3WjMfX0f5ZJPpFMoQmvwV1yeMX7xTEZz4Sxew
cHfBt9FAW9+4mBMTQfokBt0hZ6jwKcYl/z3Xi2oD+Ik/Qrzx5kcLA8LZLEVRXIBa
SZh2ASazq/dr8YoZ744VRmlmi+nISAIHbbQMeqQEQgYQh0HpwS9g5HtpsBzNP6aB
91RHqZSccb/zNdi8e+RH79Y7pX/G5QcuVKcW6KQUBcAAb6hAgOg=
=JUzp
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()
Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring
ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback
ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from
the STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic
AMD-Vi:
- Final part of SVA changes with generic IO page fault handling
Renesas IPMMU:
- Add support for R8A779H0 hardware
... and a couple smaller fixes and updates across the sub-tree"
* tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits)
iommu/arm-smmu-v3: Make the kunit into a module
arm64: Properly clean up iommu-dma remnants
iommu/amd: Enable Guest Translation after reading IOMMU feature register
iommu/vt-d: Decouple igfx_off from graphic identity mapping
iommu/amd: Fix compilation error
iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
iommu/arm-smmu-v3: Move the CD generation for SVA into a function
iommu/arm-smmu-v3: Allocate the CD table entry in advance
iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
iommu/arm-smmu-v3: Consolidate clearing a CD table entry
iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
iommu/arm-smmu-v3: Add an ops indirection to the STE code
iommu/arm-smmu-qcom: Don't build debug features as a kernel module
iommu/amd: Add SVA domain support
iommu: Add ops->domain_alloc_sva()
iommu/amd: Initial SVA support for AMD IOMMU
iommu/amd: Add support for enable/disable IOPF
iommu/amd: Add IO page fault notifier handler
...
|
|
|
|
1b10b390d9 |
EFI updates for v6.10:
- Additional cleanup by Tim for the efivarfs variable name length
confusion
- Avoid freeing a bogus pointer when virtual remapping is omitted in the
EFI boot stub
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZkMPmAAKCRAwbglWLn0t
XNw2AQC96tWvh7piDGw2LTGp4zqRcoe2LV09hNE8rgk63g+2LgEA81TOhEKwpeie
23GI9WiEABdRT8kt6ebuVatfzrn5jgk=
=8la7
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Only a handful of changes this cycle, consisting of cleanup work and a
low-prio bugfix:
- Additional cleanup by Tim for the efivarfs variable name length
confusion
- Avoid freeing a bogus pointer when virtual remapping is omitted in
the EFI boot stub"
* tag 'efi-next-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: libstub: only free priv.runtime_map when allocated
efi: Clear up misconceptions about a maximum variable name size
efivarfs: Remove unused internal struct members
Documentation: Mark the 'efivars' sysfs interface as removed
efi: pstore: Request at most 512 bytes for variable names
|
|
|
|
8815da98e0 |
Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by some
distributions that can break the PDF build.
- Various updates and additions to the Spanish, Chinese, Italian, and
Japanese translations.
- Update the stable-kernel rules to match modern practice
...and the usual array of corrections, updates, and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmY9ASYACgkQF0NaE2wM
flhPAwf/SYwHTBhKo0Xy3WsY3PHm4hsYVDwQ/Nfr6oa1mF+x4npxcN1RzPJd8iB9
zXlynnBkptwvEoukJV2hw+gVwO9ixyqJzIt7AmRFgA5cywhklpxQQAVelQG4ISR2
8M7LOXIjROJdY3OymPcQ2YF1m000tB9Khx7uvWrvMZEasXND/ITi9mFIJiOk841C
5wGTHmYKjJwuqTm6CsghAgLJkRYGHD+gtp4w8wQwQzIHJ6B8SnbVPSnYYqJ8Qt/V
31AEBgV3WJhmNiyNgP/p3rtDTCXBowSK8klOMa5CW3FQEIb4SQL/uBZ8qR8FQo2c
l1zsuPKKJOqe9T+POWHXdjoryZn1Ug==
=8fUD
-----END PGP SIGNATURE-----
Merge tag 'docs-6.10' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Another not-too-busy cycle for documentation, including:
- Some build-system changes to detect the variable fonts installed by
some distributions that can break the PDF build.
- Various updates and additions to the Spanish, Chinese, Italian, and
Japanese translations.
- Update the stable-kernel rules to match modern practice
... and the usual array of corrections, updates, and typo fixes"
* tag 'docs-6.10' of git://git.lwn.net/linux: (42 commits)
cgroup: Add documentation for missing zswap memory.stat
kernel-doc: Added "*" in $type_constants2 to fix 'make htmldocs' warning.
docs:core-api: fixed typos and grammar in printk-index page
Documentation: tracing: Fix spelling mistakes
docs/zh_CN/rust: Update the translation of quick-start to 6.9-rc4
docs/zh_CN/rust: Update the translation of general-information to 6.9-rc4
docs/zh_CN/rust: Update the translation of coding-guidelines to 6.9-rc4
docs/zh_CN/rust: Update the translation of arch-support to 6.9-rc4
docs: stable-kernel-rules: fix typo sent->send
docs/zh_CN: remove two inconsistent spaces
docs: scripts/check-variable-fonts.sh: Improve commands for detection
docs: stable-kernel-rules: create special tag to flag 'no backporting'
docs: stable-kernel-rules: explain use of stable@kernel.org (w/o @vger.)
docs: stable-kernel-rules: remove code-labels tags and a indention level
docs: stable-kernel-rules: call mainline by its name and change example
docs: stable-kernel-rules: reduce redundancy
docs, kprobes: Add riscv as supported architecture
Docs: typos/spelling
docs: kernel_include.py: Cope with docutils 0.21
docs: ja_JP/howto: Catch up update in v6.8
...
|
|
|
|
706833dbe3 |
bcachefs: CodingStyle
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
|
|
|
5ec5aab775 |
doc: split buffer.rst out of api-summary.rst
Buffer heads are no longer a generic filesystem API but an optional filesystem support library. Make the documentation structure reflect that, and include the fine documentation kept in buffer_head.h. We could give a better overview of what buffer heads are all about, but my enthusiasm for documenting it is limited. [willy@infradead.org: fix kerneldoc warning] Link: https://lkml.kernel.org/r/20240417015933.453505-1-willy@infradead.org [akpm@linux-foundation.org: remove newline at EOF] Link: https://lkml.kernel.org/r/20240416031754.4076917-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
d18a867958 |
make set_blocksize() fail unless block device is opened exclusive
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
da51bbcdba |
Docs: typos/spelling
Fix spelling and grammar in Docs descriptions Signed-off-by: Remington Brasga <rbrasga@uci.edu> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240429225527.2329-1-rbrasga@uci.edu |
|
|
|
22d407b164 |
lib: add allocation tagging support for memory allocation profiling
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily instrument memory allocators. It registers an "alloc_tags" codetag type with /proc/allocinfo interface to output allocation tag information when the feature is enabled. CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory allocation profiling instrumentation. Memory allocation profiling can be enabled or disabled at runtime using /proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation profiling by default. [surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title] Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com [surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU] Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com [klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h] Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com [surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()] Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Klara Modin <klarasmodin@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
7643f3fe27 |
f2fs: assign the write hint per stream by default
This reverts commit
|
|
|
|
67bdcd4999 |
docs: describe xfs directory tree online fsck
I've added a scrubber that checks the directory tree structure and fixes them; describe this in the design documentation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
c91fe20e5a |
docs: update offline parent pointer repair strategy
Now update how xfs_repair checks and repairs parent pointer info. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
5220727ce8 |
docs: update online directory and parent pointer repair sections
Update the case studies of online directory and parent pointer reconstruction to reflect what they actually do in the final version. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
d85fe250f2 |
docs: update the parent pointers documentation to the final version
Now that we've decided on the ondisk format of parent pointers, update the documentation to reflect that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
e6c9e75fbe |
xfs: move files to orphanage instead of letting nlinks drop to zero
If we encounter an inode with a nonzero link count but zero observed links, move it to the orphanage. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
1e58a8ccf2 |
xfs: move orphan files to the orphanage
When we're repairing a directory structure or fixing the dotdot entry of a subdirectory, it's possible that we won't ever find a parent for the subdirectory. When this is the case, move it to the orphanage, aka /lost+found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
f783529bee |
docs: update swapext -> exchmaps language
Start reworking the atomic swapext design documentation to refer to its new file contents/mapping exchange name. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
5302a5c8be |
xfs: only clear log incompat flags at clean unmount
While reviewing the online fsck patchset, someone spied the xfs_swapext_can_use_without_log_assistance function and wondered why we go through this inverted-bitmask dance to avoid setting the XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. (The same principles apply to the logged extended attribute update feature bit in the since-merged LARP series.) The reason for this dance is that xfs_add_incompat_log_feature is an expensive operation -- it forces the log, pushes the AIL, and then if nobody's beaten us to it, sets the feature bit and issues a synchronous write of the primary superblock. That could be a one-time cost amortized over the life of the filesystem, but the log quiesce and cover operations call xfs_clear_incompat_log_features to remove feature bits opportunistically. On a moderately loaded filesystem this leads to us cycling those bits on and off over and over, which hurts performance. Why do we clear the log incompat bits? Back in ~2020 I think Dave and I had a conversation on IRC[2] about what the log incompat bits represent. IIRC in that conversation we decided that the log incompat bits protect unrecovered log items so that old kernels won't try to recover them and barf. Since a clean log has no protected log items, we could clear the bits at cover/quiesce time. As Dave Chinner pointed out in the thread, clearing log incompat bits at unmount time has positive effects for golden root disk image generator setups, since the generator could be running a newer kernel than what gets written to the golden image -- if there are log incompat fields set in the golden image that was generated by a newer kernel/OS image builder then the provisioning host cannot mount the filesystem even though the log is clean and recovery is unnecessary to mount the filesystem. Given that it's expensive to set log incompat bits, we really only want to do that once per bit per mount. Therefore, I propose that we only clear log incompat bits as part of writing a clean unmount record. Do this by adding an operational state flag to the xfs mount that guards whether or not the feature bit clearing can actually take place. This eliminates the l_incompat_users rwsem that we use to protect a log cleaning operation from clearing a feature bit that a frontend thread is trying to set -- this lock adds another way to fail w.r.t. locking. For the swapext series, I shard that into multiple locks just to work around the lockdep complaints, and that's fugly. Link: https://lore.kernel.org/linux-xfs/20240131230043.GA6180@frogsfrogsfrogs/ Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> |
|
|
|
212c5c078d |
iommu: account IOMMU allocated memory
In order to be able to limit the amount of memory that is allocated by IOMMU subsystem, the memory must be accounted. Account IOMMU as part of the secondary pagetables as it was discussed at LPC. The value of SecPageTables now contains mmeory allocation by IOMMU and KVM. There is a difference between GFP_ACCOUNT and what NR_IOMMU_PAGES shows. GFP_ACCOUNT is set only where it makes sense to charge to user processes, i.e. IOMMU Page Tables, but there more IOMMU shared data that should not really be charged to a specific process. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240413002522.1101315-12-pasha.tatashin@soleen.com Signed-off-by: Joerg Roedel <jroedel@suse.de> |
|
|
|
5b625181fb |
Documentation: Mark the 'efivars' sysfs interface as removed
The 'efivars' sysfs interface was removed in commit
|
|
|
|
aa98e70fc6 |
Documentation: filesystems: Add bcachefs toctree
Commit |
|
|
|
c5d9ab85eb |
f2fs update for 6.9-rc1
In this round, there are a number of updates on mainly two areas: Zoned block
device support and Per-file compression. For example, we've found several issues
to support Zoned block device especially having large sections regarding to GC
and file pinning used for Android devices. In compression side, we've fixed many
corner race conditions that had broken the design assumption.
Enhancement:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fix:
- fix to avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space management and
compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and heap_allocation,
and also fixed minor error handling routines with neat debugging messages.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmX4gS0ACgkQQBSofoJI
UNLmgBAAg4mvbWjmJ5VbXs4zGLOgLRJYcY1sZRO5Ufg4LhWzoGRxL1Dru+TELw0t
1Ck2EQvP91XZ5weA5AZOfWbxcijy4+8L3P8L7ohOShudfACci0wQsx6IaUUWWylC
ILA4+DkovpZrlu6th12Gj9QAM6TN9gdy3V1VLT5O/KmE1x6Pekwp2hQoIvVJRH5L
I3KxOf5fTe3oWLvEN6m7yCz/8qGqz8+w0ae90UG0fqi0wVEuZJ99zsVPnuhu6uBo
riFm2A6ra0I/JqoPyqn2QM6ApItM867ULo9EoyQVgq56Q1w31ENOJXsU9N7N4Wxt
olgujH1SijkWk9ni57iKtMhR68e3Rs+pVsuNFmJuOPq0HASoggB66QRrVvCgM9JG
z3D//CB2ONtX2XiKJMiTcX9VqIqrMw6L1eVxEZu0P96C3CS70MoBU69mdSR9Og2S
5nQXja3yzFhdk3thp6+wAJ3I04ZQkf3qoHZB+0chU2Xl1pV+5NIkBgBsSw8g/TY3
EIHMfK+TX0SBSNCvkUDEJ+Z8ZRID6tcbAquTSsBr6wxB+F9mq7onEvI8O7xwyH9W
DU8xhymOE2QUoluNtyW7ww6HK913ripXIenI9LaYJnuj0XeDAcMIoPsgR7AGU5UG
hshvirFdUdWRMTfXxNNUrvhOWI0qurQSVx+VV6Qb62DGqR5ofOw=
=Qpvy
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, there are a number of updates on mainly two areas:
Zoned block device support and Per-file compression. For example,
we've found several issues to support Zoned block device especially
having large sections regarding to GC and file pinning used for
Android devices. In compression side, we've fixed many corner race
conditions that had broken the design assumption.
Enhancements:
- Support file pinning for Zoned block device having large section
- Enhance the data recovery after sudden power cut on Zoned block
device
- Add more error injection cases to easily detect the kernel panics
- add a proc entry show the entire disk layout
- Improve various error paths paniced by BUG_ON in block allocation
and GC
- support SEEK_DATA and SEEK_HOLE for compression files
Bug fixes:
- avoid use-after-free issue in f2fs_filemap_fault
- fix some race conditions to break the atomic write design
assumption
- fix to truncate meta inode pages forcely
- resolve various per-file compression issues wrt the space
management and compression policies
- fix some swap-related bugs
In addition, we removed deprecated codes such as io_bits and
heap_allocation, and also fixed minor error handling routines with
neat debugging messages"
* tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault
f2fs: truncate page cache before clearing flags when aborting atomic write
f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
f2fs: prevent atomic write on pinned file
f2fs: fix to handle error paths of {new,change}_curseg()
f2fs: unify the error handling of f2fs_is_valid_blkaddr
f2fs: zone: fix to remove pow2 check condition for zoned block device
f2fs: fix to truncate meta inode pages forcely
f2fs: compress: fix reserve_cblocks counting error when out of space
f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks
f2fs: add a proc entry show disk layout
f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup
f2fs: fix to check return value of f2fs_gc_range
f2fs: fix to check return value __allocate_new_segment
f2fs: fix to do sanity check in update_sit_entry
f2fs: fix to reset fields for unloaded curseg
f2fs: clean up new_curseg()
f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
f2fs: ro: don't start discard thread for readonly image
...
|
|
|
|
32a50540c3 |
bcachefs updates for 6.9
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXycEcACgkQE6szbY3K
bnYUTg/+K4Nv2EdAqOCyHRTKaF2OgJDUb25ZDmbGpfT1XyPrNB7/+CxHqSdEP7/e
FVuhtP61vnQAImDv82u9iZiab/TnuCZPUrjSobFEvrWYoGRtP9Bm9MyYB28NzmMa
AXGmS4yJGVwtxxrFNxZP98IbiHYiHSoYbkqxX2E5VgLag8Ru8peb7oD0Ro3zw0rb
z+6UM/seJ7on5i/9IJEMKKXFVEoZC2J5DAVoe1TghG2kgOw3cKu5OUdltLPOY5jL
jkm5J5wa6Ep46nufHat92yiMxXIQrf4U9LkXxzTi5ThoSmt+Af2qXcBjqTTVqd2D
1dGxj+UG8iu4DCCbQC6EA7J5EMvxfJM0+9lk1ULUgxUs3X69co6nlI6XH1fwEMqk
KpIqd35+Y/IYgogt9ioXI0dtXyL7dbaTVt6NZhc9SaPGPX+C2V0+l4bqToFdNaPH
0KATjjyQaJRE4ZFIjr6GliYOtKWDLi/HPEyoBivniUn7cF5vjSvti+cSQwNDSPpa
6jOd5Y923Iq9ZqDAPM3+mvTH8nNaaf2T2fmbPNrc5pdWbha9bGwOU71zvKHNFGm/
66ZsnwhKSk+uwglTMZHPKSkJJXUYAHESw3slQtEWHZVlliArc55+pBHwE00bvRt7
KHUUqkqXBUPzbp/kdZGylMAdH9+8j9TE5QJ2RaoryFm/eCfexmI=
=6xnj
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet:
- Subvolume children btree; this is needed for providing a userspace
interface for walking subvolumes, which will come later
- Lots of improvements to directory structure checking
- Improved journal pipelining, significantly improving performance on
high iodepth write workloads
- Discard path improvements: the discard path is more efficient, and no
longer flushes the journal unnecessarily
- Buffered write path can now avoid taking the inode lock
- new mm helper: memalloc_flags_{save|restore}
- mempool now does kvmalloc mempools
* tag 'bcachefs-2024-03-13' of https://evilpiepirate.org/git/bcachefs: (128 commits)
bcachefs: time_stats: shrink time_stat_buffer for better alignment
bcachefs: time_stats: split stats-with-quantiles into a separate structure
bcachefs: mean_and_variance: put struct mean_and_variance_weighted on a diet
bcachefs: time_stats: add larger units
bcachefs: pull out time_stats.[ch]
bcachefs: reconstruct_alloc cleanup
bcachefs: fix bch_folio_sector padding
bcachefs: Fix btree key cache coherency during replay
bcachefs: Always flush write buffer in delete_dead_inodes()
bcachefs: Fix order of gc_done passes
bcachefs: fix deletion of indirect extents in btree_gc
bcachefs: Prefer struct_size over open coded arithmetic
bcachefs: Kill unused flags argument to btree_split()
bcachefs: Check for writing superblocks with nonsense member seq fields
bcachefs: fix bch2_journal_buf_to_text()
lib/generic-radix-tree.c: Make nodes more reasonably sized
bcachefs: copy_(to|from)_user_errcode()
bcachefs: Split out bkey_types.h
bcachefs: fix lost journal buf wakeup due to improved pipelining
bcachefs: intercept mountoption value for bool type
...
|
|
|
|
eb386617be |
bcachefs: Errcode tracepoint, documentation
Add a tracepoint for downcasting private errors to standard errors, so they can be recovered even when not logged; also, add some documentation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> |
|
|
|
babbcc0232 |
New code for 6.9:
* Online Repair;
** New ondisk structures being repaired.
- Inode's mode field by trying to obtain file type value from the a
directory entry.
- Quota counters.
- Link counts of inodes.
- FS summary counters.
- rmap btrees.
Support for in-memory btrees has been added to support repair of rmap
btrees.
** Misc changes
- Report corruption of metadata to the health tracking subsystem.
- Enable indirect health reporting when resources are scarce.
- Reduce memory usage while reparing refcount btree.
- Extend "Bmap update" intent item to support atomic extent swapping on
the realtime device.
- Extend "Bmap update" intent item to support extended attribute fork and
unwritten extents.
** Code cleanups
- Bmap log intent.
- Btree block pointer checking.
- Btree readahead.
- Buffer target.
- Symbolic link code.
* Remove mrlock wrapper around the rwsem.
* Convert all the GFP_NOFS flag usages to use the scoped
memalloc_nofs_save() API instead of direct calls with the GFP_NOFS.
* Refactor and simplify xfile abstraction. Lower level APIs in
shmem.c are required to be exported in order to achieve this.
* Skip checking alignment constraints for inode chunk allocations when block
size is larger than inode chunk size.
* Do not submit delwri buffers collected during log recovery when an error
has been encountered.
* Fix SEEK_HOLE/DATA for file regions which have active COW extents.
* Fix lock order inversion when executing error handling path during
shrinking a filesystem.
* Remove duplicate ifdefs.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZemMkgAKCRAH7y4RirJu
9ON5AP0Vda6sMn/ZUYoLo9ZUrUvlUb8L0dhEN5JL0XfyWW5ogAD/bH4G6pKSNyTw
cSEjryuDakirdHLt5g0c+QHd2a/fzw0=
=ymKk
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
- Online repair updates:
- More ondisk structures being repaired:
- Inode's mode field by trying to obtain file type value from
the a directory entry
- Quota counters
- Link counts of inodes
- FS summary counters
- Support for in-memory btrees has been added to support repair
of rmap btrees
- Misc changes:
- Report corruption of metadata to the health tracking subsystem
- Enable indirect health reporting when resources are scarce
- Reduce memory usage while repairing refcount btree
- Extend "Bmap update" intent item to support atomic extent
swapping on the realtime device
- Extend "Bmap update" intent item to support extended attribute
fork and unwritten extents
- Code cleanups:
- Bmap log intent
- Btree block pointer checking
- Btree readahead
- Buffer target
- Symbolic link code
- Remove mrlock wrapper around the rwsem
- Convert all the GFP_NOFS flag usages to use the scoped
memalloc_nofs_save() API instead of direct calls with the GFP_NOFS
- Refactor and simplify xfile abstraction. Lower level APIs in shmem.c
are required to be exported in order to achieve this
- Skip checking alignment constraints for inode chunk allocations when
block size is larger than inode chunk size
- Do not submit delwri buffers collected during log recovery when an
error has been encountered
- Fix SEEK_HOLE/DATA for file regions which have active COW extents
- Fix lock order inversion when executing error handling path during
shrinking a filesystem
- Remove duplicate ifdefs
* tag 'xfs-6.9-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (183 commits)
xfs: shrink failure needs to hold AGI buffer
mm/shmem.c: Use new form of *@param in kernel-doc
kernel-doc: Add unary operator * to $type_param_ref
xfs: use kvfree() in xlog_cil_free_logvec()
xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL
xfs: fix scrub stats file permissions
xfs: fix log recovery erroring out on refcount recovery failure
xfs: move symlink target write function to libxfs
xfs: move remote symlink target read function to libxfs
xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h
xfs: xfs_bmap_finish_one should map unwritten extents properly
xfs: support deferred bmap updates on the attr fork
xfs: support recovering bmap intent items targetting realtime extents
xfs: add a realtime flag to the bmap update log redo items
xfs: add a xattr_entry helper
xfs: fix xfs_bunmapi to allow unmapping of partial rt extents
xfs: move xfs_bmap_defer_add to xfs_bmap_item.c
xfs: reuse xfs_bmap_update_cancel_item
xfs: add a bi_entry helper
xfs: remove xfs_trans_set_bmap_flags
...
|
|
|
|
1f44039766 |
A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation
- Rework some of the other top-level pages for better readability and, with
luck, fewer merge conflicts in the future.
- Submit-checklist improvements, hopefully the first of many.
- New Italian translations
- A fair number of kernel-doc fixes and improvements. We have also dropped
the recommendation to use an old version of Sphinx.
- A new document from Thorsten on bisection
...and lots of fixes and updates.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmXvKVIACgkQF0NaE2wM
flik1gf/ZFS1mHwDdmHA/vpx8UxdUlFEo0Pms8V24iPSW5aEIqkZ406c9DSyMTtp
CXTzW+RSCfB1Q3ciYtakHBgv0RzZ5+RyaEZ1l7zVmMyw4nYvK6giYKmg8Y0EVPKI
fAVuPWo5iE7io0sNVbKBKJJkj9Z8QEScM48hv/CV1FblMvHYn0lie6muJrF9G6Ez
HND+hlYZtWkbRd5M86CDBiFeGMLVPx17T+psQyQIcbUYm9b+RUqZRHIVRLYbad7r
18r9+83DsOhXTVJCBBSfCSZwzF8yAm+eD1w47sxnSItF8OiIjqCzQgXs3BZe9TXH
h2YyeWbMN3xByA4mEgpmOPP44RW7Pg==
=SC60
-----END PGP SIGNATURE-----
Merge tag 'docs-6.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation
- Rework some of the other top-level pages for better readability
and, with luck, fewer merge conflicts in the future.
- Submit-checklist improvements, hopefully the first of many.
- New Italian translations
- A fair number of kernel-doc fixes and improvements. We have also
dropped the recommendation to use an old version of Sphinx.
- A new document from Thorsten on bisection
... and lots of fixes and updates"
* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
docs: verify/bisect: fixes, finetuning, and support for Arch
docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
docs: submit-checklist: use subheadings
docs: submit-checklist: structure by category
docs: new text on bisecting which also covers bug validation
docs: drop the version constraints for sphinx and dependencies
docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
docs: Restore "smart quotes" for quotes
docs/zh_CN: accurate translation of "function"
docs: Include simplified link titles in main index
docs: Correct formatting of title in admin-guide/index.rst
docs: kernel_feat.py: fix build error for missing files
MAINTAINERS: Set the field name for subsystem profile section
kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
Fixed case issue with 'fault-injection' in documentation
kernel-doc: handle #if in enums as well
Documentation: update mailing list addresses
doc: kerneldoc.py: fix indentation
scripts/kernel-doc: simplify signature printing
...
|
|
|
|
3bf95d567d |
fscrypt updates for 6.9
Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZe/STxQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOKyVAAQCJQr5l3fU+rm1FVpuVg8q/pbPdi5wJ N31pYFvY3AehtQEArdPNtBbXW3V7i9OL6CDmesuNtGr3Il5KRV1h89yyYgY= =RGab -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Fix flakiness in a test by releasing the quota synchronously when a key is removed, and other minor cleanups" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: shrink the size of struct fscrypt_inode_info slightly fscrypt: write CBC-CTS instead of CTS-CBC fscrypt: clear keyring before calling key_put() fscrypt: explicitly require that inode->i_blkbits be set |
|
|
|
77417942e4 |
vfs-6.9.ntfs
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem42QAKCRCRxhvAZXjc
opOtAQDUkiJNaOu3fR6ENLvDZSFmaI2jQXIL8ulHYpEiFrXmKwD9EZQ8bmEYU7uO
WN4VM8p8UwQ7BmIV9b+jvwciF8Qi8QI=
=T03q
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull ntfs update from Christian Brauner:
"This removes the old ntfs driver. The new ntfs3 driver is a full
replacement that was merged over two years ago. We've went through
various userspace and either they use ntfs3 or they use the fuse
version of ntfs and thus build neither ntfs nor ntfs3. I think that's
a clear sign that we should risk removing the legacy ntfs driver.
Quoting from Arch Linux and Debian:
- Debian does neither build the legacy ntfs nor the new ntfs3:
"Not currently built with Debian's kernel packages, 'ntfs' has been
symlinked to 'ntfs-3g' as it relates to fstab and mount commands.
Debian kernels are built without support of the ntfs3 driver
developed by Paragon Software." (cf. [2])
- Archlinux provides ntfs3 as their default since 5.15:
"All officially supported kernels with versions 5.15 or newer are
built with CONFIG_NTFS3_FS=m and thus support it. Before 5.15,
NTFS read and write support is provided by the NTFS-3G FUSE file
system." (cf. [1]).
It's unmaintained apart from various odd fixes as well. Worst case we
have to reintroduce it if someone really has a valid dependency on it.
But it's worth trying to see whether we can remove it"
Link: https://wiki.archlinux.org/title/NTFS [1]
Link: https://wiki.debian.org/NTFS [2]
* tag 'vfs-6.9.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: remove NTFS classic from docum. index
fs: Remove NTFS classic
|
|
|
|
7ea65c89d8 |
vfs-6.9.misc
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem3wQAKCRCRxhvAZXjc
otRMAQDeo8qsuuIAcS2KUicKqZR5yMVvrY9r4sQzf7YRcJo5HQD+NQXkKwQuv1VO
OUeScsic/+I+136AgdjWnlEYO5dp0go=
=4WKU
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Misc features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Support idmapped mounts for hugetlbfs.
- Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
where the passed offset is ignored if the file is O_APPEND. The new
flag allows a caller to enforce that the offset is honored to
conform to posix even if the file was opened in append mode.
- Move i_mmap_rwsem in struct address_space to avoid false sharing
between i_mmap and i_mmap_rwsem.
- Convert efs, qnx4, and coda to use the new mount api.
- Add a generic is_dot_dotdot() helper that's used by various
filesystems and the VFS code instead of open-coding it multiple
times.
- Recently we've added stable offsets which allows stable ordering
when iterating directories exported through NFS on e.g., tmpfs
filesystems. Originally an xarray was used for the offset map but
that caused slab fragmentation issues over time. This switches the
offset map to the maple tree which has a dense mode that handles
this scenario a lot better. Includes tests.
- Finally merge the case-insensitive improvement series Gabriel has
been working on for a long time. This cleanly propagates case
insensitive operations through ->s_d_op which in turn allows us to
remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
It also improves performance by trying a case-sensitive comparison
first and then fallback to case-insensitive lookup if that fails.
This also fixes a bug where overlayfs would be able to be mounted
over a case insensitive directory which would lead to all sort of
odd behaviors.
Cleanups:
- Make file_dentry() a simple accessor now that ->d_real() is
simplified because of the backing file work we did the last two
cycles.
- Use the dedicated file_mnt_idmap helper in ntfs3.
- Use smp_load_acquire/store_release() in the i_size_read/write
helpers and thus remove the hack to handle i_size reads in the
filemap code.
- The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
fs/
- It's no longer necessary to perform a second built-in initramfs
unpack call because we retain the contents of the previous
extraction. Remove it.
- Now that we have removed various allocators kfree_rcu() always
works with kmem caches and kmalloc(). So simplify various places
that only use an rcu callback in order to handle the kmem cache
case.
- Convert the pipe code to use a lockdep comparison function instead
of open-coding the nesting making lockdep validation easier.
- Move code into fs-writeback.c that was located in a header but can
be made static as it's only used in that one file.
- Rewrite the alignment checking iterators for iovec and bvec to be
easier to read, and also significantly more compact in terms of
generated code. This saves 270 bytes of text on x86-64 (with
clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
saves a bit of time for the same workload.
- Switch various places to use KMEM_CACHE instead of
kmem_cache_create().
- Use inode_set_ctime_to_ts() in inode_set_ctime_current()
- Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.
- Various smaller cleanups for eventfds.
Fixes:
- Fix various comments and typos, and unneeded initializations.
- Fix stack allocation hack for clang in the select code.
- Improve dump_mapping() debug code on a best-effort basis.
- Fix build errors in various selftests.
- Avoid wrap-around instrumentation in various places.
- Don't allow user namespaces without an idmapping to be used for
idmapped mounts.
- Fix sysv sb_read() call.
- Fix fallback implementation of the get_name() export operation"
* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
hugetlbfs: support idmapped mounts
qnx4: convert qnx4 to use the new mount api
fs: use inode_set_ctime_to_ts to set inode ctime to current time
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
efs: remove SLAB_MEM_SPREAD flag usage
jfs: remove SLAB_MEM_SPREAD flag usage
minix: remove SLAB_MEM_SPREAD flag usage
openpromfs: remove SLAB_MEM_SPREAD flag usage
proc: remove SLAB_MEM_SPREAD flag usage
qnx6: remove SLAB_MEM_SPREAD flag usage
...
|
|
|
|
8b10d36537 |
f2fs: introduce FAULT_NO_SEGMENT
Use it to simulate no free segment case during block allocation. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
|
|
4e0197f993 |
f2fs: kill heap-based allocation
No one uses this feature. Let's kill it. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
|
|
2f944c66ae |
fscrypt: write CBC-CTS instead of CTS-CBC
Calling CBC with ciphertext stealing "CBC-CTS" seems to be more common than calling it "CTS-CBC". E.g., CBC-CTS is used by OpenSSL, Crypto++, RFC3962, and RFC6803. The NIST SP800-38A addendum uses CBC-CS1, CBC-CS2, and CBC-CS3, distinguishing between different CTS conventions but similarly putting the CBC part first. In the interest of avoiding any idiosyncratic terminology, update the fscrypt documentation and the fscrypt_mode "friendly names" to align with the more common convention. Changing the "friendly names" only affects some log messages. The actual mode constants in the API are unchanged; those call it simply "CTS". Add a note to the documentation that clarifies that "CBC" and "CTS" in the API really mean CBC-ESSIV and CBC-CTS, respectively. Link: https://lore.kernel.org/r/20240224053550.44659-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
|
|
|
a095686a23 |
xfs: support in-memory btrees
Adapt the generic btree cursor code to be able to create a btree whose buffers come from a (presumably in-memory) buftarg with a header block that's specific to in-memory btrees. We'll connect this to other parts of online scrub in the next patches. Note that in-memory btrees always have a block size matching the system memory page size for efficiency reasons. There are also a few things we need to do to finalize a btree update; that's covered in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> |
|
|
|
e5a2f47cff |
xfs: remove xfile_{get,put}_page
These functions aren't used anymore, so get rid of them. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
|
|
|
e47e2e0ba9 |
xfs: remove the xfile_pread/pwrite APIs
All current and pending xfile users use the xfile_obj_load and xfile_obj_store API, so make those the actual implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
|
|
|
87161a2b0a |
f2fs: deprecate io_bits
Let's deprecate an unused io_bits feature to save CPU cycles and memory. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
|
|
2d1ab26ace |
docs: proc.rst: comm: mention the included NUL
Indicate that the actual value will be one character less. Signed-off-by: Christoph Anton Mitterer <mail@christoph.anton.mitterer.name> Link: https://lore.kernel.org/r/20240205154100.736499-1-mail@christoph.anton.mitterer.name [jc: did s/null/NUL/ ] Signed-off-by: Jonathan Corbet <corbet@lwn.net> |
|
|
|
ef560389ca |
docs: filesystems: fix typo in docs
This patch resolves a spelling error in the filesystem documentation. It is submitted as part of my application to the "Linux Kernel Bug Fixing Spring Unpaid 2024" mentorship program of the Linux Kernel Foundation. Signed-off-by: Vincenzo Mezzela <vincenzo.mezzela@gmail.com> Link: https://lore.kernel.org/r/20240208162032.109184-1-vincenzo.mezzela@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
11b3f8ae70 |
fs: remove the inode argument to ->d_real() method
The only remaining user of ->d_real() method is d_real_inode(), which passed NULL inode argument to get the real data dentry. There are no longer any users that call ->d_real() with a non-NULL inode argument for getting a detry from a specific underlying layer. Remove the inode argument of the method and replace it with an integer 'type' argument, to allow callers to request the real metadata dentry instead of the real data dentry. All the current users of d_real_inode() (e.g. uprobe) continue to get the real data inode. Caller that need to get the real metadata inode (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202110132.1584111-3-amir73il@gmail.com Tested-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
c7115e094c |
f2fs: introduce FAULT_BLKADDR_CONSISTENCE
We will encounter below inconsistent status when FAULT_BLKADDR type fault injection is on. Info: checkpoint state = d6 : nat_bits crc fsck compacted_summary orphan_inodes sudden-power-off [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c100 has i_blocks: 000000c0, but has 191 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1c100] i_blocks=0x000000c0 -> 0xbf [FIX] (fsck_chk_inode_blk:1269) --> [0x1c100] i_compr_blocks=0x00000026 -> 0x27 [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1cadb has i_blocks: 0000002f, but has 46 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1cadb] i_blocks=0x0000002f -> 0x2e [FIX] (fsck_chk_inode_blk:1269) --> [0x1cadb] i_compr_blocks=0x00000011 -> 0x12 [ASSERT] (fsck_chk_inode_blk:1254) --> ino: 0x1c62c has i_blocks: 00000002, but has 1 blocks [FIX] (fsck_chk_inode_blk:1260) --> [0x1c62c] i_blocks=0x00000002 -> 0x1 After we inject fault into f2fs_is_valid_blkaddr() during truncation, a) it missed to increase @nr_free or @valid_blocks b) it can cause in blkaddr leak in truncated dnode Which may cause inconsistent status. This patch separates FAULT_BLKADDR_CONSISTENCE from FAULT_BLKADDR, and rename FAULT_BLKADDR to FAULT_BLKADDR_VALIDITY so that we can: a) use FAULT_BLKADDR_CONSISTENCE in f2fs_truncate_data_blocks_range() to simulate inconsistent issue independently, then it can verify fsck repair flow. b) FAULT_BLKADDR_VALIDITY fault will not cause any inconsistent status, we can just use it to check error path handling in kernel side. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
|
|
|
06b8db3a7d
|
fs: remove NTFS classic from docum. index
With the remove of the NTFS classic filesystem, also remove its
documentation entry from the filesystems index to prevent a
kernel-doc warning:
Documentation/filesystems/index.rst:63: WARNING: toctree contains reference to nonexisting document 'filesystems/ntfs'
Fixes: 9c67092ed339 ("fs: Remove NTFS classic")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240124011424.731-1-rdunlap@infradead.org
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: <linux-doc@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
|
|
7ffa8f3d30
|
fs: Remove NTFS classic
The replacement, NTFS3, was merged over two years ago. It is now time to remove the original from the tree as it is the last user of several APIs, and it is not worth changing. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240115072025.2071931-1-willy@infradead.org Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Dave Chinner <david@fromorbit.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
420332b941 |
ovl: mark xwhiteouts directory with overlay.opaque='x'
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').
This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.
This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.
Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.
Fixes:
|
|
|
|
8cb1bb178c |
4 ksmbd server fixes
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmWobwQACgkQiiy9cAdy T1GTXgv/amjpAd1kEVwgfyUGvM9rsN+DtojoXt1z5xDkzJrnszI0s/WARz7o2gc/ 6nIOxKWfpxb0QHRcLebZHbN7mrZeelHLyMqbx0Wphy5Y0cUQlq1C50l6xAkua1dd /uPklGZVW9LDRvZYzdwa4Spi0tVwDgsnmK2UiWckCfc1yu9BVz3mg1gtaeg0Z/Z4 caojvSTXdJ/35xKV4udE4lo0PpbCC2h990c5iR8iOXgMuBlgIv3JQYC+3avuXXdS Erneof5Vx9sUV7p7SWCR71VFaWnMsF1/DtXMKTxAJKjYGZPckQGQjxOJeonaRltg 0YjzeMT5/d5Fv3w0L1Mtn9pxyXTQ4ywKJ3Vpm21geuw7pl/70sqdQtg+ujljfN6j uzb6mzwFezp3LGK8RI6h//JKUrnurIM/dc9FuRCoU2N91rfhGuiAvTwzrjh67WW8 HTmsnhNWnWuhH7OQJrOPQVEEs6DSUgA6MvHjslXoj7V+ksoKMe57JyZIr6Hx0CX9 W4Q0j6Hk =gj+T -----END PGP SIGNATURE----- Merge tag '6.8-rc-smb-server-fixes-part2' of git://git.samba.org/ksmbd Pull more smb server updates from Steve French: - Fix for incorrect oplock break on directories when leases disabled - UAF fix for race between create and destroy of tcp connection - Important session setup SPNEGO fix - Update ksmbd feature status summary * tag '6.8-rc-smb-server-fixes-part2' of git://git.samba.org/ksmbd: ksmbd: only v2 leases handle the directory ksmbd: fix UAF issue in ksmbd_tcp_new_connection() ksmbd: validate mech token in session setup ksmbd: update feature status in documentation |
|
|
|
16df6e07d6 |
vfs-6.8.netfs
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZabMrQAKCRCRxhvAZXjc
ovnUAQDgCOonb1tjtTvC8s8IMDUEoaVYZI91KVfsZQSJYN1sdQD+KfJmX1BhJnWG
l0cEffGfnWGXMZkZqDgLPHUIPzFrmws=
=1b3j
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
"This extends the netfs helper library that network filesystems can use
to replace their own implementations. Both afs and 9p are ported. cifs
is ready as well but the patches are way bigger and will be routed
separately once this is merged. That will remove lots of code as well.
The overal goal is to get high-level I/O and knowledge of the page
cache and ouf of the filesystem drivers. This includes knowledge about
the existence of pages and folios
The pull request converts afs and 9p. This removes about 800 lines of
code from afs and 300 from 9p. For 9p it is now possible to do writes
in larger than a page chunks. Additionally, multipage folio support
can be turned on for 9p. Separate patches exist for cifs removing
another 2000+ lines. I've included detailed information in the
individual pulls I took.
Summary:
- Add NFS-style (and Ceph-style) locking around DIO vs buffered I/O
calls to prevent these from happening at the same time.
- Support for direct and unbuffered I/O.
- Support for write-through caching in the page cache.
- O_*SYNC and RWF_*SYNC writes use write-through rather than writing
to the page cache and then flushing afterwards.
- Support for write-streaming.
- Support for write grouping.
- Skip reads for which the server could only return zeros or EOF.
- The fscache module is now part of the netfs library and the
corresponding maintainer entry is updated.
- Some helpers from the fscache subsystem are renamed to mark them as
belonging to the netfs library.
- Follow-up fixes for the netfs library.
- Follow-up fixes for the 9p conversion"
* tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (50 commits)
netfs: Fix wrong #ifdef hiding wait
cachefiles: Fix signed/unsigned mixup
netfs: Fix the loop that unmarks folios after writing to the cache
netfs: Fix interaction between write-streaming and cachefiles culling
netfs: Count DIO writes
netfs: Mark netfs_unbuffered_write_iter_locked() static
netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs"
netfs: Rearrange netfs_io_subrequest to put request pointer first
9p: Use length of data written to the server in preference to error
9p: Do a couple of cleanups
9p: Fix initialisation of netfs_inode for 9p
cachefiles: Fix __cachefiles_prepare_write()
9p: Use netfslib read/write_iter
afs: Use the netfs write helpers
netfs: Export the netfs_sreq tracepoint
netfs: Optimise away reads above the point at which there can be no data
netfs: Implement a write-through caching option
netfs: Provide a launder_folio implementation
netfs: Provide a writepages implementation
netfs, cachefiles: Pass upper bound length to allow expansion
...
|
|
|
|
fdfd6dde43 |
ksmbd: update feature status in documentation
Update ksmbd feature status in documentation file. - add support for v2 lease feature and SMB3 CCM/GCM256 encryption. - add planned compression, quic, gmac signing features. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> |
|
|
|
499aa1ca4e |
dcache stuff for this cycle
change of locking rules for __dentry_kill(), regularized refcounting
rules in that area, assorted cleanups and removal of weird corner
cases (e.g. now ->d_iput() on child is always called before the parent
might hit __dentry_kill(), etc.)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+sQQAKCRBZ7Krx/gZQ
6ybjAQDM5jiS93IUzfHjCWq0nVBX5YGbDAkZOeqxbmIdQb+2UAEA6elP5r0fBBcA
seo3bry4DirQMDaA/Cjh4+8r71YSOQs=
=7+Hk
-----END PGP SIGNATURE-----
Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dcache updates from Al Viro:
"Change of locking rules for __dentry_kill(), regularized refcounting
rules in that area, assorted cleanups and removal of weird corner
cases (e.g. now ->d_iput() on child is always called before the parent
might hit __dentry_kill(), etc)"
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
dcache: remove unnecessary NULL check in dget_dlock()
kill DCACHE_MAY_FREE
__d_unalias() doesn't use inode argument
d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant
get rid of DCACHE_GENOCIDE
d_genocide(): move the extern into fs/internal.h
simple_fill_super(): don't bother with d_genocide() on failure
nsfs: use d_make_root()
d_alloc_pseudo(): move setting ->d_op there from the (sole) caller
kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller
retain_dentry(): introduce a trimmed-down lockless variant
__dentry_kill(): new locking scheme
d_prune_aliases(): use a shrink list
switch select_collect{,2}() to use of to_shrink_list()
to_shrink_list(): call only if refcount is 0
fold dentry_kill() into dput()
don't try to cut corners in shrink_lock_dentry()
fold the call of retain_dentry() into fast_dput()
Call retain_dentry() with refcount 0
dentry_kill(): don't bother with retain_dentry() on slow path
...
|
|
|
|
bf4e7080ae |
fix directory locking scheme on rename
broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ 60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6 0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ= =+x0+ -----END PGP SIGNATURE----- Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change |
|
|
|
5b9b41617b |
Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following a
warning that was added in 6.2.
- Some reworking of the Documentation/process front page to, hopefully,
make it more useful.
- Various kernel-doc tweaks to, for example, make it deal properly with
__counted_by annotations.
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the delightful
consequence of adding some 600 warnings to the docs build. A sustained
effort by Randy, Vegard, and myself has addressed almost all of those,
bringing the documentation back into sync with the code. The fixes are
going through the appropriate maintainer trees.
- Various improvements to the HTML rendered docs, including automatic links
to Git revisions and a nice new pulldown to make translations easy to
access.
- Speaking of translations, more of those for Spanish and Chinese.
...plus the usual stream of documentation updates and typo fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmWcRKMPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YTKIH/AxBt/3iWt40dPf18arZHLU6tdUbmg01ttef
CNKWkniCmABGKc//KYDXvjZMRDt0YlrS0KgUzrb8nIQTBlZG40D+88EwjXE0HeGP
xt1Fk7OPOiJEqBZ3HEe0PDVfOiA+4yR6CmDKklCJuKg77X9atklneBwPUw/cOASk
CWj+BdbwPBiSNQv48Lp87rGusKwnH/g0MN2uS0z9MPr1DYjM1K8+ngZjGW24lZHt
qs5yhP43mlZGBF/lwNJXQp/xhnKAqJ9XwylBX9Wmaoxaz9yyzNVsADGvROMudgzi
9YB+Jdy7Z0JSrVoLIRhUuDOv7aW8vk+8qLmGJt2aTIsqehbQ6pk=
=fCtT
-----END PGP SIGNATURE-----
Merge tag 'docs-6.8' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"Another moderately busy cycle for documentation, including:
- The minimum Sphinx requirement has been raised to 2.4.4, following
a warning that was added in 6.2
- Some reworking of the Documentation/process front page to,
hopefully, make it more useful
- Various kernel-doc tweaks to, for example, make it deal properly
with __counted_by annotations
- We have also restored a warning for documentation of nonexistent
structure members that disappeared a while back. That had the
delightful consequence of adding some 600 warnings to the docs
build. A sustained effort by Randy, Vegard, and myself has
addressed almost all of those, bringing the documentation back into
sync with the code. The fixes are going through the appropriate
maintainer trees
- Various improvements to the HTML rendered docs, including automatic
links to Git revisions and a nice new pulldown to make translations
easy to access
- Speaking of translations, more of those for Spanish and Chinese
... plus the usual stream of documentation updates and typo fixes"
* tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits)
MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL
A reworked process/index.rst
ring-buffer/Documentation: Add documentation on buffer_percent file
Translated the RISC-V architecture boot documentation.
Docs: remove mentions of fdformat from util-linux
Docs/zh_CN: Fix the meaning of DEBUG to pr_debug()
Documentation: move driver-api/dcdbas to userspace-api/
Documentation: move driver-api/isapnp to userspace-api/
Documentation/core-api : fix typo in workqueue
Documentation/trace: Fixed typos in the ftrace FLAGS section
kernel-doc: handle a void function without producing a warning
scripts/get_abi.pl: ignore some temp files
docs: kernel_abi.py: fix command injection
scripts/get_abi: fix source path leak
CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer
docs: translations: add translations links when they exist
kernel-doc: Align quick help and the code
MAINTAINERS: add reviewer for Spanish translations
docs: ignore __counted_by attribute in structure definitions
scripts: kernel-doc: Clarify missing struct member description
..
|
|
|
|
4d925f6057 |
overlayfs updates for 6.8
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWeozwACgkQEVvVyTe/
1WqnyA//U2Ka5ZIncs/hA5D03LMyuCh9qlH5qAGce5vrBTxogTlFuTGGKtsUuCB5
Y4GALO+fw8aWAowt5X1XfHD3TETLVbCshT7dYjKsKy/ojANCbgkCipXBudYx+l9m
fllwTZyueK0UY14kCU2DAV5PYsI/XVVykk71GSMOMLCUfRJfDI7R0vBD0NaUd7Kz
Wp/M6t0MnXX23nGUdgNoroZPPj3Ts/gK2MXID+QHXGaR2+M1B1lLKfSu6TcRDLtn
tbe/ivaw4y1jj3jfFwMC7sSSDyIJeZh9tBB4Rvv2vsMiYU8zAC6Eg35eIbPONu42
pUMd0QQa79H3cyYEDtUzyskzur0Jry5azzb8JdQWipgVKFh5g3CHce2XAFlVjw2a
9RyCKg41A9LvdB5l/PvBtsxig2PzaYqE09rXAfUM7eLNFlOLbL99uc1WJbIFfG43
Czh9vPxsuJ5RkdwS7R0m4GYDw8+BKW6WjpaC+Eje4I8X1rAQK0H/BLTCxe2dLRB7
0neAg8e3g6NdisRSLOP74xoEn/dhijNP7ENOFF1EdP/BFPHL7+sRsV6XYwwBeUAc
c6YsxeAPylm6gvIq/ESoRiY+e5QWvImHIWP+zB/cySYdT0fQHL9WjO6/uZW0ALuv
oZugICSmZ15pYlACIU8iYztRkS19CJZrUV7Gbq4+AurUKP8kCEI=
=2Ohx
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
"This is a very small update with no bug fixes and no new features.
The larger update of overlayfs for this cycle, the re-factoring of
overlayfs code into generic backing_file helpers, was already merged
via Christian.
Summary:
- Simplify/clarify some code
No bug fixes here, just some changes following questions from Al
about overlayfs code that could be a little more simple to follow.
- Overlayfs documentation style fixes
Mainly fixes for ReST formatting suggested by documentation
developers"
* tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
overlayfs.rst: fix ReST formatting
overlayfs.rst: use consistent feature names
ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up()
ovl: remove redundant ofs->indexdir member
|
|
|
|
17b9e388c6 |
fscrypt updates for 6.8
Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZZx4UBQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK85+AQCBHoG6R5UuPqafoDtabcCpxRW/ZHdo WzOwjvHz1/tq5AEApogvjPI/3v2gelLnG9ZrXUBZMWZN6W0LQbH/k1VHjQ8= =nvWY -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: "Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fs: move fscrypt keyring destruction to after ->put_super f2fs: move release of block devices to after kill_block_super() fscrypt: document that CephFS supports fscrypt now fscrypt: update comment for do_remove_key() fscrypt.rst: update definition of struct fscrypt_context_v2 |
|
|
|
12958e9c4c |
New code for 6.8:
* New features/functionality
* Online repair
* Reserve disk space for online repairs.
* Fix misinteraction between the AIL and btree bulkloader because of
which the bulk load fails to queue a buffer for writeback if it
happens to be on the AIL list.
* Prevent transaction reservation overflows when reaping blocks during
online repair.
* Whenever possible, bulkloader now copies multiple records into a
block.
* Support repairing of
1. Per-AG free space, inode and refcount btrees.
2. Ondisk inodes.
3. File data and attribute fork mappings.
* Verify the contents of
1. Inode and data fork of realtime bitmap file.
2. Quota files.
* Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks about
a pmem device being removed.
* Bug fixes
* Fix memory leak of recovered attri intent items.
* Fix UAF during log intent recovery.
* Fix realtime geometry integer overflows.
* Prevent scrub from live locking in xchk_iget.
* Prevent fs shutdown when removing files during low free disk space.
* Prevent transaction reservation overflow when extending an RT device.
* Prevent incorrect warning from being printed when extending a
filesystem.
* Fix an off-by-one error in xreap_agextent_binval.
* Serialize access to perag radix tree during deletion operation.
* Fix perag memory leak during growfs.
* Allow allocation of minlen realtime extent when the maximum sized
realtime free extent is minlen in size.
* Cleanups
* Remove duplicate boilerplate code spread across functionality associated
with different log items.
* Cleanup resblks interfaces.
* Pass defer ops pointer to defer helpers instead of an enum.
* Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings.
* Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size of
structures and structure member offsets. This is done in order to be
able to share the code with userspace.
* Move XFS documentation under a new directory specific to XFS.
* Do not invoke deferred ops' ->create_done callback if the deferred
operation does not have an intent item associated with it.
* Remove duplicate inclusion of header files from scrub/health.c.
* Refactor Realtime code.
* Cleanup attr code.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZZJQbwAKCRAH7y4RirJu
9JjkAP9Zg0QZNmAMsZwvgEBbuF/OnHKl4GmPA5uq0jPmSWCOqAEA0HjlOmuNfQWn
93fIw6CPbt+9QCluTYBwUisKLIJ/wgA=
=qmO0
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"New features/functionality:
- Online repair:
- Reserve disk space for online repairs
- Fix misinteraction between the AIL and btree bulkloader because
of which the bulk load fails to queue a buffer for writeback if
it happens to be on the AIL list
- Prevent transaction reservation overflows when reaping blocks
during online repair
- Whenever possible, bulkloader now copies multiple records into
a block
- Support repairing of
1. Per-AG free space, inode and refcount btrees
2. Ondisk inodes
3. File data and attribute fork mappings
- Verify the contents of
1. Inode and data fork of realtime bitmap file
2. Quota files
- Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks
about a pmem device being removed
Bug fixes:
- Fix memory leak of recovered attri intent items
- Fix UAF during log intent recovery
- Fix realtime geometry integer overflows
- Prevent scrub from live locking in xchk_iget
- Prevent fs shutdown when removing files during low free disk space
- Prevent transaction reservation overflow when extending an RT
device
- Prevent incorrect warning from being printed when extending a
filesystem
- Fix an off-by-one error in xreap_agextent_binval
- Serialize access to perag radix tree during deletion operation
- Fix perag memory leak during growfs
- Allow allocation of minlen realtime extent when the maximum sized
realtime free extent is minlen in size
Cleanups:
- Remove duplicate boilerplate code spread across functionality
associated with different log items
- Cleanup resblks interfaces
- Pass defer ops pointer to defer helpers instead of an enum
- Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings
- Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size
of structures and structure member offsets. This is done in order
to be able to share the code with userspace
- Move XFS documentation under a new directory specific to XFS
- Do not invoke deferred ops' ->create_done callback if the deferred
operation does not have an intent item associated with it
- Remove duplicate inclusion of header files from scrub/health.c
- Refactor Realtime code
- Cleanup attr code"
* tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits)
xfs: use the op name in trace_xlog_intent_recovery_failed
xfs: fix a use after free in xfs_defer_finish_recovery
xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts
xfs: remove xfs_attr_sf_hdr_t
xfs: remove struct xfs_attr_shortform
xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue
xfs: remove xfs_attr_shortform_lookup
xfs: simplify xfs_attr_sf_findname
xfs: move the xfs_attr_sf_lookup tracepoint
xfs: return if_data from xfs_idata_realloc
xfs: make if_data a void pointer
xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc
xfs: simplify and optimize the RT allocation fallback cascade
xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc
xfs: remove XFS_RTMIN/XFS_RTMAX
xfs: remove rt-wrappers from xfs_format.h
xfs: factor out a xfs_rtalloc_sumlevel helper
xfs: tidy up xfs_rtallocate_extent_exact
xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block
xfs: reflow the tail end of xfs_rtallocate_extent_block
...
|
|
|
|
9f2a635235 |
Quite a lot of kexec work this time around. Many singleton patches in
many places. The notable patch series are:
- nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio
conversions for file paths".
- Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2:
Folio conversions for directory paths".
- IA64 remnant removal in Heiko Carstens's "Remove unused code after
IA-64 removal".
- Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere
in "Treewide: enable -Wmissing-prototypes". This had some followup
fixes:
- Nathan Chancellor has cleaned up the hexagon build in the series
"hexagon: Fix up instances of -Wmissing-prototypes".
- Nathan also addressed some s390 warnings in "s390: A couple of
fixes for -Wmissing-prototypes".
- Arnd Bergmann addresses the same warnings for MIPS in his series
"mips: address -Wmissing-prototypes warnings".
- Baoquan He has made kexec_file operate in a top-down-fitting manner
similar to kexec_load in the series "kexec_file: Load kernel at top of
system RAM if required"
- Baoquan He has also added the self-explanatory "kexec_file: print out
debugging message if required".
- Some checkstack maintenance work from Tiezhu Yang in the series
"Modify some code about checkstack".
- Douglas Anderson has disentangled the watchdog code's logging when
multiple reports are occurring simultaneously. The series is "watchdog:
Better handling of concurrent lockups".
- Yuntao Wang has contributed some maintenance work on the crash code in
"crash: Some cleanups and fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA
juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV
vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk=
=kQw+
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Quite a lot of kexec work this time around. Many singleton patches in
many places. The notable patch series are:
- nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
conversions for file paths'.
- Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
Folio conversions for directory paths'.
- IA64 remnant removal in Heiko Carstens's 'Remove unused code after
IA-64 removal'.
- Arnd Bergmann has enabled the -Wmissing-prototypes warning
everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
some followup fixes:
- Nathan Chancellor has cleaned up the hexagon build in the series
'hexagon: Fix up instances of -Wmissing-prototypes'.
- Nathan also addressed some s390 warnings in 's390: A couple of
fixes for -Wmissing-prototypes'.
- Arnd Bergmann addresses the same warnings for MIPS in his series
'mips: address -Wmissing-prototypes warnings'.
- Baoquan He has made kexec_file operate in a top-down-fitting manner
similar to kexec_load in the series 'kexec_file: Load kernel at top
of system RAM if required'
- Baoquan He has also added the self-explanatory 'kexec_file: print
out debugging message if required'.
- Some checkstack maintenance work from Tiezhu Yang in the series
'Modify some code about checkstack'.
- Douglas Anderson has disentangled the watchdog code's logging when
multiple reports are occurring simultaneously. The series is
'watchdog: Better handling of concurrent lockups'.
- Yuntao Wang has contributed some maintenance work on the crash code
in 'crash: Some cleanups and fixes'"
* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
crash_core: fix and simplify the logic of crash_exclude_mem_range()
x86/crash: use SZ_1M macro instead of hardcoded value
x86/crash: remove the unused image parameter from prepare_elf_headers()
kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
scripts/decode_stacktrace.sh: strip unexpected CR from lines
watchdog: if panicking and we dumped everything, don't re-enable dumping
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
kexec_core: fix the assignment to kimage->control_page
x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
lib/trace_readwrite.c:: replace asm-generic/io with linux/io
nilfs2: cpfile: fix some kernel-doc warnings
stacktrace: fix kernel-doc typo
scripts/checkstack.pl: fix no space expression between sp and offset
x86/kexec: fix incorrect argument passed to kexec_dprintk()
x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
nilfs2: add missing set_freezable() for freezable kthread
kernel: relay: remove relay_file_splice_read dead code, doesn't work
docs: submit-checklist: remove all of "make namespacecheck"
...
|
|
|
|
fb46e22a9e |
Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the
series
"maple_tree: add mt_free_one() and mt_attr() helpers"
"Some cleanups of maple tree"
- In the series "mm: use memmap_on_memory semantics for dax/kmem"
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few
fixes) in the patch series
"Add folio_zero_tail() and folio_fill_tail()"
"Make folio_start_writeback return void"
"Fix fault handler's handling of poisoned tail pages"
"Convert aops->error_remove_page to ->error_remove_folio"
"Finish two folio conversions"
"More swap folio conversions"
- Kefeng Wang has also contributed folio-related work in the series
"mm: cleanup and use more folio in page fault"
- Jim Cromie has improved the kmemleak reporting output in the
series "tweak kmemleak report format".
- In the series "stackdepot: allow evicting stack traces" Andrey
Konovalov to permits clients (in this case KASAN) to cause
eviction of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series "mm:
page_alloc: fixes for high atomic reserve caluculations".
- Dmitry Rokosov has added to the samples/ dorectory some sample
code for a userspace memcg event listener application. See the
series "samples: introduce cgroup events listeners".
- Some mapletree maintanance work from Liam Howlett in the series
"maple_tree: iterator state changes".
- Nhat Pham has improved zswap's approach to writeback in the
series "workload-specific and memory pressure-driven zswap
writeback".
- DAMON/DAMOS feature and maintenance work from SeongJae Park in
the series
"mm/damon: let users feed and tame/auto-tune DAMOS"
"selftests/damon: add Python-written DAMON functionality tests"
"mm/damon: misc updates for 6.8"
- Yosry Ahmed has improved memcg's stats flushing in the series
"mm: memcg: subtree stats flushing and thresholds".
- In the series "Multi-size THP for anonymous memory" Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series "More buffer_head
cleanups".
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
"userfaultfd move option". UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a "KSM Advisor", in the series
"mm/ksm: Add ksm advisor". This is a governor which tunes KSM's
scanning aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory
use in the series "mm/zswap: dstmem reuse optimizations and
cleanups".
- Matthew Wilcox has performed some maintenance work on the
writeback code, both code and within filesystems. The series is
"Clean up the writeback paths".
- Andrey Konovalov has optimized KASAN's handling of alloc and
free stack traces for secondary-level allocators, in the series
"kasan: save mempool stack traces".
- Andrey also performed some KASAN maintenance work in the series
"kasan: assorted clean-ups".
- David Hildenbrand has gone to town on the rmap code. Cleanups,
more pte batching, folio conversions and more. See the series
"mm/rmap: interface overhaul".
- Kinsey Ho has contributed some maintenance work on the MGLRU
code in the series "mm/mglru: Kconfig cleanup".
- Matthew Wilcox has contributed lruvec page accounting code
cleanups in the series "Remove some lruvec page accounting
functions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
=0NHs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Peng Zhang has done some mapletree maintainance work in the series
'maple_tree: add mt_free_one() and mt_attr() helpers'
'Some cleanups of maple tree'
- In the series 'mm: use memmap_on_memory semantics for dax/kmem'
Vishal Verma has altered the interworking between memory-hotplug
and dax/kmem so that newly added 'device memory' can more easily
have its memmap placed within that newly added memory.
- Matthew Wilcox continues folio-related work (including a few fixes)
in the patch series
'Add folio_zero_tail() and folio_fill_tail()'
'Make folio_start_writeback return void'
'Fix fault handler's handling of poisoned tail pages'
'Convert aops->error_remove_page to ->error_remove_folio'
'Finish two folio conversions'
'More swap folio conversions'
- Kefeng Wang has also contributed folio-related work in the series
'mm: cleanup and use more folio in page fault'
- Jim Cromie has improved the kmemleak reporting output in the series
'tweak kmemleak report format'.
- In the series 'stackdepot: allow evicting stack traces' Andrey
Konovalov to permits clients (in this case KASAN) to cause eviction
of no longer needed stack traces.
- Charan Teja Kalla has fixed some accounting issues in the page
allocator's atomic reserve calculations in the series 'mm:
page_alloc: fixes for high atomic reserve caluculations'.
- Dmitry Rokosov has added to the samples/ dorectory some sample code
for a userspace memcg event listener application. See the series
'samples: introduce cgroup events listeners'.
- Some mapletree maintanance work from Liam Howlett in the series
'maple_tree: iterator state changes'.
- Nhat Pham has improved zswap's approach to writeback in the series
'workload-specific and memory pressure-driven zswap writeback'.
- DAMON/DAMOS feature and maintenance work from SeongJae Park in the
series
'mm/damon: let users feed and tame/auto-tune DAMOS'
'selftests/damon: add Python-written DAMON functionality tests'
'mm/damon: misc updates for 6.8'
- Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
memcg: subtree stats flushing and thresholds'.
- In the series 'Multi-size THP for anonymous memory' Ryan Roberts
has added a runtime opt-in feature to transparent hugepages which
improves performance by allocating larger chunks of memory during
anonymous page faults.
- Matthew Wilcox has also contributed some cleanup and maintenance
work against eh buffer_head code int he series 'More buffer_head
cleanups'.
- Suren Baghdasaryan has done work on Andrea Arcangeli's series
'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
compaction algorithms to move userspace's pages around rather than
UFFDIO_COPY'a alloc/copy/free.
- Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
Add ksm advisor'. This is a governor which tunes KSM's scanning
aggressiveness in response to userspace's current needs.
- Chengming Zhou has optimized zswap's temporary working memory use
in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.
- Matthew Wilcox has performed some maintenance work on the writeback
code, both code and within filesystems. The series is 'Clean up the
writeback paths'.
- Andrey Konovalov has optimized KASAN's handling of alloc and free
stack traces for secondary-level allocators, in the series 'kasan:
save mempool stack traces'.
- Andrey also performed some KASAN maintenance work in the series
'kasan: assorted clean-ups'.
- David Hildenbrand has gone to town on the rmap code. Cleanups, more
pte batching, folio conversions and more. See the series 'mm/rmap:
interface overhaul'.
- Kinsey Ho has contributed some maintenance work on the MGLRU code
in the series 'mm/mglru: Kconfig cleanup'.
- Matthew Wilcox has contributed lruvec page accounting code cleanups
in the series 'Remove some lruvec page accounting functions'"
* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
mm, treewide: introduce NR_PAGE_ORDERS
selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
selftests/mm: skip test if application doesn't has root privileges
selftests/mm: conform test to TAP format output
selftests: mm: hugepage-mmap: conform to TAP format output
selftests/mm: gup_test: conform test to TAP format output
mm/selftests: hugepage-mremap: conform test to TAP format output
mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
mm/memcontrol: remove __mod_lruvec_page_state()
mm/khugepaged: use a folio more in collapse_file()
slub: use a folio in __kmalloc_large_node
slub: use folio APIs in free_large_kmalloc()
slub: use alloc_pages_node() in alloc_slab_page()
mm: remove inc/dec lruvec page state functions
mm: ratelimit stat flush from workingset shrinker
kasan: stop leaking stack trace handles
mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
mm/mglru: add dummy pmd_dirty()
...
|
|
|
|
3f6984e730 |
vfs-6.8.super
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc
osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1
/wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc=
=iFAQ
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs super updates from Christian Brauner:
"This contains the super work for this cycle including the long-awaited
series by Jan to make it possible to prevent writing to mounted block
devices:
- Writing to mounted devices is dangerous and can lead to filesystem
corruption as well as crashes. Furthermore syzbot comes with more
and more involved examples how to corrupt block device under a
mounted filesystem leading to kernel crashes and reports we can do
nothing about. Add tracking of writers to each block device and a
kernel cmdline argument which controls whether other writeable
opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are
allowed.
Note that this effectively only prevents modification of the
particular block device's page cache by other writers. The actual
device content can still be modified by other means - e.g. by
issuing direct scsi commands, by doing writes through devices lower
in the storage stack (e.g. in case loop devices, DM, or MD are
involved) etc. But blocking direct modifications of the block
device page cache is enough to give filesystems a chance to perform
data validation when loading data from the underlying storage and
thus prevent kernel crashes.
Syzbot can use this cmdline argument option to avoid uninteresting
crashes. Also users whose userspace setup does not need writing to
mounted block devices can set this option for hardening. We expect
that this will be interesting to quite a few workloads.
Btrfs is currently opted out of this because they still haven't
merged patches we require for this to work from three kernel
releases ago.
- Reimplement block device freezing and thawing as holder operations
on the block device.
This allows us to extend block device freezing to all devices
associated with a superblock and not just the main device. It also
allows us to remove get_active_super() and thus another function
that scans the global list of superblocks.
Freezing via additional block devices only works if the filesystem
chooses to use @fs_holder_ops for these additional devices as well.
That currently only includes ext4 and xfs.
Earlier releases switched get_tree_bdev() and mount_bdev() to use
@fs_holder_ops. The remaining nilfs2 open-coded version of
mount_bdev() has been converted to rely on @fs_holder_ops as well.
So block device freezing for the main block device will continue to
work as before.
There should be no regressions in functionality. The only special
case is btrfs where block device freezing for the main block device
never worked because sb->s_bdev isn't set. Block device freezing
for btrfs can be fixed once they can switch to @fs_holder_ops but
that can happen whenever they're ready"
* tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
block: Fix a memory leak in bdev_open_by_dev()
super: don't bother with WARN_ON_ONCE()
super: massage wait event mechanism
ext4: Block writes to journal device
xfs: Block writes to log device
fs: Block writes to mounted block devices
btrfs: Do not restrict writes to btrfs devices
block: Add config option to not allow writing to mounted devices
block: Remove blkdev_get_by_*() functions
bcachefs: Convert to bdev_open_by_path()
fs: handle freezing from multiple devices
fs: remove dead check
nilfs2: simplify device handling
fs: streamline thaw_super_locked
ext4: simplify device handling
xfs: simplify device handling
fs: simplify setup_bdev_super() calls
blkdev: comment fs_holder_ops
porting: document block device freeze and thaw changes
fs: remove unused helper
...
|
|
|
|
c1f1f5bf41 |
fscrypt: document that CephFS supports fscrypt now
The help text for CONFIG_FS_ENCRYPTION and the fscrypt.rst documentation file both list the filesystems that support fscrypt. CephFS added support for fscrypt in v6.6, so add CephFS to the list. Link: https://lore.kernel.org/r/20231227045158.87276-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> |
|
|
|
4498a8eccc |
netfs, fscache: Remove ->begin_cache_operation
Remove ->begin_cache_operation() in favour of just calling fscache directly. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com |
|
|
|
3485b88390 |
mm: thp: introduce multi-size THP sysfs interface
In preparation for adding support for anonymous multi-size THP, introduce new sysfs structure that will be used to control the new behaviours. A new directory is added under transparent_hugepage for each supported THP size, and contains an `enabled` file, which can be set to "inherit" (to inherit the global setting), "always", "madvise" or "never". For now, the kernel still only supports PMD-sized anonymous THP, so only 1 directory is populated. The first half of the change converts transhuge_vma_suitable() and hugepage_vma_check() so that they take a bitfield of orders for which the user wants to determine support, and the functions filter out all the orders that can't be supported, given the current sysfs configuration and the VMA dimensions. The resulting functions are renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() respectively. Convenience functions that take a single, unencoded order and return a boolean are also defined as thp_vma_suitable_order() and thp_vma_allowable_order(). The second half of the change implements the new sysfs interface. It has been done so that each supported THP size has a `struct thpsize`, which describes the relevant metadata and is itself a kobject. This is pretty minimal for now, but should make it easy to add new per-thpsize files to the interface if needed in future (e.g. per-size defrag). Rather than keep the `enabled` state directly in the struct thpsize, I've elected to directly encode it into huge_anon_orders_[always|madvise|inherit] bitfields since this reduces the amount of work required in thp_vma_allowable_orders() which is called for every page fault. See Documentation/admin-guide/mm/transhuge.rst, as modified by this commit, for details of how the new sysfs interface works. [ryan.roberts@arm.com: fix build warning when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/20231211125320.3997543-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20231207161211.2374093-4-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <v-songbaohua@oppo.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Rientjes <rientjes@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Itaru Kitayama <itaru.kitayama@gmail.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
d17bb4620f |
overlayfs.rst: fix ReST formatting
Fix some indentation issues and fix missing newlines in quoted text by converting quoted text to code blocks. Reported-by: Christian Brauner <brauner@kernel.org> Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
|
|
|
bdc10bdf4b |
overlayfs.rst: use consistent feature names
Use the feature names "metacopy" and "index" consistently throughout the document. Covert the numbered list of features "redirect_dir", "index", "xino" to section headings, so that those features could be referenced in the document by their name. Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> |
|
|
|
48aa137e5a |
docs: filesystems: document the squashfs specific mount options
When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set, the "threads" mount option can be used to specify the decompression mode: single-threaded, multi-threaded, percpu or the number of threads used for decompression. When SQUASHFS_CHOICE_DECOMP_BY_MOUNT is not set, SQUASHFS_DECOMP_MULTI and SQUASHFS_MOUNT_DECOMP_THREADS are both set, the "threads" option can also be used to specify the number of threads used for decompression. This mount option is only mentioned in fs/squashfs/Kconfig, which makes it difficult to find. Another mount option available is "errors", which can be configured to panic the kernel when squashfs errors are encountered. Add both these options to the squashfs documentation, making them more noticeable. Link: https://lkml.kernel.org/r/20231117161215.140282-1-amiculas@cisco.com Signed-off-by: Ariel Miculas <amiculas@cisco.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
af7628d6ec |
fs: convert error_remove_page to error_remove_folio
There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
33318c0e6b |
fscrypt.rst: update definition of struct fscrypt_context_v2
Get the copy of the fscrypt_context_v2 definition in the documentation
in sync with the actual definition, which was changed recently by
commit
|
|
|
|
011f129fee |
Documentation: xfs: consolidate XFS docs into its own subdirectory
XFS docs are currently in upper-level Documentation/filesystems. Although these are currently 4 docs, they are already outstanding as a group and can be moved to its own subdirectory. Consolidate them into Documentation/filesystems/xfs/. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> |
|
|
|
11ca77cdcc |
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
By default, shared mmap is disabled in FUSE DIRECT_IO mode. However, when the DIRECT_IO_ALLOW_MMAP flag is enabled in the FUSE_INIT reply, shared mmap is allowed. Signed-off-by: Tyler Fanelli <tfanelli@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> |
|
|
|
a8b0026847 |
rename(): avoid a deadlock in the case of parents having no common ancestor
... and fix the directory locking documentation and proof of correctness. Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the case where we really don't want it is splicing the root of disconnected tree to somewhere. In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an ancestor of Y" only if X and Y are already in the same tree. Otherwise it can go from false to true, and one can construct a deadlock on that. Make lock_two_directories() report an error in such case and update the callers of lock_rename()/lock_rename_child() to handle such errors. And yes, such conditions are not impossible to create ;-/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |