mirror of https://github.com/torvalds/linux.git
167 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
099b847ccc |
ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
A syzbot fuzzed image triggered a BUG_ON in ext4_update_inline_data() when an inode had the INLINE_DATA_FL flag set but was missing the system.data extended attribute. Since this can happen due to a maiciouly fuzzed file system, we shouldn't BUG, but rather, report it as a corrupted file system. Add similar replacements of BUG_ON with EXT4_ERROR_INODE() ii ext4_create_inline_data() and ext4_inline_data_truncate(). Reported-by: syzbot+544248a761451c0df72f@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
90f097b140 |
ext4: refactor the inline directory conversion and new directory codepaths
There was a lot of common code in the codepaths used to convert an inline directory and to creaet a new directory. To address this, rename ext4_init_dot_dotdot() to ext4_init_dirblock() and then move common code into that function. This reduces the lines of code count in fs/ext4/inline.c and fs/ext4/namei.c, as well as reducing the size of their object files. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20250712181249.434530-3-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
a35454ecf8 |
ext4: use memcpy() instead of strcpy()
The strcpy() function is considered dangerous and eeeevil by people
who are using sophisticated code analysis tools such as "grep". This
is true even when a quick inspection would show that the source is a
constant string ("." or "..") and the destination is a fixed array
which is guaranteed to have enough space. Make the "grep" code
analysis tool happy by using memcpy() isstead of strcpy(). :-)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20250712181249.434530-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
91b8ca8b26 |
ext4: Make sure BH_New bit is cleared in ->write_end handler
Currently we clear BH_New bit in case of error and also in the standard
ext4_write_end() handler (in block_commit_write()). However
ext4_journalled_write_end() misses this clearing and thus we are leaving
stale BH_New bits behind. Generally ext4_block_write_begin() clears
these bits before any harm can be done but in case blocksize < pagesize
and we hit some error when processing a page with these stale bits,
we'll try to zero buffers with these stale BH_New bits and jbd2 will
complain (as buffers were not prepared for writing in this transaction).
Fix the problem by clearing BH_New bits in ext4_journalled_write_end()
and WARN if ext4_block_write_begin() sees stale BH_New bits.
Reported-by: Baolin Liu <liubaolin12138@163.com>
Reported-by: Zhi Long <longzhi@sangfor.com.cn>
Fixes:
|
|
|
|
57661f2875 |
ext4: replace ext4_writepage_trans_blocks()
After ext4 supports large folios, the semantics of reserving credits in pages is no longer applicable. In most scenarios, reserving credits in extents is sufficient. Therefore, introduce ext4_chunk_trans_extent() to replace ext4_writepage_trans_blocks(). move_extent_per_page() is the only remaining location where we are still processing extents in pages. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
227cb4ca5a |
ext4: inline: fix len overflow in ext4_prepare_inline_data
When running the following code on an ext4 filesystem with inline_data
feature enabled, it will lead to the bug below.
fd = open("file1", O_RDWR | O_CREAT | O_TRUNC, 0666);
ftruncate(fd, 30);
pwrite(fd, "a", 1, (1UL << 40) + 5UL);
That happens because write_begin will succeed as when
ext4_generic_write_inline_data calls ext4_prepare_inline_data, pos + len
will be truncated, leading to ext4_prepare_inline_data parameter to be 6
instead of 0x10000000006.
Then, later when write_end is called, we hit:
BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
at ext4_write_inline_data.
Fix it by using a loff_t type for the len parameter in
ext4_prepare_inline_data instead of an unsigned int.
[ 44.545164] ------------[ cut here ]------------
[ 44.545530] kernel BUG at fs/ext4/inline.c:240!
[ 44.545834] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 44.546172] CPU: 3 UID: 0 PID: 343 Comm: test Not tainted 6.15.0-rc2-00003-g9080916f4863 #45 PREEMPT(full) 112853fcebfdb93254270a7959841d2c6aa2c8bb
[ 44.546523] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 44.546523] RIP: 0010:ext4_write_inline_data+0xfe/0x100
[ 44.546523] Code: 3c 0e 48 83 c7 48 48 89 de 5b 41 5c 41 5d 41 5e 41 5f 5d e9 e4 fa 43 01 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc 0f 0b <0f> 0b 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 20 49
[ 44.546523] RSP: 0018:ffffb342008b79a8 EFLAGS: 00010216
[ 44.546523] RAX: 0000000000000001 RBX: ffff9329c579c000 RCX: 0000010000000006
[ 44.546523] RDX: 000000000000003c RSI: ffffb342008b79f0 RDI: ffff9329c158e738
[ 44.546523] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 44.546523] R10: 00007ffffffff000 R11: ffffffff9bd0d910 R12: 0000006210000000
[ 44.546523] R13: fffffc7e4015e700 R14: 0000010000000005 R15: ffff9329c158e738
[ 44.546523] FS: 00007f4299934740(0000) GS:ffff932a60179000(0000) knlGS:0000000000000000
[ 44.546523] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 44.546523] CR2: 00007f4299a1ec90 CR3: 0000000002886002 CR4: 0000000000770eb0
[ 44.546523] PKRU: 55555554
[ 44.546523] Call Trace:
[ 44.546523] <TASK>
[ 44.546523] ext4_write_inline_data_end+0x126/0x2d0
[ 44.546523] generic_perform_write+0x17e/0x270
[ 44.546523] ext4_buffered_write_iter+0xc8/0x170
[ 44.546523] vfs_write+0x2be/0x3e0
[ 44.546523] __x64_sys_pwrite64+0x6d/0xc0
[ 44.546523] do_syscall_64+0x6a/0xf0
[ 44.546523] ? __wake_up+0x89/0xb0
[ 44.546523] ? xas_find+0x72/0x1c0
[ 44.546523] ? next_uptodate_folio+0x317/0x330
[ 44.546523] ? set_pte_range+0x1a6/0x270
[ 44.546523] ? filemap_map_pages+0x6ee/0x840
[ 44.546523] ? ext4_setattr+0x2fa/0x750
[ 44.546523] ? do_pte_missing+0x128/0xf70
[ 44.546523] ? security_inode_post_setattr+0x3e/0xd0
[ 44.546523] ? ___pte_offset_map+0x19/0x100
[ 44.546523] ? handle_mm_fault+0x721/0xa10
[ 44.546523] ? do_user_addr_fault+0x197/0x730
[ 44.546523] ? do_syscall_64+0x76/0xf0
[ 44.546523] ? arch_exit_to_user_mode_prepare+0x1e/0x60
[ 44.546523] ? irqentry_exit_to_user_mode+0x79/0x90
[ 44.546523] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 44.546523] RIP: 0033:0x7f42999c6687
[ 44.546523] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
[ 44.546523] RSP: 002b:00007ffeae4a7930 EFLAGS: 00000202 ORIG_RAX: 0000000000000012
[ 44.546523] RAX: ffffffffffffffda RBX: 00007f4299934740 RCX: 00007f42999c6687
[ 44.546523] RDX: 0000000000000001 RSI: 000055ea6149200f RDI: 0000000000000003
[ 44.546523] RBP: 00007ffeae4a79a0 R08: 0000000000000000 R09: 0000000000000000
[ 44.546523] R10: 0000010000000005 R11: 0000000000000202 R12: 0000000000000000
[ 44.546523] R13: 00007ffeae4a7ac8 R14: 00007f4299b86000 R15: 000055ea61493dd8
[ 44.546523] </TASK>
[ 44.546523] Modules linked in:
[ 44.568501] ---[ end trace 0000000000000000 ]---
[ 44.568889] RIP: 0010:ext4_write_inline_data+0xfe/0x100
[ 44.569328] Code: 3c 0e 48 83 c7 48 48 89 de 5b 41 5c 41 5d 41 5e 41 5f 5d e9 e4 fa 43 01 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc 0f 0b <0f> 0b 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 20 49
[ 44.570931] RSP: 0018:ffffb342008b79a8 EFLAGS: 00010216
[ 44.571356] RAX: 0000000000000001 RBX: ffff9329c579c000 RCX: 0000010000000006
[ 44.571959] RDX: 000000000000003c RSI: ffffb342008b79f0 RDI: ffff9329c158e738
[ 44.572571] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 44.573148] R10: 00007ffffffff000 R11: ffffffff9bd0d910 R12: 0000006210000000
[ 44.573748] R13: fffffc7e4015e700 R14: 0000010000000005 R15: ffff9329c158e738
[ 44.574335] FS: 00007f4299934740(0000) GS:ffff932a60179000(0000) knlGS:0000000000000000
[ 44.575027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 44.575520] CR2: 00007f4299a1ec90 CR3: 0000000002886002 CR4: 0000000000770eb0
[ 44.576112] PKRU: 55555554
[ 44.576338] Kernel panic - not syncing: Fatal exception
[ 44.576517] Kernel Offset: 0x1a600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Reported-by: syzbot+fe2a25dae02a207717a0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fe2a25dae02a207717a0
Fixes:
|
|
|
|
4d3266463e |
ext4: for committing inode, make ext4_fc_track_inode wait
If the inode that's being requested to track using ext4_fc_track_inode is being committed, then wait until the inode finishes the commit. Also, add calls to ext4_fc_track_inode at the right places. With this patch, now calling ext4_reserve_inode_write() results in inode being tracked for next fast commit. This ensures that by the time ext4_reserve_inode_write() returns, it is ready to be modified and won't be committed until the corresponding handle is open. A subtle lock ordering requirement with i_data_sem (which is documented in the code) requires that ext4_fc_track_inode() be called before grabbing i_data_sem. So, this patch also adds explicit ext4_fc_track_inode() calls in places where i_data_sem grabbed. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20250508175908.1004880-3-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
eb0ece1602 |
- The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was founf to be incorrect. - The 4 patch series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The 17 patch series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The 5 patch series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The 4 patch series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The 12 patch series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The 2 patch series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The 3 patch series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The 4 patch series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The 4 patch series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The 4 patch series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The 18 patch series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The 27 patch series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The 12 patch series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The 8 patch series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The 2 patch series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The 3 patch series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The 3 patch series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The 3 patch series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The 9 patch series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The 5 patch series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The 13 patch series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The 13 patch series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The 3 patch series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The 8 patch series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The 2 patch series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The 3 patch series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The 4 patch series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The 3 patch series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The 5 patch series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The 5 patch series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The 4 patch series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The 2 patch series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. -----BEGIN PGP SIGNATURE----- iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF Ee93qSSLR1BkNdDw+931Pu0mXfbnBw== =Pn2K -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ... |
|
|
|
e224fa3b8a |
ext4: remove redundant function ext4_has_metadata_csum
Since commit
|
|
|
|
63a23847dc |
fs: convert block_commit_write() to take a folio
All callers now have a folio, so pass it in instead of converting folio->page->folio. Link: https://lkml.kernel.org/r/20250217192009.437916-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
477aa77cce |
ext4: remove unused input "inode" in ext4_find_dest_de
Remove unused input "inode" in ext4_find_dest_de. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250123162050.2114499-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
0a1b2f5ea9 |
ext4: add ext4_emergency_state() helper function
Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file system, and they are checked in similar locations, we have added a helper function, ext4_emergency_state(), to determine whether the current file system is in one of these two emergency states. Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state() in those functions that could potentially trigger write operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
30cbe84d48 |
ext4: Refactor out ext4_try_to_write_inline_data()
Refactor ext4_try_to_write_inline_data() to simplify its implementation by directly invoking ext4_generic_write_inline_data(). Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045730.1837808-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
f9bdb042df |
ext4: Replace ext4_da_write_inline_data_begin() with ext4_generic_write_inline_data().
Replace the call to ext4_da_write_inline_data_begin() with ext4_generic_write_inline_data(), and delete the ext4_da_write_inline_data_begin(). Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045710.1837756-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
3db572f780 |
ext4: Introduce a new helper function ext4_generic_write_inline_data()
A new function, ext4_generic_write_inline_data(), is introduced to provide a generic implementation of the common logic found in ext4_da_write_inline_data_begin() and ext4_try_to_write_inline_data(). This function will be utilized in the subsequent two patches. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250107045549.1837589-1-sunjunchao2870@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
056f8c437d |
Lots of cleanups and bug fixes this cycle, primarily in the block
allocation, extent management, fast commit, and journalling. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmbsGRcACgkQ8vlZVpUN gaP+pwgAop3LUpOFQ9dPRTR3+37AJI8adfabfLIDkEkoVA7lyYY/6Q8pcQ0rklq3 wE1WxrJ7MaE1GaFCwRIDIL6TP+uYRK0pPjqbFBxGakhDc+WXrTcALOWWofb7J7PL FLwP264lRRfKfpMHdK8bx6YHnEN8425PR+ZNXGVPsw+wjo72mmnq54w+ct1iOKiw dKfIrwwCGKlBsNdYHS/XsSx7MMK8e7nsKoSq0UtpJ4PqF11/asOtlYYODc4hd27U E3I3UDKuntmz+meAscDejOJqQk5FT184HIt/Y5JfetKU2zpUFj9IKqXDzMjijdaj vGn9RkTXfJdxMPm1ouF2R6KIRJollg== =V7+A -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of cleanups and bug fixes this cycle, primarily in the block allocation, extent management, fast commit, and journalling" * tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (93 commits) ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C ext4: check stripe size compatibility on remount as well ext4: fix i_data_sem unlock order in ext4_ind_migrate() ext4: remove the special buffer dirty handling in do_journal_get_write_access ext4: fix a potential assertion failure due to improperly dirtied buffer ext4: hoist ext4_block_write_begin and replace the __block_write_begin ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers ext4: dax: keep orphan list before truncate overflow allocated blocks ext4: fix error message when rejecting the default hash ext4: save unnecessary indentation in ext4_ext_create_new_leaf() ext4: make some fast commit functions reuse extents path ext4: refactor ext4_swap_extents() to reuse extents path ext4: get rid of ppath in convert_initialized_extent() ext4: get rid of ppath in ext4_ext_handle_unwritten_extents() ext4: get rid of ppath in ext4_ext_convert_to_initialized() ext4: get rid of ppath in ext4_convert_unwritten_extents_endio() ext4: get rid of ppath in ext4_split_convert_extents() ext4: get rid of ppath in ext4_split_extent() ext4: get rid of ppath in ext4_force_split_extent_at() ext4: get rid of ppath in ext4_split_extent_at() ... |
|
|
|
3352633ce6 |
vfs-6.12.file
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
osS0AQCgIpvey9oW5DMyMw6Bv0hFMRv95gbNQZfHy09iK+NMNAD9GALhb/4cMIVB
7YrZGXEz454lpgcs8AnrOVjVNfctOQg=
=e9s9
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This is the work to cleanup and shrink struct file significantly.
Right now, (focusing on x86) struct file is 232 bytes. After this
series struct file will be 184 bytes aka 3 cacheline and a spare 8
bytes for future extensions at the end of the struct.
With struct file being as ubiquitous as it is this should make a
difference for file heavy workloads and allow further optimizations in
the future.
- struct fown_struct was embedded into struct file letting it take up
32 bytes in total when really it shouldn't even be embedded in
struct file in the first place. Instead, actual users of struct
fown_struct now allocate the struct on demand. This frees up 24
bytes.
- Move struct file_ra_state into the union containg the cleanup hooks
and move f_iocb_flags out of the union. This closes a 4 byte hole
we created earlier and brings struct file to 192 bytes. Which means
struct file is 3 cachelines and we managed to shrink it by 40
bytes.
- Reorder struct file so that nothing crosses a cacheline.
I suspect that in the future we will end up reordering some members
to mitigate false sharing issues or just because someone does
actually provide really good perf data.
- Shrinking struct file to 192 bytes is only part of the work.
Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache
is created with SLAB_TYPESAFE_BY_RCU the free pointer must be
located outside of the object because the cache doesn't know what
part of the memory can safely be overwritten as it may be needed to
prevent object recycling.
That has the consequence that SLAB_TYPESAFE_BY_RCU may end up
adding a new cacheline.
So this also contains work to add a new kmem_cache_create_rcu()
function that allows the caller to specify an offset where the
freelist pointer is supposed to be placed. Thus avoiding the
implicit addition of a fourth cacheline.
- And finally this removes the f_version member in struct file.
The f_version member isn't particularly well-defined. It is mainly
used as a cookie to detect concurrent seeks when iterating
directories. But it is also abused by some subsystems for
completely unrelated things.
It is mostly a directory and filesystem specific thing that doesn't
really need to live in struct file and with its wonky semantics it
really lacks a specific function.
For pipes, f_version is (ab)used to defer poll notifications until
a write has happened. And struct pipe_inode_info is used by
multiple struct files in their ->private_data so there's no chance
of pushing that down into file->private_data without introducing
another pointer indirection.
But pipes don't rely on f_pos_lock so this adds a union into struct
file encompassing f_pos_lock and a pipe specific f_pipe member that
pipes can use. This union of course can be extended to other file
types and is similar to what we do in struct inode already"
* tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
fs: remove f_version
pipe: use f_pipe
fs: add f_pipe
ubifs: store cookie in private data
ufs: store cookie in private data
udf: store cookie in private data
proc: store cookie in private data
ocfs2: store cookie in private data
input: remove f_version abuse
ext4: store cookie in private data
ext2: store cookie in private data
affs: store cookie in private data
fs: add generic_llseek_cookie()
fs: use must_set_pos()
fs: add must_set_pos()
fs: add vfs_setpos_cookie()
s390: remove unused f_version
ceph: remove unused f_version
adi: remove unused f_version
mm: Removed @freeptr_offset to prevent doc warning
...
|
|
|
|
4f05ee2f82
|
ext4: store cookie in private data
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-11-6d3e4816aa7b@kernel.org Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
cb3de5fc87 |
ext4: fix a potential assertion failure due to improperly dirtied buffer
On an old kernel version(4.19, ext3, data=journal, pagesize=64k),
an assertion failure will occasionally be triggered by the line below:
-----------
jbd2_journal_commit_transaction
{
...
J_ASSERT_BH(bh, !buffer_dirty(bh));
/*
* The buffer on BJ_Forget list and not jbddirty means
...
}
-----------
The same condition may also be applied to the lattest kernel version.
When blocksize < pagesize and we truncate a file, there can be buffers in
the mapping tail page beyond i_size. These buffers will be filed to
transaction's BJ_Forget list by ext4_journalled_invalidatepage() during
truncation. When the transaction doing truncate starts committing, we can
grow the file again. This calls __block_write_begin() which allocates new
blocks under these buffers in the tail page we go through the branch:
if (buffer_new(bh)) {
clean_bdev_bh_alias(bh);
if (folio_test_uptodate(folio)) {
clear_buffer_new(bh);
set_buffer_uptodate(bh);
mark_buffer_dirty(bh);
continue;
}
...
}
Hence buffers on BJ_Forget list of the committing transaction get marked
dirty and this triggers the jbd2 assertion.
Teach ext4_block_write_begin() to properly handle files with data
journalling by avoiding dirtying them directly. Instead of
folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which
takes care of handling journalling. We also don't need to mark new uptodate
buffers as dirty in ext4_block_write_begin(). That will be either done
either by block_commit_write() in case of success or by
folio_zero_new_buffers() in case of failure.
Reported-by: Baolin Liu <liubaolin@kylinos.cn>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shida Zhang <zhangshida@kylinos.cn>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240830053739.3588573-4-zhangshida@kylinos.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
6b730a4050 |
ext4: hoist ext4_block_write_begin and replace the __block_write_begin
Using __block_write_begin() make it inconvenient to journal the user data dirty process. We can't tell the block layer maintainer, ‘Hey, we want to trace the dirty user data in ext4, can we add some special code for ext4 in __block_write_begin?’:P So use ext4_block_write_begin() instead. The two functions are basically doing the same thing except for the fscrypt related code. Remove the unnecessary #ifdef since fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION. And hoist the ext4_block_write_begin so that it can be used in other files. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830053739.3588573-3-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
c6b72f5d82 |
ext4: avoid OOB when system.data xattr changes underneath the filesystem
When looking up for an entry in an inlined directory, if e_value_offs is
changed underneath the filesystem by some change in the block device, it
will lead to an out-of-bounds access that KASAN detects as an UAF.
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
loop0: detected capacity change from 2048 to 2047
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103
CPU: 0 UID: 0 PID: 5103 Comm: syz-executor269 Not tainted 6.11.0-rc4-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
ext4_find_inline_entry+0x4be/0x5e0 fs/ext4/inline.c:1697
__ext4_find_entry+0x2b4/0x1b30 fs/ext4/namei.c:1573
ext4_lookup_entry fs/ext4/namei.c:1727 [inline]
ext4_lookup+0x15f/0x750 fs/ext4/namei.c:1795
lookup_one_qstr_excl+0x11f/0x260 fs/namei.c:1633
filename_create+0x297/0x540 fs/namei.c:3980
do_symlinkat+0xf9/0x3a0 fs/namei.c:4587
__do_sys_symlinkat fs/namei.c:4610 [inline]
__se_sys_symlinkat fs/namei.c:4607 [inline]
__x64_sys_symlinkat+0x95/0xb0 fs/namei.c:4607
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3e73ced469
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff4d40c258 EFLAGS: 00000246 ORIG_RAX: 000000000000010a
RAX: ffffffffffffffda RBX: 0032656c69662f2e RCX: 00007f3e73ced469
RDX: 0000000020000200 RSI: 00000000ffffff9c RDI: 00000000200001c0
RBP: 0000000000000000 R08: 00007fff4d40c290 R09: 00007fff4d40c290
R10: 0023706f6f6c2f76 R11: 0000000000000246 R12: 00007fff4d40c27c
R13: 0000000000000003 R14: 431bde82d7b634db R15: 00007fff4d40c2b0
</TASK>
Calling ext4_xattr_ibody_find right after reading the inode with
ext4_get_inode_loc will lead to a check of the validity of the xattrs,
avoiding this problem.
Reported-by: syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79
Fixes:
|
|
|
|
4d231b91a9 |
ext4: return error on ext4_find_inline_entry
In case of errors when reading an inode from disk or traversing inline directory entries, return an error-encoded ERR_PTR instead of returning NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns such encoded errors. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
9f04609f74
|
buffer: Convert __block_write_begin() to take a folio
Almost all callers have a folio now, so change __block_write_begin() to take a folio and remove a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
1da86618bd
|
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
8dc9c3da79 |
ext4: fix uninitialized variable in ext4_inlinedir_to_tree
Syzbot has found an uninit-value bug in ext4_inlinedir_to_tree This error happens because ext4_inlinedir_to_tree does not handle the case when ext4fs_dirhash returns an error This can be avoided by checking the return value of ext4fs_dirhash and propagating the error, similar to how it's done with ext4_htree_store_dirent Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Reported-and-tested-by: syzbot+eaba5abe296837a640c0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=eaba5abe296837a640c0 Link: https://patch.msgid.link/20240501033017.220000-1-shenxiaxi26@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
a4fc4a0c45 |
mm: add folio_zero_tail() and use it in ext4
Patch series "Add folio_zero_tail() and folio_fill_tail()". I'm trying to make it easier for filesystems with tailpacking / stuffing / inline data to use folios. The primary function here is folio_fill_tail(). You give it a pointer to memory where the data currently is, and it takes care of copying it into the folio at that offset. That works for gfs2 & iomap. Then There's Ext4. Rather than gin up some kind of specialist "Here's a two pointers to two blocks of memory" routine, just let it do its current thing, and let it call folio_zero_tail(), which is also called by folio_fill_tail(). Other filesystems can be converted later; these ones seemed like good examples as they're already partly or completely converted to folios. This patch (of 3): Instead of unmapping the folio after copying the data to it, then mapping it again to zero the tail, provide folio_zero_tail() to zero the tail of an already-mapped folio. [akpm@linux-foundation.org: fix kerneldoc argument ordering] Link: https://lkml.kernel.org/r/20231107212643.3490372-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231107212643.3490372-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
b898ab2336
|
ext4: convert to new timestamp accessors
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-33-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
3ef96fcfd5 |
Many ext4 and jbd2 cleanups and bug fixes for v6.6-rc1.
* Cleanups in the ext4 remount code when going to and from read-only
* Cleanups in ext4's multiblock allocator
* Cleanups in the jbd2 setup/mounting code paths
* Performance improvements when appending to a delayed allocation file
* Miscenallenous syzbot and other bug fixes
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmTwqUMACgkQ8vlZVpUN
gaMqgwf6Aui6MlrtNJx6CrJt4dxLANQ8G6bsJ2Zr+6QNS1X/GAUrCCyLWWom1dfb
OJ/n4/JUCNc9v5yLCTqHOE5ZFTdQItOBJUKXbJYff8EdnR+zCUULpj6bPbEs5BKp
U7CiiZ9TIi9S2TWezvIJKIa2VxgPej7CH/HOt8ISh/Msq8nHvcEEJIyOEvVk9odQ
LEkiQCsikWaljB7qEOIYo+xgFffMZfttc4zuTkdr/h1I6OWhvQYmlwSnTuAiE7BS
BVf3ebD2Dg8TChUMXOsk2d743iZNWf/+yTfbXVu93/uEM9vgF0+HO6EerTK8RMeM
yxhshg9z7ccuFjdY/2NYDXe6pEuDKw==
=cMIX
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Many ext4 and jbd2 cleanups and bug fixes:
- Cleanups in the ext4 remount code when going to and from read-only
- Cleanups in ext4's multiblock allocator
- Cleanups in the jbd2 setup/mounting code paths
- Performance improvements when appending to a delayed allocation file
- Miscellaneous syzbot and other bug fixes"
* tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
ext4: fix slab-use-after-free in ext4_es_insert_extent()
libfs: remove redundant checks of s_encoding
ext4: remove redundant checks of s_encoding
ext4: reject casefold inode flag without casefold feature
ext4: use LIST_HEAD() to initialize the list_head in mballoc.c
ext4: do not mark inode dirty every time when appending using delalloc
ext4: rename s_error_work to s_sb_upd_work
ext4: add periodic superblock update check
ext4: drop dio overwrite only flag and associated warning
ext4: add correct group descriptors and reserved GDT blocks to system zone
ext4: remove unused function declaration
ext4: mballoc: avoid garbage value from err
ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple()
ext4: change the type of blocksize in ext4_mb_init_cache()
ext4: fix unttached inode after power cut with orphan file feature enabled
jbd2: correct the end of the journal recovery scan range
ext4: ext4_get_{dev}_journal return proper error value
ext4: cleanup ext4_get_dev_journal() and ext4_get_journal()
jbd2: jbd2_journal_init_{dev,inode} return proper error return value
jbd2: drop useless error tag in jbd2_journal_wipe()
...
|
|
|
|
eb8ab4443a |
ext4: make ext4_forced_shutdown() take struct super_block
Currently ext4_forced_shutdown() takes struct ext4_sb_info but most callers need to get it from struct super_block anyway. So just pass in struct super_block to save all callers from some boilerplate code. No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616165109.21695-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
1bc33893e7 |
ext4: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-40-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
ed5d285b3f |
ext4: make ext4_es_remove_extent() return void
Now ext4_es_remove_extent() never fails, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
d19500da4b |
ext4: Make ext4_write_inline_data_end() use folio
ext4_write_inline_data_end() is completely converted to work with folio. Also all callers of ext4_write_inline_data_end() already works on folio except ext4_da_write_end(). Mostly for consistency and saving few instructions maybe, this patch just converts ext4_da_write_end() to work with folio which makes the last caller of ext4_write_inline_data_end() also converted to work with folio. We then make ext4_write_inline_data_end() take folio instead of page. Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/1bcea771720ff451a5a59b3f1bcd5fae51cb7ce7.1684122756.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
0b956de151 |
ext4: kill unused function ext4_journalled_write_inline_data
Commit
|
|
|
|
2a534e1d0d |
ext4: bail out of ext4_xattr_ibody_get() fails for any reason
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any reason, it's best if we just fail as opposed to stumbling on, especially if the failure is EFSCORRUPTED. Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
2220eaf909 |
ext4: add bounds checking in get_max_inline_xattr_value_size()
Normally the extended attributes in the inode body would have been checked when the inode is first opened, but if someone is writing to the block device while the file system is mounted, it's possible for the inode table to get corrupted. Add bounds checking to avoid reading beyond the end of allocated memory if this happens. Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7 Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
f4ce24f54d |
ext4: fix deadlock when converting an inline directory in nojournal mode
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock. There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.
A simple reproducer demonstrating the problem:
mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
mount -t ext4 -o dirsync /dev/vdc /vdc
cd /vdc
mkdir file0
cd file0
touch file0
touch file1
attr -s BurnSpaceInEA -V abcde .
touch supercalifragilisticexpialidocious
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
|
|
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
|
|
|
|
6b90d4130a |
ext4: Convert ext4_write_inline_data_end() to use a folio
Convert the incoming page to a folio so that we call compound_head() only once instead of seven times. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230324180129.1220691-16-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
6b87fbe415 |
ext4: Convert ext4_read_inline_page() to ext4_read_inline_folio()
All callers now have a folio, so pass it and use it. The folio may be large, although I doubt we'll want to use a large folio for an inline file. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230324180129.1220691-15-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
9a9d01f081 |
ext4: Convert ext4_da_write_inline_data_begin() to use a folio
Saves a number of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230324180129.1220691-14-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
4ed9b598ac |
ext4: Convert ext4_da_convert_inline_data_to_extent() to use a folio
Saves a number of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230324180129.1220691-13-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
f8f8c89f59 |
ext4: Convert ext4_try_to_write_inline_data() to use a folio
Saves a number of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230324180129.1220691-12-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
83eba701cf |
ext4: Convert ext4_convert_inline_data_to_extent() to use a folio
Saves a number of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230324180129.1220691-11-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
3edde93e07 |
ext4: Convert ext4_readpage_inline() to take a folio
Use the folio API in this function, saves a few calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230324180129.1220691-10-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
1dcdce5919 |
ext4: move where set the MAY_INLINE_DATA flag is set
The only caller of ext4_find_inline_data_nolock() that needs setting of EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode(). In ext4_write_inline_data_end() we just need to update inode->i_inline_off. Since we are going to add one more caller that does not need to set EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA out to ext4_iget_extra_inode(). Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
66267814ba |
fs/ext4: replace ternary operator with min()/max() and min_t()
Fix the following coccicheck warning: fs/ext4/inline.c:183: WARNING opportunity for min(). fs/ext4/extents.c:2631: WARNING opportunity for max(). fs/ext4/extents.c:2632: WARNING opportunity for min(). fs/ext4/extents.c:5559: WARNING opportunity for max(). fs/ext4/super.c:6908: WARNING opportunity for min(). min()/max() and min_t() macro is defined in include/linux/minmax.h. It avoids multiple evaluations of the arguments when non-constant and performs strict type-checking. Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220817025928.612851-1-13667453960@163.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
c9fd167d57 |
ext4: correct max_inline_xattr_value_size computing
If the ext4 inode does not have xattr space, 0 is returned in the get_max_inline_xattr_value_size function. Otherwise, the function returns a negative value when the inode does not contain EXT4_STATE_XATTR. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
|
|
|
5a57bca905 |
ext4: fix reading leftover inlined symlinks
Since commit |
|
|
|
fdaf9a5840 |
Page cache changes for 5.19
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first argument
like ->read_folio
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp
gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O
7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj
xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d
nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26
bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX
WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A==
=djLv
-----END PGP SIGNATURE-----
Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
|
|
|
|
ef09ed5d37 |
ext4: fix bug_on in ext4_writepages
we got issue as follows: EXT4-fs error (device loop0): ext4_mb_generate_buddy:1141: group 0, block bitmap and bg descriptor inconsistent: 25 vs 31513 free cls ------------[ cut here ]------------ kernel BUG at fs/ext4/inode.c:2708! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 2 PID: 2147 Comm: rep Not tainted 5.18.0-rc2-next-20220413+ #155 RIP: 0010:ext4_writepages+0x1977/0x1c10 RSP: 0018:ffff88811d3e7880 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88811c098000 RDX: 0000000000000000 RSI: ffff88811c098000 RDI: 0000000000000002 RBP: ffff888128140f50 R08: ffffffffb1ff6387 R09: 0000000000000000 R10: 0000000000000007 R11: ffffed10250281ea R12: 0000000000000001 R13: 00000000000000a4 R14: ffff88811d3e7bb8 R15: ffff888128141028 FS: 00007f443aed9740(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020007200 CR3: 000000011c2a4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> do_writepages+0x130/0x3a0 filemap_fdatawrite_wbc+0x83/0xa0 filemap_flush+0xab/0xe0 ext4_alloc_da_blocks+0x51/0x120 __ext4_ioctl+0x1534/0x3210 __x64_sys_ioctl+0x12c/0x170 do_syscall_64+0x3b/0x90 It may happen as follows: 1. write inline_data inode vfs_write new_sync_write ext4_file_write_iter ext4_buffered_write_iter generic_perform_write ext4_da_write_begin ext4_da_write_inline_data_begin -> If inline data size too small will allocate block to write, then mapping will has dirty page ext4_da_convert_inline_data_to_extent ->clear EXT4_STATE_MAY_INLINE_DATA 2. fallocate do_vfs_ioctl ioctl_preallocate vfs_fallocate ext4_fallocate ext4_convert_inline_data ext4_convert_inline_data_nolock ext4_map_blocks -> fail will goto restore data ext4_restore_inline_data ext4_create_inline_data ext4_write_inline_data ext4_set_inode_state -> set inode EXT4_STATE_MAY_INLINE_DATA 3. writepages __ext4_ioctl ext4_alloc_da_blocks filemap_flush filemap_fdatawrite_wbc do_writepages ext4_writepages if (ext4_has_inline_data(inode)) BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) The root cause of this issue is we destory inline data until call ext4_writepages under delay allocation mode. But there maybe already convert from inline to extent. To solve this issue, we call filemap_flush first.. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220516122634.1690462-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> |