- The 10 patch series "__vmalloc()/kvmalloc() and no-block support" from
Uladzislau Rezki reworks the vmalloc() code to support non-blocking
allocations (GFP_ATOIC, GFP_NOWAIT).
- The 2 patch series "ksm: fix exec/fork inheritance" from xu xin fixes
a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited
across fork/exec.
- The 4 patch series "mm/zswap: misc cleanup of code and documentations"
from SeongJae Park does some light maintenance work on the zswap code.
- The 5 patch series "mm/page_owner: add debugfs files 'show_handles'
and 'show_stacks_handles'" from Mauricio Faria de Oliveira enhances the
/sys/kernel/debug/page_owner debug feature. It adds unique identifiers
to differentiate the various stack traces so that userspace monitoring
tools can better match stack traces over time.
- The 2 patch series "mm/page_alloc: pcp->batch cleanups" from Joshua
Hahn makes some minor alterations to the page allocator's per-cpu-pages
feature.
- The 2 patch series "Improve UFFDIO_MOVE scalability by removing
anon_vma lock" from Lokesh Gidra addresses a scalability issue in
userfaultfd's UFFDIO_MOVE operation.
- The 2 patch series "kasan: cleanups for kasan_enabled() checks" from
Sabyrzhan Tasbolatov performs some cleanup in the KASAN code.
- The 2 patch series "drivers/base/node: fold node register and
unregister functions" from Donet Tom cleans up the NUMA node handling
code a little.
- The 4 patch series "mm: some optimizations for prot numa" from Kefeng
Wang provides some cleanups and small optimizations to the NUMA
allocation hinting code.
- The 5 patch series "mm/page_alloc: Batch callers of
free_pcppages_bulk" from Joshua Hahn addresses long lock hold times at
boot on large machines. These were causing (harmless) softlockup
warnings.
- The 2 patch series "optimize the logic for handling dirty file folios
during reclaim" from Baolin Wang removes some now-unnecessary work from
page reclaim.
- The 10 patch series "mm/damon: allow DAMOS auto-tuned for per-memcg
per-node memory usage" from SeongJae Park enhances the DAMOS auto-tuning
feature.
- The 2 patch series "mm/damon: fixes for address alignment issues in
DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan fixes DAMON_LRU_SORT
and DAMON_RECLAIM with certain userspace configuration.
- The 15 patch series "expand mmap_prepare functionality, port more
users" from Lorenzo Stoakes enhances the new(ish)
file_operations.mmap_prepare() method and ports additional callsites
from the old ->mmap() over to ->mmap_prepare().
- The 8 patch series "Fix stale IOTLB entries for kernel address space"
from Lu Baolu fixes a bug (and possible security issue on non-x86) in
the IOMMU code. In some situations the IOMMU could be left hanging onto
a stale kernel pagetable entry.
- The 4 patch series "mm/huge_memory: cleanup __split_unmapped_folio()"
from Wei Yang cleans up and optimizes the folio splitting code.
- The 5 patch series "mm, swap: misc cleanup and bugfix" from Kairui
Song implements some cleanups and a minor fix in the swap discard code.
- The 8 patch series "mm/damon: misc documentation fixups" from SeongJae
Park does as advertised.
- The 9 patch series "mm/damon: support pin-point targets removal" from
SeongJae Park permits userspace to remove a specific monitoring target
in the middle of the current targets list.
- The 2 patch series "mm: MISC follow-up patches for linux/pgalloc.h"
from Harry Yoo implements a couple of cleanups related to mm header file
inclusion.
- The 2 patch series "mm/swapfile.c: select swap devices of default
priority round robin" from Baoquan He improves the selection of swap
devices for NUMA machines.
- The 3 patch series "mm: Convert memory block states (MEM_*) macros to
enums" from Israel Batista changes the memory block labels from macros
to enums so they will appear in kernel debug info.
- The 3 patch series "ksm: perform a range-walk to jump over holes in
break_ksm" from Pedro Demarchi Gomes addresses an inefficiency when KSM
unmerges an address range.
- The 22 patch series "mm/damon/tests: fix memory bugs in kunit tests"
from SeongJae Park fixes leaks and unhandled malloc() failures in DAMON
userspace unit tests.
- The 2 patch series "some cleanups for pageout()" from Baolin Wang
cleans up a couple of minor things in the page scanner's
writeback-for-eviction code.
- The 2 patch series "mm/hugetlb: refactor sysfs/sysctl interfaces" from
Hui Zhu moves hugetlb's sysfs/sysctl handling code into a new file.
- The 9 patch series "introduce VM_MAYBE_GUARD and make it sticky" from
Lorenzo Stoakes makes the VMA guard regions available in /proc/pid/smaps
and improves the mergeability of guarded VMAs.
- The 2 patch series "mm: perform guard region install/remove under VMA
lock" from Lorenzo Stoakes reduces mmap lock contention for callers
performing VMA guard region operations.
- The 2 patch series "vma_start_write_killable" from Matthew Wilcox
starts work in permitting applications to be killed when they are
waiting on a read_lock on the VMA lock.
- The 11 patch series "mm/damon/tests: add more tests for online
parameters commit" from SeongJae Park adds additional userspace testing
of DAMON's "commit" feature.
- The 9 patch series "mm/damon: misc cleanups" from SeongJae Park does
that.
- The 2 patch series "make VM_SOFTDIRTY a sticky VMA flag" from Lorenzo
Stoakes addresses the possible loss of a VMA's VM_SOFTDIRTY flag when
that VMA is merged with another.
- The 16 patch series "mm: support device-private THP" from Balbir Singh
introduces support for Transparent Huge Page (THP) migration in zone
device-private memory.
- The 3 patch series "Optimize folio split in memory failure" from Zi
Yan optimizes folio split operations in the memory failure code.
- The 2 patch series "mm/huge_memory: Define split_type and consolidate
split support checks" from Wei Yang provides some more cleanups in the
folio splitting code.
- The 16 patch series "mm: remove is_swap_[pte, pmd]() + non-swap
entries, introduce leaf entries" from Lorenzo Stoakes cleans up our
handling of pagetable leaf entries by introducing the concept of
'software leaf entries', of type softleaf_t.
- The 4 patch series "reparent the THP split queue" from Muchun Song
reparents the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory resources.
- The 3 patch series "unify PMD scan results and remove redundant
cleanup" from Wei Yang does a little cleanup in the hugepage collapse
code.
- The 6 patch series "zram: introduce writeback bio batching" from
Sergey Senozhatsky improves zram writeback efficiency by introducing
batched bio writeback support.
- The 4 patch series "memcg: cleanup the memcg stats interfaces" from
Shakeel Butt cleans up our handling of the interrupt safety of some
memcg stats.
- The 4 patch series "make vmalloc gfp flags usage more apparent" from
Vishal Moola cleans up vmalloc's handling of incoming GFP flags.
- The 6 patch series "mm: Add soft-dirty and uffd-wp support for RISC-V"
from Chunyan Zhang teches soft dirty and userfaultfd write protect
tracking to use RISC-V's Svrsw60t59b extension.
- The 5 patch series "mm: swap: small fixes and comment cleanups" from
Youngjun Park fixes a small bug and cleans up some of the swap code.
- The 4 patch series "initial work on making VMA flags a bitmap" from
Lorenzo Stoakes starts work on converting the vma struct's flags to a
bitmap, so we stop running out of them, especially on 32-bit.
- The 2 patch series "mm/swapfile: fix and cleanup swap list iterations"
from Youngjun Park addresses a possible bug in the swap discard code and
cleans things up a little.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaTEb0wAKCRDdBJ7gKXxA
jjfIAP94W4EkCCwNOupnChoG+YWw/JW21anXt5NN+i5svn1yugEAwzvv6A+cAFng
o+ug/fyrfPZG7PLp2R8WFyGIP0YoBA4=
=IUzS
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
Added:
support timestamps prior to epoch;
do not overwrite uptodate pages;
disable readahead for compressed files;
setting of dummy blocksize to read boot_block when mounting;
the run_lock initialization when loading $Extend;
initialization of allocated memory before use;
support for the NTFS3_IOC_SHUTDOWN ioctl;
check for minimum alignment when performing direct I/O reads;
check for shutdown in fsync.
Fixed:
mount failure for sparse runs in run_unpack();
use-after-free of sbi->options in cmp_fnames;
KMSAN uninit bug after failed mi_read in mi_format_new;
uninit error after buffer allocation by __getname();
KMSAN uninit-value in ni_create_attr_list;
double free of sbi->options->nls and ownership of fc->fs_private;
incorrect vcn adjustments in attr_collapse_range();
mode update when ACL can be reduced to mode;
memory leaks in add sub record.
Changed:
refactor code, updated terminology, spelling;
do not kmap pages in (de)compression code;
after ntfs_look_free_mft(), code that fails must put mi;
default mount options for "acl" and "prealloc".
Replaced:
use unsafe_memcpy() to avoid memcpy size warning;
ntfs_bio_pages with page cache for compressed files.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmkwF6UACgkQqbAzH4Mk
B7aG3w/9H3iUua41pQWH1g94PW7qhRxyN+2hVvnlJvNRB5bV/rhVHnEEkyxNHjau
I+pknP2KceUrHYQO032b3kuuwab7sMCfVhVQCIzQ1dsp0V3HFwx5IWz8hGiAMrAN
7pIQuUqqG5NxArVM028HWUXOq1myjRWxuwLTEaSYDOT0/Fl+a/nXKhlg+S5Js8K1
FlkorvWdmXKUp3Dp96yzlcmcbqNFCeyfAlUu57KDx1wLjSvzEnCol8VHpSl9myc3
FET6RcsL00jy3Tr3/6xBk6ef+9Eoas6hiafLaHay6jMkzW2hPzVwNgUsGOYQ0IwQ
6jIWiTsAGfi6/GZRVQrEMFr4pvlIlG//uTezfu9pE7ld6wMwYfT4ZvDuHMYvshiw
keRIEeF4oCakut9Cy8OVh0FtUCv/RYaT332rKJhtFZyCnewqwHd29/ihjB9yNnDx
qruFEpjBHOffsbcd0PLDsATutGi8sjd4eNyv1KuRf2xKL3fZ22d79LOzBHRGhAnR
1MyBNo6Q9eF3W8hbf1aMJ0X3O76EVwXxuAWc32EwHn/35XMjb9ajjmn0G1Xeix7+
0N/cdXZO4fiZhj2fOVUuLduHpoNCcaQya0GD6YJ+jgVnFSs0ms/pGDryQa2kooqu
ewav36eEmtU6WOXAxMKVYzok1A+/fMtpDK7cOvW61Vz91X6woCU=
=RGJu
-----END PGP SIGNATURE-----
Merge tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"New code:
- support timestamps prior to epoch
- do not overwrite uptodate pages
- disable readahead for compressed files
- setting of dummy blocksize to read boot_block when mounting
- the run_lock initialization when loading $Extend
- initialization of allocated memory before use
- support for the NTFS3_IOC_SHUTDOWN ioctl
- check for minimum alignment when performing direct I/O reads
- check for shutdown in fsync
Fixes:
- mount failure for sparse runs in run_unpack()
- use-after-free of sbi->options in cmp_fnames
- KMSAN uninit bug after failed mi_read in mi_format_new
- uninit error after buffer allocation by __getname()
- KMSAN uninit-value in ni_create_attr_list
- double free of sbi->options->nls and ownership of fc->fs_private
- incorrect vcn adjustments in attr_collapse_range()
- mode update when ACL can be reduced to mode
- memory leaks in add sub record
Changes:
- refactor code, updated terminology, spelling
- do not kmap pages in (de)compression code
- after ntfs_look_free_mft(), code that fails must put mft_inode
- default mount options for "acl" and "prealloc"
Replaced:
- use unsafe_memcpy() to avoid memcpy size warning
- ntfs_bio_pages with page cache for compressed files"
* tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits)
fs/ntfs3: check for shutdown in fsync
fs/ntfs3: change the default mount options for "acl" and "prealloc"
fs/ntfs3: Prevent memory leaks in add sub record
fs/ntfs3: out1 also needs to put mi
fs/ntfs3: Fix spelling mistake "recommened" -> "recommended"
fs/ntfs3: update mode in xattr when ACL can be reduced to mode
fs/ntfs3: check minimum alignment for direct I/O
fs/ntfs3: implement NTFS3_IOC_SHUTDOWN ioctl
fs/ntfs3: correct attr_collapse_range when file is too fragmented
ntfs3: fix double free of sbi->options->nls and clarify ownership of fc->fs_private
fs/ntfs3: Initialize allocated memory before use
fs/ntfs3: remove ntfs_bio_pages and use page cache for compressed I/O
ntfs3: avoid memcpy size warning
fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list
ntfs3: init run lock for extend inode
ntfs: set dummy blocksize to read boot_block when mounting
fs/ntfs3: disable readahead for compressed files
ntfs3: Fix uninit buffer allocated by __getname()
ntfs3: fix uninit memory after failed mi_read in mi_format_new
ntfs3: fix use-after-free of sbi->options in cmp_fnames
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZgAKCRCRxhvAZXjc
oq2EAQD09y/qVU81E7Qg7Cn4n5/3WTlnQjx0aSvhb4p6dFUcFwD+K9uVJNP8x8tA
xTaPt59nZbEX9BIAwtLChSPa4CZsnwM=
=XrvE
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull fs header updates from Christian Brauner:
"This contains initial work to start splitting up fs.h.
Begin the long-overdue work of splitting up the monolithic fs.h
header. The header has grown to over 3000 lines and includes types and
functions for many different subsystems, making it difficult to
navigate and causing excessive compilation dependencies.
This series introduces new focused headers for superblock-related
code:
- Rename fs_types.h to fs_dirent.h to better reflect its actual
content (directory entry types)
- Add fs/super_types.h containing superblock type definitions
- Add fs/super.h containing superblock function declarations
This is the first step in a longer effort to modularize the VFS
headers.
Cleanups:
- Inode Field Layout Optimization (Mateusz Guzik)
Move inode fields used during fast path lookup closer together to
improve cache locality during path resolution.
- current_umask() Optimization (Mateusz Guzik)
Inline current_umask() and move it to fs_struct.h. This improves
performance by avoiding function call overhead for this
frequently-used function, and places it in a more appropriate
header since it operates on fs_struct"
* tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: move inode fields used during fast path lookup closer together
fs: inline current_umask() and move it to fs_struct.h
fs: add fs/super.h header
fs: add fs/super_types.h header
fs: rename fs_types.h to fs_dirent.h
Ensure fsync() returns -EIO when the ntfs3 filesystem is in forced
shutdown, instead of silently succeeding via generic_file_fsync().
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Switch the "acl" and "prealloc" mount parameters to fsparam_flag_no(),
making them enabled by default and allowing users to disable them with
"noacl" and "noprealloc".
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
If a rb node with the same ino already exists in the rb tree, the newly
alloced mft_inode in ni_add_subrecord() will not have its memory cleaned
up, which leads to the memory leak issue reported by syzbot.
The best option to avoid this issue is to put the newly alloced mft node
when a rb node with the same ino already exists in the rb tree and return
the rb node found in the rb tree to the parent layer.
syzbot reported:
BUG: memory leak
unreferenced object 0xffff888110bef280 (size 128):
backtrace (crc 126a088f):
ni_add_subrecord+0x31/0x180 fs/ntfs3/frecord.c:317
ntfs_look_free_mft+0xf0/0x790 fs/ntfs3/fsntfs.c:715
BUG: memory leak
unreferenced object 0xffff888109093400 (size 1024):
backtrace (crc 7197c55e):
mi_init+0x2b/0x50 fs/ntfs3/record.c:105
mi_format_new+0x40/0x220 fs/ntfs3/record.c:422
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Reported-by: syzbot+3932ccb896e06f7414c9@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
After ntfs_look_free_mft() executes successfully, all subsequent code
that fails to execute must put mi.
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
There is a spelling mistake in a ntfs_info message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
If a file's ACL can be reduced to standard mode bits, update mode
accordingly, persist the change, and update the cached ACL. This keeps
mode and ACL consistent and avoids redundant xattrs.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Add a check for minimum alignment when performing direct I/O reads. If the
file offset or user buffer is not aligned to the device's logical block
size, fall back to buffered I/O instead of continuing with unaligned direct I/O.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Add support for the NTFS3_IOC_SHUTDOWN ioctl, allowing userspace to
request a filesystem shutdown. The ioctl number is shared with other
filesystems such as ext4, exfat, and f2fs.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Fix incorrect VCN adjustments in attr_collapse_range() that caused
filesystem errors or corruption on very fragmented NTFS files when
performing collapse-range operations.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
It's useful to be able to determine the size of a VMA descriptor range
used on f_op->mmap_prepare, expressed both in bytes and pages, so add
helpers for both and update code that could make use of it to do so.
Link: https://lkml.kernel.org/r/74ef338203c9ff08a9ace73a8f1f6116a79112a0.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit 02f312754c ("ntfs3: fix use-after-free of sbi->options in cmp_fnames") introduced a use-after-free bug
due to improper handling of sbi->options in error paths. This resulted in crashes when superblock cleanup
is performed in ntfs_put_super.
This patch ensures that the options structure and its subfields are properly freed, preventing the memory
corruption and use-after-free errors.
Fixes: 02f312754c ("ntfs3: fix use-after-free of sbi->options in cmp_fnames")
Reported-by: syzbot+cc433e4cd6d54736bf80@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cc433e4cd6d54736bf80
Signed-off-by: YangWen <anmuxixixi@gmail.com>
[almaz.alexandrovich@paragon-software.com: added fixes and closes tags]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Replace the use of ntfs_bio_pages with the disk page cache for reading and
writing compressed files. This slightly improves performance when reading
compressed data and simplifies the I/O logic.
When an XPRESS or LZX compressed file is opened for writing, it is now
decompressed into a normal file before modification. A new argument (`int copy`)
is added to ni_read_frame() to handle writing of decompressed and mapped data.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
There are more entries after the structure, use unsafe_memcpy() to avoid
this warning.
syzbot reported:
memcpy: detected field-spanning write (size 3656) of single field "hdr1" at fs/ntfs3/index.c:1927 (size 16)
Call Trace:
indx_insert_entry+0x1a0/0x460 fs/ntfs3/index.c:1996
ni_add_name+0x4dd/0x820 fs/ntfs3/frecord.c:2995
ni_rename+0x98/0x170 fs/ntfs3/frecord.c:3026
ntfs_rename+0xab9/0xf00 fs/ntfs3/namei.c:332
Reported-by: syzbot+3a1878433bc1cb97b42a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3a1878433bc1cb97b42a
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The call to kmalloc() to allocate the attribute list buffer is given a
size of al_aligned(rs). This size can be larger than the data
subsequently copied into the buffer, leaving trailing bytes uninitialized.
This can trigger a KMSAN "uninit-value" warning if that memory is
later accessed.
Fix this by using kzalloc() instead, which ensures the entire
allocated buffer is zero-initialized, preventing the warning.
Reported-by: syzbot+83c9dd5c0dcf6184fdbf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=83c9dd5c0dcf6184fdbf
Signed-off-by: Nirbhay Sharma <nirbhay.lkd@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
After setting the inode mode of $Extend to a regular file, executing the
truncate system call will enter the do_truncate() routine, causing the
run_lock uninitialized error reported by syzbot.
Prior to patch 4e8011ffec, if the inode mode of $Extend was not set to
a regular file, the do_truncate() routine would not be entered.
Add the run_lock initialization when loading $Extend.
syzbot reported:
INFO: trying to register non-static key.
Call Trace:
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
assign_lock_key+0x133/0x150 kernel/locking/lockdep.c:984
register_lock_class+0x105/0x320 kernel/locking/lockdep.c:1299
__lock_acquire+0x99/0xd20 kernel/locking/lockdep.c:5112
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_write+0x96/0x1f0 kernel/locking/rwsem.c:1590
ntfs_set_size+0x140/0x200 fs/ntfs3/inode.c:860
ntfs_extend+0x1d9/0x970 fs/ntfs3/file.c:387
ntfs_setattr+0x2e8/0xbe0 fs/ntfs3/file.c:808
Fixes: 4e8011ffec ("ntfs3: pretend $Extend records as regular files")
Reported-by: syzbot+bdeb22a4b9a09ab9aa45@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bdeb22a4b9a09ab9aa45
Tested-by: syzbot+bdeb22a4b9a09ab9aa45@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
When mounting, sb->s_blocksize is used to read the boot_block without
being defined or validated. Set a dummy blocksize before attempting to
read the boot_block.
The issue can be triggered with the following syz reproducer:
mkdirat(0xffffffffffffff9c, &(0x7f0000000080)='./file1\x00', 0x0)
r4 = openat$nullb(0xffffffffffffff9c, &(0x7f0000000040), 0x121403, 0x0)
ioctl$FS_IOC_SETFLAGS(r4, 0x40081271, &(0x7f0000000980)=0x4000)
mount(&(0x7f0000000140)=@nullb, &(0x7f0000000040)='./cgroup\x00',
&(0x7f0000000000)='ntfs3\x00', 0x2208004, 0x0)
syz_clone(0x88200200, 0x0, 0x0, 0x0, 0x0, 0x0)
Here, the ioctl sets the bdev block size to 16384. During mount,
get_tree_bdev_flags() calls sb_set_blocksize(sb, block_size(bdev)),
but since block_size(bdev) > PAGE_SIZE, sb_set_blocksize() leaves
sb->s_blocksize at zero.
Later, ntfs_init_from_boot() attempts to read the boot_block while
sb->s_blocksize is still zero, which triggers the bug.
Reported-by: syzbot+f4f84b57a01d6b8364ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f4f84b57a01d6b8364ad
Signed-off-by: Pedro Demarchi Gomes <pedrodemargomes@gmail.com>
[almaz.alexandrovich@paragon-software.com: changed comment style, added
return value handling]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
There is no good reason to have this as a func call, other than avoiding
the churn of adding fs_struct.h as needed.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251104170448.630414-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reading large compressed files is extremely slow when readahead is enabled.
For example, reading a 4 GB XPRESS-4K compressed file (compression ratio
≈ 4:1) takes about 230 minutes with readahead enabled, but only around 3
minutes when readahead is disabled.
The issue was first observed in January 2025 and is reproducible with large
compressed NTFS files. Disabling readahead for compressed files avoids this
performance regression, although this may not be the ideal long-term fix.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Fix uninit errors caused after buffer allocation given to 'de'; by
initializing the buffer with zeroes. The fix was found by using KMSAN.
Reported-by: syzbot+332bd4e9d148f11a87dc@syzkaller.appspotmail.com
Fixes: 78ab59fee0 ("fs/ntfs3: Rework file operations")
Signed-off-by: Sidharth Seela <sidharthseela@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Fix a KMSAN un-init bug found by syzkaller.
ntfs_get_bh() expects a buffer from sb_getblk(), that buffer may not be
uptodate. We do not bring the buffer uptodate before setting it as
uptodate. If the buffer were to not be uptodate, it could mean adding a
buffer with un-init data to the mi record. Attempting to load that record
will trigger KMSAN.
Avoid this by setting the buffer as uptodate, if it’s not already, by
overwriting it.
Reported-by: syzbot+7a2ba6b7b66340cff225@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7a2ba6b7b66340cff225
Tested-by: syzbot+7a2ba6b7b66340cff225@syzkaller.appspotmail.com
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Raphael Pinsonneault-Thibeault <rpthibeault@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The root cause is that sbi->options points directly to fc->fs_private.
If fc->fs_private is freed while sbi still exists, sbi->options becomes
a dangling pointer.
This patch ensures that sbi->options is a separate copy of fc->fs_private
and duplicates nls_name if present. On superblock release or error,
sbi->options->nls_name and sbi->options are freed and sbi->options
is set to NULL to avoid any dangling pointer.
Reported-by: syzbot+d77c546c60db651a389c@syzkaller.appspotmail.com
Signed-off-by: YangWen <anmuxixixi@gmail.com>
[almaz.alexandrovich@paragon-software.com: remove syzbot logs from description]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
When reading a compressed file, we may read several pages in addition to
the one requested. The current code will overwrite pages in the page
cache with the data from disc which can definitely result in changes
that have been made being lost.
For example if we have four consecutie pages ABCD in the file compressed
into a single extent, on first access, we'll bring in ABCD. Then we
write to page B. Memory pressure results in the eviction of ACD.
When we attempt to write to page C, we will overwrite the data in page
B with the data currently on disk.
I haven't investigated the decompression code to check whether it's
OK to overwrite a clean page or whether it might be possible to see
corrupt data. Out of an abundance of caution, decline to overwrite
uptodate pages, not just dirty pages.
Fixes: 4342306f0f (fs/ntfs3: Add file operations and implementation)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
These pages are accessed through vmap; they are not accessed
by calling page_address(), so they do not need to be kmapped.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
These pages are accessed through DMA and vmap; they are not accessed
by calling page_address(), so they do not need to be kmapped.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Some NTFS volumes failed to mount because sparse data runs were not
handled correctly during runlist unpacking. The code performed arithmetic
on the special SPARSE_LCN64 marker, leading to invalid LCN values and
mount errors.
Add an explicit check for the case described above, marking the run as
sparse without applying arithmetic.
Fixes: 736fc7bf5f ("fs: ntfs3: Fix integer overflow in run_unpack()")
Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Reformatted the driver code according to the current .clang-format rules
and updated description of used terminology. No functional changes
intended.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Before it used an unsigned 64-bit type, which prevented proper handling
of timestamps earlier than 1970-01-01. Switch to a signed 64-bit type to
support pre-epoch timestamps. The issue was caught by xfstests.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Stop using the obsolete write_cache_pages and use writeback_iter directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Index allocation requires at least one bit in the $BITMAP attribute to
track usage of index entries. If the bitmap is empty while index blocks
are already present, this reflects on-disk corruption.
syzbot triggered this condition using a malformed NTFS image. During a
rename() operation involving a long filename (which spans multiple
index entries), the empty bitmap allowed the name to be added without
valid tracking. Subsequent deletion of the original entry failed with
-ENOENT, due to unexpected index state.
Reject such cases by verifying that the bitmap is not empty when index
blocks exist.
Reported-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b0373017f711c06ada64
Fixes: d99208b919 ("fs/ntfs3: cancle set bad inode after removing name fails")
Tested-by: syzbot+b0373017f711c06ada64@syzkaller.appspotmail.com
Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Add put_bh() to decrease the refcount of 'bh' after the job
is finished, preventing a resource leak.
Fixes: 3f3b442b5a ("fs/ntfs3: Add bitmap")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The MFT record relative to the file being opened contains its runlist,
an array containing information about the file's location on the physical
disk. Analysis of all Call Stack paths showed that the values of the
runlist array, from which LCNs are calculated, are not validated before
run_unpack function.
The run_unpack function decodes the compressed runlist data format
from MFT attributes (for example, $DATA), converting them into a runs_tree
structure, which describes the mapping of virtual clusters (VCN) to
logical clusters (LCN). The NTFS3 subsystem also has a shortcut for
deleting files from MFT records - in this case, the RUN_DEALLOCATE
command is sent to the run_unpack input, and the function logic
provides that all data transferred to the runlist about file or
directory is deleted without creating a runs_tree structure.
Substituting the runlist in the $DATA attribute of the MFT record for an
arbitrary file can lead either to access to arbitrary data on the disk
bypassing access checks to them (since the inode access check
occurs above) or to destruction of arbitrary data on the disk.
Add overflow check for addition operation.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 4342306f0f ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Vitaly Grigoryev <Vitaly.Grigoryev@kaspersky.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Since commit af153bb63a ("vfs: catch invalid modes in may_open()")
requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/
S_IFIFO/S_IFSOCK type, use S_IFREG for $Extend records.
Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Add support for the FS_IOC_SETFSLABEL ioctl.
Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Add support for the FS_IOC_GETFSLABEL ioctl.
Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Use the common FSLABEL_MAX constant instead of a hardcoded magic
constant of 256.
Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCgQAKCRCRxhvAZXjc
os+nAP9LFHUwWO6EBzHJJGEVjJvvzsbzqeYrRFamYiMc5ulPJwD+KW4RIgJa/MWO
pcYE40CacaekD8rFWwYUyszpgmv6ewc=
=wCwp
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull mmap_prepare updates from Christian Brauner:
"Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b ("mm:
introduce new .mmap_prepare() file callback").
This is preferred to the existing f_op->mmap() hook as it does require
a VMA to be established yet, thus allowing the mmap logic to invoke
this hook far, far earlier, prior to inserting a VMA into the virtual
address space, or performing any other heavy handed operations.
This allows for much simpler unwinding on error, and for there to be a
single attempt at merging a VMA rather than having to possibly
reattempt a merge based on potentially altered VMA state.
Far more importantly, it prevents inappropriate manipulation of
incompletely initialised VMA state, which is something that has been
the cause of bugs and complexity in the past.
The intent is to gradually deprecate f_op->mmap, and in that vein this
series coverts the majority of file systems to using f_op->mmap_prepare.
Prerequisite steps are taken - firstly ensuring all checks for mmap
capabilities use the file_has_valid_mmap_hooks() helper rather than
directly checking for f_op->mmap (which is now not a valid check) and
secondly updating daxdev_mapping_supported() to not require a VMA
parameter to allow ext4 and xfs to be converted.
Commit bb666b7c27 ("mm: add mmap_prepare() compatibility layer for
nested file systems") handles the nasty edge-case of nested file
systems like overlayfs, which introduces a compatibility shim to allow
f_op->mmap_prepare() to be invoked from an f_op->mmap() callback.
This allows for nested filesystems to continue to function correctly
with all file systems regardless of which callback is used. Once we
finally convert all file systems, this shim can be removed.
As a result, ecryptfs, fuse, and overlayfs remain unaltered so they
can nest all other file systems.
We additionally do not update resctl - as this requires an update to
remap_pfn_range() (or an alternative to it) which we defer to a later
series, equally we do not update cramfs which needs a mixed mapping
insertion with the same issue, nor do we update procfs, hugetlbfs,
syfs or kernfs all of which require VMAs for internal state and hooks.
We shall return to all of these later"
* tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: update porting, vfs documentation to describe mmap_prepare()
fs: replace mmap hook with .mmap_prepare for simple mappings
fs: convert most other generic_file_*mmap() users to .mmap_prepare()
fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()
mm/filemap: introduce generic_file_*_mmap_prepare() helpers
fs/xfs: transition from deprecated .mmap hook to .mmap_prepare
fs/ext4: transition from deprecated .mmap hook to .mmap_prepare
fs/dax: make it possible to check dev dax support without a VMA
fs: consistently use can_mmap_file() helper
mm/nommu: use file_has_valid_mmap_hooks() helper
mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaIM/KwAKCRCRxhvAZXjc
opT+AP407JwhRSBjUEmHg5JzUyDoivkOySdnthunRjaBKD8rlgEApM6SOIZYucU7
cPC3ZY6ORFM6Mwaw+iDW9lasM5ucHQ8=
=CHha
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc VFS updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add ext4 IOCB_DONTCACHE support
This refactors the address_space_operations write_begin() and
write_end() callbacks to take const struct kiocb * as their first
argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate
to the filesystem's buffered I/O path.
Ext4 is updated to implement handling of the IOCB_DONTCACHE flag
and advertises support via the FOP_DONTCACHE file operation flag.
Additionally, the i915 driver's shmem write paths are updated to
bypass the legacy write_begin/write_end interface in favor of
directly calling write_iter() with a constructed synchronous kiocb.
Another i915 change replaces a manual write loop with
kernel_write() during GEM shmem object creation.
Cleanups:
- don't duplicate vfs_open() in kernel_file_open()
- proc_fd_getattr(): don't bother with S_ISDIR() check
- fs/ecryptfs: replace snprintf with sysfs_emit in show function
- vfs: Remove unnecessary list_for_each_entry_safe() from
evict_inodes()
- filelock: add new locks_wake_up_waiter() helper
- fs: Remove three arguments from block_write_end()
- VFS: change old_dir and new_dir in struct renamedata to dentrys
- netfs: Remove unused declaration netfs_queue_write_request()
Fixes:
- eventpoll: Fix semi-unbounded recursion
- eventpoll: fix sphinx documentation build warning
- fs/read_write: Fix spelling typo
- fs: annotate data race between poll_schedule_timeout() and
pollwake()
- fs/pipe: set FMODE_NOWAIT in create_pipe_files()
- docs/vfs: update references to i_mutex to i_rwsem
- fs/buffer: remove comment about hard sectorsize
- fs/buffer: remove the min and max limit checks in __getblk_slow()
- fs/libfs: don't assume blocksize <= PAGE_SIZE in
generic_check_addressable
- fs_context: fix parameter name in infofc() macro
- fs: Prevent file descriptor table allocations exceeding INT_MAX"
* tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits)
netfs: Remove unused declaration netfs_queue_write_request()
eventpoll: fix sphinx documentation build warning
ext4: support uncached buffered I/O
mm/pagemap: add write_begin_get_folio() helper function
fs: change write_begin/write_end interface to take struct kiocb *
drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter
drm/i915: Use kernel_write() in shmem object create
eventpoll: Fix semi-unbounded recursion
vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes()
fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable
fs/buffer: remove the min and max limit checks in __getblk_slow()
fs: Prevent file descriptor table allocations exceeding INT_MAX
fs: Remove three arguments from block_write_end()
fs/ecryptfs: replace snprintf with sysfs_emit in show function
fs: annotate suspected data race between poll_schedule_timeout() and pollwake()
docs/vfs: update references to i_mutex to i_rwsem
fs/buffer: remove comment about hard sectorsize
fs_context: fix parameter name in infofc() macro
VFS: change old_dir and new_dir in struct renamedata to dentrys
proc_fd_getattr(): don't bother with S_ISDIR() check
...
The basic rules are simple:
* stores to dentry->d_flags are OK under dentry->d_lock.
* stores to dentry->d_flags are OK in the dentry constructor, before
becomes potentially visible to other threads.
Unfortunately, there's a couple of exceptions to that, and that's where the
headache comes from.
Main PITA comes from d_set_d_op(); that primitive sets ->d_op
of dentry and adjusts the flags that correspond to presence of individual
methods. It's very easy to misuse; existing uses _are_ safe, but proof
of correctness is brittle.
Use in __d_alloc() is safe (we are within a constructor), but we
might as well precalculate the initial value of ->d_flags when we set
the default ->d_op for given superblock and set ->d_flags directly
instead of messing with that helper.
The reasons why other uses are safe are bloody convoluted; I'm not going
to reproduce it here. See https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/
for gory details, if you care. The critical part is using d_set_d_op() only
just prior to d_splice_alias(), which makes a combination of d_splice_alias()
with setting ->d_op, etc. a natural replacement primitive. Better yet, if
we go that way, it's easy to take setting ->d_op and modifying ->d_flags
under ->d_lock, which eliminates the headache as far as ->d_flags exclusion
rules are concerned. Other exceptions are minor and easy to deal with.
What this series does:
* d_set_d_op() is no longer available; new primitive (d_splice_alias_ops())
is provided, equivalent to combination of d_set_d_op() and d_splice_alias().
* new field of struct super_block - ->s_d_flags. Default value of ->d_flags
to be used when allocating dentries on this filesystem.
* new primitive for setting ->s_d_op: set_default_d_op(). Replaces stores
to ->s_d_op at mount time. All in-tree filesystems converted; out-of-tree
ones will get caught by compiler (->s_d_op is renamed, so stores to it will
be caught). ->s_d_flags is set by the same primitive to match the ->s_d_op.
* a lot of filesystems had ->s_d_op->d_delete equal to always_delete_dentry;
that is equivalent to setting DCACHE_DONTCACHE in ->d_flags, so such filesystems
can bloody well set that bit in ->s_d_flags and drop ->d_delete() from
dentry_operations. In quite a few cases that results in empty dentry_operations,
which means that we can get rid of those.
* kill simple_dentry_operations - not needed anymore.
* massage d_alloc_parallel() to get rid of the other exception wrt ->d_flags
stores - we can set DCACHE_PAR_LOOKUP as soon as we allocate the new dentry;
no need to delay that until we commit to using the sucker.
As the result, ->d_flags stores are all either under ->d_lock or done before
the dentry becomes visible in any shared data structures.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIQ/tQAKCRBZ7Krx/gZQ
66AhAQDgQ+S224x5YevNXc9mDoGUBMF4OG0n0fIla9rfdL4I6wEAqpOWMNDcVPCZ
GwYOvJ9YuqNdz+MyprAI18Yza4GOmgs=
=rTYB
-----END PGP SIGNATURE-----
Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull dentry d_flags updates from Al Viro:
"The current exclusion rules for dentry->d_flags stores are rather
unpleasant. The basic rules are simple:
- stores to dentry->d_flags are OK under dentry->d_lock
- stores to dentry->d_flags are OK in the dentry constructor, before
becomes potentially visible to other threads
Unfortunately, there's a couple of exceptions to that, and that's
where the headache comes from.
The main PITA comes from d_set_d_op(); that primitive sets ->d_op of
dentry and adjusts the flags that correspond to presence of individual
methods. It's very easy to misuse; existing uses _are_ safe, but proof
of correctness is brittle.
Use in __d_alloc() is safe (we are within a constructor), but we might
as well precalculate the initial value of 'd_flags' when we set the
default ->d_op for given superblock and set 'd_flags' directly instead
of messing with that helper.
The reasons why other uses are safe are bloody convoluted; I'm not
going to reproduce it here. See [1] for gory details, if you care. The
critical part is using d_set_d_op() only just prior to
d_splice_alias(), which makes a combination of d_splice_alias() with
setting ->d_op, etc a natural replacement primitive.
Better yet, if we go that way, it's easy to take setting ->d_op and
modifying 'd_flags' under ->d_lock, which eliminates the headache as
far as 'd_flags' exclusion rules are concerned. Other exceptions are
minor and easy to deal with.
What this series does:
- d_set_d_op() is no longer available; instead a new primitive
(d_splice_alias_ops()) is provided, equivalent to combination of
d_set_d_op() and d_splice_alias().
- new field of struct super_block - 's_d_flags'. This sets the
default value of 'd_flags' to be used when allocating dentries on
this filesystem.
- new primitive for setting 's_d_op': set_default_d_op(). This
replaces stores to 's_d_op' at mount time.
All in-tree filesystems converted; out-of-tree ones will get caught
by the compiler ('s_d_op' is renamed, so stores to it will be
caught). 's_d_flags' is set by the same primitive to match the
's_d_op'.
- a lot of filesystems had sb->s_d_op->d_delete equal to
always_delete_dentry; that is equivalent to setting
DCACHE_DONTCACHE in 'd_flags', so such filesystems can bloody well
set that bit in 's_d_flags' and drop 'd_delete()' from
dentry_operations.
In quite a few cases that results in empty dentry_operations, which
means that we can get rid of those.
- kill simple_dentry_operations - not needed anymore
- massage d_alloc_parallel() to get rid of the other exception wrt
'd_flags' stores - we can set DCACHE_PAR_LOOKUP as soon as we
allocate the new dentry; no need to delay that until we commit to
using the sucker.
As the result, 'd_flags' stores are all either under ->d_lock or done
before the dentry becomes visible in any shared data structures"
Link: https://lore.kernel.org/all/20250224010624.GT1977892@ZenIV/ [1]
* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
configfs: use DCACHE_DONTCACHE
debugfs: use DCACHE_DONTCACHE
efivarfs: use DCACHE_DONTCACHE instead of always_delete_dentry()
9p: don't bother with always_delete_dentry
ramfs, hugetlbfs, mqueue: set DCACHE_DONTCACHE
kill simple_dentry_operations
devpts, sunrpc, hostfs: don't bother with ->d_op
shmem: no dentry retention past the refcount reaching zero
d_alloc_parallel(): set DCACHE_PAR_LOOKUP earlier
make d_set_d_op() static
simple_lookup(): just set DCACHE_DONTCACHE
tracefs: Add d_delete to remove negative dentries
set_default_d_op(): calculate the matching value for ->d_flags
correct the set of flags forbidden at d_set_d_op() time
split d_flags calculation out of d_set_d_op()
new helper: set_default_d_op()
fuse: no need for special dentry_operations for root dentry
switch procfs from d_set_d_op() to d_splice_alias_ops()
new helper: d_splice_alias_ops()
procfs: kill ->proc_dops
...
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.
Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
This reverts commit 69505fe98f.
Initially, conditional lock acquisition was removed to fix an xfstest bug
that was observed during internal testing. The deadlock reported by syzbot
is resolved by reintroducing conditional acquisition. The xfstest bug no
longer occurs on kernel version 6.16-rc1 during internal testing. I
assume that changes in other modules may have contributed to this.
Fixes: 69505fe98f ("fs/ntfs3: Replace inode_trylock with inode_lock")
Reported-by: syzbot+a91fcdbd2698f99db8f4@syzkaller.appspotmail.com
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Use ntfs_inode field 'ni_bad' to mark inode as bad (if something went wrong)
and to avoid any operations
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The reproducer uses a file0 on a ntfs3 file system with a corrupted i_link.
When renaming, the file0's inode is marked as a bad inode because the file
name cannot be deleted.
The underlying bug is that make_bad_inode() is called on a live inode.
In some cases it's "icache lookup finds a normal inode, d_splice_alias()
is called to attach it to dentry, while another thread decides to call
make_bad_inode() on it - that would evict it from icache, but we'd already
found it there earlier".
In some it's outright "we have an inode attached to dentry - that's how we
got it in the first place; let's call make_bad_inode() on it just for shits
and giggles".
Fixes: 78ab59fee0 ("fs/ntfs3: Rework file operations")
Reported-by: syzbot+1aa90f0eb1fc3e77d969@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1aa90f0eb1fc3e77d969
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
After applying this patch, could correctly create symlink:
ln -s "relative/path/to/file" symlink
Signed-off-by: Rong Zhang <ulin0208@gmail.com>
[almaz.alexandrovich@paragon-software.com: added cpu_to_le32 macro to
rs->Flags assignment]
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
The symlinks created in windows will be broken in linux by ntfs3,
the patch fixes it.
Signed-off-by: Rong Zhang <ulin0208@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>