linux/fs
Linus Torvalds 7cd122b552 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
 Reference is grabbed and not released, but it's not actually _stored_
 anywhere.  That works, but it's hard to follow and verify; among other
 things, we have no way to tell _which_ of the increments is intended
 to be an unpaired one.  Worse, on removal we need to decide whether
 the reference had already been dropped, which can be non-trivial if
 that removal is on umount and we need to figure out if this dentry is
 pinned due to e.g. unlink() not done.  Usually that is handled by using
 kill_litter_super() as ->kill_sb(), but there are open-coded special
 cases of the same (consider e.g. /proc/self).
 
 Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
 marking those "leaked" dentries.  Having it set claims responsibility
 for +1 in refcount.
 
 The end result this series is aiming for:
 
 * get these unbalanced dget() and dput() replaced with new primitives that
   would, in addition to adjusting refcount, set and clear persistency flag.
 * instead of having kill_litter_super() mess with removing the remaining
   "leaked" references (e.g. for all tmpfs files that hadn't been removed
   prior to umount), have the regular shrink_dcache_for_umount() strip
   DCACHE_PERSISTENT of all dentries, dropping the corresponding
   reference if it had been set.  After that kill_litter_super() becomes
   an equivalent of kill_anon_super().
 
 Doing that in a single step is not feasible - it would affect too many places
 in too many filesystems.  It has to be split into a series.
 
 This work has really started early in 2024; quite a few preliminary pieces
 have already gone into mainline.  This chunk is finally getting to the
 meat of that stuff - infrastructure and most of the conversions to it.
 
 Some pieces are still sitting in the local branches, but the bulk of
 that stuff is here.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
 643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
 Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
 =rKq+
 -----END PGP SIGNATURE-----

Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
..
9p vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
adfs vfs-6.17-rc1.mmap_prepare 2025-07-28 13:43:25 -07:00
affs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
afs Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
autofs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
befs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
bfs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
btrfs for-6.19-tag 2025-12-03 20:03:46 -08:00
cachefiles vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
ceph printk changes for 6.19 2025-12-03 12:42:36 -08:00
coda Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
configfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
cramfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
crypto vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
debugfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
devpts convert devpts 2025-11-16 01:35:04 -05:00
dlm net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
ecryptfs vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
efivarfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
efs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
erofs Changes since last update: 2025-12-03 20:14:44 -08:00
exfat vfs-6.18-rc7.fixes 2025-11-17 09:11:27 -08:00
exportfs
ext2 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ext4 New features and improvements for the ext4 file system 2025-12-03 20:37:15 -08:00
f2fs vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
fat vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
freevxfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
fuse Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
gfs2 gfs2 changes 2025-12-03 20:28:50 -08:00
hfs hfs/hfsplus updates for v6.19 2025-12-03 20:08:32 -08:00
hfsplus hfs/hfsplus updates for v6.19 2025-12-03 20:08:32 -08:00
hostfs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
hpfs vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
hugetlbfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
iomap vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
isofs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
jbd2 jbd2: fix the inconsistency between checksum and data in memory for journal sb 2025-11-26 17:05:47 -05:00
jffs2 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
jfs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
kernfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
lockd tools: ynl-gen: add regeneration comment 2025-11-25 19:20:42 -08:00
minix vfs-6.19-rc1.minix 2025-12-01 15:22:40 -08:00
netfs vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
nfs vfs-6.19-rc1.directory.delegations 2025-12-01 15:34:41 -08:00
nfs_common NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file 2025-08-05 16:45:40 -07:00
nfsd Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
nilfs2 vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
nls
notify vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
ntfs3 Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
ocfs2 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
omfs vfs-6.19-rc1.fs_header 2025-12-01 14:18:01 -08:00
openpromfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
orangefs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
overlayfs vfs-6.19-rc1.ovl 2025-12-01 16:31:21 -08:00
proc Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
pstore Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
qnx4 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
qnx6 Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
quota Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ramfs Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
resctrl Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
romfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
smb Forty four smb client and server changesets 2025-12-03 20:23:41 -08:00
squashfs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
sysfs sysfs: check visibility before changing group attribute ownership 2025-10-17 09:48:34 +02:00
tests
tracefs convert tracefs 2025-11-16 01:35:03 -05:00
ubifs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
udf Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
ufs Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
unicode
vboxsf simplify vboxsf_dir_atomic_open() 2025-09-16 23:59:38 -04:00
verity Optimize fsverity with 2-way interleaved hashing 2025-09-29 15:55:20 -07:00
xfs xfs: new code for v6.19 2025-12-03 20:19:38 -08:00
zonefs vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
Kconfig Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
Kconfig.binfmt binfmt_elf: preserve original ELF e_flags for core dumps 2025-09-03 20:49:32 -07:00
Makefile fs: rename fs_types.h to fs_dirent.h 2025-11-05 09:51:30 +01:00
aio.c aio: use credential guards 2025-11-04 12:36:33 +01:00
anon_inodes.c anon_inodes: convert to FD_ADD() 2025-11-28 12:42:31 +01:00
attr.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00
backing-file.c kernel-6.19-rc1.cred 2025-12-01 13:45:41 -08:00
bad_inode.c
binfmt_elf.c rseq: Provide and use rseq_set_ids() 2025-11-04 08:33:33 +01:00
binfmt_elf_fdpic.c execve updates for v6.17 2025-07-28 17:11:40 -07:00
binfmt_flat.c
binfmt_misc.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
binfmt_script.c
bpf_fs_kfuncs.c bpf...d_path(): constify path argument 2025-09-15 21:17:08 -04:00
buffer.c vfs-6.19-rc1.folio 2025-12-01 10:26:38 -08:00
char_dev.c
compat_binfmt_elf.c
coredump.c Networking changes for 6.19. 2025-12-03 17:24:33 -08:00
d_path.c
dax.c Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
dcache.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
direct-io.c Summary of significant series in this pull request: 2025-07-31 14:57:54 -07:00
drop_caches.c Coccinelle-based conversion to use ->i_state accessors 2025-10-20 20:22:26 +02:00
eventfd.c eventfd: convert do_eventfd() to FD_PREPARE() 2025-11-28 12:42:31 +01:00
eventpoll.c eventpoll: convert do_epoll_create() to FD_PREPARE() 2025-11-28 12:42:32 +01:00
exec.c A large overhaul of the restartable sequences and CID management: 2025-12-02 08:48:53 -08:00
fcntl.c vfs: expose delegation support to userland 2025-11-12 09:38:37 +01:00
fhandle.c fhandle: convert do_handle_open() to FD_ADD() 2025-11-28 12:42:31 +01:00
file.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
file_attr.c fs: remove spurious exports in fs/file_attr.c 2025-11-19 12:17:31 +01:00
file_table.c fs: update comment in init_file() 2025-10-07 12:48:33 +02:00
filesystems.c
fs-writeback.c vfs-6.19-rc1.writeback 2025-12-01 09:20:51 -08:00
fs_context.c change the calling conventions for vfs_parse_fs_string() 2025-09-04 15:20:51 -04:00
fs_dirent.c fs: rename fs_types.h to fs_dirent.h 2025-11-05 09:51:30 +01:00
fs_parser.c
fs_pin.c
fs_struct.c fs: inline current_umask() and move it to fs_struct.h 2025-11-05 22:51:23 +01:00
fsopen.c fscontext: do not consume log entries when returning -EMSGSIZE 2025-08-11 14:52:41 +02:00
init.c vfs: make vfs_symlink break delegations on parent dir 2025-11-12 09:38:36 +01:00
inode.c vfs-6.19-rc1.inode 2025-12-01 09:02:34 -08:00
internal.h Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
ioctl.c fs: remove vfs_ioctl export 2025-09-01 13:08:01 +02:00
kernel_read_file.c
libfs.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
locks.c filelock: __fcntl_getlease: fix kernel-doc warnings 2025-11-28 10:30:41 +01:00
mbcache.c
mnt_idmapping.c
mount.h fs: use boolean to indicate anonymous mount namespace 2025-11-11 10:01:31 +01:00
mpage.c mpage: convert do_mpage_readpage() to return void type 2025-09-21 14:22:16 -07:00
namei.c vfs-6.19-rc1.directory.locking 2025-12-01 16:13:46 -08:00
namespace.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
nsfs.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
open.c vfs-6.19-rc1.fd_prepare.fs 2025-12-01 17:32:07 -08:00
pidfs.c vfs-6.19-rc1.coredump 2025-12-01 10:17:39 -08:00
pipe.c Summary 2025-12-05 11:15:37 -08:00
pnode.c umount_tree(): take all victims out of propagation graph at once 2025-09-15 21:26:44 -04:00
pnode.h umount_tree(): take all victims out of propagation graph at once 2025-09-15 21:26:44 -04:00
posix_acl.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00
proc_namespace.c
read_write.c copy_file_range: limit size if in compat mode 2025-08-15 16:11:47 +02:00
readdir.c
remap_range.c
select.c select: Convert to scoped user access 2025-11-04 08:28:34 +01:00
seq_file.c
signalfd.c signalfd: convert do_signalfd4() to FD_ADD() 2025-11-28 12:42:32 +01:00
splice.c fs/splice.c: trivial fix: pipes -> pipe's 2025-11-25 10:11:16 +01:00
stack.c
stat.c constify path argument of vfs_statx_path() 2025-09-15 21:17:07 -04:00
statfs.c
super.c Some filesystems use a kinda-sorta controlled dentry refcount leak to pin 2025-12-05 14:36:21 -08:00
sync.c vfs-6.19-rc1.writeback 2025-12-01 09:20:51 -08:00
sysctls.c
timerfd.c timerfd: convert timerfd_create() to FD_ADD() 2025-11-28 12:42:32 +01:00
userfaultfd.c Significant patch series in this merge are as follows: 2025-12-05 13:52:43 -08:00
utimes.c vfs-6.19-rc1.directory.delegations 2025-12-01 15:34:41 -08:00
xattr.c filelock: add struct delegated_inode 2025-11-12 09:38:34 +01:00