linux

Commit Graph

Author	SHA1	Message	Date
Mike Snitzer	f2060bdc21	nfs/localio: add refcounting for each iocb IO associated with NFS pgio header Improve completion handling of as many as 3 IOs associated with each misaligned DIO by using a atomic_t to track completion of each IO. Update nfs_local_pgio_done() to use precise atomic_t accounting for remaining iov_iter (up to 3) associated with each iocb, so that each NFS LOCALIO pgio header is only released after all IOs have completed. But also allow early return if/when a short read or write occurs. Fixes reported BUG: KASAN: slab-use-after-free in nfs_local_call_read: https://lore.kernel.org/linux-nfs/aPSvi5Yr2lGOh5Jh@dell-per750-06-vm-07.rhts.eng.pek2.redhat.com/ Reported-by: Yongcheng Yang <yoyang@redhat.com> Fixes: `c817248fc8` ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:28 -05:00
Mike Snitzer	51a491f270	nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support Each filesystem is meant to fallback to retrying DIO in terms buffered IO when it might encounter -ENOTBLK when issuing DIO (which can happen if the VFS cannot invalidate the page cache). So NFS doesn't need special handling for -ENOTBLK. Also, explicitly initialize a couple DIO related iocb members rather than simply rely on data structure zeroing. Fixes: `c817248fc8` ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:28 -05:00
Trond Myklebust	fb2cba0854	NFS: Check the TLS certificate fields in nfs_match_client() If the TLS security policy is of type RPC_XPRTSEC_TLS_X509, then the cert_serial and privkey_serial fields need to match as well since they define the client's identity, as presented to the server. Fixes: `90c9550a8d` ("NFS: support the kernel keyring for TLS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:28 -05:00
Trond Myklebust	8ab523ce78	pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS The default setting for the transport security policy must be RPC_XPRTSEC_NONE, when using a TCP or RDMA connection without TLS. Conversely, when using TLS, the security policy needs to be set. Fixes: `6c0a8c5fcf` ("NFS: Have struct nfs_client carry a TLS policy field") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:28 -05:00
Trond Myklebust	28e19737e1	pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect() Don't try to add an RDMA transport to a client that is already marked as being a TCP/TLS transport. Fixes: `a35518cae4` ("NFSv4.1/pnfs: fix NFS with TLS in pnfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:28 -05:00
Trond Myklebust	7aca00d950	pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect() Don't try to add an RDMA transport to a client that is already marked as being a TCP/TLS transport. Fixes: `04a1526366` ("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-11-10 10:32:27 -05:00
Joshua Watt	9bb3baa9d1	NFS4: Fix state renewals missing after boot Since the last renewal time was initialized to 0 and jiffies start counting at -5 minutes, any clients connected in the first 5 minutes after a reboot would have their renewal timer set to a very long interval. If the connection was idle, this would result in the client state timing out on the server and the next call to the server would return NFS4ERR_BADSESSION. Fix this by initializing the last renewal time to the current jiffies instead of 0. Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Scott Mayhew	9ff022f382	NFS: check if suid/sgid was cleared after a write as needed I noticed xfstests generic/193 and generic/355 started failing against knfsd after commit `e7a8ebc305` ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE"). I ran those same tests against ONTAP (which has had write delegation support for a lot longer than knfsd) and they fail there too... so while it's a new failure against knfsd, it isn't an entirely new failure. Add the NFS_INO_REVAL_FORCED flag so that the presence of a delegation doesn't keep the inode from being revalidated to fetch the updated mode. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Joshua Watt	7a84394f02	NFS4: Apply delay_retrans to async operations The setting of delay_retrans is applied to synchronous RPC operations because the retransmit count is stored in same struct nfs4_exception that is passed each time an error is checked. However, for asynchronous operations (READ, WRITE, LOCKU, CLOSE, DELEGRETURN), a new struct nfs4_exception is made on the stack each time the task callback is invoked. This means that the retransmit count is always zero and thus delay_retrans never takes effect. Apply delay_retrans to these operations by tracking and updating their retransmit count. Change-Id: Ieb33e046c2b277cb979caa3faca7f52faf0568c9 Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Mike Snitzer	8db4a1d146	NFSv4/flexfiles: fix to allocate mirror->dss before use Move mirror_array's dss_count initialization and dss allocation to ff_layout_alloc_mirror(), just before the loop that initializes each nfs4_ff_layout_ds_stripe's nfs_file_localio. Also handle NULL return from kcalloc() and remove one level of indent in ff_layout_alloc_mirror(). This commit fixes dangling nfsd_serv refcount issues seen when using NFS LOCALIO and then attempting to stop the NFSD service. Fixes: `20b1d75fb8` ("NFSv4/flexfiles: Add support for striped layouts") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-10-13 14:33:00 -04:00
Linus Torvalds	50647a1176	file->f_path constification Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaN3daAAKCRBZ7Krx/gZQ 6zNWAP9kD6rOJRNqDgea4pibDPa47Tps/WM5tsDv3dsLliY29gEA6sveOWZ3guAj 4oY3ts/NtHLWXvhI7Vd/1mr2aTKEZQk= =YNK+ -----END PGP SIGNATURE----- Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...	2025-10-03 16:32:36 -07:00
Linus Torvalds	070a542f08	NFS Client Updates for Linux 6.18 New Features: * Add a Kconfig option to redirect dfprintk() to the trace buffer * Enable use of the RWF_DONTCACHE flag on the NFS client * Add striped layout handling to pNFS flexfiles * Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: * Handle NFS4ERR_GRACE errors during delegation recall * Fix NFSv4.1 backchannel max_resp_sz verification check * Fix mount hang after CREATE_SESSION failure * Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: * Improvements to write handling tracepoints * Fix a few trivial spelling mistakes * Cleanups to the rpcbind cleanup call sites * Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page * Remove unused NFS_WBACK_BUSY() macro * Remove __GFP_NOWARN flags * Unexport rpc_malloc() and rpc_free() -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmjgOEkACgkQ18tUv7Cl QOuX7RAA33AUq4NBxzDOgz4u4eNU/a8z2AazRgAtfmPVLTitrx/kqcfVEtHAdHFi cWkN2SO+TVzxIGOrudqNyjV2cjfUJV4ZJBkNY6lJvxPNAH27Dk9P2iMF12QYtHOq qyrwqoUQcyBkmtpgFUyHzydA4J17JDl5A7I/tOkro3ZfV4gmYAUwVdS+VtJoosLp 7FnXv+W5FBWkfKrIT+vPyiBqxl0gZXmzUkJK2lG9m9NvE2Jk2MbPFyhdUEA5JybJ akNLdBnFwNWw2rLulSqs68ZbCGz6NY634q1Z+ZsRJ907ZdBqJ7zIBFv4yc/bMpZm Q9kh1M0OyvK0MlLRFe3efOLxRoN9nJPd+kuaw9eYw5V57Jrwj6QGV4nud2C8nzs8 iB+LuJli+FRCeD84SY8NnMFKpXphHCeMXcBMRMsLTOSotJZFithO95+w1pKlK64A lxY1JXOQYelwJZxfhGPovwac4t1arpDjsumRlTmq12KaQnM3Z1gR2PUgeLxEPHQM f6gEiN9KDOhW/gZrFQxNs2hVAH68RDKpWxeR2XeVJlJYf37Hgh8bNGEiURi3G57n ED7tFbK9lzHVFR07hiP3Cvzop4z2mzadgHo+1vzdXmZZQA4gc4MSFfszFLCnQopw LEb7RqpVVXtQb+f7A+LuD+a2rLEnW+gTf6iLqCR8hAB5k1xmcYQ= =8wnU -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Add a Kconfig option to redirect dfprintk() to the trace buffer - Enable use of the RWF_DONTCACHE flag on the NFS client - Add striped layout handling to pNFS flexfiles - Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: - Handle NFS4ERR_GRACE errors during delegation recall - Fix NFSv4.1 backchannel max_resp_sz verification check - Fix mount hang after CREATE_SESSION failure - Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: - Improvements to write handling tracepoints - Fix a few trivial spelling mistakes - Cleanups to the rpcbind cleanup call sites - Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page - Remove unused NFS_WBACK_BUSY() macro - Remove __GFP_NOWARN flags - Unexport rpc_malloc() and rpc_free()" * tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits) NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support nfs/localio: add tracepoints for misaligned DIO READ and WRITE support nfs/localio: add proper O_DIRECT support for READ and WRITE nfs/localio: refactor iocb initialization nfs/localio: refactor iocb and iov_iter_bvec initialization nfs/localio: avoid issuing misaligned IO using O_DIRECT nfs/localio: make trace_nfs_local_open_fh more useful NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support sunrpc: unexport rpc_malloc() and rpc_free() NFSv4/flexfiles: Add support for striped layouts NFSv4/flexfiles: Update layout stats & error paths for striped layouts NFSv4/flexfiles: Write path updates for striped layouts NFSv4/flexfiles: Commit path updates for striped layouts NFSv4/flexfiles: Read path updates for striped layouts NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. NFSv4/flexfiles: Add data structure support for striped layouts NFSv4/flexfiles: Use ds_commit_idx when marking a write commit NFSv4/flexfiles: Remove cred local variable dependency nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing NFS: Enable use of the RWF_DONTCACHE flag on the NFS client ...	2025-10-03 14:20:40 -07:00
Linus Torvalds	829745b75a	finish_no_open calling conventions change Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNh82wAKCRBZ7Krx/gZQ 67qmAP9l34sMvmagYDve+C+BtndI5FO2N8vHCB0VuWpgDvM7XgD/Zy4Iuup0fePh paGQgpwjYQ7/ghc4KjOUAw4EpfJ+SAE= =cjj2 -----END PGP SIGNATURE----- Merge tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull finish_no_open updates from Al Viro: "finish_no_open calling conventions change to simplify callers" * tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: slightly simplify nfs_atomic_open() simplify gfs2_atomic_open() simplify fuse_atomic_open() simplify nfs_atomic_open_v23() simplify vboxsf_dir_atomic_open() simplify cifs_atomic_open() 9p: simplify v9fs_vfs_atomic_open_dotl() 9p: simplify v9fs_vfs_atomic_open() allow finish_no_open(file, ERR_PTR(-E...))	2025-10-03 10:59:31 -07:00
Linus Torvalds	51e9889ab1	vfs_parse_fs_string() stuff change on vfs_parse_fs_string() calling conventions - get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNhllQAKCRBZ7Krx/gZQ 6wyiAP9TmyFLBWKC/sDNtRAiGRybEtcwvVZj1agpx0kZIWshUwD7BVg4AfDs+vN3 RoYYg9DR1SP5kZF7h2Ve1T39XDq6ZQU= =YZFG -----END PGP SIGNATURE----- Merge tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs_context updates from Al Viro: "Change vfs_parse_fs_string() calling conventions Get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length" * tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_nfs4_mount(): switch to vfs_parse_fs_string() change the calling conventions for vfs_parse_fs_string()	2025-10-03 10:51:44 -07:00
Mike Snitzer	1f0d4ab0f5	NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support NFS doesn't have DIO alignment constraints, so have NFS respond with accommodating DIO alignment attributes (rather than plumb in GETATTR support for STATX_DIOALIGN and STATX_DIO_READ_ALIGN). The most coarse-grained dio_offset_align is the most accommodating (e.g. PAGE_SIZE, in future larger may be supported). Now that NFS has support, NFS reexport will now handle unaligned DIO (NFSD's NFSD_IO_DIRECT support requires the underlying filesystem support STATX_DIOALIGN and/or STATX_DIO_READ_ALIGN). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	cda94457c2	nfs/localio: add tracepoints for misaligned DIO READ and WRITE support Add nfs_local_dio_class and use it to create nfs_local_dio_read, nfs_local_dio_write and nfs_local_dio_misaligned trace events. These trace events show how NFS LOCALIO splits a given misaligned IO into a mix of misaligned head and/or tail extents and a DIO-aligned middle extent. The misaligned head and/or tail extents are issued using buffered IO and the DIO-aligned middle is issued using O_DIRECT. This combination of trace events is useful for LOCALIO DIO READs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_readpage_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable This combination of trace events is useful for LOCALIO DIO WRITEs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	c817248fc8	nfs/localio: add proper O_DIRECT support for READ and WRITE Because the NFS client will already happily handle misaligned O_DIRECT IO (by sending it out to NFSD via RPC) this commit's new capabilities are for the benefit of LOCALIO. LOCALIO will make best effort to transform misaligned IO to DIO-aligned extents when possible. LOCALIO's READ and WRITE DIO that is misaligned will be split into as many as 3 component IOs (@start, @middle and @end) as needed -- IFF the @middle extent is verified to be DIO-aligned, and then the @start and/or @end are misaligned (due to each being a partial page). Otherwise if the @middle isn't DIO-aligned the code will fallback to issuing only a single contiguous buffered IO. The @middle is only DIO-aligned if both the memory and on-disk offsets for the IO are aligned relative to the underlying local filesystem's block device limits (@dma_alignment and @logical_block_size respectively). The misaligned @start and/or @end extents are issued using buffered IO and the DIO-aligned @middle is issued using O_DIRECT. The @start and @end IOs are issued first using buffered IO with IOCB_SYNC and then the @middle is issued last using direct IO with async completion (AIO). This out of order IO completion means that LOCALIO's IO completion code (nfs_local_read_done and nfs_local_write_done) is only called for the IO's last associated iov_iter completion. And in the case of DIO-aligned @middle it completes last using AIO. nfs_local_pgio_done() is updated to handle piece-wise partial completion of each iov_iter. This implementation for LOCALIO's misaligned DIO handling uses 3 iov_iter that share the same backing pages in their bio_vecs (so unfortunately 'struct nfs_local_kiocb' has 3 instead of only 1). [Reducing LOCALIO's per-IO (struct nfs_local_kiocb) memory use can be explored in the future. One logical progression to improve this code, and eliminate explicit loops over up to 3 iov_iter, is by extending 'struct iov_iter' to support iov_iter_clone() and iov_iter_chain() interfaces that are comparable to what 'struct bio' is able to support in the block layer. But even that wouldn't avoid the need to allocate/use up to 3 iov_iter] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	e43e9a3a3d	nfs/localio: refactor iocb initialization The goal of this commit's various refactoring is to have LOCALIO's per IO initialization occur in process context so that we don't get into a situation where IO fails to be issued from workqueue (e.g. due to lack of memory, etc). Better to have LOCALIO's iocb initialization fail early. There isn't immediate need but this commit makes it possible for LOCALIO to fallback to NFS pagelist code in process context to allow for immediate retry over RPC. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	091bdcfcec	nfs/localio: refactor iocb and iov_iter_bvec initialization nfs_local_iter_init() is updated to follow the same pattern to initializing LOCALIO's iov_iter_bvec as was established by nfsd_iter_read(). Other LOCALIO iocb initialization refactoring in this commit offers incremental cleanup that will be taken further by the next commit. No functional change. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:30 -04:00
Mike Snitzer	25ba2b84c3	nfs/localio: avoid issuing misaligned IO using O_DIRECT Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO using O_DIRECT. Any misaligned DIO falls back to using buffered IO. Because misaligned DIO is now handled safely, remove the nfs modparam 'localio_O_DIRECT_semantics' that was added to require users opt-in to the requirement that all O_DIRECT be properly DIO-aligned. Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of iov_iter_aligned_bvec() that also verifies the offset associated with an iov_iter is DIO-aligned. NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed along with iov_iter_is_aligned(). Lastly, add pr_info_ratelimited if underlying filesystem returns -EINVAL because it was made to try O_DIRECT for IO that is not DIO-aligned (shouldn't happen, so its best to be louder if it does). Fixes: `3feec68563` ("nfs/localio: add direct IO enablement with sync and async IO support") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:29 -04:00
Mike Snitzer	fd6d93c2b7	nfs/localio: make trace_nfs_local_open_fh more useful Always trigger trace event when LOCALIO opens a file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-30 16:10:29 -04:00
Linus Torvalds	b786405685	vfs-6.18-rc1.workqueue -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQYgAKCRCRxhvAZXjc olgGAQDWr4sD7kUt8TxifdAXsQNgyGG8qOUkb/BHHSqJ/5mKvAEAlTwJ+81tgNKT hYYdPyvWdbgW6CnWeiQLi0JjpFvUPQU= =uHwG -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq	2025-09-29 10:27:17 -07:00
Linus Torvalds	56e7b31071	vfs-6.18-rc1.inode -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQQgAKCRCRxhvAZXjc oud9AQD5IG4sNnzCjsvcTDpQkbX5eZW+LFIiAiiN+nztZ+OcRQEAvC2N7YovfqM3 TWpVoNDKvEPdtDc9ttFMUKqBZYvxvgE= =sEaL -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function	2025-09-29 09:42:30 -07:00
Linus Torvalds	b7ce6fa90f	vfs-6.18-rc1.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQMQAKCRCRxhvAZXjc omNLAQCgrwzd9sa1JTlixweu3OAxQlSEbLuMpEv7Ztm+B7Wz0AD9HtwPC44Kev03 GbMcB2DCFLC4evqYECj6IG7NBmoKsAs= =1ICf -----END PGP SIGNATURE----- Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...	2025-09-29 09:03:07 -07:00
Jonathan Curley	20b1d75fb8	NFSv4/flexfiles: Add support for striped layouts Updates lseg creation path to parse and add striped layouts. Enable support for striped layouts. Limitations: 1. All mirrors must have the same number of stripes. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:43:40 -04:00
Jonathan Curley	8a8e0f5566	NFSv4/flexfiles: Update layout stats & error paths for striped layouts Updates the layout stats logic to be stripe aware. Read and write stats are accumulated on a per DS stripe basis. Also updates error paths to use dss_id where appropraite. Limitations: 1. The layout stats structure is still statically sized to 4 and there is no deduplication logic for deviceids that may appear more than once in a striped layout. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:43:10 -04:00
Jonathan Curley	06d157d6fc	NFSv4/flexfiles: Write path updates for striped layouts Updates write path to calculate and use dss_id to direct IO to the appropriate stripe DS. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:43:10 -04:00
Jonathan Curley	8a8729db67	NFSv4/flexfiles: Commit path updates for striped layouts Updates the commit path to be stripe aware. This required updating the ds_commit_idx to be stripe aware. ds_commit_idx == mirror_idx * dss_count + dss_id. Updates code paths to utilize the new ds_commit_idx and derive dss_id & mirror_idx where appropriate to contact the correct DS using the corresponding parameters. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:41:18 -04:00
Jonathan Curley	4934ccbeae	NFSv4/flexfiles: Read path updates for striped layouts Updates read path to calculate and use dss_id to direct IO to the appropriate stripe DS. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:40:22 -04:00
Jonathan Curley	a1491919c8	NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. Updates common helper functions to be dss_id aware. Most cases simply add a dss_id parameter. The has_available functions have been updated with a loop. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:35:02 -04:00
Jonathan Curley	d442670c0f	NFSv4/flexfiles: Add data structure support for striped layouts Adds a new struct nfs4_ff_layout_ds_stripe that represents a data server stripe within a layout. A new dynamically allocated array of this type has been added to nfs4_ff_layout_mirror and per stripe configuration information has been moved from the mirror type to the stripe based on the RFC. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:32:24 -04:00
Jonathan Curley	eb71428e1a	NFSv4/flexfiles: Use ds_commit_idx when marking a write commit Correct this path to use ds_commit_idx. Another noop preparation change. In current code commit_idx == mirror_idx but when striping is enabled that will not be true. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:30:20 -04:00
Jonathan Curley	fec80afc41	NFSv4/flexfiles: Remove cred local variable dependency No-op preparation change to remove dependency on cred local variable. Subsequent striping diff has a cred per stripe so this local variable can't be trusted to be the same. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-26 15:30:19 -04:00
Al Viro	a890a2e339	nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing Theoretically it's an oopsable race, but I don't believe one can manage to hit it on real hardware; might become doable on a KVM, but it still won't be easy to attack. Anyway, it's easy to deal with - since xdr_encode_hyper() is just a call of put_unaligned_be64(), we can put that under ->d_lock and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:51 -04:00
Trond Myklebust	902893e390	NFS: Enable use of the RWF_DONTCACHE flag on the NFS client The NFS client needs to defer dropbehind until after any writes to the folio have been persisted on the server. Since this may be a 2 step process, use folio_end_writeback_no_dropbehind() to allow release of the writeback flag, and then call folio_end_dropbehind() once the COMMIT is done. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	4b7c3b4c67	NFS: Update the flexfilelayout driver to use xdr_set_scratch_folio() Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	1a33b629af	NFS: Update the filelayout to use xdr_set_scratch_folio() Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	cf289099ab	NFS: Update the blocklayout to use xdr_set_scratch_folio() Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	c9cefd7ae8	NFS: Update listxattr to use xdr_set_scratch_folio() Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	2f8416f23e	NFS: Update getacl to use xdr_set_scratch_folio() Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Anna Schumaker	670335c0f9	NFS: Update readdir to use a scratch folio Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:29:50 -04:00
Chuck Lever	62c0c0e749	SUNRPC: Move the svc_rpcb_cleanup() call sites Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Chuck Lever	c231cea10d	NFS: Remove rpcbind cleanup for NFSv4.0 callback The NFS client's NFSv4.0 callback listeners are created with SVC_SOCK_ANONYMOUS, therefore svc_setup_socket() does not register them with the client's rpcbind service. And, note that nfs_callback_down_net() does not call svc_rpcb_cleanup() at all when shutting down the callback server. Even if svc_setup_socket() were to attempt to register or unregister these sockets, the callback service has vs_hidden set, which shunts the rpcbind upcalls. The svc_rpcb_cleanup() error flow was introduced by commit `c946556b87` ("NFS: move per-net callback thread initialization to nfs_callback_up_net()"). It doesn't appear in the code that was relocated by that commit. Therefore, there is no need to call svc_rpcb_cleanup() when listener creation fails during callback server start-up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Anthony Iliopoulos	bf75ad0968	NFSv4.1: fix mount hang after CREATE_SESSION failure When client initialization goes through server trunking discovery, it schedules the state manager and then sleeps waiting for nfs_client initialization completion. The state manager can fail during state recovery, and specifically in lease establishment as nfs41_init_clientid() will bail out in case of errors returned from nfs4_proc_create_session(), without ever marking the client ready. The session creation can fail for a variety of reasons e.g. during backchannel parameter negotiation, with status -EINVAL. The error status will propagate all the way to the nfs4_state_manager but the client status will not be marked, and thus the mount process will remain blocked waiting. Fix it by adding -EINVAL error handling to nfs4_state_manager(). Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Anthony Iliopoulos	191512355e	NFSv4.1: fix backchannel max_resp_sz verification check When the client max_resp_sz is larger than what the server encodes in its reply, the nfs4_verify_back_channel_attrs() check fails and this causes nfs4_proc_create_session() to fail, in cases where the client page size is larger than that of the server and the server does not want to negotiate upwards. While this is not a problem with the linux nfs server that will reflect the proposed value in its reply irrespective of the local page size, other nfs server implementations may insist on their own max_resp_sz value, which could be smaller. Fix this by accepting smaller max_resp_sz values from the server, as this does not violate the protocol. The server is allowed to decrease but not increase proposed the size, and as such values smaller than the client-proposed ones are valid. Fixes: `43c2e885be` ("nfs4: fix channel attribute sanity-checks") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Xichao Zhao	64afd87839	NFSv4: fix "prefered"->"preferred" Trivial fix to spelling mistake in comment text. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Olga Kornievskaia	be390f9524	NFSv4: handle ERR_GRACE on delegation recalls RFC7530 states that clients should be prepared for the return of NFS4ERR_GRACE errors for non-reclaim lock and I/O requests. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Jeff Layton	9082aae154	sunrpc: remove dfprintk_cont() and dfprintk_rcu_cont() KERN_CONT hails from a simpler time, when SMP wasn't the norm. These days, it doesn't quite work right since another printk() can always race in between the first one and the one being "continued". Nothing calls dprintk_rcu_cont(), so just remove it. The only caller of dprintk_cont() is in nfs_commit_release_pages(). Just use a normal dprintk() there instead, since this is not SMP-safe anyway. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Leo Martins	64dd802224	nfs: cleanup tracepoint declarations Cleanup tracepoint declarations by replacing commas with semicolons to better match other tracepoint declarations. No functional changes introduced. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00
Jeff Layton	83c47ef8ac	nfs: add tracepoints to nfs_writepages() Show the inode info and requested range. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>	2025-09-23 13:28:19 -04:00

1 2 3 4 5 ...

7146 Commits