Commit Graph

6970 Commits

Author SHA1 Message Date
Linus Torvalds 1ca0f935a1 nfsd-6.15 fixes:
- v6.15 libcrc clean-up makes invalid configurations possible
 - Fix a potential deadlock introduced during the v6.15 merge window
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmgD2mYACgkQM2qzM29m
 f5fn3xAAlPZP7MInbeszGMeRw9zdWaqbg4C90Gehdvi7sMBIYRo9E36NZuT4vl4D
 dfpHhoMIPMO5gsGrOeJZAT27La1F8M+O4yZOiuheTfIikaBrbZcZZtpwdRQHayGw
 qhHz8XoVNPsqvnoG+95u15XWG4IbI4kme2gNbYxEgF8zGERLv1aKPQTh3FqYG7Bk
 +9ANjoPNCAlnUAYdVHJZ5kwfTR10pDwx+jC45pZA4+hAusLx1Ek43pkcBgLMvoh2
 ylbWa6SKceVdh/kb3rWaKXtVptNjcaMWLUZNSpd6Bu9e3zDCfjJN0hKUamHD+hSn
 HjYKbNU1W/l0wJsq9KKDwrOgE7LDZPP/mQ3MkFtMR/ccyTCAGe9Wy6nv7PnwEAu2
 GqRh34VbNDGMPp4bpuTlp2FAL91BtGyYCkuwDZ1lqUDqnCmg4gRjyzFpkFF4lXPJ
 hq4J3NgA9BjO9wlQ4P3KNKFgE34s/8R0ey7euyrI5hxFoVFKxr5DNgKXVeW0FsY/
 zMaDLzVN2Uv37IxjKhc1XvE2Ml6+J3tRv9awv+Yp05tG8/3vAhogs7So3TtwfWdU
 iWA3G0+zB30ATqdeXwpmJ8XxIqBnixDKhJcnLNOwrT+vAeqqqa0C39M0sOTQTMg3
 VyFYkgt7v8jljp34DRfc4DAhTCsHzBzgAh9H/jrTP1xDlp7i7AU=
 =vCFi
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - v6.15 libcrc clean-up makes invalid configurations possible

 - Fix a potential deadlock introduced during the v6.15 merge window

* tag 'nfsd-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: decrease sc_count directly if fail to queue dl_recall
  nfs: add missing selections of CONFIG_CRC32
2025-04-19 10:38:03 -07:00
Eric Biggers cd35b6cb46 nfs: add missing selections of CONFIG_CRC32
nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available
only when CONFIG_CRC32 is enabled.  But the only NFS kconfig option that
selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and
did not actually guard the use of crc32_le() even on the client.

The code worked around this bug by only actually calling crc32_le() when
CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases.  This
avoided randconfig build errors, and in real kernels the fallback code
was unlikely to be reached since CONFIG_CRC32 is 'default y'.  But, this
really needs to just be done properly, especially now that I'm planning
to update CONFIG_CRC32 to not be 'default y'.

Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select
CONFIG_CRC32.  Then remove the fallback code that becomes unnecessary,
as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG.

Fixes: 1264a2f053 ("NFS: refactor code for calculating the crc32 hash of a filehandle")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-13 16:39:42 -04:00
Linus Torvalds 94d471a4f4 NFS client updates for Linux 6.15
Highlights include:
 
 Bugfixes:
 - 3 Fixes for looping in the NFSv4 state manager delegation code.
 - Fix for the NFSv4 state XDR code from Neil Brown.
 - Fix a leaked reference in nfs_lock_and_join_requests().
 - Fix a use-after-free in the delegation return code.
 
 Features:
 - Implemenation of the NFSv4.2 copy offload OFFLOAD_STATUS operation to
   allow monitoring of an in-progress copy.
 - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
   getdents() call.
 - SUNRPC now allows some basic management of an existing RPC client's
   connections using sysfs.
 - Improvements to the automated teardown of a NFS client when the
   container it was initiated from gets killed.
 - Improvements to prevent tasks from getting stuck in a killable wait
   state after calling exit_signals().
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmftuE0ACgkQZwvnipYK
 APIAAhAAqFdJnh88UUT0/R184Qzpd021lR9XhxkwNA3TzhOIzmpuTgBzNE1iMG1j
 EHveYqCpTU2orA1aisAyw5c8meJlsCQREPDvUOQ2i4BTCCmsBHOMxg7KDWwwRdNh
 SVDCezFWrHYz4An81jpgBe3/x6RJaEyAhKC45ZzQruiBtSMeoOX1TAV/DTWwEo0j
 JcLdAUSGVBsfyrj3qT0oJXoj+96o7rbB80loCdNKy8m8PBWHWp0oILwuU00XdXgu
 7jYyjZfxW1013It+vfVFsjTYRVfJ92pq3wiz/U9HXYDe3Arc4oPRw509/Jo3xEWW
 tdUljc/HepD3459ahiubTCLY39JxILl8/GapWe2Fn0J/JJuOGgZX9lqIMKDn4QCA
 6TBOqWK7OEwImj4M7cfPptJQWd+hp91T4AR13xWJeQgp19AR8yOqEW0YX6hVlaBg
 UrBwdR+l6ys5lJJBReUW+JMDCYZmbH9RjuwcqzXn71JmlACHNFi6odwLnQ1mInvF
 P5pEf7aXaZkF6kEz2kmZ1eUgdkERAaIGCNFQTui6intlCSlQodNurrEU7Vx146os
 OvowJYM0HvnVBDOnERrJD04HADKZeDS8jt59ev0uXbP/NFxEJnPRRQgIdiZbfISV
 beQrc2fpUgwdjYAURbW1qWO7XNTJzK9LHJzn02SytfCazX0IQO0=
 =zPX4
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Bugfixes:

   - Three fixes for looping in the NFSv4 state manager delegation code

   - Fix for the NFSv4 state XDR code (Neil Brown)

   - Fix a leaked reference in nfs_lock_and_join_requests()

   - Fix a use-after-free in the delegation return code

  Features:

   - Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to
     allow monitoring of an in-progress copy

   - Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
     getdents() call

   - SUNRPC now allows some basic management of an existing RPC client's
     connections using sysfs

   - Improvements to the automated teardown of a NFS client when the
     container it was initiated from gets killed

   - Improvements to prevent tasks from getting stuck in a killable wait
     state after calling exit_signals()"

* tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
  nfs: Add missing release on error in nfs_lock_and_join_requests()
  NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
  NFS: Don't allow waiting for exiting tasks
  SUNRPC: Don't allow waiting for exiting tasks
  NFSv4: Treat ENETUNREACH errors as fatal for state recovery
  NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
  NFSv4: Further cleanups to shutdown loops
  NFS: Shut down the nfs_client only after all the superblocks
  SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
  SUNRPC: rpcbind should never reset the port to the value '0'
  pNFS/flexfiles: Report ENETDOWN as a connection error
  pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
  NFS: Treat ENETUNREACH errors as fatal in containers
  NFS: Add a mount option to make ENETUNREACH errors fatal
  sunrpc: Add a sysfs file for one-step xprt deletion
  sunrpc: Add a sysfs file for adding a new xprt
  sunrpc: Add a sysfs files for rpc_clnt information
  sunrpc: Add a sysfs attr for xprtsec
  NFS: Add implid to sysfs
  NFS: Extend rdirplus mount option with "force|none"
  ...
2025-04-02 17:06:31 -07:00
Dan Carpenter 8e5419d654 nfs: Add missing release on error in nfs_lock_and_join_requests()
Call nfs_release_request() on this error path before returning.

Fixes: c3f2235782 ("nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/3aaaa3d5-1c8a-41e4-98c7-717801ddd171@stanley.mountain
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-04-02 09:53:16 -04:00
Linus Torvalds b6dde1e527 NFSD 6.15 Release Notes
Neil Brown contributed more scalability improvements to NFSD's
 open file cache, and Jeff Layton contributed a menagerie of
 repairs to NFSD's NFSv4 callback / backchannel implementation.
 
 Mike Snitzer contributed a change to NFS re-export support that
 disables support for file locking on a re-exported NFSv4 mount.
 This is because NFSv4 state recovery is currently difficult if
 not impossible for re-exported NFS mounts. The change aims to
 prevent data integrity exposures after the re-export server
 crashes.
 
 Work continues on the evolving NFSD netlink administrative API.
 
 Many thanks to the contributors, reviewers, testers, and bug
 reporters who participated during the v6.15 development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m
 f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6
 OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH
 u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK
 ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI
 MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M
 bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y
 9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem
 uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL
 Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO
 MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd
 hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo=
 =e2E2
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Neil Brown contributed more scalability improvements to NFSD's open
  file cache, and Jeff Layton contributed a menagerie of repairs to
  NFSD's NFSv4 callback / backchannel implementation.

  Mike Snitzer contributed a change to NFS re-export support that
  disables support for file locking on a re-exported NFSv4 mount. This
  is because NFSv4 state recovery is currently difficult if not
  impossible for re-exported NFS mounts. The change aims to prevent data
  integrity exposures after the re-export server crashes.

  Work continues on the evolving NFSD netlink administrative API.

  Many thanks to the contributors, reviewers, testers, and bug reporters
  who participated during the v6.15 development cycle"

* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
  NFSD: Add a Kconfig setting to enable delegated timestamps
  sysctl: Fixes nsm_local_state bounds
  nfsd: use a long for the count in nfsd4_state_shrinker_count()
  nfsd: remove obsolete comment from nfs4_alloc_stid
  nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
  nfsd: reorganize struct nfs4_delegation for better packing
  nfsd: handle errors from rpc_call_async()
  nfsd: move cb_need_restart flag into cb_flags
  nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
  nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
  nfsd: prevent callback tasks running concurrently
  nfsd: disallow file locking and delegations for NFSv4 reexport
  nfsd: filecache: drop the list_lru lock during lock gc scans
  nfsd: filecache: don't repeatedly add/remove files on the lru list
  nfsd: filecache: introduce NFSD_FILE_RECENT
  nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
  nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
  NFSD: Re-organize nfsd_file_gc_worker()
  nfsd: filecache: remove race handling.
  fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
  ...
2025-03-31 17:28:17 -07:00
Trond Myklebust 9e8f324bd4 NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
Check that the delegation is still attached after taking the spin lock
in nfs_start_delegation_return_locked().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-28 16:38:05 -04:00
Trond Myklebust 8d3ca33102 NFS: Don't allow waiting for exiting tasks
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-28 16:38:02 -04:00
Trond Myklebust 0af5fb5ed3 NFSv4: Treat ENETUNREACH errors as fatal for state recovery
If a containerised process is killed and causes an ENETUNREACH or
ENETDOWN error to be propagated to the state manager, then mark the
nfs_client as being dead so that we don't loop in functions that are
expecting recovery to succeed.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-26 12:19:58 -04:00
Trond Myklebust c81d5bcb7b NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
If someone calls nfs_mark_client_ready(clp, status) with a negative
value for status, then that should signal that the nfs_client is no
longer valid.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-26 12:19:51 -04:00
Trond Myklebust 3eba080e4d NFSv4: Further cleanups to shutdown loops
Replace the tests for the RPC client being shut down with tests for
whether the nfs_client is in an error state.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-26 12:19:08 -04:00
Trond Myklebust 2d3e998a0b NFS: Shut down the nfs_client only after all the superblocks
The nfs_client manages state for all the superblocks in the
"cl_superblocks" list, so it must not be shut down until all of them are
gone.

Fixes: 7d3e26a054 ("NFS: Cancel all existing RPC tasks when shutdown")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-26 12:17:48 -04:00
Linus Torvalds 26d8e43079 vfs-6.15-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
 onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
 N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
 =VJgY
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async dir updates from Christian Brauner:
 "This contains cleanups that fell out of the work from async directory
  handling:

   - Change kern_path_locked() and user_path_locked_at() to never return
     a negative dentry. This simplifies the usability of these helpers
     in various places

   - Drop d_exact_alias() from the remaining place in NFS where it is
     still used. This also allows us to drop the d_exact_alias() helper
     completely

   - Drop an unnecessary call to fh_update() from nfsd_create_locked()

   - Change i_op->mkdir() to return a struct dentry

     Change vfs_mkdir() to return a dentry provided by the filesystems
     which is hashed and positive. This allows us to reduce the number
     of cases where the resulting dentry is not positive to very few
     cases. The code in these places becomes simpler and easier to
     understand.

   - Repack DENTRY_* and LOOKUP_* flags"

* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: fix inline emphasis warning
  VFS: Change vfs_mkdir() to return the dentry.
  nfs: change mkdir inode_operation to return alternate dentry if needed.
  fuse: return correct dentry for ->mkdir
  ceph: return the correct dentry on mkdir
  hostfs: store inode in dentry after mkdir if possible.
  Change inode_operations.mkdir to return struct dentry *
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()
  VFS: add common error checks to lookup_one_qstr_excl()
  VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  VFS: repack LOOKUP_ bit flags.
  VFS: repack DENTRY_ flags.
2025-03-24 10:47:14 -07:00
Trond Myklebust aa42add73c pNFS/flexfiles: Report ENETDOWN as a connection error
If the client should see an ENETDOWN when trying to connect to the data
server, it might still be able to talk to the metadata server through
another NIC. If so, report the error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21 12:44:26 -04:00
Trond Myklebust 8c9f0df7a8 pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the pNFS
flexfiles client. In these circumstances, the client needs to treat the
ENETDOWN and ENETUNREACH errors as fatal, and should abandon the
attempted I/O.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21 12:44:24 -04:00
Trond Myklebust 9827144bfb NFS: Treat ENETUNREACH errors as fatal in containers
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic
NFS client. If the flag is set, the client will receive ENETDOWN and
ENETUNREACH errors from the RPC layer, and is expected to treat them as
being fatal.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21 12:44:19 -04:00
Trond Myklebust 487fae09d7 NFS: Add a mount option to make ENETUNREACH errors fatal
If the NFS client was initially created in a container, and that
container is torn down, there is usually no possibity to go back and
destroy any NFS clients that are hung because their virtual network
devices have been unlinked.

Add a flag that tells the NFS client that in these circumstances, it
should treat ENETDOWN and ENETUNREACH errors as fatal to the NFS client.

The option defaults to being on when the mount happens from inside a net
namespace that is not "init_net".

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21 12:44:09 -04:00
Anna Schumaker e171b96500 NFS: Add implid to sysfs
The Linux NFS server added support for returning this information during
an EXCHANGE_ID in Linux v6.13. This is something and admin might want to
query, so let's add it to sysfs.

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Link: https://lore.kernel.org/r/20250207204225.594002-2-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-21 09:34:52 -04:00
Benjamin Coddington cfe1f8776f NFS: Extend rdirplus mount option with "force|none"
There are certain users that wish to force the NFS client to choose
READDIRPLUS over READDIR for a particular mount.  Update the "rdirplus" mount
option to optionally accept values.  For "rdirplus=force", the NFS client
will always attempt to use READDDIRPLUS.  The setting of "rdirplus=none" is
aliased to the existing "nordirplus".

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/c4cf0de4c8be0930b91bc74bee310d289781cd3b.1741885071.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-21 09:34:52 -04:00
Chuck Lever 2d6a194f4b NFS: Refactor trace_nfs4_offload_cancel
Add a trace_nfs4_offload_status trace point that looks just like
trace_nfs4_offload_cancel. Promote that event to an event class to
avoid duplicating code.

An alternative approach would be to expand trace_nfs4_offload_status
to report more of the actual OFFLOAD_STATUS result.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250113153235.48706-16-cel@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Chuck Lever adcc0aef9e NFS: Use NFSv4.2's OFFLOAD_STATUS operation
We've found that there are cases where a transport disconnection
results in the loss of callback RPCs. NFS servers typically do not
retransmit callback operations after a disconnect.

This can be a problem for the Linux NFS client's current
implementation of asynchronous COPY, which waits indefinitely for a
CB_OFFLOAD callback. If a transport disconnect occurs while an async
COPY is running, there's a good chance the client will never get the
completing CB_OFFLOAD.

Fix this by implementing the OFFLOAD_STATUS operation so that the
Linux NFS client can probe the NFS server if it doesn't see a
CB_OFFLOAD in a reasonable amount of time.

This patch implements a simplistic check. As future work, the client
might also be able to detect whether there is no forward progress on
the request asynchronous COPY operation, and CANCEL it.

Suggested-by: Olga Kornievskaia <kolga@netapp.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250113153235.48706-15-cel@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Chuck Lever 77dd8a302f NFS: Implement NFSv4.2's OFFLOAD_STATUS operation
Enable the Linux NFS client to observe the progress of an offloaded
asynchronous COPY operation. This new operation will be put to use
in a subsequent patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250113153235.48706-14-cel@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Chuck Lever 8955e7ce61 NFS: Implement NFSv4.2's OFFLOAD_STATUS XDR
Add XDR encoding and decoding functions for the NFSv4.2
OFFLOAD_STATUS operation.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/20250113153235.48706-13-cel@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
NeilBrown 43502f6e8d NFS: fix open_owner_id_maxsz and related fields.
A recent change increased the size of an NFSv4 open owner, but didn't
increase the corresponding max_sz defines.  This is not know to have
caused failure, but should be fixed.

This patch also fixes some relates _maxsz fields that are wrong.

Note that the XXX_owner_id_maxsz values now are only the size of the id
and do NOT include the len field that will always preceed the id in xdr
encoding.  I think this is clearer.

Reported-by: David Disseldorp <ddiss@suse.com>
Fixes: d98f722725 ("nfs: simplify and guarantee owner uniqueness.")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Trond Myklebust e767b59e29 NFSv4: Avoid unnecessary scans of filesystems for delayed delegations
The amount of looping through the list of delegations is occasionally
leading to soft lockups. If the state manager was asked to manage the
delayed return of delegations, then only scan those filesystems
containing delegations that were marked as being delayed.

Fixes: be20037725 ("NFSv4: Fix delegation return in cases where we have to retry")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Trond Myklebust f163aa81a7 NFSv4: Avoid unnecessary scans of filesystems for expired delegations
The amount of looping through the list of delegations is occasionally
leading to soft lockups.  If the state manager was asked to reap the
expired delegations, it should scan only those filesystems that hold
delegations that need to be reaped.

Fixes: 7f156ef0bf ("NFSv4: Clean up nfs_delegation_reap_expired()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Trond Myklebust 35a566a24e NFSv4: Avoid unnecessary scans of filesystems for returning delegations
The amount of looping through the list of delegations is occasionally
leading to soft lockups. If the state manager was asked to return
delegations asynchronously, it should only scan those filesystems that
hold delegations that need to be returned.

Fixes: af3b61bf61 ("NFSv4: Clean up nfs_client_return_marked_delegations()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:53 -04:00
Trond Myklebust 47acca884f NFSv4: Don't trigger uneccessary scans for return-on-close delegations
The amount of looping through the list of delegations is occasionally
leading to soft lockups. Avoid at least some loops by not requiring the
NFSv4 state manager to scan for delegations that are marked for
return-on-close. Instead, either mark them for immediate return (if
possible) or else leave it up to nfs4_inode_return_delegation_on_close()
to return them once the file is closed by the application.

Fixes: b757144fd7 ("NFSv4: Be less aggressive about returning delegations for open files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-17 16:51:52 -04:00
Mike Snitzer 9254c8ae9b nfsd: disallow file locking and delegations for NFSv4 reexport
We do not and cannot support file locking with NFS reexport over
NFSv4.x for the same reason we don't do it for NFSv3: NFS reexport
server reboot cannot allow clients to recover locks because the source
NFS server has not rebooted, and so it is not in grace.  Since the
source NFS server is not in grace, it cannot offer any guarantees that
the file won't have been changed between the locks getting lost and
any attempt to recover/reclaim them.  The same applies to delegations
and any associated locks, so disallow them too.

Clients are no longer allowed to get file locks or delegations from a
reexport server, any attempts will fail with operation not supported.

Update the "Reboot recovery" section accordingly in
Documentation/filesystems/nfs/reexport.rst

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-10 09:11:08 -04:00
Linus Torvalds 1110ce6a1e 33 hotfixes. 24 are cc:stable and the remainder address post-6.13 issues
or aren't considered necessary for -stable kernels.
 
 26 are for MM and 7 are for non-MM.
 
 - "mm: memory_failure: unmap poisoned folio during migrate properly"
   from Ma Wupeng fixes a couple of two year old bugs involving the
   migration of hwpoisoned folios.
 
 - "selftests/damon: three fixes for false results" from SeongJae Park
   fixes three one year old bugs in the SAMON selftest code.
 
 The remainder are singletons and doubletons.  Please see the individual
 changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ8zgnAAKCRDdBJ7gKXxA
 jmTzAP9LsTIpkPkRXDpwxPR/Si5KPwOkE6sGj4ETEqbX3vUvcAEA/Lp7oafc7Vqr
 XxlC1VFush1ZK29Tecxzvnapl2/VSAs=
 =zBvY
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "33 hotfixes. 24 are cc:stable and the remainder address post-6.13
  issues or aren't considered necessary for -stable kernels.

  26 are for MM and 7 are for non-MM.

   - "mm: memory_failure: unmap poisoned folio during migrate properly"
     from Ma Wupeng fixes a couple of two year old bugs involving the
     migration of hwpoisoned folios.

   - "selftests/damon: three fixes for false results" from SeongJae Park
     fixes three one year old bugs in the SAMON selftest code.

  The remainder are singletons and doubletons. Please see the individual
  changelogs for details"

* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
  mm/page_alloc: fix uninitialized variable
  rapidio: add check for rio_add_net() in rio_scan_alloc_net()
  rapidio: fix an API misues when rio_add_net() fails
  MAINTAINERS: .mailmap: update Sumit Garg's email address
  Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
  mm: fix finish_fault() handling for large folios
  mm: don't skip arch_sync_kernel_mappings() in error paths
  mm: shmem: remove unnecessary warning in shmem_writepage()
  userfaultfd: fix PTE unmapping stack-allocated PTE copies
  userfaultfd: do not block on locking a large folio with raised refcount
  mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
  mm: shmem: fix potential data corruption during shmem swapin
  mm: fix kernel BUG when userfaultfd_move encounters swapcache
  selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
  selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
  selftests/damon/damos_quota: make real expectation of quota exceeds
  include/linux/log2.h: mark is_power_of_2() with __always_inline
  NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
  mm, swap: avoid BUG_ON in relocate_cluster()
  mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
  ...
2025-03-08 14:34:06 -10:00
Mike Snitzer ce6d9c1c2b NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so
nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd.

Otherwise NFS can deadlock waiting for kcompactd enduced writeback which
recurses back to NFS (which triggers writeback to NFSD via NFS loopback
mount on the same host, NFSD blocks waiting for XFS's call to
__filemap_get_folio):

6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds.

{---
[58] "kcompactd0"
[<0>] folio_wait_bit+0xe8/0x200
[<0>] folio_wait_writeback+0x2b/0x80
[<0>] nfs_wb_folio+0x80/0x1b0 [nfs]
[<0>] nfs_release_folio+0x68/0x130 [nfs]
[<0>] split_huge_page_to_list_to_order+0x362/0x840
[<0>] migrate_pages_batch+0x43d/0xb90
[<0>] migrate_pages_sync+0x9a/0x240
[<0>] migrate_pages+0x93c/0x9f0
[<0>] compact_zone+0x8e2/0x1030
[<0>] compact_node+0xdb/0x120
[<0>] kcompactd+0x121/0x2e0
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x40
[<0>] ret_from_fork_asm+0x1a/0x30
---}

[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org
Fixes: 96780ca55e ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05 21:36:15 -08:00
NeilBrown 8376583b84
nfs: change mkdir inode_operation to return alternate dentry if needed.
mkdir now allows a different dentry to be returned which is sometimes
relevant for nfs.

This patch changes the nfs_rpc_ops mkdir op to return a dentry, and
passes that back to the caller.

The mkdir nfs_rpc_op will return NULL if the original dentry should be
used.  This matches the mkdir inode_operation.

nfs4_do_create() is duplicated to nfs4_do_mkdir() which is changed to
handle the specifics of directories.  Consequently the current special
handling for directories is removed from nfs4_do_create()

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-6-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-05 11:52:50 +01:00
NeilBrown 88d5baf690
Change inode_operations.mkdir to return struct dentry *
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g.  on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns.  For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.

This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes.  Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.

This lookup-after-mkdir requires that the directory remains locked to
avoid races.  Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.

To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
  NULL - the directory was created and no other dentry was used
  ERR_PTR() - an error occurred
  non-NULL - this other dentry was spliced in

This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations.  Subsequent patches will make
further changes to some file-systems to return a correct dentry.

Not all filesystems reliably result in a positive hashed dentry:

- NFS, cifs, hostfs will sometimes need to perform a lookup of
  the name to get inode information.  Races could result in this
  returning something different. Note that this lookup is
  non-atomic which is what we are trying to avoid.  Placing the
  lookup in filesystem code means it only happens when the filesystem
  has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
  operation ensures that lookup will be called to correctly populate
  the dentry.  This could be fixed but I don't think it is important
  to any of the users of vfs_mkdir() which look at the dentry.

The recommendation to use
    d_drop();d_splice_alias()
is ugly but fits with current practice.  A planned future patch will
change this.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 20:00:17 +01:00
Christian Brauner 71628584df
Merge patch series "prep patches for my mkdir series"
NeilBrown <neilb@suse.de> says:

These two patches are cleanup are dependencies for my mkdir changes and
subsequence directory locking changes.

* patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits)
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()

Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-27 09:25:34 +01:00
NeilBrown 3ff6c8707c
nfs/vfs: discard d_exact_alias()
d_exact_alias() is a descendent of d_add_unique() which was introduced
20 years ago mostly likely to work around problems with NFS servers of
the time.  It is now not used in several situations were it was
originally needed and there have been no reports of problems -
presumably the old NFS servers have been improved.  This only place it
is now use is in NFSv4 code and the old problematic servers are thought
to have been v2/v3 only.

There is no clear benefit in reusing a unhashed() dentry which happens
to have the same name as the dentry we are adding.

So this patch removes d_exact_alias() and the one place that it is used.

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250226062135.2043651-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26 09:55:18 +01:00
Stephen Smalley 9084ed79dd lsm,nfs: fix memory leak of lsm_context
commit b530104f50 ("lsm: lsm_context in security_dentry_init_security")
did not preserve the lsm id for subsequent release calls, which results
in a memory leak. Fix it by saving the lsm id in the nfs4_label and
providing it on the subsequent release call.

Fixes: b530104f50 ("lsm: lsm_context in security_dentry_init_security")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-25 15:07:24 -05:00
Trond Myklebust 8f8df955f0 NFSv4: Fix a deadlock when recovering state on a sillyrenamed file
If the file is sillyrenamed, and slated for delete on close, it is
possible for a server reboot to triggeer an open reclaim, with can again
race with the application call to close(). When that happens, the call
to put_nfs_open_context() can trigger a synchronous delegreturn call
which deadlocks because it is not marked as privileged.

Instead, ensure that the call to nfs4_inode_return_delegation_on_close()
catches the delegreturn, and schedules it asynchronously.

Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Fixes: adb4b42d19 ("Return the delegation when deleting sillyrenamed files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-19 16:45:24 -05:00
Trond Myklebust 88025c67fe NFS: Adjust delegated timestamps for O_DIRECT reads and writes
Adjust the timestamps if O_DIRECT is being combined with attribute
delegations.

Fixes: e12912d941 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-19 16:45:24 -05:00
Trond Myklebust fcf857ee19 NFS: O_DIRECT writes must check and adjust the file length
While it is uncommon for delegations to be held while O_DIRECT writes
are in progress, it is possible. The xfstests generic/647 and
generic/729 both end up triggering that state, and end up failing due to
the fact that the file size is not adjusted.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219738
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-02-19 16:45:23 -05:00
NeilBrown 204a575e91
VFS: add common error checks to lookup_one_qstr_excl()
Callers of lookup_one_qstr_excl() often check if the result is negative or
positive.
These changes can easily be moved into lookup_one_qstr_excl() by checking the
lookup flags:
LOOKUP_CREATE means it is NOT an error if the name doesn't exist.
LOOKUP_EXCL means it IS an error if the name DOES exist.

This patch adds these checks, then removes error checks from callers,
and ensures that appropriate flags are passed.

This subtly changes the meaning of LOOKUP_EXCL.  Previously it could
only accompany LOOKUP_CREATE.  Now it can accompany LOOKUP_RENAME_TARGET
as well.  A couple of small changes are needed to accommodate this.  The
NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does
exactly what the name says.

Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250217003020.3170652-3-neilb@suse.de
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-19 14:09:15 +01:00
Linus Torvalds d3d90cc289 Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name
 and parent have their stability guaranteed by the callers;
 ->d_revalidate() is the major exception.
 
 It's easy enough for callers to supply stable values for
 expected name and expected parent of the dentry being
 validated.  That kills quite a bit of boilerplate in
 ->d_revalidate() instances, along with a bunch of races
 where they used to access ->d_name without sufficient
 precautions.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
 6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
 xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
 =TWY8
 -----END PGP SIGNATURE-----

Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs d_revalidate updates from Al Viro:
 "Provide stable parent and name to ->d_revalidate() instances

  Most of the filesystem methods where we care about dentry name and
  parent have their stability guaranteed by the callers;
  ->d_revalidate() is the major exception.

  It's easy enough for callers to supply stable values for expected name
  and expected parent of the dentry being validated. That kills quite a
  bit of boilerplate in ->d_revalidate() instances, along with a bunch
  of races where they used to access ->d_name without sufficient
  precautions"

* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: fix ->rename_sem exclusion
  orangefs_d_revalidate(): use stable parent inode and name passed by caller
  ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
  nfs: fix ->d_revalidate() UAF on ->d_name accesses
  nfs{,4}_lookup_validate(): use stable parent inode passed by caller
  gfs2_drevalidate(): use stable parent inode and name passed by caller
  fuse_dentry_revalidate(): use stable parent inode and name passed by caller
  vfat_revalidate{,_ci}(): use stable parent inode passed by caller
  exfat_d_revalidate(): use stable parent inode passed by caller
  fscrypt_d_revalidate(): use stable parent inode passed by caller
  ceph_d_revalidate(): propagate stable name down into request encoding
  ceph_d_revalidate(): use stable parent inode passed by caller
  afs_d_revalidate(): use stable name and parent inode passed by caller
  Pass parent directory inode and expected name to ->d_revalidate()
  generic_ci_d_compare(): use shortname_storage
  ext4 fast_commit: make use of name_snapshot primitives
  dissolve external_name.u into separate members
  make take_dentry_name_snapshot() lockless
  dcache: back inline names with a struct-wrapped array of unsigned long
  make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-30 09:13:35 -08:00
Linus Torvalds af13ff1c33 Summary:
All ctl_table declared outside of functions and that remain unmodified after
   initialization are const qualified. This prevents unintended modifications to
   proc_handler function pointers by placing them in the .rodata section. This is
   a continuation of the tree-wide effort started a few releases ago with the
   constification of the ctl_table struct arguments in the sysctl API done in
   78eb4ea25c ("sysctl: treewide: constify the ctl_table argument of
   proc_handlers")
 
 Testing:
 
   Testing was done on 0-day and sysctl selftests in x86_64. The linux-next
   branch was not used for such a big change in order to avoid unnecessary merge
   conflicts
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmeY6L0ACgkQupfNUreW
 QU/REwwAizeoFg3XyfwvGsjKUJKvZ8Ltnv3n4+tkd687UAQJnJHPE7/ODR8hKbpE
 E56G12jFlKQyiFR01wg+cbOy6+TTOT9o5qVmLZbo/zmI491Ygkxqen0Y0Z2mGXqR
 FMqcI8ZBmAAYfUKDjjUo+xUI70aNikWOOKRSmJp4cpgm5242d/UN7sOuKkOgt5DY
 GiyjPGlpKFkcYN4bOegKhlfZKdr9BMFxSgN0TZLtensj6cDrkZyLsrdgmVXy1mRT
 0xTnmonGehweog4XY4hSPt2l6uCUu1fiY/WUcghKdWxUty43x9J3LahfD9b7DiAA
 G+DxHStSH0S/czWsa8Z0peyt/2gW8KZcRgk9W4UyVhpyDknXtVxr2sI3nxbTEFGl
 x2h6C29VCqg9Tn9oljEgGbYUrwlLz5Mah65JLDwlPLTpJmfA4BNbNxaC1V+DiqrX
 eApet8vaqGPlG7F3DRlyRAn7DoG8rs/eX93qqjbSA/pUjKjQUwCk/VBxNr1JBuNG
 elX+8QZi
 =x7aW
 -----END PGP SIGNATURE-----

Merge tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl table constification from Joel Granados:
 "All ctl_table declared outside of functions and that remain unmodified
  after initialization are const qualified.

  This prevents unintended modifications to proc_handler function
  pointers by placing them in the .rodata section.

  This is a continuation of the tree-wide effort started a few releases
  ago with the constification of the ctl_table struct arguments in the
  sysctl API done in 78eb4ea25c ("sysctl: treewide: constify the
  ctl_table argument of proc_handlers")"

* tag 'constfy-sysctl-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  treewide: const qualify ctl_tables where applicable
2025-01-29 10:35:40 -08:00
Linus Torvalds b88fe2b5dd NFS Client Updates for Linux 6.14
New Features:
   * Enable using direct IO with localio
   * Added localio related tracepoints
 
 Bugfixes:
   * Sunrpc fixes for working with a very large cl_tasks list
   * Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   * Fixes for handling reconnections with localio
   * Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   * Fix COPY_NOTIFY xdr_buf size calculations
   * pNFS/Flexfiles fix for retrying requesting a layout segment for reads
   * Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired
 
 Cleanups:
   * Various other nfs & nfsd localio cleanups
   * Prepratory patches for async copy improvements that are under development
   * Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts
   * Add netns inum and srcaddr to debugfs rpc_xprt info
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmeZUzUACgkQ18tUv7Cl
 QOvArw/9HltIlcJHbi7tApGJ4dFpuJCa/fHbA1n5bHvKrCR5aElmFZoiFDdsM1JX
 kFAlMED9n1dW9VmzLJcepxmrLo/t7KXueiZNharHynWTxcszSl6jS+tOFBW6OflG
 Rrrjq/SrsWI2Fu8X4e/7ZV7pqRLGGn5SSMwgbuMbcyzBvVgN8mZM/BneIp1J59AI
 5NOsif5KWetVhQc43zlRlbVWR5cvNGcUK4i58LIaPFzPMt0xq/XJI+QWffj6kv4g
 cHabCNYTdQYMkhiPQC+LLYkw6sMbw2NatajTTYNMWfR/I+7wz9k5ej6CHKPIFCSr
 xjmscypySTLfMFQjrDFZkpX2CwSp/VIbV6go36DJwAlcCRzqz+I7cajlrRK4zvyr
 DyrcaZHvClEczP9QqdPj2wqRXbmIOsDMksOu4ACTUImd4o3f2v1K6DcwRj9oUIhV
 AGR31OEMt2A+RaVvVZYR4PpixJ01vH9LcmsaOu5KkHX8X4q2osQ7eMy+FV4kV09S
 pMnxDMAyszJU8IuzUG1/HfkonNlDMivIbqpgG4ZaVW08Nq4mCxJll1vTAa9FTLz2
 z+9eocqKwf724q1RAgOB7vj4AwOwL4Ul6d18UBtyUitZz3ndLRZ8Yy6r/AhrpCsC
 3co0Y3znZbKeRjmReNl0GLG4qiKE+E7Xh23Lf3IqXg8GE2Mu+Ls=
 =srvH
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Enable using direct IO with localio
   - Added localio related tracepoints

  Bugfixes:
   - Sunrpc fixes for working with a very large cl_tasks list
   - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
   - Fixes for handling reconnections with localio
   - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
   - Fix COPY_NOTIFY xdr_buf size calculations
   - pNFS/Flexfiles fix for retrying requesting a layout segment for
     reads
   - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
     expired

  Cleanups:
   - Various other nfs & nfsd localio cleanups
   - Prepratory patches for async copy improvements that are under
     development
   - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
     xprts
   - Add netns inum and srcaddr to debugfs rpc_xprt info"

* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
  SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
  sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
  pnfs/flexfiles: retry getting layout segment for reads
  NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
  NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
  NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
  NFS: Rename struct nfs4_offloadcancel_data
  NFS: Fix typo in OFFLOAD_CANCEL comment
  NFS: CB_OFFLOAD can return NFS4ERR_DELAY
  nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
  nfs: fix incorrect error handling in LOCALIO
  nfs: probe for LOCALIO when v3 client reconnects to server
  nfs: probe for LOCALIO when v4 client reconnects to server
  nfs/localio: remove redundant code and simplify LOCALIO enablement
  nfs_common: add nfs_localio trace events
  nfs_common: track all open nfsd_files per LOCALIO nfs_client
  nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
  nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
  nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
  nfsd: update percpu_ref to manage references on nfsd_net
  ...
2025-01-28 14:23:46 -08:00
Joel Granados 1751f872cc treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25c ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-01-28 13:48:37 +01:00
Linus Torvalds f34b580514 NFSD 6.14 Release Notes
Jeff Layton contributed an implementation of NFSv4.2+ attribute
 delegation, as described here:
 
 https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
 
 This interoperates with similar functionality introduced into the
 Linux NFS client in v6.11. An attribute delegation permits an NFS
 client to manage a file's mtime, rather than flushing dirty data to
 the NFS server so that the file's mtime reflects the last write,
 which is considerably slower.
 
 Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
 This facility enables NFSD to increase or decrease the number of
 slots per NFS session depending on server memory availability. More
 session slots means greater parallelism.
 
 Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
 encoding screws up when crossing a page boundary in the encoding
 buffer. This is a zero-day bug, but hitting it is rare and depends
 on the NFS client implementation. The Linux NFS client does not
 happen to trigger this issue.
 
 A variety of bug fixes and other incremental improvements fill out
 the list of commits in this release. Great thanks to all
 contributors, reviewers, testers, and bug reporters who participated
 during this development cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeVPBUACgkQM2qzM29m
 f5dClxAAmW4O2bOJaR8neJ54fzeFYXtFYEXRF/XbIh2KdnCy2LoywT9ux8ndzE0k
 1tsjtv0g4y84IcYfrxXPRTuhh2GO2pHw5L1kGVRezJg2ODSFbpdzGtcVK7SIrs2S
 eijTViqdi9xRgfj1jPqRrvxC99RL3/fmCztqorAPLDFsqYAd/6ZRxZ7+IcZ2h4+J
 cJ0Z6Wx6eh10roacZPXweH13XJ7xWO/ublYZvQFQpK2BAKyO98aXGgLDraNt8k60
 X3DZLSKkGB/eBlNeAlTtcrXec/ot6XGJPKr3b/7zhwfMi8B13RGdSmCR8SxMdRQM
 vCQO4G2YadU4YFS6FFIw9Wc1XDYUuYh2YgcveafjzjXbgi7NnY7rYOxtnTgBi0Xv
 XGjtGqpvD676gPm+8b3DcwqmWI3c/WUdtQIZ1uYRCFZdFqkVP91bySPT2aHtx2GO
 4j3uEyTlypC00kyvu1oL3+tVUG/EFlJCpYvIbOwqDG2m7KWPStzpfJTD5Q9cdlEl
 fdgs6l82EVqe1YyjLTqajDuOcRrLYK2hlR/5STc03hQV+GpKSo5UypRejzE9WtRV
 zT/tyelqhj4+0EZJz4ay/8q9s2Jp+5JGVxoVvjujSuH7+Ulb3T+IDkldtMamO8Fm
 www2y0/fLfU2xIapMJdCoJ+ZKgel2i8RZMPIc0cIfO5ITXm+dOs=
 =hwXx
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "Jeff Layton contributed an implementation of NFSv4.2+ attribute
  delegation, as described here:

    https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html

  This interoperates with similar functionality introduced into the
  Linux NFS client in v6.11. An attribute delegation permits an NFS
  client to manage a file's mtime, rather than flushing dirty data to
  the NFS server so that the file's mtime reflects the last write, which
  is considerably slower.

  Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
  This facility enables NFSD to increase or decrease the number of slots
  per NFS session depending on server memory availability. More session
  slots means greater parallelism.

  Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
  encoding screws up when crossing a page boundary in the encoding
  buffer. This is a zero-day bug, but hitting it is rare and depends on
  the NFS client implementation. The Linux NFS client does not happen to
  trigger this issue.

  A variety of bug fixes and other incremental improvements fill out the
  list of commits in this release. Great thanks to all contributors,
  reviewers, testers, and bug reporters who participated during this
  development cycle"

* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
  sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
  sunrpc: Remove gss_generic_token deadcode
  sunrpc: Remove unused xprt_iter_get_xprt
  Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
  nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
  nfsd: handle delegated timestamps in SETATTR
  nfsd: add support for delegated timestamps
  nfsd: rework NFS4_SHARE_WANT_* flag handling
  nfsd: add support for FATTR4_OPEN_ARGUMENTS
  nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
  nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
  nfsd: switch to autogenerated definitions for open_delegation_type4
  nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
  nfsd: fix handling of delegated change attr in CB_GETATTR
  SUNRPC: Document validity guarantees of the pointer returned by reserve_space
  NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
  NFSD: Refactor nfsd4_do_encode_secinfo() again
  NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
  NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
  ...
2025-01-27 17:27:24 -08:00
Al Viro ffeeaada2b nfs: fix ->d_revalidate() UAF on ->d_name accesses
Pass the stable name all the way down to ->rpc_ops->lookup() instances.

Note that passing &dentry->d_name is safe in e.g. nfs_lookup() - it *is*
stable there, as it is in ->create() et.al.

dget_parent() in nfs_instantiate() should be redundant - it'd better be
stable there; if it's not, we have more trouble, since ->d_name would
also be unsafe in such case.

nfs_submount() and nfs4_submount() may or may not require fixes - if
they ever get moved on server with fhandle preserved, we are in trouble
there...

UAF window is fairly narrow here and exfiltration requires the ability
to watch the traffic.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro 39f644a266 nfs{,4}_lookup_validate(): use stable parent inode passed by caller
we can't kill __nfs_lookup_revalidate() completely, but ->d_parent boilerplate
in it is gone

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:24 -05:00
Al Viro 5be1fa8abd Pass parent directory inode and expected name to ->d_revalidate()
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller.  We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.

It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable.  There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.

It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.

This commit only changes the calling conventions; making use of supplied
values is left to followups.

NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate.  This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).

One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'.  Do not
ignore name->len.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27 19:25:23 -05:00
Mike Snitzer eb3fabde15 pnfs/flexfiles: retry getting layout segment for reads
If ff_layout_pg_get_read()'s attempt to get a layout segment results
in -EAGAIN have ff_layout_pg_init_read() retry it after sleeping.

If "softerr" mount is used, use 'io_maxretrans' to limit the number of
attempts to get a layout segment.

This fixes a long-standing issue of O_DIRECT reads failing with
-EAGAIN (11) when using flexfiles Client Side Mirroring (CSM).

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-22 14:26:46 -05:00
Linus Torvalds f96a974170 lsm/stable-6.14 PR 20250121
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmeQFBoUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPvcA//XCdwMz0bGtWKv58nuyP8vkQx08n6
 //olz/O8te3uWK5O3kRiarzFLwH8qsHQ6A7GYalwwix34hatR4ndJE0Y/guVRWa1
 +aBmJxJ7Jm/q3fvpAEfqiSgreuE6kBoztlDOWEq+hUQGu4qfnQGm2EnvbvfFrAmN
 VheOfIQSU2KCL/Scc3FGnF6uru4WrqN0JJ9RbvrEpfdQgmcyTGLnQsZLljutWSIq
 kDWkteIr7cj3O9J45zpxZsTftvYSgVn/y1iKeXbHI4DBA1eheK12vsHB9AADKI1J
 GwHxOrnLpZtv+ICUKqcfFTmWTl+NmfJJurAT5KXKdBjL3xM5MoJlBvK1A5qE9CMo
 LaHVG/TZR2MmBaoM3EN+gvWhDgWlvT02Q/0cYaafTlVLMez3HtfctxN6OnCvTXTB
 Y8dqYClhhlBm/mHQwYfMoeKw4MftUpzEqBd1Nj7Qe8dbP0f/62Ca3K2B3D6Rf8QV
 pj3ryMlSWYV9mdTerruLNQexTGoN7l66jPwzdWpTbFeL3WmNtfCako8OZGbXgPIu
 Iahm3P+jnSVx8ZQro2c9zwdKXI5xiI335pCBbDZ8aX+JAsfj0OofHsFx5Q5diber
 M7tAEhxDqRisbpz7Ei+/LOAEGg2Z619XKg8ks4z6Y4P5PF7zEgeWTkZJk2iLbxXe
 6LLOjmF7LLw+G4M=
 =fgyr
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Improved handling of LSM "secctx" strings through lsm_context struct

   The LSM secctx string interface is from an older time when only one
   LSM was supported, migrate over to the lsm_context struct to better
   support the different LSMs we now have and make it easier to support
   new LSMs in the future.

   These changes explain the Rust, VFS, and networking changes in the
   diffstat.

 - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are
   enabled

   Small tweak to be a bit smarter about when we build the LSM's common
   audit helpers.

 - Check for absurdly large policies from userspace in SafeSetID

   SafeSetID policies rules are fairly small, basically just "UID:UID",
   it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which
   helps quiet a number of syzbot related issues. While work is being
   done to address the syzbot issues through other mechanisms, this is a
   trivial and relatively safe fix that we can do now.

 - Various minor improvements and cleanups

   A collection of improvements to the kernel selftests, constification
   of some function parameters, removing redundant assignments, and
   local variable renames to improve readability.

* tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: initialize local array before use to quiet static analysis
  safesetid: check size of policy writes
  net: corrections for security_secid_to_secctx returns
  lsm: rename variable to avoid shadowing
  lsm: constify function parameters
  security: remove redundant assignment to return variable
  lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set
  selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test
  binder: initialize lsm_context structure
  rust: replace lsm context+len with lsm_context
  lsm: secctx provider check on release
  lsm: lsm_context in security_dentry_init_security
  lsm: use lsm_context in security_inode_getsecctx
  lsm: replace context+len with lsm_context
  lsm: ensure the correct LSM context releaser
2025-01-21 20:03:04 -08:00
Olga Kornievskaia 0b96c75d86 NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
LAYOUTSTATS and LAYOUTERROR should be marked MOVEABLE for when we
need to move tasks off a non-functional transport.

Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-21 11:34:50 -05:00