xfs: new code for v6.19

Signed-off-by: Carlos Maiolino <cem@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQSmtYVZ/MfVMGUq1GNcsMJ8RxYuYwUCaS1/jAAKCRBcsMJ8RxYu
 Y+jfAYCF6akuDB6lwjCCIhiRk2tJjVeaTF+NVdztSBwFOnbU7uVOznMUooo4PCN4
 fshIslYBgP0s9DBa5DAP//7PE8IwIbuftrly6UmQMvhJFLg5ZRZaQCSKJK77JhjJ
 REn0g8+Ifg==
 =wHN9
 -----END PGP SIGNATURE-----

Merge tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Carlos Maiolino:
 "There are no major changes in xfs. This contains mostly some code
  cleanups, a few bug fixes and documentation update. Highlights are:

   - Quota locking cleanup

   - Getting rid of old xlog_in_core_2_t type"

* tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits)
  docs: remove obsolete links in the xfs online repair documentation
  xfs: move some code out of xfs_iget_recycle
  xfs: use zi more in xfs_zone_gc_mount
  xfs: remove the unused bv field in struct xfs_gc_bio
  xfs: remove xarray mark for reclaimable zones
  xfs: remove the xlog_in_core_t typedef
  xfs: remove l_iclog_heads
  xfs: remove the xlog_rec_header_t typedef
  xfs: remove xlog_in_core_2_t
  xfs: remove a very outdated comment from xlog_alloc_log
  xfs: cleanup xlog_alloc_log a bit
  xfs: don't use xlog_in_core_2_t in struct xlog_in_core
  xfs: add a on-disk log header cycle array accessor
  xfs: add a XLOG_CYCLE_DATA_SIZE constant
  xfs: reduce ilock roundtrips in xfs_qm_vop_dqalloc
  xfs: move xfs_dquot_tree calls into xfs_qm_dqget_cache_{lookup,insert}
  xfs: move quota locking into xrep_quota_item
  xfs: move quota locking into xqcheck_commit_dquot
  xfs: move q_qlock locking into xqcheck_compare_dquot
  xfs: move q_qlock locking into xchk_quota_item
  ...
This commit is contained in:
Linus Torvalds 2025-12-03 20:19:38 -08:00
commit 3ed1c68307
29 changed files with 364 additions and 744 deletions

View File

@ -105,10 +105,8 @@ occur; this capability aids both strategies.
TLDR; Show Me the Code! TLDR; Show Me the Code!
----------------------- -----------------------
Code is posted to the kernel.org git trees as follows: Kernel and userspace code has been fully merged as of October 2025.
`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
Each kernel patchset adding an online repair function will use the same branch Each kernel patchset adding an online repair function will use the same branch
name across the kernel, xfsprogs, and fstests git repos. name across the kernel, xfsprogs, and fstests git repos.
@ -764,12 +762,8 @@ allow the online fsck developers to compare online fsck against offline fsck,
and they enable XFS developers to find deficiencies in the code base. and they enable XFS developers to find deficiencies in the code base.
Proposed patchsets include Proposed patchsets include
`general fuzzer improvements
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_,
`fuzzing baselines `fuzzing baselines
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_, <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_.
and `improvements in fuzz testing comprehensiveness
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_.
Stress Testing Stress Testing
-------------- --------------
@ -801,11 +795,6 @@ Success is defined by the ability to run all of these tests without observing
any unexpected filesystem shutdowns due to corrupted metadata, kernel hang any unexpected filesystem shutdowns due to corrupted metadata, kernel hang
check warnings, or any other sort of mischief. check warnings, or any other sort of mischief.
Proposed patchsets include `general stress testing
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
and the `evolution of existing per-function stress testing
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
4. User Interface 4. User Interface
================= =================
@ -886,10 +875,6 @@ apply as nice of a priority to IO and CPU scheduling as possible.
This measure was taken to minimize delays in the rest of the filesystem. This measure was taken to minimize delays in the rest of the filesystem.
No such hardening has been performed for the cron job. No such hardening has been performed for the cron job.
Proposed patchset:
`Enabling the xfs_scrub background service
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
Health Reporting Health Reporting
---------------- ----------------
@ -912,13 +897,6 @@ notifications and initiate a repair?
*Answer*: These questions remain unanswered, but should be a part of the *Answer*: These questions remain unanswered, but should be a part of the
conversation with early adopters and potential downstream users of XFS. conversation with early adopters and potential downstream users of XFS.
Proposed patchsets include
`wiring up health reports to correction returns
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
and
`preservation of sickness info during memory reclaim
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.
5. Kernel Algorithms and Data Structures 5. Kernel Algorithms and Data Structures
======================================== ========================================
@ -1310,21 +1288,6 @@ Space allocation records are cross-referenced as follows:
are there the same number of reverse mapping records for each block as the are there the same number of reverse mapping records for each block as the
reference count record claims? reference count record claims?
Proposed patchsets are the series to find gaps in
`refcount btree
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps>`_,
`inode btree
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-inobt-gaps>`_, and
`rmap btree
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-rmapbt-gaps>`_ records;
to find
`mergeable records
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-mergeable-records>`_;
and to
`improve cross referencing with rmap
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-strengthen-rmap-checking>`_
before starting a repair.
Checking Extended Attributes Checking Extended Attributes
```````````````````````````` ````````````````````````````
@ -1756,10 +1719,6 @@ For scrub, the drain works as follows:
To avoid polling in step 4, the drain provides a waitqueue for scrub threads to To avoid polling in step 4, the drain provides a waitqueue for scrub threads to
be woken up whenever the intent count drops to zero. be woken up whenever the intent count drops to zero.
The proposed patchset is the
`scrub intent drain series
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-drain-intents>`_.
.. _jump_labels: .. _jump_labels:
Static Keys (aka Jump Label Patching) Static Keys (aka Jump Label Patching)
@ -2036,10 +1995,6 @@ The ``xfarray_store_anywhere`` function is used to insert a record in any
null record slot in the bag; and the ``xfarray_unset`` function removes a null record slot in the bag; and the ``xfarray_unset`` function removes a
record from the bag. record from the bag.
The proposed patchset is the
`big in-memory array
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=big-array>`_.
Iterating Array Elements Iterating Array Elements
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^
@ -2172,10 +2127,6 @@ However, it should be noted that these repair functions only use blob storage
to cache a small number of entries before adding them to a temporary ondisk to cache a small number of entries before adding them to a temporary ondisk
file, which is why compaction is not required. file, which is why compaction is not required.
The proposed patchset is at the start of the
`extended attribute repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ series.
.. _xfbtree: .. _xfbtree:
In-Memory B+Trees In-Memory B+Trees
@ -2214,11 +2165,6 @@ xfiles enables reuse of the entire btree library.
Btrees built atop an xfile are collectively known as ``xfbtrees``. Btrees built atop an xfile are collectively known as ``xfbtrees``.
The next few sections describe how they actually work. The next few sections describe how they actually work.
The proposed patchset is the
`in-memory btree
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=in-memory-btrees>`_
series.
Using xfiles as a Buffer Cache Target Using xfiles as a Buffer Cache Target
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -2459,14 +2405,6 @@ This enables the log to release the old EFI to keep the log moving forwards.
EFIs have a role to play during the commit and reaping phases; please see the EFIs have a role to play during the commit and reaping phases; please see the
next section and the section about :ref:`reaping<reaping>` for more details. next section and the section about :ref:`reaping<reaping>` for more details.
Proposed patchsets are the
`bitmap rework
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-bitmap-rework>`_
and the
`preparation for bulk loading btrees
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_.
Writing the New Tree Writing the New Tree
```````````````````` ````````````````````
@ -2623,11 +2561,6 @@ The number of records for the inode btree is the number of xfarray records,
but the record count for the free inode btree has to be computed as inode chunk but the record count for the free inode btree has to be computed as inode chunk
records are stored in the xfarray. records are stored in the xfarray.
The proposed patchset is the
`AG btree repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
series.
Case Study: Rebuilding the Space Reference Counts Case Study: Rebuilding the Space Reference Counts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -2716,11 +2649,6 @@ Reverse mappings are added to the bag using ``xfarray_store_anywhere`` and
removed via ``xfarray_unset``. removed via ``xfarray_unset``.
Bag members are examined through ``xfarray_iter`` loops. Bag members are examined through ``xfarray_iter`` loops.
The proposed patchset is the
`AG btree repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
series.
Case Study: Rebuilding File Fork Mapping Indices Case Study: Rebuilding File Fork Mapping Indices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -2757,11 +2685,6 @@ EXTENTS format instead of BMBT, which may require a conversion.
Third, the incore extent map must be reloaded carefully to avoid disturbing Third, the incore extent map must be reloaded carefully to avoid disturbing
any delayed allocation extents. any delayed allocation extents.
The proposed patchset is the
`file mapping repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-file-mappings>`_
series.
.. _reaping: .. _reaping:
Reaping Old Metadata Blocks Reaping Old Metadata Blocks
@ -2843,11 +2766,6 @@ blocks.
As stated earlier, online repair functions use very large transactions to As stated earlier, online repair functions use very large transactions to
minimize the chances of this occurring. minimize the chances of this occurring.
The proposed patchset is the
`preparation for bulk loading btrees
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_
series.
Case Study: Reaping After a Regular Btree Repair Case Study: Reaping After a Regular Btree Repair
```````````````````````````````````````````````` ````````````````````````````````````````````````
@ -2943,11 +2861,6 @@ When the walk is complete, the bitmap disunion operation ``(ag_owner_bitmap &
btrees. btrees.
These blocks can then be reaped using the methods outlined above. These blocks can then be reaped using the methods outlined above.
The proposed patchset is the
`AG btree repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
series.
.. _rmap_reap: .. _rmap_reap:
Case Study: Reaping After Repairing Reverse Mapping Btrees Case Study: Reaping After Repairing Reverse Mapping Btrees
@ -2972,11 +2885,6 @@ methods outlined above.
The rest of the process of rebuildng the reverse mapping btree is discussed The rest of the process of rebuildng the reverse mapping btree is discussed
in a separate :ref:`case study<rmap_repair>`. in a separate :ref:`case study<rmap_repair>`.
The proposed patchset is the
`AG btree repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
series.
Case Study: Rebuilding the AGFL Case Study: Rebuilding the AGFL
``````````````````````````````` ```````````````````````````````
@ -3024,11 +2932,6 @@ more complicated, because computing the correct value requires traversing the
forks, or if that fails, leaving the fields invalid and waiting for the fork forks, or if that fails, leaving the fields invalid and waiting for the fork
fsck functions to run. fsck functions to run.
The proposed patchset is the
`inode
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-inodes>`_
repair series.
Quota Record Repairs Quota Record Repairs
-------------------- --------------------
@ -3045,11 +2948,6 @@ checking are obviously bad limits and timer values.
Quota usage counters are checked, repaired, and discussed separately in the Quota usage counters are checked, repaired, and discussed separately in the
section about :ref:`live quotacheck <quotacheck>`. section about :ref:`live quotacheck <quotacheck>`.
The proposed patchset is the
`quota
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quota>`_
repair series.
.. _fscounters: .. _fscounters:
Freezing to Fix Summary Counters Freezing to Fix Summary Counters
@ -3145,11 +3043,6 @@ long enough to check and correct the summary counters.
| This bug was fixed in Linux 5.17. | | This bug was fixed in Linux 5.17. |
+--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
The proposed patchset is the
`summary counter cleanup
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters>`_
series.
Full Filesystem Scans Full Filesystem Scans
--------------------- ---------------------
@ -3277,15 +3170,6 @@ Second, if the incore inode is stuck in some intermediate state, the scan
coordinator must release the AGI and push the main filesystem to get the inode coordinator must release the AGI and push the main filesystem to get the inode
back into a loadable state. back into a loadable state.
The proposed patches are the
`inode scanner
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
series.
The first user of the new functionality is the
`online quotacheck
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
series.
Inode Management Inode Management
```````````````` ````````````````
@ -3381,12 +3265,6 @@ To capture these nuances, the online fsck code has a separate ``xchk_irele``
function to set or clear the ``DONTCACHE`` flag to get the required release function to set or clear the ``DONTCACHE`` flag to get the required release
behavior. behavior.
Proposed patchsets include fixing
`scrub iget usage
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes>`_ and
`dir iget usage
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-dir-iget-fixes>`_.
.. _ilocking: .. _ilocking:
Locking Inodes Locking Inodes
@ -3443,11 +3321,6 @@ If the dotdot entry changes while the directory is unlocked, then a move or
rename operation must have changed the child's parentage, and the scan can rename operation must have changed the child's parentage, and the scan can
exit early. exit early.
The proposed patchset is the
`directory repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
series.
.. _fshooks: .. _fshooks:
Filesystem Hooks Filesystem Hooks
@ -3594,11 +3467,6 @@ The inode scan APIs are pretty simple:
- ``xchk_iscan_teardown`` to finish the scan - ``xchk_iscan_teardown`` to finish the scan
This functionality is also a part of the
`inode scanner
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
series.
.. _quotacheck: .. _quotacheck:
Case Study: Quota Counter Checking Case Study: Quota Counter Checking
@ -3686,11 +3554,6 @@ needing to hold any locks for a long duration.
If repairs are desired, the real and shadow dquots are locked and their If repairs are desired, the real and shadow dquots are locked and their
resource counts are set to the values in the shadow dquot. resource counts are set to the values in the shadow dquot.
The proposed patchset is the
`online quotacheck
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
series.
.. _nlinks: .. _nlinks:
Case Study: File Link Count Checking Case Study: File Link Count Checking
@ -3744,11 +3607,6 @@ shadow information.
If no parents are found, the file must be :ref:`reparented <orphanage>` to the If no parents are found, the file must be :ref:`reparented <orphanage>` to the
orphanage to prevent the file from being lost forever. orphanage to prevent the file from being lost forever.
The proposed patchset is the
`file link count repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-nlinks>`_
series.
.. _rmap_repair: .. _rmap_repair:
Case Study: Rebuilding Reverse Mapping Records Case Study: Rebuilding Reverse Mapping Records
@ -3828,11 +3686,6 @@ scan for reverse mapping records.
12. Free the xfbtree now that it not needed. 12. Free the xfbtree now that it not needed.
The proposed patchset is the
`rmap repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rmap-btree>`_
series.
Staging Repairs with Temporary Files on Disk Staging Repairs with Temporary Files on Disk
-------------------------------------------- --------------------------------------------
@ -3971,11 +3824,6 @@ Once a good copy of a data file has been constructed in a temporary file, it
must be conveyed to the file being repaired, which is the topic of the next must be conveyed to the file being repaired, which is the topic of the next
section. section.
The proposed patches are in the
`repair temporary files
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
series.
Logged File Content Exchanges Logged File Content Exchanges
----------------------------- -----------------------------
@ -4025,11 +3873,6 @@ The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag
in the superblock protects these new log item records from being replayed on in the superblock protects these new log item records from being replayed on
old kernels. old kernels.
The proposed patchset is the
`file contents exchange
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
series.
+--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
| **Sidebar: Using Log-Incompatible Feature Flags** | | **Sidebar: Using Log-Incompatible Feature Flags** |
+--------------------------------------------------------------------------+ +--------------------------------------------------------------------------+
@ -4323,11 +4166,6 @@ To repair the summary file, write the xfile contents into the temporary file
and use atomic mapping exchange to commit the new contents. and use atomic mapping exchange to commit the new contents.
The temporary file is then reaped. The temporary file is then reaped.
The proposed patchset is the
`realtime summary repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rtsummary>`_
series.
Case Study: Salvaging Extended Attributes Case Study: Salvaging Extended Attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -4369,11 +4207,6 @@ Salvaging extended attributes is done as follows:
4. Reap the temporary file. 4. Reap the temporary file.
The proposed patchset is the
`extended attribute repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_
series.
Fixing Directories Fixing Directories
------------------ ------------------
@ -4448,11 +4281,6 @@ Unfortunately, the current dentry cache design doesn't provide a means to walk
every child dentry of a specific directory, which makes this a hard problem. every child dentry of a specific directory, which makes this a hard problem.
There is no known solution. There is no known solution.
The proposed patchset is the
`directory repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
series.
Parent Pointers Parent Pointers
``````````````` ```````````````
@ -4612,11 +4440,6 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
7. Reap the temporary directory. 7. Reap the temporary directory.
The proposed patchset is the
`parent pointers directory repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
series.
Case Study: Repairing Parent Pointers Case Study: Repairing Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -4662,11 +4485,6 @@ directory reconstruction:
8. Reap the temporary file. 8. Reap the temporary file.
The proposed patchset is the
`parent pointers repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
series.
Digression: Offline Checking of Parent Pointers Digression: Offline Checking of Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -4755,11 +4573,6 @@ connectivity checks:
4. Move on to examining link counts, as we do today. 4. Move on to examining link counts, as we do today.
The proposed patchset is the
`offline parent pointers repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_
series.
Rebuilding directories from parent pointers in offline repair would be very Rebuilding directories from parent pointers in offline repair would be very
challenging because xfs_repair currently uses two single-pass scans of the challenging because xfs_repair currently uses two single-pass scans of the
filesystem during phases 3 and 4 to decide which files are corrupt enough to be filesystem during phases 3 and 4 to decide which files are corrupt enough to be
@ -4903,12 +4716,6 @@ Repairing the directory tree works as follows:
6. If the subdirectory has zero paths, attach it to the lost and found. 6. If the subdirectory has zero paths, attach it to the lost and found.
The proposed patches are in the
`directory tree repair
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_
series.
.. _orphanage: .. _orphanage:
The Orphanage The Orphanage
@ -4973,11 +4780,6 @@ Orphaned files are adopted by the orphanage as follows:
7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all 7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all
resources. resources.
The proposed patches are in the
`orphanage adoption
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-orphanage>`_
series.
6. Userspace Algorithms and Data Structures 6. Userspace Algorithms and Data Structures
=========================================== ===========================================
@ -5091,14 +4893,6 @@ first workqueue's workers until the backlog eases.
This doesn't completely solve the balancing problem, but reduces it enough to This doesn't completely solve the balancing problem, but reduces it enough to
move on to more pressing issues. move on to more pressing issues.
The proposed patchsets are the scrub
`performance tweaks
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-performance-tweaks>`_
and the
`inode scan rebalance
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-iscan-rebalance>`_
series.
.. _scrubrepair: .. _scrubrepair:
Scheduling Repairs Scheduling Repairs
@ -5179,20 +4973,6 @@ immediately.
Corrupt file data blocks reported by phase 6 cannot be recovered by the Corrupt file data blocks reported by phase 6 cannot be recovered by the
filesystem. filesystem.
The proposed patchsets are the
`repair warning improvements
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-better-repair-warnings>`_,
refactoring of the
`repair data dependency
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-data-deps>`_
and
`object tracking
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-object-tracking>`_,
and the
`repair scheduling
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-scheduling>`_
improvement series.
Checking Names for Confusable Unicode Sequences Checking Names for Confusable Unicode Sequences
----------------------------------------------- -----------------------------------------------
@ -5372,6 +5152,8 @@ The extra flexibility enables several new use cases:
This emulates an atomic device write in software, and can support arbitrary This emulates an atomic device write in software, and can support arbitrary
scattered writes. scattered writes.
(This functionality was merged into mainline as of 2025)
Vectorized Scrub Vectorized Scrub
---------------- ----------------
@ -5393,13 +5175,7 @@ It is hoped that ``io_uring`` will pick up enough of this functionality that
online fsck can use that instead of adding a separate vectored scrub system online fsck can use that instead of adding a separate vectored scrub system
call to XFS. call to XFS.
The relevant patchsets are the (This functionality was merged into mainline as of 2025)
`kernel vectorized scrub
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub>`_
and
`userspace vectorized scrub
<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub>`_
series.
Quality of Service Targets for Scrub Quality of Service Targets for Scrub
------------------------------------ ------------------------------------

View File

@ -98,6 +98,15 @@ xfs_group_max_blocks(
return xg->xg_mount->m_groups[xg->xg_type].blocks; return xg->xg_mount->m_groups[xg->xg_type].blocks;
} }
static inline xfs_rfsblock_t
xfs_groups_to_rfsbs(
struct xfs_mount *mp,
uint32_t nr_groups,
enum xfs_group_type type)
{
return (xfs_rfsblock_t)mp->m_groups[type].blocks * nr_groups;
}
static inline xfs_fsblock_t static inline xfs_fsblock_t
xfs_group_start_fsb( xfs_group_start_fsb(
struct xfs_group *xg) struct xfs_group *xg)

View File

@ -31,6 +31,7 @@ typedef uint32_t xlog_tid_t;
#define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */ #define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */
#define XLOG_MAX_RECORD_BSIZE (256*1024) #define XLOG_MAX_RECORD_BSIZE (256*1024)
#define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */ #define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */
#define XLOG_CYCLE_DATA_SIZE (XLOG_HEADER_CYCLE_SIZE / BBSIZE)
#define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */ #define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */
#define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */ #define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */
#define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */ #define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */
@ -125,7 +126,17 @@ struct xlog_op_header {
#define XLOG_FMT XLOG_FMT_LINUX_LE #define XLOG_FMT XLOG_FMT_LINUX_LE
#endif #endif
typedef struct xlog_rec_header { struct xlog_rec_ext_header {
__be32 xh_cycle; /* write cycle of log */
__be32 xh_cycle_data[XLOG_CYCLE_DATA_SIZE];
__u8 xh_reserved[252];
};
/* actual ext header payload size for checksumming */
#define XLOG_REC_EXT_SIZE \
offsetofend(struct xlog_rec_ext_header, xh_cycle_data)
struct xlog_rec_header {
__be32 h_magicno; /* log record (LR) identifier : 4 */ __be32 h_magicno; /* log record (LR) identifier : 4 */
__be32 h_cycle; /* write cycle of log : 4 */ __be32 h_cycle; /* write cycle of log : 4 */
__be32 h_version; /* LR version : 4 */ __be32 h_version; /* LR version : 4 */
@ -135,7 +146,7 @@ typedef struct xlog_rec_header {
__le32 h_crc; /* crc of log record : 4 */ __le32 h_crc; /* crc of log record : 4 */
__be32 h_prev_block; /* block number to previous LR : 4 */ __be32 h_prev_block; /* block number to previous LR : 4 */
__be32 h_num_logops; /* number of log operations in this LR : 4 */ __be32 h_num_logops; /* number of log operations in this LR : 4 */
__be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; __be32 h_cycle_data[XLOG_CYCLE_DATA_SIZE];
/* fields added by the Linux port: */ /* fields added by the Linux port: */
__be32 h_fmt; /* format of log record : 4 */ __be32 h_fmt; /* format of log record : 4 */
@ -160,30 +171,19 @@ typedef struct xlog_rec_header {
* (little-endian) architectures. * (little-endian) architectures.
*/ */
__u32 h_pad0; __u32 h_pad0;
} xlog_rec_header_t;
__u8 h_reserved[184];
struct xlog_rec_ext_header h_ext[];
};
#ifdef __i386__ #ifdef __i386__
#define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_size) #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_size)
#define XLOG_REC_SIZE_OTHER sizeof(struct xlog_rec_header) #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_pad0)
#else #else
#define XLOG_REC_SIZE sizeof(struct xlog_rec_header) #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_pad0)
#define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_size) #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_size)
#endif /* __i386__ */ #endif /* __i386__ */
typedef struct xlog_rec_ext_header {
__be32 xh_cycle; /* write cycle of log : 4 */
__be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */
} xlog_rec_ext_header_t;
/*
* Quite misnamed, because this union lays out the actual on-disk log buffer.
*/
typedef union xlog_in_core2 {
xlog_rec_header_t hic_header;
xlog_rec_ext_header_t hic_xheader;
char hic_sector[XLOG_HEADER_SIZE];
} xlog_in_core_2_t;
/* not an on-disk structure, but needed by log recovery in userspace */ /* not an on-disk structure, but needed by log recovery in userspace */
struct xfs_log_iovec { struct xfs_log_iovec {
void *i_addr; /* beginning address of region */ void *i_addr; /* beginning address of region */

View File

@ -174,9 +174,11 @@ xfs_check_ondisk_structs(void)
XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16); XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16);
XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32); XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32);
XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16); XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16);
XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 328); XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 512);
XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 260); XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 512);
XFS_CHECK_OFFSET(struct xlog_rec_header, h_reserved, 328);
XFS_CHECK_OFFSET(struct xlog_rec_ext_header, xh_reserved, 260);
XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16); XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16);
XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16); XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16);
XFS_CHECK_OFFSET(struct xfs_rui_log_format, rui_extents, 16); XFS_CHECK_OFFSET(struct xfs_rui_log_format, rui_extents, 16);

View File

@ -29,11 +29,9 @@ typedef uint8_t xfs_dqtype_t;
* flags for q_flags field in the dquot. * flags for q_flags field in the dquot.
*/ */
#define XFS_DQFLAG_DIRTY (1u << 0) /* dquot is dirty */ #define XFS_DQFLAG_DIRTY (1u << 0) /* dquot is dirty */
#define XFS_DQFLAG_FREEING (1u << 1) /* dquot is being torn down */
#define XFS_DQFLAG_STRINGS \ #define XFS_DQFLAG_STRINGS \
{ XFS_DQFLAG_DIRTY, "DIRTY" }, \ { XFS_DQFLAG_DIRTY, "DIRTY" }
{ XFS_DQFLAG_FREEING, "FREEING" }
/* /*
* We have the possibility of all three quota types being active at once, and * We have the possibility of all three quota types being active at once, and

View File

@ -64,12 +64,6 @@ struct xfs_rtgroup {
*/ */
#define XFS_RTG_FREE XA_MARK_0 #define XFS_RTG_FREE XA_MARK_0
/*
* For zoned RT devices this is set on groups that are fully written and that
* have unused blocks. Used by the garbage collection to pick targets.
*/
#define XFS_RTG_RECLAIMABLE XA_MARK_1
static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg) static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg)
{ {
return container_of(xg, struct xfs_rtgroup, rtg_group); return container_of(xg, struct xfs_rtgroup, rtg_group);
@ -371,4 +365,12 @@ static inline int xfs_initialize_rtgroups(struct xfs_mount *mp,
# define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) # define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP)
#endif /* CONFIG_XFS_RT */ #endif /* CONFIG_XFS_RT */
static inline xfs_rfsblock_t
xfs_rtgs_to_rfsbs(
struct xfs_mount *mp,
uint32_t nr_groups)
{
return xfs_groups_to_rfsbs(mp, nr_groups, XG_TYPE_RTG);
}
#endif /* __LIBXFS_RTGROUP_H */ #endif /* __LIBXFS_RTGROUP_H */

View File

@ -155,12 +155,9 @@ xchk_quota_item(
* We want to validate the bmap record for the storage backing this * We want to validate the bmap record for the storage backing this
* dquot, so we need to lock the dquot and the quota file. For quota * dquot, so we need to lock the dquot and the quota file. For quota
* operations, the locking order is first the ILOCK and then the dquot. * operations, the locking order is first the ILOCK and then the dquot.
* However, dqiterate gave us a locked dquot, so drop the dquot lock to
* get the ILOCK.
*/ */
xfs_dqunlock(dq);
xchk_ilock(sc, XFS_ILOCK_SHARED); xchk_ilock(sc, XFS_ILOCK_SHARED);
xfs_dqlock(dq); mutex_lock(&dq->q_qlock);
/* /*
* Except for the root dquot, the actual dquot we got must either have * Except for the root dquot, the actual dquot we got must either have
@ -251,6 +248,7 @@ xchk_quota_item(
xchk_quota_item_timer(sc, offset, &dq->q_rtb); xchk_quota_item_timer(sc, offset, &dq->q_rtb);
out: out:
mutex_unlock(&dq->q_qlock);
if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
return -ECANCELED; return -ECANCELED;
@ -330,7 +328,7 @@ xchk_quota(
xchk_dqiter_init(&cursor, sc, dqtype); xchk_dqiter_init(&cursor, sc, dqtype);
while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xchk_quota_item(&sqi, dq); error = xchk_quota_item(&sqi, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
break; break;
} }

View File

@ -184,17 +184,13 @@ xrep_quota_item(
/* /*
* We might need to fix holes in the bmap record for the storage * We might need to fix holes in the bmap record for the storage
* backing this dquot, so we need to lock the dquot and the quota file. * backing this dquot, so we need to lock the dquot and the quota file.
* dqiterate gave us a locked dquot, so drop the dquot lock to get the
* ILOCK_EXCL.
*/ */
xfs_dqunlock(dq);
xchk_ilock(sc, XFS_ILOCK_EXCL); xchk_ilock(sc, XFS_ILOCK_EXCL);
xfs_dqlock(dq); mutex_lock(&dq->q_qlock);
error = xrep_quota_item_bmap(sc, dq, &dirty); error = xrep_quota_item_bmap(sc, dq, &dirty);
xchk_iunlock(sc, XFS_ILOCK_EXCL); xchk_iunlock(sc, XFS_ILOCK_EXCL);
if (error) if (error)
return error; goto out_unlock_dquot;
/* Check the limits. */ /* Check the limits. */
if (dq->q_blk.softlimit > dq->q_blk.hardlimit) { if (dq->q_blk.softlimit > dq->q_blk.hardlimit) {
@ -246,7 +242,7 @@ xrep_quota_item(
xrep_quota_item_timer(sc, &dq->q_rtb, &dirty); xrep_quota_item_timer(sc, &dq->q_rtb, &dirty);
if (!dirty) if (!dirty)
return 0; goto out_unlock_dquot;
trace_xrep_dquot_item(sc->mp, dq->q_type, dq->q_id); trace_xrep_dquot_item(sc->mp, dq->q_type, dq->q_id);
@ -257,8 +253,10 @@ xrep_quota_item(
xfs_qm_adjust_dqtimers(dq); xfs_qm_adjust_dqtimers(dq);
} }
xfs_trans_log_dquot(sc->tp, dq); xfs_trans_log_dquot(sc->tp, dq);
error = xfs_trans_roll(&sc->tp); return xfs_trans_roll(&sc->tp);
xfs_dqlock(dq);
out_unlock_dquot:
mutex_unlock(&dq->q_qlock);
return error; return error;
} }
@ -513,7 +511,7 @@ xrep_quota_problems(
xchk_dqiter_init(&cursor, sc, dqtype); xchk_dqiter_init(&cursor, sc, dqtype);
while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xrep_quota_item(&rqi, dq); error = xrep_quota_item(&rqi, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
break; break;
} }

View File

@ -563,6 +563,7 @@ xqcheck_compare_dquot(
return -ECANCELED; return -ECANCELED;
} }
mutex_lock(&dq->q_qlock);
mutex_lock(&xqc->lock); mutex_lock(&xqc->lock);
error = xfarray_load_sparse(counts, dq->q_id, &xcdq); error = xfarray_load_sparse(counts, dq->q_id, &xcdq);
if (error) if (error)
@ -589,7 +590,9 @@ xqcheck_compare_dquot(
xchk_set_incomplete(xqc->sc); xchk_set_incomplete(xqc->sc);
error = -ECANCELED; error = -ECANCELED;
} }
out_unlock:
mutex_unlock(&xqc->lock); mutex_unlock(&xqc->lock);
mutex_unlock(&dq->q_qlock);
if (error) if (error)
return error; return error;
@ -597,10 +600,6 @@ xqcheck_compare_dquot(
return -ECANCELED; return -ECANCELED;
return 0; return 0;
out_unlock:
mutex_unlock(&xqc->lock);
return error;
} }
/* /*
@ -636,7 +635,7 @@ xqcheck_walk_observations(
return error; return error;
error = xqcheck_compare_dquot(xqc, dqtype, dq); error = xqcheck_compare_dquot(xqc, dqtype, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
return error; return error;
@ -674,7 +673,7 @@ xqcheck_compare_dqtype(
xchk_dqiter_init(&cursor, sc, dqtype); xchk_dqiter_init(&cursor, sc, dqtype);
while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xqcheck_compare_dquot(xqc, dqtype, dq); error = xqcheck_compare_dquot(xqc, dqtype, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
break; break;
} }

View File

@ -52,13 +52,11 @@ xqcheck_commit_dquot(
bool dirty = false; bool dirty = false;
int error = 0; int error = 0;
/* Unlock the dquot just long enough to allocate a transaction. */
xfs_dqunlock(dq);
error = xchk_trans_alloc(xqc->sc, 0); error = xchk_trans_alloc(xqc->sc, 0);
xfs_dqlock(dq);
if (error) if (error)
return error; return error;
mutex_lock(&dq->q_qlock);
xfs_trans_dqjoin(xqc->sc->tp, dq); xfs_trans_dqjoin(xqc->sc->tp, dq);
if (xchk_iscan_aborted(&xqc->iscan)) { if (xchk_iscan_aborted(&xqc->iscan)) {
@ -115,23 +113,12 @@ xqcheck_commit_dquot(
if (dq->q_id) if (dq->q_id)
xfs_qm_adjust_dqtimers(dq); xfs_qm_adjust_dqtimers(dq);
xfs_trans_log_dquot(xqc->sc->tp, dq); xfs_trans_log_dquot(xqc->sc->tp, dq);
return xrep_trans_commit(xqc->sc);
/*
* Transaction commit unlocks the dquot, so we must re-lock it so that
* the caller can put the reference (which apparently requires a locked
* dquot).
*/
error = xrep_trans_commit(xqc->sc);
xfs_dqlock(dq);
return error;
out_unlock: out_unlock:
mutex_unlock(&xqc->lock); mutex_unlock(&xqc->lock);
out_cancel: out_cancel:
xchk_trans_cancel(xqc->sc); xchk_trans_cancel(xqc->sc);
/* Re-lock the dquot so the caller can put the reference. */
xfs_dqlock(dq);
return error; return error;
} }
@ -156,7 +143,7 @@ xqcheck_commit_dqtype(
xchk_dqiter_init(&cursor, sc, dqtype); xchk_dqiter_init(&cursor, sc, dqtype);
while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) {
error = xqcheck_commit_dquot(xqc, dqtype, dq); error = xqcheck_commit_dquot(xqc, dqtype, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
break; break;
} }
@ -187,7 +174,7 @@ xqcheck_commit_dqtype(
return error; return error;
error = xqcheck_commit_dquot(xqc, dqtype, dq); error = xqcheck_commit_dquot(xqc, dqtype, dq);
xfs_qm_dqput(dq); xfs_qm_dqrele(dq);
if (error) if (error)
return error; return error;

View File

@ -31,7 +31,7 @@
* *
* ip->i_lock * ip->i_lock
* qi->qi_tree_lock * qi->qi_tree_lock
* dquot->q_qlock (xfs_dqlock() and friends) * dquot->q_qlock
* dquot->q_flush (xfs_dqflock() and friends) * dquot->q_flush (xfs_dqflock() and friends)
* qi->qi_lru_lock * qi->qi_lru_lock
* *
@ -801,10 +801,11 @@ xfs_dq_get_next_id(
static struct xfs_dquot * static struct xfs_dquot *
xfs_qm_dqget_cache_lookup( xfs_qm_dqget_cache_lookup(
struct xfs_mount *mp, struct xfs_mount *mp,
struct xfs_quotainfo *qi, xfs_dqid_t id,
struct radix_tree_root *tree, xfs_dqtype_t type)
xfs_dqid_t id)
{ {
struct xfs_quotainfo *qi = mp->m_quotainfo;
struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
struct xfs_dquot *dqp; struct xfs_dquot *dqp;
restart: restart:
@ -816,16 +817,12 @@ xfs_qm_dqget_cache_lookup(
return NULL; return NULL;
} }
xfs_dqlock(dqp); if (!lockref_get_not_dead(&dqp->q_lockref)) {
if (dqp->q_flags & XFS_DQFLAG_FREEING) {
xfs_dqunlock(dqp);
mutex_unlock(&qi->qi_tree_lock); mutex_unlock(&qi->qi_tree_lock);
trace_xfs_dqget_freeing(dqp); trace_xfs_dqget_freeing(dqp);
delay(1); delay(1);
goto restart; goto restart;
} }
dqp->q_nrefs++;
mutex_unlock(&qi->qi_tree_lock); mutex_unlock(&qi->qi_tree_lock);
trace_xfs_dqget_hit(dqp); trace_xfs_dqget_hit(dqp);
@ -836,8 +833,7 @@ xfs_qm_dqget_cache_lookup(
/* /*
* Try to insert a new dquot into the in-core cache. If an error occurs the * Try to insert a new dquot into the in-core cache. If an error occurs the
* caller should throw away the dquot and start over. Otherwise, the dquot * caller should throw away the dquot and start over. Otherwise, the dquot
* is returned locked (and held by the cache) as if there had been a cache * is returned (and held by the cache) as if there had been a cache hit.
* hit.
* *
* The insert needs to be done under memalloc_nofs context because the radix * The insert needs to be done under memalloc_nofs context because the radix
* tree can do memory allocation during insert. The qi->qi_tree_lock is taken in * tree can do memory allocation during insert. The qi->qi_tree_lock is taken in
@ -848,11 +844,12 @@ xfs_qm_dqget_cache_lookup(
static int static int
xfs_qm_dqget_cache_insert( xfs_qm_dqget_cache_insert(
struct xfs_mount *mp, struct xfs_mount *mp,
struct xfs_quotainfo *qi,
struct radix_tree_root *tree,
xfs_dqid_t id, xfs_dqid_t id,
xfs_dqtype_t type,
struct xfs_dquot *dqp) struct xfs_dquot *dqp)
{ {
struct xfs_quotainfo *qi = mp->m_quotainfo;
struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
unsigned int nofs_flags; unsigned int nofs_flags;
int error; int error;
@ -860,14 +857,11 @@ xfs_qm_dqget_cache_insert(
mutex_lock(&qi->qi_tree_lock); mutex_lock(&qi->qi_tree_lock);
error = radix_tree_insert(tree, id, dqp); error = radix_tree_insert(tree, id, dqp);
if (unlikely(error)) { if (unlikely(error)) {
/* Duplicate found! Caller must try again. */
trace_xfs_dqget_dup(dqp); trace_xfs_dqget_dup(dqp);
goto out_unlock; goto out_unlock;
} }
/* Return a locked dquot to the caller, with a reference taken. */ lockref_init(&dqp->q_lockref);
xfs_dqlock(dqp);
dqp->q_nrefs = 1;
qi->qi_dquots++; qi->qi_dquots++;
out_unlock: out_unlock:
@ -903,7 +897,7 @@ xfs_qm_dqget_checks(
/* /*
* Given the file system, id, and type (UDQUOT/GDQUOT/PDQUOT), return a * Given the file system, id, and type (UDQUOT/GDQUOT/PDQUOT), return a
* locked dquot, doing an allocation (if requested) as needed. * dquot, doing an allocation (if requested) as needed.
*/ */
int int
xfs_qm_dqget( xfs_qm_dqget(
@ -913,8 +907,6 @@ xfs_qm_dqget(
bool can_alloc, bool can_alloc,
struct xfs_dquot **O_dqpp) struct xfs_dquot **O_dqpp)
{ {
struct xfs_quotainfo *qi = mp->m_quotainfo;
struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
struct xfs_dquot *dqp; struct xfs_dquot *dqp;
int error; int error;
@ -923,28 +915,30 @@ xfs_qm_dqget(
return error; return error;
restart: restart:
dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); dqp = xfs_qm_dqget_cache_lookup(mp, id, type);
if (dqp) { if (dqp)
*O_dqpp = dqp; goto found;
return 0;
}
error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp); error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp);
if (error) if (error)
return error; return error;
error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); error = xfs_qm_dqget_cache_insert(mp, id, type, dqp);
if (error) { if (error) {
/*
* Duplicate found. Just throw away the new dquot and start
* over.
*/
xfs_qm_dqdestroy(dqp); xfs_qm_dqdestroy(dqp);
XFS_STATS_INC(mp, xs_qm_dquot_dups); if (error == -EEXIST) {
goto restart; /*
* Duplicate found. Just throw away the new dquot and
* start over.
*/
XFS_STATS_INC(mp, xs_qm_dquot_dups);
goto restart;
}
return error;
} }
trace_xfs_dqget_miss(dqp); trace_xfs_dqget_miss(dqp);
found:
*O_dqpp = dqp; *O_dqpp = dqp;
return 0; return 0;
} }
@ -999,15 +993,16 @@ xfs_qm_dqget_inode(
struct xfs_inode *ip, struct xfs_inode *ip,
xfs_dqtype_t type, xfs_dqtype_t type,
bool can_alloc, bool can_alloc,
struct xfs_dquot **O_dqpp) struct xfs_dquot **dqpp)
{ {
struct xfs_mount *mp = ip->i_mount; struct xfs_mount *mp = ip->i_mount;
struct xfs_quotainfo *qi = mp->m_quotainfo;
struct radix_tree_root *tree = xfs_dquot_tree(qi, type);
struct xfs_dquot *dqp; struct xfs_dquot *dqp;
xfs_dqid_t id; xfs_dqid_t id;
int error; int error;
ASSERT(!*dqpp);
xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
error = xfs_qm_dqget_checks(mp, type); error = xfs_qm_dqget_checks(mp, type);
if (error) if (error)
return error; return error;
@ -1019,11 +1014,9 @@ xfs_qm_dqget_inode(
id = xfs_qm_id_for_quotatype(ip, type); id = xfs_qm_id_for_quotatype(ip, type);
restart: restart:
dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); dqp = xfs_qm_dqget_cache_lookup(mp, id, type);
if (dqp) { if (dqp)
*O_dqpp = dqp; goto found;
return 0;
}
/* /*
* Dquot cache miss. We don't want to keep the inode lock across * Dquot cache miss. We don't want to keep the inode lock across
@ -1049,7 +1042,6 @@ xfs_qm_dqget_inode(
if (dqp1) { if (dqp1) {
xfs_qm_dqdestroy(dqp); xfs_qm_dqdestroy(dqp);
dqp = dqp1; dqp = dqp1;
xfs_dqlock(dqp);
goto dqret; goto dqret;
} }
} else { } else {
@ -1058,21 +1050,26 @@ xfs_qm_dqget_inode(
return -ESRCH; return -ESRCH;
} }
error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); error = xfs_qm_dqget_cache_insert(mp, id, type, dqp);
if (error) { if (error) {
/*
* Duplicate found. Just throw away the new dquot and start
* over.
*/
xfs_qm_dqdestroy(dqp); xfs_qm_dqdestroy(dqp);
XFS_STATS_INC(mp, xs_qm_dquot_dups); if (error == -EEXIST) {
goto restart; /*
* Duplicate found. Just throw away the new dquot and
* start over.
*/
XFS_STATS_INC(mp, xs_qm_dquot_dups);
goto restart;
}
return error;
} }
dqret: dqret:
xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
trace_xfs_dqget_miss(dqp); trace_xfs_dqget_miss(dqp);
*O_dqpp = dqp; found:
trace_xfs_dqattach_get(dqp);
*dqpp = dqp;
return 0; return 0;
} }
@ -1098,45 +1095,21 @@ xfs_qm_dqget_next(
else if (error != 0) else if (error != 0)
break; break;
mutex_lock(&dqp->q_qlock);
if (!XFS_IS_DQUOT_UNINITIALIZED(dqp)) { if (!XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
*dqpp = dqp; *dqpp = dqp;
return 0; return 0;
} }
xfs_qm_dqput(dqp); mutex_unlock(&dqp->q_qlock);
xfs_qm_dqrele(dqp);
} }
return error; return error;
} }
/* /*
* Release a reference to the dquot (decrement ref-count) and unlock it. * Release a reference to the dquot.
*
* If there is a group quota attached to this dquot, carefully release that
* too without tripping over deadlocks'n'stuff.
*/
void
xfs_qm_dqput(
struct xfs_dquot *dqp)
{
ASSERT(dqp->q_nrefs > 0);
ASSERT(XFS_DQ_IS_LOCKED(dqp));
trace_xfs_dqput(dqp);
if (--dqp->q_nrefs == 0) {
struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo;
trace_xfs_dqput_free(dqp);
if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru))
XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused);
}
xfs_dqunlock(dqp);
}
/*
* Release a dquot. Flush it if dirty, then dqput() it.
* dquot must not be locked.
*/ */
void void
xfs_qm_dqrele( xfs_qm_dqrele(
@ -1147,14 +1120,16 @@ xfs_qm_dqrele(
trace_xfs_dqrele(dqp); trace_xfs_dqrele(dqp);
xfs_dqlock(dqp); if (lockref_put_or_lock(&dqp->q_lockref))
/* return;
* We don't care to flush it if the dquot is dirty here. if (!--dqp->q_lockref.count) {
* That will create stutters that we want to avoid. struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo;
* Instead we do a delayed write when we try to reclaim
* a dirty dquot. Also xfs_sync will take part of the burden... trace_xfs_dqrele_free(dqp);
*/ if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru))
xfs_qm_dqput(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused);
}
spin_unlock(&dqp->q_lockref.lock);
} }
/* /*

View File

@ -71,7 +71,7 @@ struct xfs_dquot {
xfs_dqtype_t q_type; xfs_dqtype_t q_type;
uint16_t q_flags; uint16_t q_flags;
xfs_dqid_t q_id; xfs_dqid_t q_id;
uint q_nrefs; struct lockref q_lockref;
int q_bufoffset; int q_bufoffset;
xfs_daddr_t q_blkno; xfs_daddr_t q_blkno;
xfs_fileoff_t q_fileoffset; xfs_fileoff_t q_fileoffset;
@ -121,21 +121,6 @@ static inline void xfs_dqfunlock(struct xfs_dquot *dqp)
complete(&dqp->q_flush); complete(&dqp->q_flush);
} }
static inline int xfs_dqlock_nowait(struct xfs_dquot *dqp)
{
return mutex_trylock(&dqp->q_qlock);
}
static inline void xfs_dqlock(struct xfs_dquot *dqp)
{
mutex_lock(&dqp->q_qlock);
}
static inline void xfs_dqunlock(struct xfs_dquot *dqp)
{
mutex_unlock(&dqp->q_qlock);
}
static inline int static inline int
xfs_dquot_type(const struct xfs_dquot *dqp) xfs_dquot_type(const struct xfs_dquot *dqp)
{ {
@ -233,7 +218,6 @@ int xfs_qm_dqget_next(struct xfs_mount *mp, xfs_dqid_t id,
int xfs_qm_dqget_uncached(struct xfs_mount *mp, int xfs_qm_dqget_uncached(struct xfs_mount *mp,
xfs_dqid_t id, xfs_dqtype_t type, xfs_dqid_t id, xfs_dqtype_t type,
struct xfs_dquot **dqpp); struct xfs_dquot **dqpp);
void xfs_qm_dqput(struct xfs_dquot *dqp);
void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *);
void xfs_dqlockn(struct xfs_dqtrx *q); void xfs_dqlockn(struct xfs_dqtrx *q);
@ -246,9 +230,7 @@ void xfs_dquot_detach_buf(struct xfs_dquot *dqp);
static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp) static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp)
{ {
xfs_dqlock(dqp); lockref_get(&dqp->q_lockref);
dqp->q_nrefs++;
xfs_dqunlock(dqp);
return dqp; return dqp;
} }

View File

@ -132,7 +132,7 @@ xfs_qm_dquot_logitem_push(
if (atomic_read(&dqp->q_pincount) > 0) if (atomic_read(&dqp->q_pincount) > 0)
return XFS_ITEM_PINNED; return XFS_ITEM_PINNED;
if (!xfs_dqlock_nowait(dqp)) if (!mutex_trylock(&dqp->q_qlock))
return XFS_ITEM_LOCKED; return XFS_ITEM_LOCKED;
/* /*
@ -177,7 +177,7 @@ xfs_qm_dquot_logitem_push(
out_relock_ail: out_relock_ail:
spin_lock(&lip->li_ailp->ail_lock); spin_lock(&lip->li_ailp->ail_lock);
out_unlock: out_unlock:
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
return rval; return rval;
} }
@ -195,7 +195,7 @@ xfs_qm_dquot_logitem_release(
* transaction layer, within trans_commit. Hence, no LI_HOLD flag * transaction layer, within trans_commit. Hence, no LI_HOLD flag
* for the logitem. * for the logitem.
*/ */
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
} }
STATIC void STATIC void

View File

@ -358,7 +358,7 @@ xfs_reinit_inode(
static int static int
xfs_iget_recycle( xfs_iget_recycle(
struct xfs_perag *pag, struct xfs_perag *pag,
struct xfs_inode *ip) __releases(&ip->i_flags_lock) struct xfs_inode *ip)
{ {
struct xfs_mount *mp = ip->i_mount; struct xfs_mount *mp = ip->i_mount;
struct inode *inode = VFS_I(ip); struct inode *inode = VFS_I(ip);
@ -366,20 +366,6 @@ xfs_iget_recycle(
trace_xfs_iget_recycle(ip); trace_xfs_iget_recycle(ip);
if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
return -EAGAIN;
/*
* We need to make it look like the inode is being reclaimed to prevent
* the actual reclaim workers from stomping over us while we recycle
* the inode. We can't clear the radix tree tag yet as it requires
* pag_ici_lock to be held exclusive.
*/
ip->i_flags |= XFS_IRECLAIM;
spin_unlock(&ip->i_flags_lock);
rcu_read_unlock();
ASSERT(!rwsem_is_locked(&inode->i_rwsem)); ASSERT(!rwsem_is_locked(&inode->i_rwsem));
error = xfs_reinit_inode(mp, inode); error = xfs_reinit_inode(mp, inode);
xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_iunlock(ip, XFS_ILOCK_EXCL);
@ -576,10 +562,19 @@ xfs_iget_cache_hit(
/* The inode fits the selection criteria; process it. */ /* The inode fits the selection criteria; process it. */
if (ip->i_flags & XFS_IRECLAIMABLE) { if (ip->i_flags & XFS_IRECLAIMABLE) {
/* Drops i_flags_lock and RCU read lock. */ /*
error = xfs_iget_recycle(pag, ip); * We need to make it look like the inode is being reclaimed to
if (error == -EAGAIN) * prevent the actual reclaim workers from stomping over us
* while we recycle the inode. We can't clear the radix tree
* tag yet as it requires pag_ici_lock to be held exclusive.
*/
if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
goto out_skip; goto out_skip;
ip->i_flags |= XFS_IRECLAIM;
spin_unlock(&ip->i_flags_lock);
rcu_read_unlock();
error = xfs_iget_recycle(pag, ip);
if (error) if (error)
return error; return error;
} else { } else {

View File

@ -534,8 +534,8 @@ xlog_state_release_iclog(
*/ */
if ((iclog->ic_state == XLOG_STATE_WANT_SYNC || if ((iclog->ic_state == XLOG_STATE_WANT_SYNC ||
(iclog->ic_flags & XLOG_ICL_NEED_FUA)) && (iclog->ic_flags & XLOG_ICL_NEED_FUA)) &&
!iclog->ic_header.h_tail_lsn) { !iclog->ic_header->h_tail_lsn) {
iclog->ic_header.h_tail_lsn = iclog->ic_header->h_tail_lsn =
cpu_to_be64(atomic64_read(&log->l_tail_lsn)); cpu_to_be64(atomic64_read(&log->l_tail_lsn));
} }
@ -1279,11 +1279,12 @@ xlog_get_iclog_buffer_size(
log->l_iclog_size = mp->m_logbsize; log->l_iclog_size = mp->m_logbsize;
/* /*
* # headers = size / 32k - one header holds cycles from 32k of data. * Combined size of the log record headers. The first 32k cycles
* are stored directly in the xlog_rec_header, the rest in the
* variable number of xlog_rec_ext_headers at its end.
*/ */
log->l_iclog_heads = log->l_iclog_hsize = struct_size(log->l_iclog->ic_header, h_ext,
DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE); DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE) - 1);
log->l_iclog_hsize = log->l_iclog_heads << BBSHIFT;
} }
void void
@ -1367,9 +1368,8 @@ xlog_alloc_log(
int num_bblks) int num_bblks)
{ {
struct xlog *log; struct xlog *log;
xlog_rec_header_t *head; struct xlog_in_core **iclogp;
xlog_in_core_t **iclogp; struct xlog_in_core *iclog, *prev_iclog = NULL;
xlog_in_core_t *iclog, *prev_iclog=NULL;
int i; int i;
int error = -ENOMEM; int error = -ENOMEM;
uint log2_size = 0; uint log2_size = 0;
@ -1436,13 +1436,6 @@ xlog_alloc_log(
init_waitqueue_head(&log->l_flush_wait); init_waitqueue_head(&log->l_flush_wait);
iclogp = &log->l_iclog; iclogp = &log->l_iclog;
/*
* The amount of memory to allocate for the iclog structure is
* rather funky due to the way the structure is defined. It is
* done this way so that we can use different sizes for machines
* with different amounts of memory. See the definition of
* xlog_in_core_t in xfs_log_priv.h for details.
*/
ASSERT(log->l_iclog_size >= 4096); ASSERT(log->l_iclog_size >= 4096);
for (i = 0; i < log->l_iclog_bufs; i++) { for (i = 0; i < log->l_iclog_bufs; i++) {
size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) * size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) *
@ -1457,26 +1450,25 @@ xlog_alloc_log(
iclog->ic_prev = prev_iclog; iclog->ic_prev = prev_iclog;
prev_iclog = iclog; prev_iclog = iclog;
iclog->ic_data = kvzalloc(log->l_iclog_size, iclog->ic_header = kvzalloc(log->l_iclog_size,
GFP_KERNEL | __GFP_RETRY_MAYFAIL); GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!iclog->ic_data) if (!iclog->ic_header)
goto out_free_iclog; goto out_free_iclog;
head = &iclog->ic_header; iclog->ic_header->h_magicno =
memset(head, 0, sizeof(xlog_rec_header_t)); cpu_to_be32(XLOG_HEADER_MAGIC_NUM);
head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); iclog->ic_header->h_version = cpu_to_be32(
head->h_version = cpu_to_be32(
xfs_has_logv2(log->l_mp) ? 2 : 1); xfs_has_logv2(log->l_mp) ? 2 : 1);
head->h_size = cpu_to_be32(log->l_iclog_size); iclog->ic_header->h_size = cpu_to_be32(log->l_iclog_size);
/* new fields */ iclog->ic_header->h_fmt = cpu_to_be32(XLOG_FMT);
head->h_fmt = cpu_to_be32(XLOG_FMT); memcpy(&iclog->ic_header->h_fs_uuid, &mp->m_sb.sb_uuid,
memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); sizeof(iclog->ic_header->h_fs_uuid));
iclog->ic_datap = (void *)iclog->ic_header + log->l_iclog_hsize;
iclog->ic_size = log->l_iclog_size - log->l_iclog_hsize; iclog->ic_size = log->l_iclog_size - log->l_iclog_hsize;
iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_state = XLOG_STATE_ACTIVE;
iclog->ic_log = log; iclog->ic_log = log;
atomic_set(&iclog->ic_refcnt, 0); atomic_set(&iclog->ic_refcnt, 0);
INIT_LIST_HEAD(&iclog->ic_callbacks); INIT_LIST_HEAD(&iclog->ic_callbacks);
iclog->ic_datap = (void *)iclog->ic_data + log->l_iclog_hsize;
init_waitqueue_head(&iclog->ic_force_wait); init_waitqueue_head(&iclog->ic_force_wait);
init_waitqueue_head(&iclog->ic_write_wait); init_waitqueue_head(&iclog->ic_write_wait);
@ -1504,7 +1496,7 @@ xlog_alloc_log(
out_free_iclog: out_free_iclog:
for (iclog = log->l_iclog; iclog; iclog = prev_iclog) { for (iclog = log->l_iclog; iclog; iclog = prev_iclog) {
prev_iclog = iclog->ic_next; prev_iclog = iclog->ic_next;
kvfree(iclog->ic_data); kvfree(iclog->ic_header);
kfree(iclog); kfree(iclog);
if (prev_iclog == log->l_iclog) if (prev_iclog == log->l_iclog)
break; break;
@ -1524,36 +1516,19 @@ xlog_pack_data(
struct xlog_in_core *iclog, struct xlog_in_core *iclog,
int roundoff) int roundoff)
{ {
int i, j, k; struct xlog_rec_header *rhead = iclog->ic_header;
int size = iclog->ic_offset + roundoff; __be32 cycle_lsn = CYCLE_LSN_DISK(rhead->h_lsn);
__be32 cycle_lsn; char *dp = iclog->ic_datap;
char *dp; int i;
cycle_lsn = CYCLE_LSN_DISK(iclog->ic_header.h_lsn); for (i = 0; i < BTOBB(iclog->ic_offset + roundoff); i++) {
*xlog_cycle_data(rhead, i) = *(__be32 *)dp;
dp = iclog->ic_datap;
for (i = 0; i < BTOBB(size); i++) {
if (i >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE))
break;
iclog->ic_header.h_cycle_data[i] = *(__be32 *)dp;
*(__be32 *)dp = cycle_lsn; *(__be32 *)dp = cycle_lsn;
dp += BBSIZE; dp += BBSIZE;
} }
if (xfs_has_logv2(log->l_mp)) { for (i = 0; i < (log->l_iclog_hsize >> BBSHIFT) - 1; i++)
xlog_in_core_2_t *xhdr = iclog->ic_data; rhead->h_ext[i].xh_cycle = cycle_lsn;
for ( ; i < BTOBB(size); i++) {
j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
xhdr[j].hic_xheader.xh_cycle_data[k] = *(__be32 *)dp;
*(__be32 *)dp = cycle_lsn;
dp += BBSIZE;
}
for (i = 1; i < log->l_iclog_heads; i++)
xhdr[i].hic_xheader.xh_cycle = cycle_lsn;
}
} }
/* /*
@ -1578,16 +1553,11 @@ xlog_cksum(
/* ... then for additional cycle data for v2 logs ... */ /* ... then for additional cycle data for v2 logs ... */
if (xfs_has_logv2(log->l_mp)) { if (xfs_has_logv2(log->l_mp)) {
union xlog_in_core2 *xhdr = (union xlog_in_core2 *)rhead; int xheads, i;
int i;
int xheads;
xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE); xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE) - 1;
for (i = 0; i < xheads; i++)
for (i = 1; i < xheads; i++) { crc = crc32c(crc, &rhead->h_ext[i], XLOG_REC_EXT_SIZE);
crc = crc32c(crc, &xhdr[i].hic_xheader,
sizeof(struct xlog_rec_ext_header));
}
} }
/* ... and finally for the payload */ /* ... and finally for the payload */
@ -1671,11 +1641,11 @@ xlog_write_iclog(
iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
if (is_vmalloc_addr(iclog->ic_data)) { if (is_vmalloc_addr(iclog->ic_header)) {
if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_data, count)) if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_header, count))
goto shutdown; goto shutdown;
} else { } else {
bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_data, count); bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_header, count);
} }
/* /*
@ -1804,19 +1774,19 @@ xlog_sync(
size = iclog->ic_offset; size = iclog->ic_offset;
if (xfs_has_logv2(log->l_mp)) if (xfs_has_logv2(log->l_mp))
size += roundoff; size += roundoff;
iclog->ic_header.h_len = cpu_to_be32(size); iclog->ic_header->h_len = cpu_to_be32(size);
XFS_STATS_INC(log->l_mp, xs_log_writes); XFS_STATS_INC(log->l_mp, xs_log_writes);
XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count)); XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count));
bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn)); bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header->h_lsn));
/* Do we need to split this write into 2 parts? */ /* Do we need to split this write into 2 parts? */
if (bno + BTOBB(count) > log->l_logBBsize) if (bno + BTOBB(count) > log->l_logBBsize)
xlog_split_iclog(log, &iclog->ic_header, bno, count); xlog_split_iclog(log, iclog->ic_header, bno, count);
/* calculcate the checksum */ /* calculcate the checksum */
iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, iclog->ic_header->h_crc = xlog_cksum(log, iclog->ic_header,
iclog->ic_datap, XLOG_REC_SIZE, size); iclog->ic_datap, XLOG_REC_SIZE, size);
/* /*
* Intentionally corrupt the log record CRC based on the error injection * Intentionally corrupt the log record CRC based on the error injection
@ -1827,11 +1797,11 @@ xlog_sync(
*/ */
#ifdef DEBUG #ifdef DEBUG
if (XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) { if (XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) {
iclog->ic_header.h_crc &= cpu_to_le32(0xAAAAAAAA); iclog->ic_header->h_crc &= cpu_to_le32(0xAAAAAAAA);
iclog->ic_fail_crc = true; iclog->ic_fail_crc = true;
xfs_warn(log->l_mp, xfs_warn(log->l_mp,
"Intentionally corrupted log record at LSN 0x%llx. Shutdown imminent.", "Intentionally corrupted log record at LSN 0x%llx. Shutdown imminent.",
be64_to_cpu(iclog->ic_header.h_lsn)); be64_to_cpu(iclog->ic_header->h_lsn));
} }
#endif #endif
xlog_verify_iclog(log, iclog, count); xlog_verify_iclog(log, iclog, count);
@ -1843,10 +1813,10 @@ xlog_sync(
*/ */
STATIC void STATIC void
xlog_dealloc_log( xlog_dealloc_log(
struct xlog *log) struct xlog *log)
{ {
xlog_in_core_t *iclog, *next_iclog; struct xlog_in_core *iclog, *next_iclog;
int i; int i;
/* /*
* Destroy the CIL after waiting for iclog IO completion because an * Destroy the CIL after waiting for iclog IO completion because an
@ -1858,7 +1828,7 @@ xlog_dealloc_log(
iclog = log->l_iclog; iclog = log->l_iclog;
for (i = 0; i < log->l_iclog_bufs; i++) { for (i = 0; i < log->l_iclog_bufs; i++) {
next_iclog = iclog->ic_next; next_iclog = iclog->ic_next;
kvfree(iclog->ic_data); kvfree(iclog->ic_header);
kfree(iclog); kfree(iclog);
iclog = next_iclog; iclog = next_iclog;
} }
@ -1880,7 +1850,7 @@ xlog_state_finish_copy(
{ {
lockdep_assert_held(&log->l_icloglock); lockdep_assert_held(&log->l_icloglock);
be32_add_cpu(&iclog->ic_header.h_num_logops, record_cnt); be32_add_cpu(&iclog->ic_header->h_num_logops, record_cnt);
iclog->ic_offset += copy_bytes; iclog->ic_offset += copy_bytes;
} }
@ -2303,7 +2273,7 @@ xlog_state_activate_iclog(
* We don't need to cover the dummy. * We don't need to cover the dummy.
*/ */
if (*iclogs_changed == 0 && if (*iclogs_changed == 0 &&
iclog->ic_header.h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) { iclog->ic_header->h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) {
*iclogs_changed = 1; *iclogs_changed = 1;
} else { } else {
/* /*
@ -2315,11 +2285,11 @@ xlog_state_activate_iclog(
iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_state = XLOG_STATE_ACTIVE;
iclog->ic_offset = 0; iclog->ic_offset = 0;
iclog->ic_header.h_num_logops = 0; iclog->ic_header->h_num_logops = 0;
memset(iclog->ic_header.h_cycle_data, 0, memset(iclog->ic_header->h_cycle_data, 0,
sizeof(iclog->ic_header.h_cycle_data)); sizeof(iclog->ic_header->h_cycle_data));
iclog->ic_header.h_lsn = 0; iclog->ic_header->h_lsn = 0;
iclog->ic_header.h_tail_lsn = 0; iclog->ic_header->h_tail_lsn = 0;
} }
/* /*
@ -2411,7 +2381,7 @@ xlog_get_lowest_lsn(
iclog->ic_state == XLOG_STATE_DIRTY) iclog->ic_state == XLOG_STATE_DIRTY)
continue; continue;
lsn = be64_to_cpu(iclog->ic_header.h_lsn); lsn = be64_to_cpu(iclog->ic_header->h_lsn);
if ((lsn && !lowest_lsn) || XFS_LSN_CMP(lsn, lowest_lsn) < 0) if ((lsn && !lowest_lsn) || XFS_LSN_CMP(lsn, lowest_lsn) < 0)
lowest_lsn = lsn; lowest_lsn = lsn;
} while ((iclog = iclog->ic_next) != log->l_iclog); } while ((iclog = iclog->ic_next) != log->l_iclog);
@ -2446,7 +2416,7 @@ xlog_state_iodone_process_iclog(
* If this is not the lowest lsn iclog, then we will leave it * If this is not the lowest lsn iclog, then we will leave it
* for another completion to process. * for another completion to process.
*/ */
header_lsn = be64_to_cpu(iclog->ic_header.h_lsn); header_lsn = be64_to_cpu(iclog->ic_header->h_lsn);
lowest_lsn = xlog_get_lowest_lsn(log); lowest_lsn = xlog_get_lowest_lsn(log);
if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0)
return false; return false;
@ -2609,9 +2579,9 @@ xlog_state_get_iclog_space(
struct xlog_ticket *ticket, struct xlog_ticket *ticket,
int *logoffsetp) int *logoffsetp)
{ {
int log_offset; int log_offset;
xlog_rec_header_t *head; struct xlog_rec_header *head;
xlog_in_core_t *iclog; struct xlog_in_core *iclog;
restart: restart:
spin_lock(&log->l_icloglock); spin_lock(&log->l_icloglock);
@ -2629,7 +2599,7 @@ xlog_state_get_iclog_space(
goto restart; goto restart;
} }
head = &iclog->ic_header; head = iclog->ic_header;
atomic_inc(&iclog->ic_refcnt); /* prevents sync */ atomic_inc(&iclog->ic_refcnt); /* prevents sync */
log_offset = iclog->ic_offset; log_offset = iclog->ic_offset;
@ -2794,7 +2764,7 @@ xlog_state_switch_iclogs(
if (!eventual_size) if (!eventual_size)
eventual_size = iclog->ic_offset; eventual_size = iclog->ic_offset;
iclog->ic_state = XLOG_STATE_WANT_SYNC; iclog->ic_state = XLOG_STATE_WANT_SYNC;
iclog->ic_header.h_prev_block = cpu_to_be32(log->l_prev_block); iclog->ic_header->h_prev_block = cpu_to_be32(log->l_prev_block);
log->l_prev_block = log->l_curr_block; log->l_prev_block = log->l_curr_block;
log->l_prev_cycle = log->l_curr_cycle; log->l_prev_cycle = log->l_curr_cycle;
@ -2838,7 +2808,7 @@ xlog_force_and_check_iclog(
struct xlog_in_core *iclog, struct xlog_in_core *iclog,
bool *completed) bool *completed)
{ {
xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn);
int error; int error;
*completed = false; *completed = false;
@ -2850,7 +2820,7 @@ xlog_force_and_check_iclog(
* If the iclog has already been completed and reused the header LSN * If the iclog has already been completed and reused the header LSN
* will have been rewritten by completion * will have been rewritten by completion
*/ */
if (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) if (be64_to_cpu(iclog->ic_header->h_lsn) != lsn)
*completed = true; *completed = true;
return 0; return 0;
} }
@ -2983,7 +2953,7 @@ xlog_force_lsn(
goto out_error; goto out_error;
iclog = log->l_iclog; iclog = log->l_iclog;
while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) { while (be64_to_cpu(iclog->ic_header->h_lsn) != lsn) {
trace_xlog_iclog_force_lsn(iclog, _RET_IP_); trace_xlog_iclog_force_lsn(iclog, _RET_IP_);
iclog = iclog->ic_next; iclog = iclog->ic_next;
if (iclog == log->l_iclog) if (iclog == log->l_iclog)
@ -3249,7 +3219,7 @@ xlog_verify_dump_tail(
{ {
xfs_alert(log->l_mp, xfs_alert(log->l_mp,
"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x", "ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x",
iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1, iclog ? be64_to_cpu(iclog->ic_header->h_tail_lsn) : -1,
atomic64_read(&log->l_tail_lsn), atomic64_read(&log->l_tail_lsn),
log->l_ailp->ail_head_lsn, log->l_ailp->ail_head_lsn,
log->l_curr_cycle, log->l_curr_block, log->l_curr_cycle, log->l_curr_block,
@ -3268,7 +3238,7 @@ xlog_verify_tail_lsn(
struct xlog *log, struct xlog *log,
struct xlog_in_core *iclog) struct xlog_in_core *iclog)
{ {
xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn); xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header->h_tail_lsn);
int blocks; int blocks;
if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
@ -3322,13 +3292,12 @@ xlog_verify_iclog(
struct xlog_in_core *iclog, struct xlog_in_core *iclog,
int count) int count)
{ {
struct xlog_op_header *ophead; struct xlog_rec_header *rhead = iclog->ic_header;
xlog_in_core_t *icptr; struct xlog_in_core *icptr;
xlog_in_core_2_t *xhdr; void *base_ptr, *ptr;
void *base_ptr, *ptr, *p;
ptrdiff_t field_offset; ptrdiff_t field_offset;
uint8_t clientid; uint8_t clientid;
int len, i, j, k, op_len; int len, i, op_len;
int idx; int idx;
/* check validity of iclog pointers */ /* check validity of iclog pointers */
@ -3342,11 +3311,10 @@ xlog_verify_iclog(
spin_unlock(&log->l_icloglock); spin_unlock(&log->l_icloglock);
/* check log magic numbers */ /* check log magic numbers */
if (iclog->ic_header.h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) if (rhead->h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM))
xfs_emerg(log->l_mp, "%s: invalid magic num", __func__); xfs_emerg(log->l_mp, "%s: invalid magic num", __func__);
base_ptr = ptr = &iclog->ic_header; base_ptr = ptr = rhead;
p = &iclog->ic_header;
for (ptr += BBSIZE; ptr < base_ptr + count; ptr += BBSIZE) { for (ptr += BBSIZE; ptr < base_ptr + count; ptr += BBSIZE) {
if (*(__be32 *)ptr == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) if (*(__be32 *)ptr == cpu_to_be32(XLOG_HEADER_MAGIC_NUM))
xfs_emerg(log->l_mp, "%s: unexpected magic num", xfs_emerg(log->l_mp, "%s: unexpected magic num",
@ -3354,29 +3322,19 @@ xlog_verify_iclog(
} }
/* check fields */ /* check fields */
len = be32_to_cpu(iclog->ic_header.h_num_logops); len = be32_to_cpu(rhead->h_num_logops);
base_ptr = ptr = iclog->ic_datap; base_ptr = ptr = iclog->ic_datap;
ophead = ptr;
xhdr = iclog->ic_data;
for (i = 0; i < len; i++) { for (i = 0; i < len; i++) {
ophead = ptr; struct xlog_op_header *ophead = ptr;
void *p = &ophead->oh_clientid;
/* clientid is only 1 byte */ /* clientid is only 1 byte */
p = &ophead->oh_clientid;
field_offset = p - base_ptr; field_offset = p - base_ptr;
if (field_offset & 0x1ff) { if (field_offset & 0x1ff) {
clientid = ophead->oh_clientid; clientid = ophead->oh_clientid;
} else { } else {
idx = BTOBBT((void *)&ophead->oh_clientid - iclog->ic_datap); idx = BTOBBT((void *)&ophead->oh_clientid - iclog->ic_datap);
if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { clientid = xlog_get_client_id(*xlog_cycle_data(rhead, idx));
j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
clientid = xlog_get_client_id(
xhdr[j].hic_xheader.xh_cycle_data[k]);
} else {
clientid = xlog_get_client_id(
iclog->ic_header.h_cycle_data[idx]);
}
} }
if (clientid != XFS_TRANSACTION && clientid != XFS_LOG) { if (clientid != XFS_TRANSACTION && clientid != XFS_LOG) {
xfs_warn(log->l_mp, xfs_warn(log->l_mp,
@ -3392,13 +3350,7 @@ xlog_verify_iclog(
op_len = be32_to_cpu(ophead->oh_len); op_len = be32_to_cpu(ophead->oh_len);
} else { } else {
idx = BTOBBT((void *)&ophead->oh_len - iclog->ic_datap); idx = BTOBBT((void *)&ophead->oh_len - iclog->ic_datap);
if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { op_len = be32_to_cpu(*xlog_cycle_data(rhead, idx));
j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
op_len = be32_to_cpu(xhdr[j].hic_xheader.xh_cycle_data[k]);
} else {
op_len = be32_to_cpu(iclog->ic_header.h_cycle_data[idx]);
}
} }
ptr += sizeof(struct xlog_op_header) + op_len; ptr += sizeof(struct xlog_op_header) + op_len;
} }
@ -3529,19 +3481,19 @@ xlog_force_shutdown(
STATIC int STATIC int
xlog_iclogs_empty( xlog_iclogs_empty(
struct xlog *log) struct xlog *log)
{ {
xlog_in_core_t *iclog; struct xlog_in_core *iclog = log->l_iclog;
iclog = log->l_iclog;
do { do {
/* endianness does not matter here, zero is zero in /* endianness does not matter here, zero is zero in
* any language. * any language.
*/ */
if (iclog->ic_header.h_num_logops) if (iclog->ic_header->h_num_logops)
return 0; return 0;
iclog = iclog->ic_next; iclog = iclog->ic_next;
} while (iclog != log->l_iclog); } while (iclog != log->l_iclog);
return 1; return 1;
} }

View File

@ -940,7 +940,7 @@ xlog_cil_set_ctx_write_state(
struct xlog_in_core *iclog) struct xlog_in_core *iclog)
{ {
struct xfs_cil *cil = ctx->cil; struct xfs_cil *cil = ctx->cil;
xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn);
ASSERT(!ctx->commit_lsn); ASSERT(!ctx->commit_lsn);
if (!ctx->start_lsn) { if (!ctx->start_lsn) {
@ -1458,9 +1458,9 @@ xlog_cil_push_work(
*/ */
spin_lock(&log->l_icloglock); spin_lock(&log->l_icloglock);
if (ctx->start_lsn != ctx->commit_lsn) { if (ctx->start_lsn != ctx->commit_lsn) {
xfs_lsn_t plsn; xfs_lsn_t plsn = be64_to_cpu(
ctx->commit_iclog->ic_prev->ic_header->h_lsn);
plsn = be64_to_cpu(ctx->commit_iclog->ic_prev->ic_header.h_lsn);
if (plsn && XFS_LSN_CMP(plsn, ctx->commit_lsn) < 0) { if (plsn && XFS_LSN_CMP(plsn, ctx->commit_lsn) < 0) {
/* /*
* Waiting on ic_force_wait orders the completion of * Waiting on ic_force_wait orders the completion of

View File

@ -158,10 +158,8 @@ struct xlog_ticket {
}; };
/* /*
* - A log record header is 512 bytes. There is plenty of room to grow the * In-core log structure.
* xlog_rec_header_t into the reserved space. *
* - ic_data follows, so a write to disk can start at the beginning of
* the iclog.
* - ic_forcewait is used to implement synchronous forcing of the iclog to disk. * - ic_forcewait is used to implement synchronous forcing of the iclog to disk.
* - ic_next is the pointer to the next iclog in the ring. * - ic_next is the pointer to the next iclog in the ring.
* - ic_log is a pointer back to the global log structure. * - ic_log is a pointer back to the global log structure.
@ -183,7 +181,7 @@ struct xlog_ticket {
* We'll put all the read-only and l_icloglock fields in the first cacheline, * We'll put all the read-only and l_icloglock fields in the first cacheline,
* and move everything else out to subsequent cachelines. * and move everything else out to subsequent cachelines.
*/ */
typedef struct xlog_in_core { struct xlog_in_core {
wait_queue_head_t ic_force_wait; wait_queue_head_t ic_force_wait;
wait_queue_head_t ic_write_wait; wait_queue_head_t ic_write_wait;
struct xlog_in_core *ic_next; struct xlog_in_core *ic_next;
@ -198,8 +196,7 @@ typedef struct xlog_in_core {
/* reference counts need their own cacheline */ /* reference counts need their own cacheline */
atomic_t ic_refcnt ____cacheline_aligned_in_smp; atomic_t ic_refcnt ____cacheline_aligned_in_smp;
xlog_in_core_2_t *ic_data; struct xlog_rec_header *ic_header;
#define ic_header ic_data->hic_header
#ifdef DEBUG #ifdef DEBUG
bool ic_fail_crc : 1; bool ic_fail_crc : 1;
#endif #endif
@ -207,7 +204,7 @@ typedef struct xlog_in_core {
struct work_struct ic_end_io_work; struct work_struct ic_end_io_work;
struct bio ic_bio; struct bio ic_bio;
struct bio_vec ic_bvec[]; struct bio_vec ic_bvec[];
} xlog_in_core_t; };
/* /*
* The CIL context is used to aggregate per-transaction details as well be * The CIL context is used to aggregate per-transaction details as well be
@ -409,7 +406,6 @@ struct xlog {
struct list_head *l_buf_cancel_table; struct list_head *l_buf_cancel_table;
struct list_head r_dfops; /* recovered log intent items */ struct list_head r_dfops; /* recovered log intent items */
int l_iclog_hsize; /* size of iclog header */ int l_iclog_hsize; /* size of iclog header */
int l_iclog_heads; /* # of iclog header sectors */
uint l_sectBBsize; /* sector size in BBs (2^n) */ uint l_sectBBsize; /* sector size in BBs (2^n) */
int l_iclog_size; /* size of log in bytes */ int l_iclog_size; /* size of log in bytes */
int l_iclog_bufs; /* number of iclog buffers */ int l_iclog_bufs; /* number of iclog buffers */
@ -422,7 +418,7 @@ struct xlog {
/* waiting for iclog flush */ /* waiting for iclog flush */
int l_covered_state;/* state of "covering disk int l_covered_state;/* state of "covering disk
* log entries" */ * log entries" */
xlog_in_core_t *l_iclog; /* head log queue */ struct xlog_in_core *l_iclog; /* head log queue */
spinlock_t l_icloglock; /* grab to change iclog state */ spinlock_t l_icloglock; /* grab to change iclog state */
int l_curr_cycle; /* Cycle number of log writes */ int l_curr_cycle; /* Cycle number of log writes */
int l_prev_cycle; /* Cycle number before last int l_prev_cycle; /* Cycle number before last
@ -711,4 +707,21 @@ xlog_item_space(
return round_up(nbytes, sizeof(uint64_t)); return round_up(nbytes, sizeof(uint64_t));
} }
/*
* Cycles over XLOG_CYCLE_DATA_SIZE overflow into the extended header that was
* added for v2 logs. Addressing for the cycles array there is off by one,
* because the first batch of cycles is in the original header.
*/
static inline __be32 *xlog_cycle_data(struct xlog_rec_header *rhead, unsigned i)
{
if (i >= XLOG_CYCLE_DATA_SIZE) {
unsigned j = i / XLOG_CYCLE_DATA_SIZE;
unsigned k = i % XLOG_CYCLE_DATA_SIZE;
return &rhead->h_ext[j - 1].xh_cycle_data[k];
}
return &rhead->h_cycle_data[i];
}
#endif /* __XFS_LOG_PRIV_H__ */ #endif /* __XFS_LOG_PRIV_H__ */

View File

@ -190,8 +190,8 @@ xlog_bwrite(
*/ */
STATIC void STATIC void
xlog_header_check_dump( xlog_header_check_dump(
xfs_mount_t *mp, struct xfs_mount *mp,
xlog_rec_header_t *head) struct xlog_rec_header *head)
{ {
xfs_debug(mp, "%s: SB : uuid = %pU, fmt = %d", xfs_debug(mp, "%s: SB : uuid = %pU, fmt = %d",
__func__, &mp->m_sb.sb_uuid, XLOG_FMT); __func__, &mp->m_sb.sb_uuid, XLOG_FMT);
@ -207,8 +207,8 @@ xlog_header_check_dump(
*/ */
STATIC int STATIC int
xlog_header_check_recover( xlog_header_check_recover(
xfs_mount_t *mp, struct xfs_mount *mp,
xlog_rec_header_t *head) struct xlog_rec_header *head)
{ {
ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM));
@ -238,8 +238,8 @@ xlog_header_check_recover(
*/ */
STATIC int STATIC int
xlog_header_check_mount( xlog_header_check_mount(
xfs_mount_t *mp, struct xfs_mount *mp,
xlog_rec_header_t *head) struct xlog_rec_header *head)
{ {
ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM));
@ -400,7 +400,7 @@ xlog_find_verify_log_record(
xfs_daddr_t i; xfs_daddr_t i;
char *buffer; char *buffer;
char *offset = NULL; char *offset = NULL;
xlog_rec_header_t *head = NULL; struct xlog_rec_header *head = NULL;
int error = 0; int error = 0;
int smallmem = 0; int smallmem = 0;
int num_blks = *last_blk - start_blk; int num_blks = *last_blk - start_blk;
@ -437,7 +437,7 @@ xlog_find_verify_log_record(
goto out; goto out;
} }
head = (xlog_rec_header_t *)offset; head = (struct xlog_rec_header *)offset;
if (head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) if (head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM))
break; break;
@ -1237,7 +1237,7 @@ xlog_find_tail(
xfs_daddr_t *head_blk, xfs_daddr_t *head_blk,
xfs_daddr_t *tail_blk) xfs_daddr_t *tail_blk)
{ {
xlog_rec_header_t *rhead; struct xlog_rec_header *rhead;
char *offset = NULL; char *offset = NULL;
char *buffer; char *buffer;
int error; int error;
@ -1487,7 +1487,7 @@ xlog_add_record(
int tail_cycle, int tail_cycle,
int tail_block) int tail_block)
{ {
xlog_rec_header_t *recp = (xlog_rec_header_t *)buf; struct xlog_rec_header *recp = (struct xlog_rec_header *)buf;
memset(buf, 0, BBSIZE); memset(buf, 0, BBSIZE);
recp->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); recp->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM);
@ -2863,23 +2863,12 @@ xlog_unpack_data(
char *dp, char *dp,
struct xlog *log) struct xlog *log)
{ {
int i, j, k; int i;
for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) && for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) {
i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) { *(__be32 *)dp = *xlog_cycle_data(rhead, i);
*(__be32 *)dp = *(__be32 *)&rhead->h_cycle_data[i];
dp += BBSIZE; dp += BBSIZE;
} }
if (xfs_has_logv2(log->l_mp)) {
xlog_in_core_2_t *xhdr = (xlog_in_core_2_t *)rhead;
for ( ; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) {
j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE);
*(__be32 *)dp = xhdr[j].hic_xheader.xh_cycle_data[k];
dp += BBSIZE;
}
}
} }
/* /*
@ -3008,7 +2997,7 @@ xlog_do_recovery_pass(
int pass, int pass,
xfs_daddr_t *first_bad) /* out: first bad log rec */ xfs_daddr_t *first_bad) /* out: first bad log rec */
{ {
xlog_rec_header_t *rhead; struct xlog_rec_header *rhead;
xfs_daddr_t blk_no, rblk_no; xfs_daddr_t blk_no, rblk_no;
xfs_daddr_t rhead_blk; xfs_daddr_t rhead_blk;
char *offset; char *offset;
@ -3045,7 +3034,7 @@ xlog_do_recovery_pass(
if (error) if (error)
goto bread_err1; goto bread_err1;
rhead = (xlog_rec_header_t *)offset; rhead = (struct xlog_rec_header *)offset;
/* /*
* xfsprogs has a bug where record length is based on lsunit but * xfsprogs has a bug where record length is based on lsunit but
@ -3152,7 +3141,7 @@ xlog_do_recovery_pass(
if (error) if (error)
goto bread_err2; goto bread_err2;
} }
rhead = (xlog_rec_header_t *)offset; rhead = (struct xlog_rec_header *)offset;
error = xlog_valid_rec_header(log, rhead, error = xlog_valid_rec_header(log, rhead,
split_hblks ? blk_no : 0, h_size); split_hblks ? blk_no : 0, h_size);
if (error) if (error)
@ -3234,7 +3223,7 @@ xlog_do_recovery_pass(
if (error) if (error)
goto bread_err2; goto bread_err2;
rhead = (xlog_rec_header_t *)offset; rhead = (struct xlog_rec_header *)offset;
error = xlog_valid_rec_header(log, rhead, blk_no, h_size); error = xlog_valid_rec_header(log, rhead, blk_no, h_size);
if (error) if (error)
goto bread_err2; goto bread_err2;

View File

@ -126,14 +126,16 @@ xfs_qm_dqpurge(
void *data) void *data)
{ {
struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo;
int error = -EAGAIN;
xfs_dqlock(dqp); spin_lock(&dqp->q_lockref.lock);
if ((dqp->q_flags & XFS_DQFLAG_FREEING) || dqp->q_nrefs != 0) if (dqp->q_lockref.count > 0 || __lockref_is_dead(&dqp->q_lockref)) {
goto out_unlock; spin_unlock(&dqp->q_lockref.lock);
return -EAGAIN;
dqp->q_flags |= XFS_DQFLAG_FREEING; }
lockref_mark_dead(&dqp->q_lockref);
spin_unlock(&dqp->q_lockref.lock);
mutex_lock(&dqp->q_qlock);
xfs_qm_dqunpin_wait(dqp); xfs_qm_dqunpin_wait(dqp);
xfs_dqflock(dqp); xfs_dqflock(dqp);
@ -144,6 +146,7 @@ xfs_qm_dqpurge(
*/ */
if (XFS_DQ_IS_DIRTY(dqp)) { if (XFS_DQ_IS_DIRTY(dqp)) {
struct xfs_buf *bp = NULL; struct xfs_buf *bp = NULL;
int error;
/* /*
* We don't care about getting disk errors here. We need * We don't care about getting disk errors here. We need
@ -151,9 +154,9 @@ xfs_qm_dqpurge(
*/ */
error = xfs_dquot_use_attached_buf(dqp, &bp); error = xfs_dquot_use_attached_buf(dqp, &bp);
if (error == -EAGAIN) { if (error == -EAGAIN) {
xfs_dqfunlock(dqp); /* resurrect the refcount from the dead. */
dqp->q_flags &= ~XFS_DQFLAG_FREEING; dqp->q_lockref.count = 0;
goto out_unlock; goto out_funlock;
} }
if (!bp) if (!bp)
goto out_funlock; goto out_funlock;
@ -177,7 +180,7 @@ xfs_qm_dqpurge(
!test_bit(XFS_LI_IN_AIL, &dqp->q_logitem.qli_item.li_flags)); !test_bit(XFS_LI_IN_AIL, &dqp->q_logitem.qli_item.li_flags));
xfs_dqfunlock(dqp); xfs_dqfunlock(dqp);
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
radix_tree_delete(xfs_dquot_tree(qi, xfs_dquot_type(dqp)), dqp->q_id); radix_tree_delete(xfs_dquot_tree(qi, xfs_dquot_type(dqp)), dqp->q_id);
qi->qi_dquots--; qi->qi_dquots--;
@ -192,10 +195,6 @@ xfs_qm_dqpurge(
xfs_qm_dqdestroy(dqp); xfs_qm_dqdestroy(dqp);
return 0; return 0;
out_unlock:
xfs_dqunlock(dqp);
return error;
} }
/* /*
@ -288,51 +287,6 @@ xfs_qm_unmount_quotas(
xfs_qm_destroy_quotainos(mp->m_quotainfo); xfs_qm_destroy_quotainos(mp->m_quotainfo);
} }
STATIC int
xfs_qm_dqattach_one(
struct xfs_inode *ip,
xfs_dqtype_t type,
bool doalloc,
struct xfs_dquot **IO_idqpp)
{
struct xfs_dquot *dqp;
int error;
xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
error = 0;
/*
* See if we already have it in the inode itself. IO_idqpp is &i_udquot
* or &i_gdquot. This made the code look weird, but made the logic a lot
* simpler.
*/
dqp = *IO_idqpp;
if (dqp) {
trace_xfs_dqattach_found(dqp);
return 0;
}
/*
* Find the dquot from somewhere. This bumps the reference count of
* dquot and returns it locked. This can return ENOENT if dquot didn't
* exist on disk and we didn't ask it to allocate; ESRCH if quotas got
* turned off suddenly.
*/
error = xfs_qm_dqget_inode(ip, type, doalloc, &dqp);
if (error)
return error;
trace_xfs_dqattach_get(dqp);
/*
* dqget may have dropped and re-acquired the ilock, but it guarantees
* that the dquot returned is the one that should go in the inode.
*/
*IO_idqpp = dqp;
xfs_dqunlock(dqp);
return 0;
}
static bool static bool
xfs_qm_need_dqattach( xfs_qm_need_dqattach(
struct xfs_inode *ip) struct xfs_inode *ip)
@ -372,7 +326,7 @@ xfs_qm_dqattach_locked(
ASSERT(!xfs_is_metadir_inode(ip)); ASSERT(!xfs_is_metadir_inode(ip));
if (XFS_IS_UQUOTA_ON(mp) && !ip->i_udquot) { if (XFS_IS_UQUOTA_ON(mp) && !ip->i_udquot) {
error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_USER, error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_USER,
doalloc, &ip->i_udquot); doalloc, &ip->i_udquot);
if (error) if (error)
goto done; goto done;
@ -380,7 +334,7 @@ xfs_qm_dqattach_locked(
} }
if (XFS_IS_GQUOTA_ON(mp) && !ip->i_gdquot) { if (XFS_IS_GQUOTA_ON(mp) && !ip->i_gdquot) {
error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_GROUP, error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_GROUP,
doalloc, &ip->i_gdquot); doalloc, &ip->i_gdquot);
if (error) if (error)
goto done; goto done;
@ -388,7 +342,7 @@ xfs_qm_dqattach_locked(
} }
if (XFS_IS_PQUOTA_ON(mp) && !ip->i_pdquot) { if (XFS_IS_PQUOTA_ON(mp) && !ip->i_pdquot) {
error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_PROJ, error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_PROJ,
doalloc, &ip->i_pdquot); doalloc, &ip->i_pdquot);
if (error) if (error)
goto done; goto done;
@ -468,7 +422,7 @@ xfs_qm_dquot_isolate(
struct xfs_qm_isolate *isol = arg; struct xfs_qm_isolate *isol = arg;
enum lru_status ret = LRU_SKIP; enum lru_status ret = LRU_SKIP;
if (!xfs_dqlock_nowait(dqp)) if (!spin_trylock(&dqp->q_lockref.lock))
goto out_miss_busy; goto out_miss_busy;
/* /*
@ -476,7 +430,7 @@ xfs_qm_dquot_isolate(
* from the LRU, leave it for the freeing task to complete the freeing * from the LRU, leave it for the freeing task to complete the freeing
* process rather than risk it being free from under us here. * process rather than risk it being free from under us here.
*/ */
if (dqp->q_flags & XFS_DQFLAG_FREEING) if (__lockref_is_dead(&dqp->q_lockref))
goto out_miss_unlock; goto out_miss_unlock;
/* /*
@ -485,16 +439,15 @@ xfs_qm_dquot_isolate(
* again. * again.
*/ */
ret = LRU_ROTATE; ret = LRU_ROTATE;
if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) { if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0)
goto out_miss_unlock; goto out_miss_unlock;
}
/* /*
* This dquot has acquired a reference in the meantime remove it from * This dquot has acquired a reference in the meantime remove it from
* the freelist and try again. * the freelist and try again.
*/ */
if (dqp->q_nrefs) { if (dqp->q_lockref.count) {
xfs_dqunlock(dqp); spin_unlock(&dqp->q_lockref.lock);
XFS_STATS_INC(dqp->q_mount, xs_qm_dqwants); XFS_STATS_INC(dqp->q_mount, xs_qm_dqwants);
trace_xfs_dqreclaim_want(dqp); trace_xfs_dqreclaim_want(dqp);
@ -518,10 +471,9 @@ xfs_qm_dquot_isolate(
/* /*
* Prevent lookups now that we are past the point of no return. * Prevent lookups now that we are past the point of no return.
*/ */
dqp->q_flags |= XFS_DQFLAG_FREEING; lockref_mark_dead(&dqp->q_lockref);
xfs_dqunlock(dqp); spin_unlock(&dqp->q_lockref.lock);
ASSERT(dqp->q_nrefs == 0);
list_lru_isolate_move(lru, &dqp->q_lru, &isol->dispose); list_lru_isolate_move(lru, &dqp->q_lru, &isol->dispose);
XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot_unused); XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot_unused);
trace_xfs_dqreclaim_done(dqp); trace_xfs_dqreclaim_done(dqp);
@ -529,7 +481,7 @@ xfs_qm_dquot_isolate(
return LRU_REMOVED; return LRU_REMOVED;
out_miss_unlock: out_miss_unlock:
xfs_dqunlock(dqp); spin_unlock(&dqp->q_lockref.lock);
out_miss_busy: out_miss_busy:
trace_xfs_dqreclaim_busy(dqp); trace_xfs_dqreclaim_busy(dqp);
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);
@ -1316,9 +1268,10 @@ xfs_qm_quotacheck_dqadjust(
return error; return error;
} }
mutex_lock(&dqp->q_qlock);
error = xfs_dquot_attach_buf(NULL, dqp); error = xfs_dquot_attach_buf(NULL, dqp);
if (error) if (error)
return error; goto out_unlock;
trace_xfs_dqadjust(dqp); trace_xfs_dqadjust(dqp);
@ -1348,8 +1301,10 @@ xfs_qm_quotacheck_dqadjust(
} }
dqp->q_flags |= XFS_DQFLAG_DIRTY; dqp->q_flags |= XFS_DQFLAG_DIRTY;
xfs_qm_dqput(dqp); out_unlock:
return 0; mutex_unlock(&dqp->q_qlock);
xfs_qm_dqrele(dqp);
return error;
} }
/* /*
@ -1466,9 +1421,10 @@ xfs_qm_flush_one(
struct xfs_buf *bp = NULL; struct xfs_buf *bp = NULL;
int error = 0; int error = 0;
xfs_dqlock(dqp); if (!lockref_get_not_dead(&dqp->q_lockref))
if (dqp->q_flags & XFS_DQFLAG_FREEING) return 0;
goto out_unlock;
mutex_lock(&dqp->q_qlock);
if (!XFS_DQ_IS_DIRTY(dqp)) if (!XFS_DQ_IS_DIRTY(dqp))
goto out_unlock; goto out_unlock;
@ -1488,7 +1444,8 @@ xfs_qm_flush_one(
xfs_buf_delwri_queue(bp, buffer_list); xfs_buf_delwri_queue(bp, buffer_list);
xfs_buf_relse(bp); xfs_buf_relse(bp);
out_unlock: out_unlock:
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
xfs_qm_dqrele(dqp);
return error; return error;
} }
@ -1904,16 +1861,12 @@ xfs_qm_vop_dqalloc(
struct xfs_dquot *gq = NULL; struct xfs_dquot *gq = NULL;
struct xfs_dquot *pq = NULL; struct xfs_dquot *pq = NULL;
int error; int error;
uint lockflags;
if (!XFS_IS_QUOTA_ON(mp)) if (!XFS_IS_QUOTA_ON(mp))
return 0; return 0;
ASSERT(!xfs_is_metadir_inode(ip)); ASSERT(!xfs_is_metadir_inode(ip));
lockflags = XFS_ILOCK_EXCL;
xfs_ilock(ip, lockflags);
if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip)) if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip))
gid = inode->i_gid; gid = inode->i_gid;
@ -1922,38 +1875,22 @@ xfs_qm_vop_dqalloc(
* if necessary. The dquot(s) will not be locked. * if necessary. The dquot(s) will not be locked.
*/ */
if (XFS_NOT_DQATTACHED(mp, ip)) { if (XFS_NOT_DQATTACHED(mp, ip)) {
xfs_ilock(ip, XFS_ILOCK_EXCL);
error = xfs_qm_dqattach_locked(ip, true); error = xfs_qm_dqattach_locked(ip, true);
if (error) { xfs_iunlock(ip, XFS_ILOCK_EXCL);
xfs_iunlock(ip, lockflags); if (error)
return error; return error;
}
} }
if ((flags & XFS_QMOPT_UQUOTA) && XFS_IS_UQUOTA_ON(mp)) { if ((flags & XFS_QMOPT_UQUOTA) && XFS_IS_UQUOTA_ON(mp)) {
ASSERT(O_udqpp); ASSERT(O_udqpp);
if (!uid_eq(inode->i_uid, uid)) { if (!uid_eq(inode->i_uid, uid)) {
/*
* What we need is the dquot that has this uid, and
* if we send the inode to dqget, the uid of the inode
* takes priority over what's sent in the uid argument.
* We must unlock inode here before calling dqget if
* we're not sending the inode, because otherwise
* we'll deadlock by doing trans_reserve while
* holding ilock.
*/
xfs_iunlock(ip, lockflags);
error = xfs_qm_dqget(mp, from_kuid(user_ns, uid), error = xfs_qm_dqget(mp, from_kuid(user_ns, uid),
XFS_DQTYPE_USER, true, &uq); XFS_DQTYPE_USER, true, &uq);
if (error) { if (error) {
ASSERT(error != -ENOENT); ASSERT(error != -ENOENT);
return error; return error;
} }
/*
* Get the ilock in the right order.
*/
xfs_dqunlock(uq);
lockflags = XFS_ILOCK_SHARED;
xfs_ilock(ip, lockflags);
} else { } else {
/* /*
* Take an extra reference, because we'll return * Take an extra reference, because we'll return
@ -1966,16 +1903,12 @@ xfs_qm_vop_dqalloc(
if ((flags & XFS_QMOPT_GQUOTA) && XFS_IS_GQUOTA_ON(mp)) { if ((flags & XFS_QMOPT_GQUOTA) && XFS_IS_GQUOTA_ON(mp)) {
ASSERT(O_gdqpp); ASSERT(O_gdqpp);
if (!gid_eq(inode->i_gid, gid)) { if (!gid_eq(inode->i_gid, gid)) {
xfs_iunlock(ip, lockflags);
error = xfs_qm_dqget(mp, from_kgid(user_ns, gid), error = xfs_qm_dqget(mp, from_kgid(user_ns, gid),
XFS_DQTYPE_GROUP, true, &gq); XFS_DQTYPE_GROUP, true, &gq);
if (error) { if (error) {
ASSERT(error != -ENOENT); ASSERT(error != -ENOENT);
goto error_rele; goto error_rele;
} }
xfs_dqunlock(gq);
lockflags = XFS_ILOCK_SHARED;
xfs_ilock(ip, lockflags);
} else { } else {
ASSERT(ip->i_gdquot); ASSERT(ip->i_gdquot);
gq = xfs_qm_dqhold(ip->i_gdquot); gq = xfs_qm_dqhold(ip->i_gdquot);
@ -1984,16 +1917,12 @@ xfs_qm_vop_dqalloc(
if ((flags & XFS_QMOPT_PQUOTA) && XFS_IS_PQUOTA_ON(mp)) { if ((flags & XFS_QMOPT_PQUOTA) && XFS_IS_PQUOTA_ON(mp)) {
ASSERT(O_pdqpp); ASSERT(O_pdqpp);
if (ip->i_projid != prid) { if (ip->i_projid != prid) {
xfs_iunlock(ip, lockflags);
error = xfs_qm_dqget(mp, prid, error = xfs_qm_dqget(mp, prid,
XFS_DQTYPE_PROJ, true, &pq); XFS_DQTYPE_PROJ, true, &pq);
if (error) { if (error) {
ASSERT(error != -ENOENT); ASSERT(error != -ENOENT);
goto error_rele; goto error_rele;
} }
xfs_dqunlock(pq);
lockflags = XFS_ILOCK_SHARED;
xfs_ilock(ip, lockflags);
} else { } else {
ASSERT(ip->i_pdquot); ASSERT(ip->i_pdquot);
pq = xfs_qm_dqhold(ip->i_pdquot); pq = xfs_qm_dqhold(ip->i_pdquot);
@ -2001,7 +1930,6 @@ xfs_qm_vop_dqalloc(
} }
trace_xfs_dquot_dqalloc(ip); trace_xfs_dquot_dqalloc(ip);
xfs_iunlock(ip, lockflags);
if (O_udqpp) if (O_udqpp)
*O_udqpp = uq; *O_udqpp = uq;
else else
@ -2078,7 +2006,7 @@ xfs_qm_vop_chown(
* back now. * back now.
*/ */
tp->t_flags |= XFS_TRANS_DIRTY; tp->t_flags |= XFS_TRANS_DIRTY;
xfs_dqlock(prevdq); mutex_lock(&prevdq->q_qlock);
if (isrt) { if (isrt) {
ASSERT(prevdq->q_rtb.reserved >= ip->i_delayed_blks); ASSERT(prevdq->q_rtb.reserved >= ip->i_delayed_blks);
prevdq->q_rtb.reserved -= ip->i_delayed_blks; prevdq->q_rtb.reserved -= ip->i_delayed_blks;
@ -2086,7 +2014,7 @@ xfs_qm_vop_chown(
ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks);
prevdq->q_blk.reserved -= ip->i_delayed_blks; prevdq->q_blk.reserved -= ip->i_delayed_blks;
} }
xfs_dqunlock(prevdq); mutex_unlock(&prevdq->q_qlock);
/* /*
* Take an extra reference, because the inode is going to keep * Take an extra reference, because the inode is going to keep

View File

@ -57,7 +57,7 @@ struct xfs_quotainfo {
struct xfs_inode *qi_pquotaip; /* project quota inode */ struct xfs_inode *qi_pquotaip; /* project quota inode */
struct xfs_inode *qi_dirip; /* quota metadir */ struct xfs_inode *qi_dirip; /* quota metadir */
struct list_lru qi_lru; struct list_lru qi_lru;
int qi_dquots; uint64_t qi_dquots;
struct mutex qi_quotaofflock;/* to serialize quotaoff */ struct mutex qi_quotaofflock;/* to serialize quotaoff */
xfs_filblks_t qi_dqchunklen; /* # BBs in a chunk of dqs */ xfs_filblks_t qi_dqchunklen; /* # BBs in a chunk of dqs */
uint qi_dqperchunk; /* # ondisk dq in above chunk */ uint qi_dqperchunk; /* # ondisk dq in above chunk */

View File

@ -73,8 +73,10 @@ xfs_qm_statvfs(
struct xfs_dquot *dqp; struct xfs_dquot *dqp;
if (!xfs_qm_dqget(mp, ip->i_projid, XFS_DQTYPE_PROJ, false, &dqp)) { if (!xfs_qm_dqget(mp, ip->i_projid, XFS_DQTYPE_PROJ, false, &dqp)) {
mutex_lock(&dqp->q_qlock);
xfs_fill_statvfs_from_dquot(statp, ip, dqp); xfs_fill_statvfs_from_dquot(statp, ip, dqp);
xfs_qm_dqput(dqp); mutex_unlock(&dqp->q_qlock);
xfs_qm_dqrele(dqp);
} }
} }

View File

@ -303,13 +303,12 @@ xfs_qm_scall_setqlim(
} }
defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); defq = xfs_get_defquota(q, xfs_dquot_type(dqp));
xfs_dqunlock(dqp);
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_setqlim, 0, 0, 0, &tp); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_setqlim, 0, 0, 0, &tp);
if (error) if (error)
goto out_rele; goto out_rele;
xfs_dqlock(dqp); mutex_lock(&dqp->q_qlock);
xfs_trans_dqjoin(tp, dqp); xfs_trans_dqjoin(tp, dqp);
/* /*
@ -459,6 +458,7 @@ xfs_qm_scall_getquota(
* If everything's NULL, this dquot doesn't quite exist as far as * If everything's NULL, this dquot doesn't quite exist as far as
* our utility programs are concerned. * our utility programs are concerned.
*/ */
mutex_lock(&dqp->q_qlock);
if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) { if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) {
error = -ENOENT; error = -ENOENT;
goto out_put; goto out_put;
@ -467,7 +467,8 @@ xfs_qm_scall_getquota(
xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst);
out_put: out_put:
xfs_qm_dqput(dqp); mutex_unlock(&dqp->q_qlock);
xfs_qm_dqrele(dqp);
return error; return error;
} }
@ -497,7 +498,8 @@ xfs_qm_scall_getquota_next(
*id = dqp->q_id; *id = dqp->q_id;
xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst);
mutex_unlock(&dqp->q_qlock);
xfs_qm_dqput(dqp); xfs_qm_dqrele(dqp);
return error; return error;
} }

View File

@ -65,7 +65,7 @@ xfs_fs_get_quota_state(
memset(state, 0, sizeof(*state)); memset(state, 0, sizeof(*state));
if (!XFS_IS_QUOTA_ON(mp)) if (!XFS_IS_QUOTA_ON(mp))
return 0; return 0;
state->s_incoredqs = q->qi_dquots; state->s_incoredqs = min_t(uint64_t, q->qi_dquots, UINT_MAX);
if (XFS_IS_UQUOTA_ON(mp)) if (XFS_IS_UQUOTA_ON(mp))
state->s_state[USRQUOTA].flags |= QCI_ACCT_ENABLED; state->s_state[USRQUOTA].flags |= QCI_ACCT_ENABLED;
if (XFS_IS_UQUOTA_ENFORCED(mp)) if (XFS_IS_UQUOTA_ENFORCED(mp))

View File

@ -1350,7 +1350,7 @@ DECLARE_EVENT_CLASS(xfs_dquot_class,
__entry->id = dqp->q_id; __entry->id = dqp->q_id;
__entry->type = dqp->q_type; __entry->type = dqp->q_type;
__entry->flags = dqp->q_flags; __entry->flags = dqp->q_flags;
__entry->nrefs = dqp->q_nrefs; __entry->nrefs = data_race(dqp->q_lockref.count);
__entry->res_bcount = dqp->q_blk.reserved; __entry->res_bcount = dqp->q_blk.reserved;
__entry->res_rtbcount = dqp->q_rtb.reserved; __entry->res_rtbcount = dqp->q_rtb.reserved;
@ -1399,7 +1399,6 @@ DEFINE_DQUOT_EVENT(xfs_dqadjust);
DEFINE_DQUOT_EVENT(xfs_dqreclaim_want); DEFINE_DQUOT_EVENT(xfs_dqreclaim_want);
DEFINE_DQUOT_EVENT(xfs_dqreclaim_busy); DEFINE_DQUOT_EVENT(xfs_dqreclaim_busy);
DEFINE_DQUOT_EVENT(xfs_dqreclaim_done); DEFINE_DQUOT_EVENT(xfs_dqreclaim_done);
DEFINE_DQUOT_EVENT(xfs_dqattach_found);
DEFINE_DQUOT_EVENT(xfs_dqattach_get); DEFINE_DQUOT_EVENT(xfs_dqattach_get);
DEFINE_DQUOT_EVENT(xfs_dqalloc); DEFINE_DQUOT_EVENT(xfs_dqalloc);
DEFINE_DQUOT_EVENT(xfs_dqtobp_read); DEFINE_DQUOT_EVENT(xfs_dqtobp_read);
@ -1409,9 +1408,8 @@ DEFINE_DQUOT_EVENT(xfs_dqget_hit);
DEFINE_DQUOT_EVENT(xfs_dqget_miss); DEFINE_DQUOT_EVENT(xfs_dqget_miss);
DEFINE_DQUOT_EVENT(xfs_dqget_freeing); DEFINE_DQUOT_EVENT(xfs_dqget_freeing);
DEFINE_DQUOT_EVENT(xfs_dqget_dup); DEFINE_DQUOT_EVENT(xfs_dqget_dup);
DEFINE_DQUOT_EVENT(xfs_dqput);
DEFINE_DQUOT_EVENT(xfs_dqput_free);
DEFINE_DQUOT_EVENT(xfs_dqrele); DEFINE_DQUOT_EVENT(xfs_dqrele);
DEFINE_DQUOT_EVENT(xfs_dqrele_free);
DEFINE_DQUOT_EVENT(xfs_dqflush); DEFINE_DQUOT_EVENT(xfs_dqflush);
DEFINE_DQUOT_EVENT(xfs_dqflush_force); DEFINE_DQUOT_EVENT(xfs_dqflush_force);
DEFINE_DQUOT_EVENT(xfs_dqflush_done); DEFINE_DQUOT_EVENT(xfs_dqflush_done);
@ -4934,7 +4932,7 @@ DECLARE_EVENT_CLASS(xlog_iclog_class,
__entry->refcount = atomic_read(&iclog->ic_refcnt); __entry->refcount = atomic_read(&iclog->ic_refcnt);
__entry->offset = iclog->ic_offset; __entry->offset = iclog->ic_offset;
__entry->flags = iclog->ic_flags; __entry->flags = iclog->ic_flags;
__entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn); __entry->lsn = be64_to_cpu(iclog->ic_header->h_lsn);
__entry->caller_ip = caller_ip; __entry->caller_ip = caller_ip;
), ),
TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx flags %s caller %pS", TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx flags %s caller %pS",

View File

@ -393,7 +393,7 @@ xfs_trans_dqlockedjoin(
unsigned int i; unsigned int i;
ASSERT(q[0].qt_dquot != NULL); ASSERT(q[0].qt_dquot != NULL);
if (q[1].qt_dquot == NULL) { if (q[1].qt_dquot == NULL) {
xfs_dqlock(q[0].qt_dquot); mutex_lock(&q[0].qt_dquot->q_qlock);
xfs_trans_dqjoin(tp, q[0].qt_dquot); xfs_trans_dqjoin(tp, q[0].qt_dquot);
} else if (q[2].qt_dquot == NULL) { } else if (q[2].qt_dquot == NULL) {
xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot); xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot);
@ -693,7 +693,7 @@ xfs_trans_unreserve_and_mod_dquots(
locked = already_locked; locked = already_locked;
if (qtrx->qt_blk_res) { if (qtrx->qt_blk_res) {
if (!locked) { if (!locked) {
xfs_dqlock(dqp); mutex_lock(&dqp->q_qlock);
locked = true; locked = true;
} }
dqp->q_blk.reserved -= dqp->q_blk.reserved -=
@ -701,7 +701,7 @@ xfs_trans_unreserve_and_mod_dquots(
} }
if (qtrx->qt_ino_res) { if (qtrx->qt_ino_res) {
if (!locked) { if (!locked) {
xfs_dqlock(dqp); mutex_lock(&dqp->q_qlock);
locked = true; locked = true;
} }
dqp->q_ino.reserved -= dqp->q_ino.reserved -=
@ -710,14 +710,14 @@ xfs_trans_unreserve_and_mod_dquots(
if (qtrx->qt_rtblk_res) { if (qtrx->qt_rtblk_res) {
if (!locked) { if (!locked) {
xfs_dqlock(dqp); mutex_lock(&dqp->q_qlock);
locked = true; locked = true;
} }
dqp->q_rtb.reserved -= dqp->q_rtb.reserved -=
(xfs_qcnt_t)qtrx->qt_rtblk_res; (xfs_qcnt_t)qtrx->qt_rtblk_res;
} }
if (locked && !already_locked) if (locked && !already_locked)
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
} }
} }
@ -820,7 +820,7 @@ xfs_trans_dqresv(
struct xfs_dquot_res *blkres; struct xfs_dquot_res *blkres;
struct xfs_quota_limits *qlim; struct xfs_quota_limits *qlim;
xfs_dqlock(dqp); mutex_lock(&dqp->q_qlock);
defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); defq = xfs_get_defquota(q, xfs_dquot_type(dqp));
@ -887,16 +887,16 @@ xfs_trans_dqresv(
XFS_IS_CORRUPT(mp, dqp->q_ino.reserved < dqp->q_ino.count)) XFS_IS_CORRUPT(mp, dqp->q_ino.reserved < dqp->q_ino.count))
goto error_corrupt; goto error_corrupt;
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
return 0; return 0;
error_return: error_return:
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ) if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
return -ENOSPC; return -ENOSPC;
return -EDQUOT; return -EDQUOT;
error_corrupt: error_corrupt:
xfs_dqunlock(dqp); mutex_unlock(&dqp->q_qlock);
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
xfs_fs_mark_sick(mp, XFS_SICK_FS_QUOTACHECK); xfs_fs_mark_sick(mp, XFS_SICK_FS_QUOTACHECK);
return -EFSCORRUPTED; return -EFSCORRUPTED;

View File

@ -103,9 +103,6 @@ xfs_zone_account_reclaimable(
*/ */
trace_xfs_zone_emptied(rtg); trace_xfs_zone_emptied(rtg);
if (!was_full)
xfs_group_clear_mark(xg, XFS_RTG_RECLAIMABLE);
spin_lock(&zi->zi_used_buckets_lock); spin_lock(&zi->zi_used_buckets_lock);
if (!was_full) if (!was_full)
xfs_zone_remove_from_bucket(zi, rgno, from_bucket); xfs_zone_remove_from_bucket(zi, rgno, from_bucket);
@ -127,7 +124,6 @@ xfs_zone_account_reclaimable(
xfs_zone_add_to_bucket(zi, rgno, to_bucket); xfs_zone_add_to_bucket(zi, rgno, to_bucket);
spin_unlock(&zi->zi_used_buckets_lock); spin_unlock(&zi->zi_used_buckets_lock);
xfs_group_set_mark(xg, XFS_RTG_RECLAIMABLE);
if (zi->zi_gc_thread && xfs_zoned_need_gc(mp)) if (zi->zi_gc_thread && xfs_zoned_need_gc(mp))
wake_up_process(zi->zi_gc_thread); wake_up_process(zi->zi_gc_thread);
} else if (to_bucket != from_bucket) { } else if (to_bucket != from_bucket) {
@ -142,6 +138,28 @@ xfs_zone_account_reclaimable(
} }
} }
/*
* Check if we have any zones that can be reclaimed by looking at the entry
* counters for the zone buckets.
*/
bool
xfs_zoned_have_reclaimable(
struct xfs_zone_info *zi)
{
int i;
spin_lock(&zi->zi_used_buckets_lock);
for (i = 0; i < XFS_ZONE_USED_BUCKETS; i++) {
if (zi->zi_used_bucket_entries[i]) {
spin_unlock(&zi->zi_used_buckets_lock);
return true;
}
}
spin_unlock(&zi->zi_used_buckets_lock);
return false;
}
static void static void
xfs_open_zone_mark_full( xfs_open_zone_mark_full(
struct xfs_open_zone *oz) struct xfs_open_zone *oz)

View File

@ -117,7 +117,6 @@ struct xfs_gc_bio {
struct xfs_rtgroup *victim_rtg; struct xfs_rtgroup *victim_rtg;
/* Bio used for reads and writes, including the bvec used by it */ /* Bio used for reads and writes, including the bvec used by it */
struct bio_vec bv;
struct bio bio; /* must be last */ struct bio bio; /* must be last */
}; };
@ -175,14 +174,13 @@ xfs_zoned_need_gc(
s64 available, free, threshold; s64 available, free, threshold;
s32 remainder; s32 remainder;
if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE)) if (!xfs_zoned_have_reclaimable(mp->m_zone_info))
return false; return false;
available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE); available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE);
if (available < if (available <
mp->m_groups[XG_TYPE_RTG].blocks * xfs_rtgs_to_rfsbs(mp, mp->m_max_open_zones - XFS_OPEN_GC_ZONES))
(mp->m_max_open_zones - XFS_OPEN_GC_ZONES))
return true; return true;
free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS); free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS);
@ -1184,16 +1182,16 @@ xfs_zone_gc_mount(
goto out_put_gc_zone; goto out_put_gc_zone;
} }
mp->m_zone_info->zi_gc_thread = kthread_create(xfs_zoned_gcd, data, zi->zi_gc_thread = kthread_create(xfs_zoned_gcd, data,
"xfs-zone-gc/%s", mp->m_super->s_id); "xfs-zone-gc/%s", mp->m_super->s_id);
if (IS_ERR(mp->m_zone_info->zi_gc_thread)) { if (IS_ERR(zi->zi_gc_thread)) {
xfs_warn(mp, "unable to create zone gc thread"); xfs_warn(mp, "unable to create zone gc thread");
error = PTR_ERR(mp->m_zone_info->zi_gc_thread); error = PTR_ERR(zi->zi_gc_thread);
goto out_free_gc_data; goto out_free_gc_data;
} }
/* xfs_zone_gc_start will unpark for rw mounts */ /* xfs_zone_gc_start will unpark for rw mounts */
kthread_park(mp->m_zone_info->zi_gc_thread); kthread_park(zi->zi_gc_thread);
return 0; return 0;
out_free_gc_data: out_free_gc_data:

View File

@ -113,6 +113,7 @@ struct xfs_open_zone *xfs_open_zone(struct xfs_mount *mp,
int xfs_zone_gc_reset_sync(struct xfs_rtgroup *rtg); int xfs_zone_gc_reset_sync(struct xfs_rtgroup *rtg);
bool xfs_zoned_need_gc(struct xfs_mount *mp); bool xfs_zoned_need_gc(struct xfs_mount *mp);
bool xfs_zoned_have_reclaimable(struct xfs_zone_info *zi);
int xfs_zone_gc_mount(struct xfs_mount *mp); int xfs_zone_gc_mount(struct xfs_mount *mp);
void xfs_zone_gc_unmount(struct xfs_mount *mp); void xfs_zone_gc_unmount(struct xfs_mount *mp);

View File

@ -54,12 +54,10 @@ xfs_zoned_default_resblks(
{ {
switch (ctr) { switch (ctr) {
case XC_FREE_RTEXTENTS: case XC_FREE_RTEXTENTS:
return (uint64_t)XFS_RESERVED_ZONES * return xfs_rtgs_to_rfsbs(mp, XFS_RESERVED_ZONES) +
mp->m_groups[XG_TYPE_RTG].blocks + mp->m_sb.sb_rtreserved;
mp->m_sb.sb_rtreserved;
case XC_FREE_RTAVAILABLE: case XC_FREE_RTAVAILABLE:
return (uint64_t)XFS_GC_ZONES * return xfs_rtgs_to_rfsbs(mp, XFS_GC_ZONES);
mp->m_groups[XG_TYPE_RTG].blocks;
default: default:
ASSERT(0); ASSERT(0);
return 0; return 0;
@ -174,7 +172,7 @@ xfs_zoned_reserve_available(
* processing a pending GC request give up as we're fully out * processing a pending GC request give up as we're fully out
* of space. * of space.
*/ */
if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE) && if (!xfs_zoned_have_reclaimable(mp->m_zone_info) &&
!xfs_is_zonegc_running(mp)) !xfs_is_zonegc_running(mp))
break; break;