diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst index 55e727b5f12e..3d9233f403db 100644 --- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst @@ -105,10 +105,8 @@ occur; this capability aids both strategies. TLDR; Show Me the Code! ----------------------- -Code is posted to the kernel.org git trees as follows: -`kernel changes `_, -`userspace changes `_, and -`QA test changes `_. +Kernel and userspace code has been fully merged as of October 2025. + Each kernel patchset adding an online repair function will use the same branch name across the kernel, xfsprogs, and fstests git repos. @@ -764,12 +762,8 @@ allow the online fsck developers to compare online fsck against offline fsck, and they enable XFS developers to find deficiencies in the code base. Proposed patchsets include -`general fuzzer improvements -`_, `fuzzing baselines -`_, -and `improvements in fuzz testing comprehensiveness -`_. +`_. Stress Testing -------------- @@ -801,11 +795,6 @@ Success is defined by the ability to run all of these tests without observing any unexpected filesystem shutdowns due to corrupted metadata, kernel hang check warnings, or any other sort of mischief. -Proposed patchsets include `general stress testing -`_ -and the `evolution of existing per-function stress testing -`_. - 4. User Interface ================= @@ -886,10 +875,6 @@ apply as nice of a priority to IO and CPU scheduling as possible. This measure was taken to minimize delays in the rest of the filesystem. No such hardening has been performed for the cron job. -Proposed patchset: -`Enabling the xfs_scrub background service -`_. - Health Reporting ---------------- @@ -912,13 +897,6 @@ notifications and initiate a repair? *Answer*: These questions remain unanswered, but should be a part of the conversation with early adopters and potential downstream users of XFS. -Proposed patchsets include -`wiring up health reports to correction returns -`_ -and -`preservation of sickness info during memory reclaim -`_. - 5. Kernel Algorithms and Data Structures ======================================== @@ -1310,21 +1288,6 @@ Space allocation records are cross-referenced as follows: are there the same number of reverse mapping records for each block as the reference count record claims? -Proposed patchsets are the series to find gaps in -`refcount btree -`_, -`inode btree -`_, and -`rmap btree -`_ records; -to find -`mergeable records -`_; -and to -`improve cross referencing with rmap -`_ -before starting a repair. - Checking Extended Attributes ```````````````````````````` @@ -1756,10 +1719,6 @@ For scrub, the drain works as follows: To avoid polling in step 4, the drain provides a waitqueue for scrub threads to be woken up whenever the intent count drops to zero. -The proposed patchset is the -`scrub intent drain series -`_. - .. _jump_labels: Static Keys (aka Jump Label Patching) @@ -2036,10 +1995,6 @@ The ``xfarray_store_anywhere`` function is used to insert a record in any null record slot in the bag; and the ``xfarray_unset`` function removes a record from the bag. -The proposed patchset is the -`big in-memory array -`_. - Iterating Array Elements ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2172,10 +2127,6 @@ However, it should be noted that these repair functions only use blob storage to cache a small number of entries before adding them to a temporary ondisk file, which is why compaction is not required. -The proposed patchset is at the start of the -`extended attribute repair -`_ series. - .. _xfbtree: In-Memory B+Trees @@ -2214,11 +2165,6 @@ xfiles enables reuse of the entire btree library. Btrees built atop an xfile are collectively known as ``xfbtrees``. The next few sections describe how they actually work. -The proposed patchset is the -`in-memory btree -`_ -series. - Using xfiles as a Buffer Cache Target ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2459,14 +2405,6 @@ This enables the log to release the old EFI to keep the log moving forwards. EFIs have a role to play during the commit and reaping phases; please see the next section and the section about :ref:`reaping` for more details. -Proposed patchsets are the -`bitmap rework -`_ -and the -`preparation for bulk loading btrees -`_. - - Writing the New Tree ```````````````````` @@ -2623,11 +2561,6 @@ The number of records for the inode btree is the number of xfarray records, but the record count for the free inode btree has to be computed as inode chunk records are stored in the xfarray. -The proposed patchset is the -`AG btree repair -`_ -series. - Case Study: Rebuilding the Space Reference Counts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2716,11 +2649,6 @@ Reverse mappings are added to the bag using ``xfarray_store_anywhere`` and removed via ``xfarray_unset``. Bag members are examined through ``xfarray_iter`` loops. -The proposed patchset is the -`AG btree repair -`_ -series. - Case Study: Rebuilding File Fork Mapping Indices ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2757,11 +2685,6 @@ EXTENTS format instead of BMBT, which may require a conversion. Third, the incore extent map must be reloaded carefully to avoid disturbing any delayed allocation extents. -The proposed patchset is the -`file mapping repair -`_ -series. - .. _reaping: Reaping Old Metadata Blocks @@ -2843,11 +2766,6 @@ blocks. As stated earlier, online repair functions use very large transactions to minimize the chances of this occurring. -The proposed patchset is the -`preparation for bulk loading btrees -`_ -series. - Case Study: Reaping After a Regular Btree Repair ```````````````````````````````````````````````` @@ -2943,11 +2861,6 @@ When the walk is complete, the bitmap disunion operation ``(ag_owner_bitmap & btrees. These blocks can then be reaped using the methods outlined above. -The proposed patchset is the -`AG btree repair -`_ -series. - .. _rmap_reap: Case Study: Reaping After Repairing Reverse Mapping Btrees @@ -2972,11 +2885,6 @@ methods outlined above. The rest of the process of rebuildng the reverse mapping btree is discussed in a separate :ref:`case study`. -The proposed patchset is the -`AG btree repair -`_ -series. - Case Study: Rebuilding the AGFL ``````````````````````````````` @@ -3024,11 +2932,6 @@ more complicated, because computing the correct value requires traversing the forks, or if that fails, leaving the fields invalid and waiting for the fork fsck functions to run. -The proposed patchset is the -`inode -`_ -repair series. - Quota Record Repairs -------------------- @@ -3045,11 +2948,6 @@ checking are obviously bad limits and timer values. Quota usage counters are checked, repaired, and discussed separately in the section about :ref:`live quotacheck `. -The proposed patchset is the -`quota -`_ -repair series. - .. _fscounters: Freezing to Fix Summary Counters @@ -3145,11 +3043,6 @@ long enough to check and correct the summary counters. | This bug was fixed in Linux 5.17. | +--------------------------------------------------------------------------+ -The proposed patchset is the -`summary counter cleanup -`_ -series. - Full Filesystem Scans --------------------- @@ -3277,15 +3170,6 @@ Second, if the incore inode is stuck in some intermediate state, the scan coordinator must release the AGI and push the main filesystem to get the inode back into a loadable state. -The proposed patches are the -`inode scanner -`_ -series. -The first user of the new functionality is the -`online quotacheck -`_ -series. - Inode Management ```````````````` @@ -3381,12 +3265,6 @@ To capture these nuances, the online fsck code has a separate ``xchk_irele`` function to set or clear the ``DONTCACHE`` flag to get the required release behavior. -Proposed patchsets include fixing -`scrub iget usage -`_ and -`dir iget usage -`_. - .. _ilocking: Locking Inodes @@ -3443,11 +3321,6 @@ If the dotdot entry changes while the directory is unlocked, then a move or rename operation must have changed the child's parentage, and the scan can exit early. -The proposed patchset is the -`directory repair -`_ -series. - .. _fshooks: Filesystem Hooks @@ -3594,11 +3467,6 @@ The inode scan APIs are pretty simple: - ``xchk_iscan_teardown`` to finish the scan -This functionality is also a part of the -`inode scanner -`_ -series. - .. _quotacheck: Case Study: Quota Counter Checking @@ -3686,11 +3554,6 @@ needing to hold any locks for a long duration. If repairs are desired, the real and shadow dquots are locked and their resource counts are set to the values in the shadow dquot. -The proposed patchset is the -`online quotacheck -`_ -series. - .. _nlinks: Case Study: File Link Count Checking @@ -3744,11 +3607,6 @@ shadow information. If no parents are found, the file must be :ref:`reparented ` to the orphanage to prevent the file from being lost forever. -The proposed patchset is the -`file link count repair -`_ -series. - .. _rmap_repair: Case Study: Rebuilding Reverse Mapping Records @@ -3828,11 +3686,6 @@ scan for reverse mapping records. 12. Free the xfbtree now that it not needed. -The proposed patchset is the -`rmap repair -`_ -series. - Staging Repairs with Temporary Files on Disk -------------------------------------------- @@ -3971,11 +3824,6 @@ Once a good copy of a data file has been constructed in a temporary file, it must be conveyed to the file being repaired, which is the topic of the next section. -The proposed patches are in the -`repair temporary files -`_ -series. - Logged File Content Exchanges ----------------------------- @@ -4025,11 +3873,6 @@ The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag in the superblock protects these new log item records from being replayed on old kernels. -The proposed patchset is the -`file contents exchange -`_ -series. - +--------------------------------------------------------------------------+ | **Sidebar: Using Log-Incompatible Feature Flags** | +--------------------------------------------------------------------------+ @@ -4323,11 +4166,6 @@ To repair the summary file, write the xfile contents into the temporary file and use atomic mapping exchange to commit the new contents. The temporary file is then reaped. -The proposed patchset is the -`realtime summary repair -`_ -series. - Case Study: Salvaging Extended Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4369,11 +4207,6 @@ Salvaging extended attributes is done as follows: 4. Reap the temporary file. -The proposed patchset is the -`extended attribute repair -`_ -series. - Fixing Directories ------------------ @@ -4448,11 +4281,6 @@ Unfortunately, the current dentry cache design doesn't provide a means to walk every child dentry of a specific directory, which makes this a hard problem. There is no known solution. -The proposed patchset is the -`directory repair -`_ -series. - Parent Pointers ``````````````` @@ -4612,11 +4440,6 @@ a :ref:`directory entry live update hook ` as follows: 7. Reap the temporary directory. -The proposed patchset is the -`parent pointers directory repair -`_ -series. - Case Study: Repairing Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4662,11 +4485,6 @@ directory reconstruction: 8. Reap the temporary file. -The proposed patchset is the -`parent pointers repair -`_ -series. - Digression: Offline Checking of Parent Pointers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -4755,11 +4573,6 @@ connectivity checks: 4. Move on to examining link counts, as we do today. -The proposed patchset is the -`offline parent pointers repair -`_ -series. - Rebuilding directories from parent pointers in offline repair would be very challenging because xfs_repair currently uses two single-pass scans of the filesystem during phases 3 and 4 to decide which files are corrupt enough to be @@ -4903,12 +4716,6 @@ Repairing the directory tree works as follows: 6. If the subdirectory has zero paths, attach it to the lost and found. -The proposed patches are in the -`directory tree repair -`_ -series. - - .. _orphanage: The Orphanage @@ -4973,11 +4780,6 @@ Orphaned files are adopted by the orphanage as follows: 7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all resources. -The proposed patches are in the -`orphanage adoption -`_ -series. - 6. Userspace Algorithms and Data Structures =========================================== @@ -5091,14 +4893,6 @@ first workqueue's workers until the backlog eases. This doesn't completely solve the balancing problem, but reduces it enough to move on to more pressing issues. -The proposed patchsets are the scrub -`performance tweaks -`_ -and the -`inode scan rebalance -`_ -series. - .. _scrubrepair: Scheduling Repairs @@ -5179,20 +4973,6 @@ immediately. Corrupt file data blocks reported by phase 6 cannot be recovered by the filesystem. -The proposed patchsets are the -`repair warning improvements -`_, -refactoring of the -`repair data dependency -`_ -and -`object tracking -`_, -and the -`repair scheduling -`_ -improvement series. - Checking Names for Confusable Unicode Sequences ----------------------------------------------- @@ -5372,6 +5152,8 @@ The extra flexibility enables several new use cases: This emulates an atomic device write in software, and can support arbitrary scattered writes. +(This functionality was merged into mainline as of 2025) + Vectorized Scrub ---------------- @@ -5393,13 +5175,7 @@ It is hoped that ``io_uring`` will pick up enough of this functionality that online fsck can use that instead of adding a separate vectored scrub system call to XFS. -The relevant patchsets are the -`kernel vectorized scrub -`_ -and -`userspace vectorized scrub -`_ -series. +(This functionality was merged into mainline as of 2025) Quality of Service Targets for Scrub ------------------------------------ diff --git a/fs/xfs/libxfs/xfs_group.h b/fs/xfs/libxfs/xfs_group.h index 4423932a2313..4ae638f1c2c5 100644 --- a/fs/xfs/libxfs/xfs_group.h +++ b/fs/xfs/libxfs/xfs_group.h @@ -98,6 +98,15 @@ xfs_group_max_blocks( return xg->xg_mount->m_groups[xg->xg_type].blocks; } +static inline xfs_rfsblock_t +xfs_groups_to_rfsbs( + struct xfs_mount *mp, + uint32_t nr_groups, + enum xfs_group_type type) +{ + return (xfs_rfsblock_t)mp->m_groups[type].blocks * nr_groups; +} + static inline xfs_fsblock_t xfs_group_start_fsb( struct xfs_group *xg) diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 6c50cb2ece19..908e7060428c 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -31,6 +31,7 @@ typedef uint32_t xlog_tid_t; #define XLOG_BIG_RECORD_BSIZE (32*1024) /* 32k buffers */ #define XLOG_MAX_RECORD_BSIZE (256*1024) #define XLOG_HEADER_CYCLE_SIZE (32*1024) /* cycle data in header */ +#define XLOG_CYCLE_DATA_SIZE (XLOG_HEADER_CYCLE_SIZE / BBSIZE) #define XLOG_MIN_RECORD_BSHIFT 14 /* 16384 == 1 << 14 */ #define XLOG_BIG_RECORD_BSHIFT 15 /* 32k == 1 << 15 */ #define XLOG_MAX_RECORD_BSHIFT 18 /* 256k == 1 << 18 */ @@ -125,7 +126,17 @@ struct xlog_op_header { #define XLOG_FMT XLOG_FMT_LINUX_LE #endif -typedef struct xlog_rec_header { +struct xlog_rec_ext_header { + __be32 xh_cycle; /* write cycle of log */ + __be32 xh_cycle_data[XLOG_CYCLE_DATA_SIZE]; + __u8 xh_reserved[252]; +}; + +/* actual ext header payload size for checksumming */ +#define XLOG_REC_EXT_SIZE \ + offsetofend(struct xlog_rec_ext_header, xh_cycle_data) + +struct xlog_rec_header { __be32 h_magicno; /* log record (LR) identifier : 4 */ __be32 h_cycle; /* write cycle of log : 4 */ __be32 h_version; /* LR version : 4 */ @@ -135,7 +146,7 @@ typedef struct xlog_rec_header { __le32 h_crc; /* crc of log record : 4 */ __be32 h_prev_block; /* block number to previous LR : 4 */ __be32 h_num_logops; /* number of log operations in this LR : 4 */ - __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; + __be32 h_cycle_data[XLOG_CYCLE_DATA_SIZE]; /* fields added by the Linux port: */ __be32 h_fmt; /* format of log record : 4 */ @@ -160,30 +171,19 @@ typedef struct xlog_rec_header { * (little-endian) architectures. */ __u32 h_pad0; -} xlog_rec_header_t; + + __u8 h_reserved[184]; + struct xlog_rec_ext_header h_ext[]; +}; #ifdef __i386__ #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_size) -#define XLOG_REC_SIZE_OTHER sizeof(struct xlog_rec_header) +#define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_pad0) #else -#define XLOG_REC_SIZE sizeof(struct xlog_rec_header) +#define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_pad0) #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_size) #endif /* __i386__ */ -typedef struct xlog_rec_ext_header { - __be32 xh_cycle; /* write cycle of log : 4 */ - __be32 xh_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; /* : 256 */ -} xlog_rec_ext_header_t; - -/* - * Quite misnamed, because this union lays out the actual on-disk log buffer. - */ -typedef union xlog_in_core2 { - xlog_rec_header_t hic_header; - xlog_rec_ext_header_t hic_xheader; - char hic_sector[XLOG_HEADER_SIZE]; -} xlog_in_core_2_t; - /* not an on-disk structure, but needed by log recovery in userspace */ struct xfs_log_iovec { void *i_addr; /* beginning address of region */ diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h index 7bfa3242e2c5..2e9715cc1641 100644 --- a/fs/xfs/libxfs/xfs_ondisk.h +++ b/fs/xfs/libxfs/xfs_ondisk.h @@ -174,9 +174,11 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16); XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32); XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16); - XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 328); - XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 260); + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 512); + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 512); + XFS_CHECK_OFFSET(struct xlog_rec_header, h_reserved, 328); + XFS_CHECK_OFFSET(struct xlog_rec_ext_header, xh_reserved, 260); XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16); XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16); XFS_CHECK_OFFSET(struct xfs_rui_log_format, rui_extents, 16); diff --git a/fs/xfs/libxfs/xfs_quota_defs.h b/fs/xfs/libxfs/xfs_quota_defs.h index 763d941a8420..551d7ae46c5c 100644 --- a/fs/xfs/libxfs/xfs_quota_defs.h +++ b/fs/xfs/libxfs/xfs_quota_defs.h @@ -29,11 +29,9 @@ typedef uint8_t xfs_dqtype_t; * flags for q_flags field in the dquot. */ #define XFS_DQFLAG_DIRTY (1u << 0) /* dquot is dirty */ -#define XFS_DQFLAG_FREEING (1u << 1) /* dquot is being torn down */ #define XFS_DQFLAG_STRINGS \ - { XFS_DQFLAG_DIRTY, "DIRTY" }, \ - { XFS_DQFLAG_FREEING, "FREEING" } + { XFS_DQFLAG_DIRTY, "DIRTY" } /* * We have the possibility of all three quota types being active at once, and diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index d4fcf591e63d..03f1e2493334 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -64,12 +64,6 @@ struct xfs_rtgroup { */ #define XFS_RTG_FREE XA_MARK_0 -/* - * For zoned RT devices this is set on groups that are fully written and that - * have unused blocks. Used by the garbage collection to pick targets. - */ -#define XFS_RTG_RECLAIMABLE XA_MARK_1 - static inline struct xfs_rtgroup *to_rtg(struct xfs_group *xg) { return container_of(xg, struct xfs_rtgroup, rtg_group); @@ -371,4 +365,12 @@ static inline int xfs_initialize_rtgroups(struct xfs_mount *mp, # define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) #endif /* CONFIG_XFS_RT */ +static inline xfs_rfsblock_t +xfs_rtgs_to_rfsbs( + struct xfs_mount *mp, + uint32_t nr_groups) +{ + return xfs_groups_to_rfsbs(mp, nr_groups, XG_TYPE_RTG); +} + #endif /* __LIBXFS_RTGROUP_H */ diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index 58d6d4ed2853..5c5374c44c5a 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -155,12 +155,9 @@ xchk_quota_item( * We want to validate the bmap record for the storage backing this * dquot, so we need to lock the dquot and the quota file. For quota * operations, the locking order is first the ILOCK and then the dquot. - * However, dqiterate gave us a locked dquot, so drop the dquot lock to - * get the ILOCK. */ - xfs_dqunlock(dq); xchk_ilock(sc, XFS_ILOCK_SHARED); - xfs_dqlock(dq); + mutex_lock(&dq->q_qlock); /* * Except for the root dquot, the actual dquot we got must either have @@ -251,6 +248,7 @@ xchk_quota_item( xchk_quota_item_timer(sc, offset, &dq->q_rtb); out: + mutex_unlock(&dq->q_qlock); if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) return -ECANCELED; @@ -330,7 +328,7 @@ xchk_quota( xchk_dqiter_init(&cursor, sc, dqtype); while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { error = xchk_quota_item(&sqi, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) break; } diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c index 8f4c8d41f308..b1d661aa5f06 100644 --- a/fs/xfs/scrub/quota_repair.c +++ b/fs/xfs/scrub/quota_repair.c @@ -184,17 +184,13 @@ xrep_quota_item( /* * We might need to fix holes in the bmap record for the storage * backing this dquot, so we need to lock the dquot and the quota file. - * dqiterate gave us a locked dquot, so drop the dquot lock to get the - * ILOCK_EXCL. */ - xfs_dqunlock(dq); xchk_ilock(sc, XFS_ILOCK_EXCL); - xfs_dqlock(dq); - + mutex_lock(&dq->q_qlock); error = xrep_quota_item_bmap(sc, dq, &dirty); xchk_iunlock(sc, XFS_ILOCK_EXCL); if (error) - return error; + goto out_unlock_dquot; /* Check the limits. */ if (dq->q_blk.softlimit > dq->q_blk.hardlimit) { @@ -246,7 +242,7 @@ xrep_quota_item( xrep_quota_item_timer(sc, &dq->q_rtb, &dirty); if (!dirty) - return 0; + goto out_unlock_dquot; trace_xrep_dquot_item(sc->mp, dq->q_type, dq->q_id); @@ -257,8 +253,10 @@ xrep_quota_item( xfs_qm_adjust_dqtimers(dq); } xfs_trans_log_dquot(sc->tp, dq); - error = xfs_trans_roll(&sc->tp); - xfs_dqlock(dq); + return xfs_trans_roll(&sc->tp); + +out_unlock_dquot: + mutex_unlock(&dq->q_qlock); return error; } @@ -513,7 +511,7 @@ xrep_quota_problems( xchk_dqiter_init(&cursor, sc, dqtype); while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { error = xrep_quota_item(&rqi, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) break; } diff --git a/fs/xfs/scrub/quotacheck.c b/fs/xfs/scrub/quotacheck.c index e4105aaafe84..d412a8359784 100644 --- a/fs/xfs/scrub/quotacheck.c +++ b/fs/xfs/scrub/quotacheck.c @@ -563,6 +563,7 @@ xqcheck_compare_dquot( return -ECANCELED; } + mutex_lock(&dq->q_qlock); mutex_lock(&xqc->lock); error = xfarray_load_sparse(counts, dq->q_id, &xcdq); if (error) @@ -589,7 +590,9 @@ xqcheck_compare_dquot( xchk_set_incomplete(xqc->sc); error = -ECANCELED; } +out_unlock: mutex_unlock(&xqc->lock); + mutex_unlock(&dq->q_qlock); if (error) return error; @@ -597,10 +600,6 @@ xqcheck_compare_dquot( return -ECANCELED; return 0; - -out_unlock: - mutex_unlock(&xqc->lock); - return error; } /* @@ -636,7 +635,7 @@ xqcheck_walk_observations( return error; error = xqcheck_compare_dquot(xqc, dqtype, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) return error; @@ -674,7 +673,7 @@ xqcheck_compare_dqtype( xchk_dqiter_init(&cursor, sc, dqtype); while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { error = xqcheck_compare_dquot(xqc, dqtype, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) break; } diff --git a/fs/xfs/scrub/quotacheck_repair.c b/fs/xfs/scrub/quotacheck_repair.c index dd8554c755b5..51be8d8d261b 100644 --- a/fs/xfs/scrub/quotacheck_repair.c +++ b/fs/xfs/scrub/quotacheck_repair.c @@ -52,13 +52,11 @@ xqcheck_commit_dquot( bool dirty = false; int error = 0; - /* Unlock the dquot just long enough to allocate a transaction. */ - xfs_dqunlock(dq); error = xchk_trans_alloc(xqc->sc, 0); - xfs_dqlock(dq); if (error) return error; + mutex_lock(&dq->q_qlock); xfs_trans_dqjoin(xqc->sc->tp, dq); if (xchk_iscan_aborted(&xqc->iscan)) { @@ -115,23 +113,12 @@ xqcheck_commit_dquot( if (dq->q_id) xfs_qm_adjust_dqtimers(dq); xfs_trans_log_dquot(xqc->sc->tp, dq); - - /* - * Transaction commit unlocks the dquot, so we must re-lock it so that - * the caller can put the reference (which apparently requires a locked - * dquot). - */ - error = xrep_trans_commit(xqc->sc); - xfs_dqlock(dq); - return error; + return xrep_trans_commit(xqc->sc); out_unlock: mutex_unlock(&xqc->lock); out_cancel: xchk_trans_cancel(xqc->sc); - - /* Re-lock the dquot so the caller can put the reference. */ - xfs_dqlock(dq); return error; } @@ -156,7 +143,7 @@ xqcheck_commit_dqtype( xchk_dqiter_init(&cursor, sc, dqtype); while ((error = xchk_dquot_iter(&cursor, &dq)) == 1) { error = xqcheck_commit_dquot(xqc, dqtype, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) break; } @@ -187,7 +174,7 @@ xqcheck_commit_dqtype( return error; error = xqcheck_commit_dquot(xqc, dqtype, dq); - xfs_qm_dqput(dq); + xfs_qm_dqrele(dq); if (error) return error; diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 0bd8022e47b4..612ca682a513 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -31,7 +31,7 @@ * * ip->i_lock * qi->qi_tree_lock - * dquot->q_qlock (xfs_dqlock() and friends) + * dquot->q_qlock * dquot->q_flush (xfs_dqflock() and friends) * qi->qi_lru_lock * @@ -801,10 +801,11 @@ xfs_dq_get_next_id( static struct xfs_dquot * xfs_qm_dqget_cache_lookup( struct xfs_mount *mp, - struct xfs_quotainfo *qi, - struct radix_tree_root *tree, - xfs_dqid_t id) + xfs_dqid_t id, + xfs_dqtype_t type) { + struct xfs_quotainfo *qi = mp->m_quotainfo; + struct radix_tree_root *tree = xfs_dquot_tree(qi, type); struct xfs_dquot *dqp; restart: @@ -816,16 +817,12 @@ xfs_qm_dqget_cache_lookup( return NULL; } - xfs_dqlock(dqp); - if (dqp->q_flags & XFS_DQFLAG_FREEING) { - xfs_dqunlock(dqp); + if (!lockref_get_not_dead(&dqp->q_lockref)) { mutex_unlock(&qi->qi_tree_lock); trace_xfs_dqget_freeing(dqp); delay(1); goto restart; } - - dqp->q_nrefs++; mutex_unlock(&qi->qi_tree_lock); trace_xfs_dqget_hit(dqp); @@ -836,8 +833,7 @@ xfs_qm_dqget_cache_lookup( /* * Try to insert a new dquot into the in-core cache. If an error occurs the * caller should throw away the dquot and start over. Otherwise, the dquot - * is returned locked (and held by the cache) as if there had been a cache - * hit. + * is returned (and held by the cache) as if there had been a cache hit. * * The insert needs to be done under memalloc_nofs context because the radix * tree can do memory allocation during insert. The qi->qi_tree_lock is taken in @@ -848,11 +844,12 @@ xfs_qm_dqget_cache_lookup( static int xfs_qm_dqget_cache_insert( struct xfs_mount *mp, - struct xfs_quotainfo *qi, - struct radix_tree_root *tree, xfs_dqid_t id, + xfs_dqtype_t type, struct xfs_dquot *dqp) { + struct xfs_quotainfo *qi = mp->m_quotainfo; + struct radix_tree_root *tree = xfs_dquot_tree(qi, type); unsigned int nofs_flags; int error; @@ -860,14 +857,11 @@ xfs_qm_dqget_cache_insert( mutex_lock(&qi->qi_tree_lock); error = radix_tree_insert(tree, id, dqp); if (unlikely(error)) { - /* Duplicate found! Caller must try again. */ trace_xfs_dqget_dup(dqp); goto out_unlock; } - /* Return a locked dquot to the caller, with a reference taken. */ - xfs_dqlock(dqp); - dqp->q_nrefs = 1; + lockref_init(&dqp->q_lockref); qi->qi_dquots++; out_unlock: @@ -903,7 +897,7 @@ xfs_qm_dqget_checks( /* * Given the file system, id, and type (UDQUOT/GDQUOT/PDQUOT), return a - * locked dquot, doing an allocation (if requested) as needed. + * dquot, doing an allocation (if requested) as needed. */ int xfs_qm_dqget( @@ -913,8 +907,6 @@ xfs_qm_dqget( bool can_alloc, struct xfs_dquot **O_dqpp) { - struct xfs_quotainfo *qi = mp->m_quotainfo; - struct radix_tree_root *tree = xfs_dquot_tree(qi, type); struct xfs_dquot *dqp; int error; @@ -923,28 +915,30 @@ xfs_qm_dqget( return error; restart: - dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); - if (dqp) { - *O_dqpp = dqp; - return 0; - } + dqp = xfs_qm_dqget_cache_lookup(mp, id, type); + if (dqp) + goto found; error = xfs_qm_dqread(mp, id, type, can_alloc, &dqp); if (error) return error; - error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); + error = xfs_qm_dqget_cache_insert(mp, id, type, dqp); if (error) { - /* - * Duplicate found. Just throw away the new dquot and start - * over. - */ xfs_qm_dqdestroy(dqp); - XFS_STATS_INC(mp, xs_qm_dquot_dups); - goto restart; + if (error == -EEXIST) { + /* + * Duplicate found. Just throw away the new dquot and + * start over. + */ + XFS_STATS_INC(mp, xs_qm_dquot_dups); + goto restart; + } + return error; } trace_xfs_dqget_miss(dqp); +found: *O_dqpp = dqp; return 0; } @@ -999,15 +993,16 @@ xfs_qm_dqget_inode( struct xfs_inode *ip, xfs_dqtype_t type, bool can_alloc, - struct xfs_dquot **O_dqpp) + struct xfs_dquot **dqpp) { struct xfs_mount *mp = ip->i_mount; - struct xfs_quotainfo *qi = mp->m_quotainfo; - struct radix_tree_root *tree = xfs_dquot_tree(qi, type); struct xfs_dquot *dqp; xfs_dqid_t id; int error; + ASSERT(!*dqpp); + xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); + error = xfs_qm_dqget_checks(mp, type); if (error) return error; @@ -1019,11 +1014,9 @@ xfs_qm_dqget_inode( id = xfs_qm_id_for_quotatype(ip, type); restart: - dqp = xfs_qm_dqget_cache_lookup(mp, qi, tree, id); - if (dqp) { - *O_dqpp = dqp; - return 0; - } + dqp = xfs_qm_dqget_cache_lookup(mp, id, type); + if (dqp) + goto found; /* * Dquot cache miss. We don't want to keep the inode lock across @@ -1049,7 +1042,6 @@ xfs_qm_dqget_inode( if (dqp1) { xfs_qm_dqdestroy(dqp); dqp = dqp1; - xfs_dqlock(dqp); goto dqret; } } else { @@ -1058,21 +1050,26 @@ xfs_qm_dqget_inode( return -ESRCH; } - error = xfs_qm_dqget_cache_insert(mp, qi, tree, id, dqp); + error = xfs_qm_dqget_cache_insert(mp, id, type, dqp); if (error) { - /* - * Duplicate found. Just throw away the new dquot and start - * over. - */ xfs_qm_dqdestroy(dqp); - XFS_STATS_INC(mp, xs_qm_dquot_dups); - goto restart; + if (error == -EEXIST) { + /* + * Duplicate found. Just throw away the new dquot and + * start over. + */ + XFS_STATS_INC(mp, xs_qm_dquot_dups); + goto restart; + } + return error; } dqret: xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); trace_xfs_dqget_miss(dqp); - *O_dqpp = dqp; +found: + trace_xfs_dqattach_get(dqp); + *dqpp = dqp; return 0; } @@ -1098,45 +1095,21 @@ xfs_qm_dqget_next( else if (error != 0) break; + mutex_lock(&dqp->q_qlock); if (!XFS_IS_DQUOT_UNINITIALIZED(dqp)) { *dqpp = dqp; return 0; } - xfs_qm_dqput(dqp); + mutex_unlock(&dqp->q_qlock); + xfs_qm_dqrele(dqp); } return error; } /* - * Release a reference to the dquot (decrement ref-count) and unlock it. - * - * If there is a group quota attached to this dquot, carefully release that - * too without tripping over deadlocks'n'stuff. - */ -void -xfs_qm_dqput( - struct xfs_dquot *dqp) -{ - ASSERT(dqp->q_nrefs > 0); - ASSERT(XFS_DQ_IS_LOCKED(dqp)); - - trace_xfs_dqput(dqp); - - if (--dqp->q_nrefs == 0) { - struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; - trace_xfs_dqput_free(dqp); - - if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru)) - XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused); - } - xfs_dqunlock(dqp); -} - -/* - * Release a dquot. Flush it if dirty, then dqput() it. - * dquot must not be locked. + * Release a reference to the dquot. */ void xfs_qm_dqrele( @@ -1147,14 +1120,16 @@ xfs_qm_dqrele( trace_xfs_dqrele(dqp); - xfs_dqlock(dqp); - /* - * We don't care to flush it if the dquot is dirty here. - * That will create stutters that we want to avoid. - * Instead we do a delayed write when we try to reclaim - * a dirty dquot. Also xfs_sync will take part of the burden... - */ - xfs_qm_dqput(dqp); + if (lockref_put_or_lock(&dqp->q_lockref)) + return; + if (!--dqp->q_lockref.count) { + struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; + + trace_xfs_dqrele_free(dqp); + if (list_lru_add_obj(&qi->qi_lru, &dqp->q_lru)) + XFS_STATS_INC(dqp->q_mount, xs_qm_dquot_unused); + } + spin_unlock(&dqp->q_lockref.lock); } /* diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index 61217adf5ba5..bbb824adca82 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -71,7 +71,7 @@ struct xfs_dquot { xfs_dqtype_t q_type; uint16_t q_flags; xfs_dqid_t q_id; - uint q_nrefs; + struct lockref q_lockref; int q_bufoffset; xfs_daddr_t q_blkno; xfs_fileoff_t q_fileoffset; @@ -121,21 +121,6 @@ static inline void xfs_dqfunlock(struct xfs_dquot *dqp) complete(&dqp->q_flush); } -static inline int xfs_dqlock_nowait(struct xfs_dquot *dqp) -{ - return mutex_trylock(&dqp->q_qlock); -} - -static inline void xfs_dqlock(struct xfs_dquot *dqp) -{ - mutex_lock(&dqp->q_qlock); -} - -static inline void xfs_dqunlock(struct xfs_dquot *dqp) -{ - mutex_unlock(&dqp->q_qlock); -} - static inline int xfs_dquot_type(const struct xfs_dquot *dqp) { @@ -233,7 +218,6 @@ int xfs_qm_dqget_next(struct xfs_mount *mp, xfs_dqid_t id, int xfs_qm_dqget_uncached(struct xfs_mount *mp, xfs_dqid_t id, xfs_dqtype_t type, struct xfs_dquot **dqpp); -void xfs_qm_dqput(struct xfs_dquot *dqp); void xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *); void xfs_dqlockn(struct xfs_dqtrx *q); @@ -246,9 +230,7 @@ void xfs_dquot_detach_buf(struct xfs_dquot *dqp); static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp) { - xfs_dqlock(dqp); - dqp->q_nrefs++; - xfs_dqunlock(dqp); + lockref_get(&dqp->q_lockref); return dqp; } diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 271b195ebb93..b374cd9f1900 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -132,7 +132,7 @@ xfs_qm_dquot_logitem_push( if (atomic_read(&dqp->q_pincount) > 0) return XFS_ITEM_PINNED; - if (!xfs_dqlock_nowait(dqp)) + if (!mutex_trylock(&dqp->q_qlock)) return XFS_ITEM_LOCKED; /* @@ -177,7 +177,7 @@ xfs_qm_dquot_logitem_push( out_relock_ail: spin_lock(&lip->li_ailp->ail_lock); out_unlock: - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); return rval; } @@ -195,7 +195,7 @@ xfs_qm_dquot_logitem_release( * transaction layer, within trans_commit. Hence, no LI_HOLD flag * for the logitem. */ - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); } STATIC void diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index f3fc4d21bfe1..23a920437fe4 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -358,7 +358,7 @@ xfs_reinit_inode( static int xfs_iget_recycle( struct xfs_perag *pag, - struct xfs_inode *ip) __releases(&ip->i_flags_lock) + struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; struct inode *inode = VFS_I(ip); @@ -366,20 +366,6 @@ xfs_iget_recycle( trace_xfs_iget_recycle(ip); - if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) - return -EAGAIN; - - /* - * We need to make it look like the inode is being reclaimed to prevent - * the actual reclaim workers from stomping over us while we recycle - * the inode. We can't clear the radix tree tag yet as it requires - * pag_ici_lock to be held exclusive. - */ - ip->i_flags |= XFS_IRECLAIM; - - spin_unlock(&ip->i_flags_lock); - rcu_read_unlock(); - ASSERT(!rwsem_is_locked(&inode->i_rwsem)); error = xfs_reinit_inode(mp, inode); xfs_iunlock(ip, XFS_ILOCK_EXCL); @@ -576,10 +562,19 @@ xfs_iget_cache_hit( /* The inode fits the selection criteria; process it. */ if (ip->i_flags & XFS_IRECLAIMABLE) { - /* Drops i_flags_lock and RCU read lock. */ - error = xfs_iget_recycle(pag, ip); - if (error == -EAGAIN) + /* + * We need to make it look like the inode is being reclaimed to + * prevent the actual reclaim workers from stomping over us + * while we recycle the inode. We can't clear the radix tree + * tag yet as it requires pag_ici_lock to be held exclusive. + */ + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) goto out_skip; + ip->i_flags |= XFS_IRECLAIM; + spin_unlock(&ip->i_flags_lock); + rcu_read_unlock(); + + error = xfs_iget_recycle(pag, ip); if (error) return error; } else { diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 603e85c1ab4c..a311385b23d8 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -534,8 +534,8 @@ xlog_state_release_iclog( */ if ((iclog->ic_state == XLOG_STATE_WANT_SYNC || (iclog->ic_flags & XLOG_ICL_NEED_FUA)) && - !iclog->ic_header.h_tail_lsn) { - iclog->ic_header.h_tail_lsn = + !iclog->ic_header->h_tail_lsn) { + iclog->ic_header->h_tail_lsn = cpu_to_be64(atomic64_read(&log->l_tail_lsn)); } @@ -1279,11 +1279,12 @@ xlog_get_iclog_buffer_size( log->l_iclog_size = mp->m_logbsize; /* - * # headers = size / 32k - one header holds cycles from 32k of data. + * Combined size of the log record headers. The first 32k cycles + * are stored directly in the xlog_rec_header, the rest in the + * variable number of xlog_rec_ext_headers at its end. */ - log->l_iclog_heads = - DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE); - log->l_iclog_hsize = log->l_iclog_heads << BBSHIFT; + log->l_iclog_hsize = struct_size(log->l_iclog->ic_header, h_ext, + DIV_ROUND_UP(mp->m_logbsize, XLOG_HEADER_CYCLE_SIZE) - 1); } void @@ -1367,9 +1368,8 @@ xlog_alloc_log( int num_bblks) { struct xlog *log; - xlog_rec_header_t *head; - xlog_in_core_t **iclogp; - xlog_in_core_t *iclog, *prev_iclog=NULL; + struct xlog_in_core **iclogp; + struct xlog_in_core *iclog, *prev_iclog = NULL; int i; int error = -ENOMEM; uint log2_size = 0; @@ -1436,13 +1436,6 @@ xlog_alloc_log( init_waitqueue_head(&log->l_flush_wait); iclogp = &log->l_iclog; - /* - * The amount of memory to allocate for the iclog structure is - * rather funky due to the way the structure is defined. It is - * done this way so that we can use different sizes for machines - * with different amounts of memory. See the definition of - * xlog_in_core_t in xfs_log_priv.h for details. - */ ASSERT(log->l_iclog_size >= 4096); for (i = 0; i < log->l_iclog_bufs; i++) { size_t bvec_size = howmany(log->l_iclog_size, PAGE_SIZE) * @@ -1457,26 +1450,25 @@ xlog_alloc_log( iclog->ic_prev = prev_iclog; prev_iclog = iclog; - iclog->ic_data = kvzalloc(log->l_iclog_size, + iclog->ic_header = kvzalloc(log->l_iclog_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL); - if (!iclog->ic_data) + if (!iclog->ic_header) goto out_free_iclog; - head = &iclog->ic_header; - memset(head, 0, sizeof(xlog_rec_header_t)); - head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); - head->h_version = cpu_to_be32( + iclog->ic_header->h_magicno = + cpu_to_be32(XLOG_HEADER_MAGIC_NUM); + iclog->ic_header->h_version = cpu_to_be32( xfs_has_logv2(log->l_mp) ? 2 : 1); - head->h_size = cpu_to_be32(log->l_iclog_size); - /* new fields */ - head->h_fmt = cpu_to_be32(XLOG_FMT); - memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); + iclog->ic_header->h_size = cpu_to_be32(log->l_iclog_size); + iclog->ic_header->h_fmt = cpu_to_be32(XLOG_FMT); + memcpy(&iclog->ic_header->h_fs_uuid, &mp->m_sb.sb_uuid, + sizeof(iclog->ic_header->h_fs_uuid)); + iclog->ic_datap = (void *)iclog->ic_header + log->l_iclog_hsize; iclog->ic_size = log->l_iclog_size - log->l_iclog_hsize; iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_log = log; atomic_set(&iclog->ic_refcnt, 0); INIT_LIST_HEAD(&iclog->ic_callbacks); - iclog->ic_datap = (void *)iclog->ic_data + log->l_iclog_hsize; init_waitqueue_head(&iclog->ic_force_wait); init_waitqueue_head(&iclog->ic_write_wait); @@ -1504,7 +1496,7 @@ xlog_alloc_log( out_free_iclog: for (iclog = log->l_iclog; iclog; iclog = prev_iclog) { prev_iclog = iclog->ic_next; - kvfree(iclog->ic_data); + kvfree(iclog->ic_header); kfree(iclog); if (prev_iclog == log->l_iclog) break; @@ -1524,36 +1516,19 @@ xlog_pack_data( struct xlog_in_core *iclog, int roundoff) { - int i, j, k; - int size = iclog->ic_offset + roundoff; - __be32 cycle_lsn; - char *dp; + struct xlog_rec_header *rhead = iclog->ic_header; + __be32 cycle_lsn = CYCLE_LSN_DISK(rhead->h_lsn); + char *dp = iclog->ic_datap; + int i; - cycle_lsn = CYCLE_LSN_DISK(iclog->ic_header.h_lsn); - - dp = iclog->ic_datap; - for (i = 0; i < BTOBB(size); i++) { - if (i >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) - break; - iclog->ic_header.h_cycle_data[i] = *(__be32 *)dp; + for (i = 0; i < BTOBB(iclog->ic_offset + roundoff); i++) { + *xlog_cycle_data(rhead, i) = *(__be32 *)dp; *(__be32 *)dp = cycle_lsn; dp += BBSIZE; } - if (xfs_has_logv2(log->l_mp)) { - xlog_in_core_2_t *xhdr = iclog->ic_data; - - for ( ; i < BTOBB(size); i++) { - j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - xhdr[j].hic_xheader.xh_cycle_data[k] = *(__be32 *)dp; - *(__be32 *)dp = cycle_lsn; - dp += BBSIZE; - } - - for (i = 1; i < log->l_iclog_heads; i++) - xhdr[i].hic_xheader.xh_cycle = cycle_lsn; - } + for (i = 0; i < (log->l_iclog_hsize >> BBSHIFT) - 1; i++) + rhead->h_ext[i].xh_cycle = cycle_lsn; } /* @@ -1578,16 +1553,11 @@ xlog_cksum( /* ... then for additional cycle data for v2 logs ... */ if (xfs_has_logv2(log->l_mp)) { - union xlog_in_core2 *xhdr = (union xlog_in_core2 *)rhead; - int i; - int xheads; + int xheads, i; - xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE); - - for (i = 1; i < xheads; i++) { - crc = crc32c(crc, &xhdr[i].hic_xheader, - sizeof(struct xlog_rec_ext_header)); - } + xheads = DIV_ROUND_UP(size, XLOG_HEADER_CYCLE_SIZE) - 1; + for (i = 0; i < xheads; i++) + crc = crc32c(crc, &rhead->h_ext[i], XLOG_REC_EXT_SIZE); } /* ... and finally for the payload */ @@ -1671,11 +1641,11 @@ xlog_write_iclog( iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA); - if (is_vmalloc_addr(iclog->ic_data)) { - if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_data, count)) + if (is_vmalloc_addr(iclog->ic_header)) { + if (!bio_add_vmalloc(&iclog->ic_bio, iclog->ic_header, count)) goto shutdown; } else { - bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_data, count); + bio_add_virt_nofail(&iclog->ic_bio, iclog->ic_header, count); } /* @@ -1804,19 +1774,19 @@ xlog_sync( size = iclog->ic_offset; if (xfs_has_logv2(log->l_mp)) size += roundoff; - iclog->ic_header.h_len = cpu_to_be32(size); + iclog->ic_header->h_len = cpu_to_be32(size); XFS_STATS_INC(log->l_mp, xs_log_writes); XFS_STATS_ADD(log->l_mp, xs_log_blocks, BTOBB(count)); - bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header.h_lsn)); + bno = BLOCK_LSN(be64_to_cpu(iclog->ic_header->h_lsn)); /* Do we need to split this write into 2 parts? */ if (bno + BTOBB(count) > log->l_logBBsize) - xlog_split_iclog(log, &iclog->ic_header, bno, count); + xlog_split_iclog(log, iclog->ic_header, bno, count); /* calculcate the checksum */ - iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, + iclog->ic_header->h_crc = xlog_cksum(log, iclog->ic_header, iclog->ic_datap, XLOG_REC_SIZE, size); /* * Intentionally corrupt the log record CRC based on the error injection @@ -1827,11 +1797,11 @@ xlog_sync( */ #ifdef DEBUG if (XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) { - iclog->ic_header.h_crc &= cpu_to_le32(0xAAAAAAAA); + iclog->ic_header->h_crc &= cpu_to_le32(0xAAAAAAAA); iclog->ic_fail_crc = true; xfs_warn(log->l_mp, "Intentionally corrupted log record at LSN 0x%llx. Shutdown imminent.", - be64_to_cpu(iclog->ic_header.h_lsn)); + be64_to_cpu(iclog->ic_header->h_lsn)); } #endif xlog_verify_iclog(log, iclog, count); @@ -1843,10 +1813,10 @@ xlog_sync( */ STATIC void xlog_dealloc_log( - struct xlog *log) + struct xlog *log) { - xlog_in_core_t *iclog, *next_iclog; - int i; + struct xlog_in_core *iclog, *next_iclog; + int i; /* * Destroy the CIL after waiting for iclog IO completion because an @@ -1858,7 +1828,7 @@ xlog_dealloc_log( iclog = log->l_iclog; for (i = 0; i < log->l_iclog_bufs; i++) { next_iclog = iclog->ic_next; - kvfree(iclog->ic_data); + kvfree(iclog->ic_header); kfree(iclog); iclog = next_iclog; } @@ -1880,7 +1850,7 @@ xlog_state_finish_copy( { lockdep_assert_held(&log->l_icloglock); - be32_add_cpu(&iclog->ic_header.h_num_logops, record_cnt); + be32_add_cpu(&iclog->ic_header->h_num_logops, record_cnt); iclog->ic_offset += copy_bytes; } @@ -2303,7 +2273,7 @@ xlog_state_activate_iclog( * We don't need to cover the dummy. */ if (*iclogs_changed == 0 && - iclog->ic_header.h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) { + iclog->ic_header->h_num_logops == cpu_to_be32(XLOG_COVER_OPS)) { *iclogs_changed = 1; } else { /* @@ -2315,11 +2285,11 @@ xlog_state_activate_iclog( iclog->ic_state = XLOG_STATE_ACTIVE; iclog->ic_offset = 0; - iclog->ic_header.h_num_logops = 0; - memset(iclog->ic_header.h_cycle_data, 0, - sizeof(iclog->ic_header.h_cycle_data)); - iclog->ic_header.h_lsn = 0; - iclog->ic_header.h_tail_lsn = 0; + iclog->ic_header->h_num_logops = 0; + memset(iclog->ic_header->h_cycle_data, 0, + sizeof(iclog->ic_header->h_cycle_data)); + iclog->ic_header->h_lsn = 0; + iclog->ic_header->h_tail_lsn = 0; } /* @@ -2411,7 +2381,7 @@ xlog_get_lowest_lsn( iclog->ic_state == XLOG_STATE_DIRTY) continue; - lsn = be64_to_cpu(iclog->ic_header.h_lsn); + lsn = be64_to_cpu(iclog->ic_header->h_lsn); if ((lsn && !lowest_lsn) || XFS_LSN_CMP(lsn, lowest_lsn) < 0) lowest_lsn = lsn; } while ((iclog = iclog->ic_next) != log->l_iclog); @@ -2446,7 +2416,7 @@ xlog_state_iodone_process_iclog( * If this is not the lowest lsn iclog, then we will leave it * for another completion to process. */ - header_lsn = be64_to_cpu(iclog->ic_header.h_lsn); + header_lsn = be64_to_cpu(iclog->ic_header->h_lsn); lowest_lsn = xlog_get_lowest_lsn(log); if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0) return false; @@ -2609,9 +2579,9 @@ xlog_state_get_iclog_space( struct xlog_ticket *ticket, int *logoffsetp) { - int log_offset; - xlog_rec_header_t *head; - xlog_in_core_t *iclog; + int log_offset; + struct xlog_rec_header *head; + struct xlog_in_core *iclog; restart: spin_lock(&log->l_icloglock); @@ -2629,7 +2599,7 @@ xlog_state_get_iclog_space( goto restart; } - head = &iclog->ic_header; + head = iclog->ic_header; atomic_inc(&iclog->ic_refcnt); /* prevents sync */ log_offset = iclog->ic_offset; @@ -2794,7 +2764,7 @@ xlog_state_switch_iclogs( if (!eventual_size) eventual_size = iclog->ic_offset; iclog->ic_state = XLOG_STATE_WANT_SYNC; - iclog->ic_header.h_prev_block = cpu_to_be32(log->l_prev_block); + iclog->ic_header->h_prev_block = cpu_to_be32(log->l_prev_block); log->l_prev_block = log->l_curr_block; log->l_prev_cycle = log->l_curr_cycle; @@ -2838,7 +2808,7 @@ xlog_force_and_check_iclog( struct xlog_in_core *iclog, bool *completed) { - xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); + xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn); int error; *completed = false; @@ -2850,7 +2820,7 @@ xlog_force_and_check_iclog( * If the iclog has already been completed and reused the header LSN * will have been rewritten by completion */ - if (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) + if (be64_to_cpu(iclog->ic_header->h_lsn) != lsn) *completed = true; return 0; } @@ -2983,7 +2953,7 @@ xlog_force_lsn( goto out_error; iclog = log->l_iclog; - while (be64_to_cpu(iclog->ic_header.h_lsn) != lsn) { + while (be64_to_cpu(iclog->ic_header->h_lsn) != lsn) { trace_xlog_iclog_force_lsn(iclog, _RET_IP_); iclog = iclog->ic_next; if (iclog == log->l_iclog) @@ -3249,7 +3219,7 @@ xlog_verify_dump_tail( { xfs_alert(log->l_mp, "ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x", - iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1, + iclog ? be64_to_cpu(iclog->ic_header->h_tail_lsn) : -1, atomic64_read(&log->l_tail_lsn), log->l_ailp->ail_head_lsn, log->l_curr_cycle, log->l_curr_block, @@ -3268,7 +3238,7 @@ xlog_verify_tail_lsn( struct xlog *log, struct xlog_in_core *iclog) { - xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn); + xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header->h_tail_lsn); int blocks; if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) { @@ -3322,13 +3292,12 @@ xlog_verify_iclog( struct xlog_in_core *iclog, int count) { - struct xlog_op_header *ophead; - xlog_in_core_t *icptr; - xlog_in_core_2_t *xhdr; - void *base_ptr, *ptr, *p; + struct xlog_rec_header *rhead = iclog->ic_header; + struct xlog_in_core *icptr; + void *base_ptr, *ptr; ptrdiff_t field_offset; uint8_t clientid; - int len, i, j, k, op_len; + int len, i, op_len; int idx; /* check validity of iclog pointers */ @@ -3342,11 +3311,10 @@ xlog_verify_iclog( spin_unlock(&log->l_icloglock); /* check log magic numbers */ - if (iclog->ic_header.h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) + if (rhead->h_magicno != cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) xfs_emerg(log->l_mp, "%s: invalid magic num", __func__); - base_ptr = ptr = &iclog->ic_header; - p = &iclog->ic_header; + base_ptr = ptr = rhead; for (ptr += BBSIZE; ptr < base_ptr + count; ptr += BBSIZE) { if (*(__be32 *)ptr == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) xfs_emerg(log->l_mp, "%s: unexpected magic num", @@ -3354,29 +3322,19 @@ xlog_verify_iclog( } /* check fields */ - len = be32_to_cpu(iclog->ic_header.h_num_logops); + len = be32_to_cpu(rhead->h_num_logops); base_ptr = ptr = iclog->ic_datap; - ophead = ptr; - xhdr = iclog->ic_data; for (i = 0; i < len; i++) { - ophead = ptr; + struct xlog_op_header *ophead = ptr; + void *p = &ophead->oh_clientid; /* clientid is only 1 byte */ - p = &ophead->oh_clientid; field_offset = p - base_ptr; if (field_offset & 0x1ff) { clientid = ophead->oh_clientid; } else { idx = BTOBBT((void *)&ophead->oh_clientid - iclog->ic_datap); - if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { - j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - clientid = xlog_get_client_id( - xhdr[j].hic_xheader.xh_cycle_data[k]); - } else { - clientid = xlog_get_client_id( - iclog->ic_header.h_cycle_data[idx]); - } + clientid = xlog_get_client_id(*xlog_cycle_data(rhead, idx)); } if (clientid != XFS_TRANSACTION && clientid != XFS_LOG) { xfs_warn(log->l_mp, @@ -3392,13 +3350,7 @@ xlog_verify_iclog( op_len = be32_to_cpu(ophead->oh_len); } else { idx = BTOBBT((void *)&ophead->oh_len - iclog->ic_datap); - if (idx >= (XLOG_HEADER_CYCLE_SIZE / BBSIZE)) { - j = idx / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - k = idx % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - op_len = be32_to_cpu(xhdr[j].hic_xheader.xh_cycle_data[k]); - } else { - op_len = be32_to_cpu(iclog->ic_header.h_cycle_data[idx]); - } + op_len = be32_to_cpu(*xlog_cycle_data(rhead, idx)); } ptr += sizeof(struct xlog_op_header) + op_len; } @@ -3529,19 +3481,19 @@ xlog_force_shutdown( STATIC int xlog_iclogs_empty( - struct xlog *log) + struct xlog *log) { - xlog_in_core_t *iclog; + struct xlog_in_core *iclog = log->l_iclog; - iclog = log->l_iclog; do { /* endianness does not matter here, zero is zero in * any language. */ - if (iclog->ic_header.h_num_logops) + if (iclog->ic_header->h_num_logops) return 0; iclog = iclog->ic_next; } while (iclog != log->l_iclog); + return 1; } diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index f443757e93c2..778ac47adb8c 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -940,7 +940,7 @@ xlog_cil_set_ctx_write_state( struct xlog_in_core *iclog) { struct xfs_cil *cil = ctx->cil; - xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header.h_lsn); + xfs_lsn_t lsn = be64_to_cpu(iclog->ic_header->h_lsn); ASSERT(!ctx->commit_lsn); if (!ctx->start_lsn) { @@ -1458,9 +1458,9 @@ xlog_cil_push_work( */ spin_lock(&log->l_icloglock); if (ctx->start_lsn != ctx->commit_lsn) { - xfs_lsn_t plsn; + xfs_lsn_t plsn = be64_to_cpu( + ctx->commit_iclog->ic_prev->ic_header->h_lsn); - plsn = be64_to_cpu(ctx->commit_iclog->ic_prev->ic_header.h_lsn); if (plsn && XFS_LSN_CMP(plsn, ctx->commit_lsn) < 0) { /* * Waiting on ic_force_wait orders the completion of diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 0cfc654d8e87..0fe59f0525aa 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -158,10 +158,8 @@ struct xlog_ticket { }; /* - * - A log record header is 512 bytes. There is plenty of room to grow the - * xlog_rec_header_t into the reserved space. - * - ic_data follows, so a write to disk can start at the beginning of - * the iclog. + * In-core log structure. + * * - ic_forcewait is used to implement synchronous forcing of the iclog to disk. * - ic_next is the pointer to the next iclog in the ring. * - ic_log is a pointer back to the global log structure. @@ -183,7 +181,7 @@ struct xlog_ticket { * We'll put all the read-only and l_icloglock fields in the first cacheline, * and move everything else out to subsequent cachelines. */ -typedef struct xlog_in_core { +struct xlog_in_core { wait_queue_head_t ic_force_wait; wait_queue_head_t ic_write_wait; struct xlog_in_core *ic_next; @@ -198,8 +196,7 @@ typedef struct xlog_in_core { /* reference counts need their own cacheline */ atomic_t ic_refcnt ____cacheline_aligned_in_smp; - xlog_in_core_2_t *ic_data; -#define ic_header ic_data->hic_header + struct xlog_rec_header *ic_header; #ifdef DEBUG bool ic_fail_crc : 1; #endif @@ -207,7 +204,7 @@ typedef struct xlog_in_core { struct work_struct ic_end_io_work; struct bio ic_bio; struct bio_vec ic_bvec[]; -} xlog_in_core_t; +}; /* * The CIL context is used to aggregate per-transaction details as well be @@ -409,7 +406,6 @@ struct xlog { struct list_head *l_buf_cancel_table; struct list_head r_dfops; /* recovered log intent items */ int l_iclog_hsize; /* size of iclog header */ - int l_iclog_heads; /* # of iclog header sectors */ uint l_sectBBsize; /* sector size in BBs (2^n) */ int l_iclog_size; /* size of log in bytes */ int l_iclog_bufs; /* number of iclog buffers */ @@ -422,7 +418,7 @@ struct xlog { /* waiting for iclog flush */ int l_covered_state;/* state of "covering disk * log entries" */ - xlog_in_core_t *l_iclog; /* head log queue */ + struct xlog_in_core *l_iclog; /* head log queue */ spinlock_t l_icloglock; /* grab to change iclog state */ int l_curr_cycle; /* Cycle number of log writes */ int l_prev_cycle; /* Cycle number before last @@ -711,4 +707,21 @@ xlog_item_space( return round_up(nbytes, sizeof(uint64_t)); } +/* + * Cycles over XLOG_CYCLE_DATA_SIZE overflow into the extended header that was + * added for v2 logs. Addressing for the cycles array there is off by one, + * because the first batch of cycles is in the original header. + */ +static inline __be32 *xlog_cycle_data(struct xlog_rec_header *rhead, unsigned i) +{ + if (i >= XLOG_CYCLE_DATA_SIZE) { + unsigned j = i / XLOG_CYCLE_DATA_SIZE; + unsigned k = i % XLOG_CYCLE_DATA_SIZE; + + return &rhead->h_ext[j - 1].xh_cycle_data[k]; + } + + return &rhead->h_cycle_data[i]; +} + #endif /* __XFS_LOG_PRIV_H__ */ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 549d60959aee..03e42c7dab56 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -190,8 +190,8 @@ xlog_bwrite( */ STATIC void xlog_header_check_dump( - xfs_mount_t *mp, - xlog_rec_header_t *head) + struct xfs_mount *mp, + struct xlog_rec_header *head) { xfs_debug(mp, "%s: SB : uuid = %pU, fmt = %d", __func__, &mp->m_sb.sb_uuid, XLOG_FMT); @@ -207,8 +207,8 @@ xlog_header_check_dump( */ STATIC int xlog_header_check_recover( - xfs_mount_t *mp, - xlog_rec_header_t *head) + struct xfs_mount *mp, + struct xlog_rec_header *head) { ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); @@ -238,8 +238,8 @@ xlog_header_check_recover( */ STATIC int xlog_header_check_mount( - xfs_mount_t *mp, - xlog_rec_header_t *head) + struct xfs_mount *mp, + struct xlog_rec_header *head) { ASSERT(head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)); @@ -400,7 +400,7 @@ xlog_find_verify_log_record( xfs_daddr_t i; char *buffer; char *offset = NULL; - xlog_rec_header_t *head = NULL; + struct xlog_rec_header *head = NULL; int error = 0; int smallmem = 0; int num_blks = *last_blk - start_blk; @@ -437,7 +437,7 @@ xlog_find_verify_log_record( goto out; } - head = (xlog_rec_header_t *)offset; + head = (struct xlog_rec_header *)offset; if (head->h_magicno == cpu_to_be32(XLOG_HEADER_MAGIC_NUM)) break; @@ -1237,7 +1237,7 @@ xlog_find_tail( xfs_daddr_t *head_blk, xfs_daddr_t *tail_blk) { - xlog_rec_header_t *rhead; + struct xlog_rec_header *rhead; char *offset = NULL; char *buffer; int error; @@ -1487,7 +1487,7 @@ xlog_add_record( int tail_cycle, int tail_block) { - xlog_rec_header_t *recp = (xlog_rec_header_t *)buf; + struct xlog_rec_header *recp = (struct xlog_rec_header *)buf; memset(buf, 0, BBSIZE); recp->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM); @@ -2863,23 +2863,12 @@ xlog_unpack_data( char *dp, struct xlog *log) { - int i, j, k; + int i; - for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)) && - i < (XLOG_HEADER_CYCLE_SIZE / BBSIZE); i++) { - *(__be32 *)dp = *(__be32 *)&rhead->h_cycle_data[i]; + for (i = 0; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) { + *(__be32 *)dp = *xlog_cycle_data(rhead, i); dp += BBSIZE; } - - if (xfs_has_logv2(log->l_mp)) { - xlog_in_core_2_t *xhdr = (xlog_in_core_2_t *)rhead; - for ( ; i < BTOBB(be32_to_cpu(rhead->h_len)); i++) { - j = i / (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - k = i % (XLOG_HEADER_CYCLE_SIZE / BBSIZE); - *(__be32 *)dp = xhdr[j].hic_xheader.xh_cycle_data[k]; - dp += BBSIZE; - } - } } /* @@ -3008,7 +2997,7 @@ xlog_do_recovery_pass( int pass, xfs_daddr_t *first_bad) /* out: first bad log rec */ { - xlog_rec_header_t *rhead; + struct xlog_rec_header *rhead; xfs_daddr_t blk_no, rblk_no; xfs_daddr_t rhead_blk; char *offset; @@ -3045,7 +3034,7 @@ xlog_do_recovery_pass( if (error) goto bread_err1; - rhead = (xlog_rec_header_t *)offset; + rhead = (struct xlog_rec_header *)offset; /* * xfsprogs has a bug where record length is based on lsunit but @@ -3152,7 +3141,7 @@ xlog_do_recovery_pass( if (error) goto bread_err2; } - rhead = (xlog_rec_header_t *)offset; + rhead = (struct xlog_rec_header *)offset; error = xlog_valid_rec_header(log, rhead, split_hblks ? blk_no : 0, h_size); if (error) @@ -3234,7 +3223,7 @@ xlog_do_recovery_pass( if (error) goto bread_err2; - rhead = (xlog_rec_header_t *)offset; + rhead = (struct xlog_rec_header *)offset; error = xlog_valid_rec_header(log, rhead, blk_no, h_size); if (error) goto bread_err2; diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 23ba84ec919a..95be67ac6eb4 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -126,14 +126,16 @@ xfs_qm_dqpurge( void *data) { struct xfs_quotainfo *qi = dqp->q_mount->m_quotainfo; - int error = -EAGAIN; - xfs_dqlock(dqp); - if ((dqp->q_flags & XFS_DQFLAG_FREEING) || dqp->q_nrefs != 0) - goto out_unlock; - - dqp->q_flags |= XFS_DQFLAG_FREEING; + spin_lock(&dqp->q_lockref.lock); + if (dqp->q_lockref.count > 0 || __lockref_is_dead(&dqp->q_lockref)) { + spin_unlock(&dqp->q_lockref.lock); + return -EAGAIN; + } + lockref_mark_dead(&dqp->q_lockref); + spin_unlock(&dqp->q_lockref.lock); + mutex_lock(&dqp->q_qlock); xfs_qm_dqunpin_wait(dqp); xfs_dqflock(dqp); @@ -144,6 +146,7 @@ xfs_qm_dqpurge( */ if (XFS_DQ_IS_DIRTY(dqp)) { struct xfs_buf *bp = NULL; + int error; /* * We don't care about getting disk errors here. We need @@ -151,9 +154,9 @@ xfs_qm_dqpurge( */ error = xfs_dquot_use_attached_buf(dqp, &bp); if (error == -EAGAIN) { - xfs_dqfunlock(dqp); - dqp->q_flags &= ~XFS_DQFLAG_FREEING; - goto out_unlock; + /* resurrect the refcount from the dead. */ + dqp->q_lockref.count = 0; + goto out_funlock; } if (!bp) goto out_funlock; @@ -177,7 +180,7 @@ xfs_qm_dqpurge( !test_bit(XFS_LI_IN_AIL, &dqp->q_logitem.qli_item.li_flags)); xfs_dqfunlock(dqp); - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); radix_tree_delete(xfs_dquot_tree(qi, xfs_dquot_type(dqp)), dqp->q_id); qi->qi_dquots--; @@ -192,10 +195,6 @@ xfs_qm_dqpurge( xfs_qm_dqdestroy(dqp); return 0; - -out_unlock: - xfs_dqunlock(dqp); - return error; } /* @@ -288,51 +287,6 @@ xfs_qm_unmount_quotas( xfs_qm_destroy_quotainos(mp->m_quotainfo); } -STATIC int -xfs_qm_dqattach_one( - struct xfs_inode *ip, - xfs_dqtype_t type, - bool doalloc, - struct xfs_dquot **IO_idqpp) -{ - struct xfs_dquot *dqp; - int error; - - xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); - error = 0; - - /* - * See if we already have it in the inode itself. IO_idqpp is &i_udquot - * or &i_gdquot. This made the code look weird, but made the logic a lot - * simpler. - */ - dqp = *IO_idqpp; - if (dqp) { - trace_xfs_dqattach_found(dqp); - return 0; - } - - /* - * Find the dquot from somewhere. This bumps the reference count of - * dquot and returns it locked. This can return ENOENT if dquot didn't - * exist on disk and we didn't ask it to allocate; ESRCH if quotas got - * turned off suddenly. - */ - error = xfs_qm_dqget_inode(ip, type, doalloc, &dqp); - if (error) - return error; - - trace_xfs_dqattach_get(dqp); - - /* - * dqget may have dropped and re-acquired the ilock, but it guarantees - * that the dquot returned is the one that should go in the inode. - */ - *IO_idqpp = dqp; - xfs_dqunlock(dqp); - return 0; -} - static bool xfs_qm_need_dqattach( struct xfs_inode *ip) @@ -372,7 +326,7 @@ xfs_qm_dqattach_locked( ASSERT(!xfs_is_metadir_inode(ip)); if (XFS_IS_UQUOTA_ON(mp) && !ip->i_udquot) { - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_USER, + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_USER, doalloc, &ip->i_udquot); if (error) goto done; @@ -380,7 +334,7 @@ xfs_qm_dqattach_locked( } if (XFS_IS_GQUOTA_ON(mp) && !ip->i_gdquot) { - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_GROUP, + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_GROUP, doalloc, &ip->i_gdquot); if (error) goto done; @@ -388,7 +342,7 @@ xfs_qm_dqattach_locked( } if (XFS_IS_PQUOTA_ON(mp) && !ip->i_pdquot) { - error = xfs_qm_dqattach_one(ip, XFS_DQTYPE_PROJ, + error = xfs_qm_dqget_inode(ip, XFS_DQTYPE_PROJ, doalloc, &ip->i_pdquot); if (error) goto done; @@ -468,7 +422,7 @@ xfs_qm_dquot_isolate( struct xfs_qm_isolate *isol = arg; enum lru_status ret = LRU_SKIP; - if (!xfs_dqlock_nowait(dqp)) + if (!spin_trylock(&dqp->q_lockref.lock)) goto out_miss_busy; /* @@ -476,7 +430,7 @@ xfs_qm_dquot_isolate( * from the LRU, leave it for the freeing task to complete the freeing * process rather than risk it being free from under us here. */ - if (dqp->q_flags & XFS_DQFLAG_FREEING) + if (__lockref_is_dead(&dqp->q_lockref)) goto out_miss_unlock; /* @@ -485,16 +439,15 @@ xfs_qm_dquot_isolate( * again. */ ret = LRU_ROTATE; - if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) { + if (XFS_DQ_IS_DIRTY(dqp) || atomic_read(&dqp->q_pincount) > 0) goto out_miss_unlock; - } /* * This dquot has acquired a reference in the meantime remove it from * the freelist and try again. */ - if (dqp->q_nrefs) { - xfs_dqunlock(dqp); + if (dqp->q_lockref.count) { + spin_unlock(&dqp->q_lockref.lock); XFS_STATS_INC(dqp->q_mount, xs_qm_dqwants); trace_xfs_dqreclaim_want(dqp); @@ -518,10 +471,9 @@ xfs_qm_dquot_isolate( /* * Prevent lookups now that we are past the point of no return. */ - dqp->q_flags |= XFS_DQFLAG_FREEING; - xfs_dqunlock(dqp); + lockref_mark_dead(&dqp->q_lockref); + spin_unlock(&dqp->q_lockref.lock); - ASSERT(dqp->q_nrefs == 0); list_lru_isolate_move(lru, &dqp->q_lru, &isol->dispose); XFS_STATS_DEC(dqp->q_mount, xs_qm_dquot_unused); trace_xfs_dqreclaim_done(dqp); @@ -529,7 +481,7 @@ xfs_qm_dquot_isolate( return LRU_REMOVED; out_miss_unlock: - xfs_dqunlock(dqp); + spin_unlock(&dqp->q_lockref.lock); out_miss_busy: trace_xfs_dqreclaim_busy(dqp); XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses); @@ -1316,9 +1268,10 @@ xfs_qm_quotacheck_dqadjust( return error; } + mutex_lock(&dqp->q_qlock); error = xfs_dquot_attach_buf(NULL, dqp); if (error) - return error; + goto out_unlock; trace_xfs_dqadjust(dqp); @@ -1348,8 +1301,10 @@ xfs_qm_quotacheck_dqadjust( } dqp->q_flags |= XFS_DQFLAG_DIRTY; - xfs_qm_dqput(dqp); - return 0; +out_unlock: + mutex_unlock(&dqp->q_qlock); + xfs_qm_dqrele(dqp); + return error; } /* @@ -1466,9 +1421,10 @@ xfs_qm_flush_one( struct xfs_buf *bp = NULL; int error = 0; - xfs_dqlock(dqp); - if (dqp->q_flags & XFS_DQFLAG_FREEING) - goto out_unlock; + if (!lockref_get_not_dead(&dqp->q_lockref)) + return 0; + + mutex_lock(&dqp->q_qlock); if (!XFS_DQ_IS_DIRTY(dqp)) goto out_unlock; @@ -1488,7 +1444,8 @@ xfs_qm_flush_one( xfs_buf_delwri_queue(bp, buffer_list); xfs_buf_relse(bp); out_unlock: - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); + xfs_qm_dqrele(dqp); return error; } @@ -1904,16 +1861,12 @@ xfs_qm_vop_dqalloc( struct xfs_dquot *gq = NULL; struct xfs_dquot *pq = NULL; int error; - uint lockflags; if (!XFS_IS_QUOTA_ON(mp)) return 0; ASSERT(!xfs_is_metadir_inode(ip)); - lockflags = XFS_ILOCK_EXCL; - xfs_ilock(ip, lockflags); - if ((flags & XFS_QMOPT_INHERIT) && XFS_INHERIT_GID(ip)) gid = inode->i_gid; @@ -1922,38 +1875,22 @@ xfs_qm_vop_dqalloc( * if necessary. The dquot(s) will not be locked. */ if (XFS_NOT_DQATTACHED(mp, ip)) { + xfs_ilock(ip, XFS_ILOCK_EXCL); error = xfs_qm_dqattach_locked(ip, true); - if (error) { - xfs_iunlock(ip, lockflags); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + if (error) return error; - } } if ((flags & XFS_QMOPT_UQUOTA) && XFS_IS_UQUOTA_ON(mp)) { ASSERT(O_udqpp); if (!uid_eq(inode->i_uid, uid)) { - /* - * What we need is the dquot that has this uid, and - * if we send the inode to dqget, the uid of the inode - * takes priority over what's sent in the uid argument. - * We must unlock inode here before calling dqget if - * we're not sending the inode, because otherwise - * we'll deadlock by doing trans_reserve while - * holding ilock. - */ - xfs_iunlock(ip, lockflags); error = xfs_qm_dqget(mp, from_kuid(user_ns, uid), XFS_DQTYPE_USER, true, &uq); if (error) { ASSERT(error != -ENOENT); return error; } - /* - * Get the ilock in the right order. - */ - xfs_dqunlock(uq); - lockflags = XFS_ILOCK_SHARED; - xfs_ilock(ip, lockflags); } else { /* * Take an extra reference, because we'll return @@ -1966,16 +1903,12 @@ xfs_qm_vop_dqalloc( if ((flags & XFS_QMOPT_GQUOTA) && XFS_IS_GQUOTA_ON(mp)) { ASSERT(O_gdqpp); if (!gid_eq(inode->i_gid, gid)) { - xfs_iunlock(ip, lockflags); error = xfs_qm_dqget(mp, from_kgid(user_ns, gid), XFS_DQTYPE_GROUP, true, &gq); if (error) { ASSERT(error != -ENOENT); goto error_rele; } - xfs_dqunlock(gq); - lockflags = XFS_ILOCK_SHARED; - xfs_ilock(ip, lockflags); } else { ASSERT(ip->i_gdquot); gq = xfs_qm_dqhold(ip->i_gdquot); @@ -1984,16 +1917,12 @@ xfs_qm_vop_dqalloc( if ((flags & XFS_QMOPT_PQUOTA) && XFS_IS_PQUOTA_ON(mp)) { ASSERT(O_pdqpp); if (ip->i_projid != prid) { - xfs_iunlock(ip, lockflags); error = xfs_qm_dqget(mp, prid, XFS_DQTYPE_PROJ, true, &pq); if (error) { ASSERT(error != -ENOENT); goto error_rele; } - xfs_dqunlock(pq); - lockflags = XFS_ILOCK_SHARED; - xfs_ilock(ip, lockflags); } else { ASSERT(ip->i_pdquot); pq = xfs_qm_dqhold(ip->i_pdquot); @@ -2001,7 +1930,6 @@ xfs_qm_vop_dqalloc( } trace_xfs_dquot_dqalloc(ip); - xfs_iunlock(ip, lockflags); if (O_udqpp) *O_udqpp = uq; else @@ -2078,7 +2006,7 @@ xfs_qm_vop_chown( * back now. */ tp->t_flags |= XFS_TRANS_DIRTY; - xfs_dqlock(prevdq); + mutex_lock(&prevdq->q_qlock); if (isrt) { ASSERT(prevdq->q_rtb.reserved >= ip->i_delayed_blks); prevdq->q_rtb.reserved -= ip->i_delayed_blks; @@ -2086,7 +2014,7 @@ xfs_qm_vop_chown( ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); prevdq->q_blk.reserved -= ip->i_delayed_blks; } - xfs_dqunlock(prevdq); + mutex_unlock(&prevdq->q_qlock); /* * Take an extra reference, because the inode is going to keep diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h index 35b64bc3a7a8..e88ed6ad0e65 100644 --- a/fs/xfs/xfs_qm.h +++ b/fs/xfs/xfs_qm.h @@ -57,7 +57,7 @@ struct xfs_quotainfo { struct xfs_inode *qi_pquotaip; /* project quota inode */ struct xfs_inode *qi_dirip; /* quota metadir */ struct list_lru qi_lru; - int qi_dquots; + uint64_t qi_dquots; struct mutex qi_quotaofflock;/* to serialize quotaoff */ xfs_filblks_t qi_dqchunklen; /* # BBs in a chunk of dqs */ uint qi_dqperchunk; /* # ondisk dq in above chunk */ diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c index 245d754f382a..edc0aef3cf34 100644 --- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -73,8 +73,10 @@ xfs_qm_statvfs( struct xfs_dquot *dqp; if (!xfs_qm_dqget(mp, ip->i_projid, XFS_DQTYPE_PROJ, false, &dqp)) { + mutex_lock(&dqp->q_qlock); xfs_fill_statvfs_from_dquot(statp, ip, dqp); - xfs_qm_dqput(dqp); + mutex_unlock(&dqp->q_qlock); + xfs_qm_dqrele(dqp); } } diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 0c78f30fa4a3..022e2179c06b 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -303,13 +303,12 @@ xfs_qm_scall_setqlim( } defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); - xfs_dqunlock(dqp); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_setqlim, 0, 0, 0, &tp); if (error) goto out_rele; - xfs_dqlock(dqp); + mutex_lock(&dqp->q_qlock); xfs_trans_dqjoin(tp, dqp); /* @@ -459,6 +458,7 @@ xfs_qm_scall_getquota( * If everything's NULL, this dquot doesn't quite exist as far as * our utility programs are concerned. */ + mutex_lock(&dqp->q_qlock); if (XFS_IS_DQUOT_UNINITIALIZED(dqp)) { error = -ENOENT; goto out_put; @@ -467,7 +467,8 @@ xfs_qm_scall_getquota( xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); out_put: - xfs_qm_dqput(dqp); + mutex_unlock(&dqp->q_qlock); + xfs_qm_dqrele(dqp); return error; } @@ -497,7 +498,8 @@ xfs_qm_scall_getquota_next( *id = dqp->q_id; xfs_qm_scall_getquota_fill_qc(mp, type, dqp, dst); + mutex_unlock(&dqp->q_qlock); - xfs_qm_dqput(dqp); + xfs_qm_dqrele(dqp); return error; } diff --git a/fs/xfs/xfs_quotaops.c b/fs/xfs/xfs_quotaops.c index 4c7f7ce4fd2f..94fbe3d99ec7 100644 --- a/fs/xfs/xfs_quotaops.c +++ b/fs/xfs/xfs_quotaops.c @@ -65,7 +65,7 @@ xfs_fs_get_quota_state( memset(state, 0, sizeof(*state)); if (!XFS_IS_QUOTA_ON(mp)) return 0; - state->s_incoredqs = q->qi_dquots; + state->s_incoredqs = min_t(uint64_t, q->qi_dquots, UINT_MAX); if (XFS_IS_UQUOTA_ON(mp)) state->s_state[USRQUOTA].flags |= QCI_ACCT_ENABLED; if (XFS_IS_UQUOTA_ENFORCED(mp)) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 79b8641880ab..f70afbf3cb19 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1350,7 +1350,7 @@ DECLARE_EVENT_CLASS(xfs_dquot_class, __entry->id = dqp->q_id; __entry->type = dqp->q_type; __entry->flags = dqp->q_flags; - __entry->nrefs = dqp->q_nrefs; + __entry->nrefs = data_race(dqp->q_lockref.count); __entry->res_bcount = dqp->q_blk.reserved; __entry->res_rtbcount = dqp->q_rtb.reserved; @@ -1399,7 +1399,6 @@ DEFINE_DQUOT_EVENT(xfs_dqadjust); DEFINE_DQUOT_EVENT(xfs_dqreclaim_want); DEFINE_DQUOT_EVENT(xfs_dqreclaim_busy); DEFINE_DQUOT_EVENT(xfs_dqreclaim_done); -DEFINE_DQUOT_EVENT(xfs_dqattach_found); DEFINE_DQUOT_EVENT(xfs_dqattach_get); DEFINE_DQUOT_EVENT(xfs_dqalloc); DEFINE_DQUOT_EVENT(xfs_dqtobp_read); @@ -1409,9 +1408,8 @@ DEFINE_DQUOT_EVENT(xfs_dqget_hit); DEFINE_DQUOT_EVENT(xfs_dqget_miss); DEFINE_DQUOT_EVENT(xfs_dqget_freeing); DEFINE_DQUOT_EVENT(xfs_dqget_dup); -DEFINE_DQUOT_EVENT(xfs_dqput); -DEFINE_DQUOT_EVENT(xfs_dqput_free); DEFINE_DQUOT_EVENT(xfs_dqrele); +DEFINE_DQUOT_EVENT(xfs_dqrele_free); DEFINE_DQUOT_EVENT(xfs_dqflush); DEFINE_DQUOT_EVENT(xfs_dqflush_force); DEFINE_DQUOT_EVENT(xfs_dqflush_done); @@ -4934,7 +4932,7 @@ DECLARE_EVENT_CLASS(xlog_iclog_class, __entry->refcount = atomic_read(&iclog->ic_refcnt); __entry->offset = iclog->ic_offset; __entry->flags = iclog->ic_flags; - __entry->lsn = be64_to_cpu(iclog->ic_header.h_lsn); + __entry->lsn = be64_to_cpu(iclog->ic_header->h_lsn); __entry->caller_ip = caller_ip; ), TP_printk("dev %d:%d state %s refcnt %d offset %u lsn 0x%llx flags %s caller %pS", diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index 765456bf3428..c842ce06acd6 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -393,7 +393,7 @@ xfs_trans_dqlockedjoin( unsigned int i; ASSERT(q[0].qt_dquot != NULL); if (q[1].qt_dquot == NULL) { - xfs_dqlock(q[0].qt_dquot); + mutex_lock(&q[0].qt_dquot->q_qlock); xfs_trans_dqjoin(tp, q[0].qt_dquot); } else if (q[2].qt_dquot == NULL) { xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot); @@ -693,7 +693,7 @@ xfs_trans_unreserve_and_mod_dquots( locked = already_locked; if (qtrx->qt_blk_res) { if (!locked) { - xfs_dqlock(dqp); + mutex_lock(&dqp->q_qlock); locked = true; } dqp->q_blk.reserved -= @@ -701,7 +701,7 @@ xfs_trans_unreserve_and_mod_dquots( } if (qtrx->qt_ino_res) { if (!locked) { - xfs_dqlock(dqp); + mutex_lock(&dqp->q_qlock); locked = true; } dqp->q_ino.reserved -= @@ -710,14 +710,14 @@ xfs_trans_unreserve_and_mod_dquots( if (qtrx->qt_rtblk_res) { if (!locked) { - xfs_dqlock(dqp); + mutex_lock(&dqp->q_qlock); locked = true; } dqp->q_rtb.reserved -= (xfs_qcnt_t)qtrx->qt_rtblk_res; } if (locked && !already_locked) - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); } } @@ -820,7 +820,7 @@ xfs_trans_dqresv( struct xfs_dquot_res *blkres; struct xfs_quota_limits *qlim; - xfs_dqlock(dqp); + mutex_lock(&dqp->q_qlock); defq = xfs_get_defquota(q, xfs_dquot_type(dqp)); @@ -887,16 +887,16 @@ xfs_trans_dqresv( XFS_IS_CORRUPT(mp, dqp->q_ino.reserved < dqp->q_ino.count)) goto error_corrupt; - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); return 0; error_return: - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ) return -ENOSPC; return -EDQUOT; error_corrupt: - xfs_dqunlock(dqp); + mutex_unlock(&dqp->q_qlock); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); xfs_fs_mark_sick(mp, XFS_SICK_FS_QUOTACHECK); return -EFSCORRUPTED; diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c index 98f65d99b776..bbcf21704ea0 100644 --- a/fs/xfs/xfs_zone_alloc.c +++ b/fs/xfs/xfs_zone_alloc.c @@ -103,9 +103,6 @@ xfs_zone_account_reclaimable( */ trace_xfs_zone_emptied(rtg); - if (!was_full) - xfs_group_clear_mark(xg, XFS_RTG_RECLAIMABLE); - spin_lock(&zi->zi_used_buckets_lock); if (!was_full) xfs_zone_remove_from_bucket(zi, rgno, from_bucket); @@ -127,7 +124,6 @@ xfs_zone_account_reclaimable( xfs_zone_add_to_bucket(zi, rgno, to_bucket); spin_unlock(&zi->zi_used_buckets_lock); - xfs_group_set_mark(xg, XFS_RTG_RECLAIMABLE); if (zi->zi_gc_thread && xfs_zoned_need_gc(mp)) wake_up_process(zi->zi_gc_thread); } else if (to_bucket != from_bucket) { @@ -142,6 +138,28 @@ xfs_zone_account_reclaimable( } } +/* + * Check if we have any zones that can be reclaimed by looking at the entry + * counters for the zone buckets. + */ +bool +xfs_zoned_have_reclaimable( + struct xfs_zone_info *zi) +{ + int i; + + spin_lock(&zi->zi_used_buckets_lock); + for (i = 0; i < XFS_ZONE_USED_BUCKETS; i++) { + if (zi->zi_used_bucket_entries[i]) { + spin_unlock(&zi->zi_used_buckets_lock); + return true; + } + } + spin_unlock(&zi->zi_used_buckets_lock); + + return false; +} + static void xfs_open_zone_mark_full( struct xfs_open_zone *oz) diff --git a/fs/xfs/xfs_zone_gc.c b/fs/xfs/xfs_zone_gc.c index 4ade54445532..3c52cc1497d4 100644 --- a/fs/xfs/xfs_zone_gc.c +++ b/fs/xfs/xfs_zone_gc.c @@ -117,7 +117,6 @@ struct xfs_gc_bio { struct xfs_rtgroup *victim_rtg; /* Bio used for reads and writes, including the bvec used by it */ - struct bio_vec bv; struct bio bio; /* must be last */ }; @@ -175,14 +174,13 @@ xfs_zoned_need_gc( s64 available, free, threshold; s32 remainder; - if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE)) + if (!xfs_zoned_have_reclaimable(mp->m_zone_info)) return false; available = xfs_estimate_freecounter(mp, XC_FREE_RTAVAILABLE); if (available < - mp->m_groups[XG_TYPE_RTG].blocks * - (mp->m_max_open_zones - XFS_OPEN_GC_ZONES)) + xfs_rtgs_to_rfsbs(mp, mp->m_max_open_zones - XFS_OPEN_GC_ZONES)) return true; free = xfs_estimate_freecounter(mp, XC_FREE_RTEXTENTS); @@ -1184,16 +1182,16 @@ xfs_zone_gc_mount( goto out_put_gc_zone; } - mp->m_zone_info->zi_gc_thread = kthread_create(xfs_zoned_gcd, data, + zi->zi_gc_thread = kthread_create(xfs_zoned_gcd, data, "xfs-zone-gc/%s", mp->m_super->s_id); - if (IS_ERR(mp->m_zone_info->zi_gc_thread)) { + if (IS_ERR(zi->zi_gc_thread)) { xfs_warn(mp, "unable to create zone gc thread"); - error = PTR_ERR(mp->m_zone_info->zi_gc_thread); + error = PTR_ERR(zi->zi_gc_thread); goto out_free_gc_data; } /* xfs_zone_gc_start will unpark for rw mounts */ - kthread_park(mp->m_zone_info->zi_gc_thread); + kthread_park(zi->zi_gc_thread); return 0; out_free_gc_data: diff --git a/fs/xfs/xfs_zone_priv.h b/fs/xfs/xfs_zone_priv.h index 4322e26dd99a..ce7f0e2f4598 100644 --- a/fs/xfs/xfs_zone_priv.h +++ b/fs/xfs/xfs_zone_priv.h @@ -113,6 +113,7 @@ struct xfs_open_zone *xfs_open_zone(struct xfs_mount *mp, int xfs_zone_gc_reset_sync(struct xfs_rtgroup *rtg); bool xfs_zoned_need_gc(struct xfs_mount *mp); +bool xfs_zoned_have_reclaimable(struct xfs_zone_info *zi); int xfs_zone_gc_mount(struct xfs_mount *mp); void xfs_zone_gc_unmount(struct xfs_mount *mp); diff --git a/fs/xfs/xfs_zone_space_resv.c b/fs/xfs/xfs_zone_space_resv.c index 9cd38716fd25..fc1a4d1ce10c 100644 --- a/fs/xfs/xfs_zone_space_resv.c +++ b/fs/xfs/xfs_zone_space_resv.c @@ -54,12 +54,10 @@ xfs_zoned_default_resblks( { switch (ctr) { case XC_FREE_RTEXTENTS: - return (uint64_t)XFS_RESERVED_ZONES * - mp->m_groups[XG_TYPE_RTG].blocks + - mp->m_sb.sb_rtreserved; + return xfs_rtgs_to_rfsbs(mp, XFS_RESERVED_ZONES) + + mp->m_sb.sb_rtreserved; case XC_FREE_RTAVAILABLE: - return (uint64_t)XFS_GC_ZONES * - mp->m_groups[XG_TYPE_RTG].blocks; + return xfs_rtgs_to_rfsbs(mp, XFS_GC_ZONES); default: ASSERT(0); return 0; @@ -174,7 +172,7 @@ xfs_zoned_reserve_available( * processing a pending GC request give up as we're fully out * of space. */ - if (!xfs_group_marked(mp, XG_TYPE_RTG, XFS_RTG_RECLAIMABLE) && + if (!xfs_zoned_have_reclaimable(mp->m_zone_info) && !xfs_is_zonegc_running(mp)) break;