linux

Commit Graph

Author	SHA1	Message	Date
Qu Wenruo	54df8b80cc	btrfs: scrub: always update btrfs_scrub_progress::last_physical [BUG] When a scrub failed immediately without any byte scrubbed, the returned btrfs_scrub_progress::last_physical will always be 0, even if there is a non-zero @start passed into btrfs_scrub_dev() for resume cases. This will reset the progress and make later scrub resume start from the beginning. [CAUSE] The function btrfs_scrub_dev() accepts a @progress parameter to copy its updated progress to the caller, there are cases where we either don't touch progress::last_physical at all or copy 0 into last_physical: - last_physical not updated at all If some error happened before scrubbing any super block or chunk, we will not copy the progress, leaving the @last_physical untouched. E.g. failed to allocate @sctx, scrubbing a missing device or even there is already a running scrub and so on. All those cases won't touch @progress at all, resulting the last_physical untouched and will be left as 0 for most cases. - Error out before scrubbing any bytes In those case we allocated @sctx, and sctx->stat.last_physical is all zero (initialized by kvzalloc()). Unfortunately some critical errors happened during scrub_enumerate_chunks() or scrub_supers() before any stripe is really scrubbed. In that case although we will copy sctx->stat back to @progress, since no byte is really scrubbed, last_physical will be overwritten to 0. [FIX] Make sure the parameter @progress always has its @last_physical member updated to @start parameter inside btrfs_scrub_dev(). At the very beginning of the function, set @progress->last_physical to @start, so that even if we error out without doing progress copying, last_physical is still at @start. Then after we got @sctx allocated, set sctx->stat.last_physical to @start, this will make sure even if we didn't get any byte scrubbed, at the progress copying stage the @last_physical is not left as zero. This should resolve the resume progress reset problem. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:26 +01:00
Filipe Manana	d7fe41044b	btrfs: use bool type for btrfs_path members used as booleans Many fields of struct btrfs_path are used as booleans but their type is an unsigned int (of one 1 bit width to save space). Change the type to bool keeping the :1 suffix so that they combine with the previous u8 fields in order to save space. This makes the code more clear by using explicit true/false and more in line with the preferred style, preserving the size of the structure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:25 +01:00
David Sterba	1c094e6cce	btrfs: make a few more ASSERTs verbose We have support for optional string to be printed in ASSERT() (added in `19468a623a` ("btrfs: enhance ASSERT() to take optional format string")), it's not yet everywhere it could be so add a few more files. Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:24 +01:00
David Sterba	4decf577fb	btrfs: move and rename CSUM_FMT definition Move the CSUM_FMT* definitions to fs.h where is be the BTRFS_KEY_FMT and add the prefix for consistency. Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:23 +01:00
Qu Wenruo	07166122b5	btrfs: scrub: factor out parity scrub code into a helper The function scrub_raid56_parity_stripe() is handling the parity stripe by the following steps: - Scrub each data stripes And make sure everything is fine in each data stripe - Cache the data stripe into the raid bio - Use the cached raid bio to scrub the target parity stripe Extract the last two steps into a new helper, scrub_raid56_cached_parity(), as a cleanup and make the error handling more straightforward. With the following minor cleanups: - Use on-stack bio structure The bio is always empty thus we do not need any bio vector nor the block device. Thus there is no need to allocate a bio, the on-stack one is more than enough to cut it. - Remove the unnecessary btrfs_put_bioc() call if btrfs_map_block() failed If btrfs_map_block() is failed, @bioc_ret will not be touched thus there is no need to call btrfs_put_bioc() in this case. - Use a proper out: tag to do the cleanup Now the error cleanup is much shorter and simpler, just btrfs_bio_counter_dec() and bio_uninit(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:22 +01:00
Qu Wenruo	d435c51365	btrfs: make sure extent and csum paths are always released in scrub_raid56_parity_stripe() Unlike queue_scrub_stripe() which uses the global sctx->extent_path and sctx->csum_path which are always released at the end of scrub_stripe(), scrub_raid56_parity_stripe() uses local extent_path and csum_path, as that function is going to handle the full stripe, whose bytenr may be smaller than the bytenr in the global sctx paths. However the cleanup of local extent/csum paths is only happening after we have successfully submitted an rbio. There are several error routes that we didn't release those two paths: - scrub_find_fill_first_stripe() errored out at csum tree search In that case extent_path is still valid, and that function itself will not release the extent_path passed in. And the function returns directly without releasing both paths. - The full stripe is empty - Some blocks failed to be recovered - btrfs_map_block() failed - raid56_parity_alloc_scrub_rbio() failed The function returns directly without releasing both paths. Fix it by covering btrfs_release_path() calls inside the out: tag. This is just a hot fix, in the long run we will go scoped based auto freeing for both local paths. Fixes: `1dc4888e72` ("btrfs: scrub: avoid unnecessary extent tree search preparing stripes") Fixes: `3c771c1944` ("btrfs: scrub: avoid unnecessary csum tree search preparing stripes") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:42:22 +01:00
Qu Wenruo	81cea6cd70	btrfs: remove btrfs_bio::fs_info by extracting it from btrfs_bio::inode Currently there is only one caller which doesn't populate btrfs_bio::inode, and that's scrub. The idea is scrub doesn't want any automatic csum verification nor read-repair, as everything will be handled by scrub itself. However that behavior is really no different than metadata inode, thus we can reuse btree_inode as btrfs_bio::inode for scrub. The only exception is in btrfs_submit_chunk() where if a bbio is from scrub or data reloc inode, we set rst_search_commit_root to true. This means we still need a way to distinguish scrub from metadata, but that can be done by a new flag inside btrfs_bio. Now btrfs_bio::inode is a mandatory parameter, we can extract fs_info from that inode thus can remove btrfs_bio::fs_info to save 8 bytes from btrfs_bio structure. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:40:16 +01:00
Miquel Sabaté Solà	285c3ab28e	btrfs: declare free_ipath() via DEFINE_FREE() The free_ipath() function was being used as a cleanup function everywhere. Declare it via DEFINE_FREE() so we can use this function with the __free() helper. The name has also been adjusted so it's closer to the type's name. Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:34:51 +01:00
Qu Wenruo	937f99c736	btrfs: scrub: cancel the run if there is a pending signal Unlike relocation, scrub never checks pending signals, and even for relocation is only explicitly checking for fatal signal (SIGKILL), not for regular ones. Thankfully relocation can still be interrupted by regular signals by the usage of wait_on_bit(), which is called with TASK_INTERRUPTIBLE. Do the same for scrub/dev-replace, so that regular signals can also cancel the scrub/replace run, and more importantly handle v2 cgroup freezing which is based on signal handling code inside the kernel, and freezing() function will not return true for v2 cgroup freezing. This will address the problem that systemd slice freezing will timeout on long running scrub/dev-replace. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:34:32 +01:00
Qu Wenruo	c7b478504b	btrfs: scrub: cancel the run if the process or fs is being frozen It's a known bug that btrfs scrub/dev-replace can prevent the system from suspending. There are at least two factors involved: - Holding super_block::s_writers for the whole scrub/dev-replace duration We hold that percpu rw semaphore through mnt_want_write_file() for the whole scrub/dev-replace duration. That will prevent the fs being frozen, which can be initiated by either the user (e.g. fsfreeze) or power management suspend/hibernate. - Stuck in the kernel space for a long time During suspend all user processes (and some kernel threads) will be frozen. But if a user space progress has fallen into kernel (scrub ioctl) and do not return for a long time, it will make process freezing time out. Unfortunately scrub/dev-replace is a long running ioctl, and it will prevent the btrfs process from returning to the user space, thus make PM suspend/hibernate time out. Address them in one go: - Introduce a new helper should_cancel_scrub() Which includes the existing cancel request and new fs/process freezing checks. Here we have to check both fs and process freezing for PM suspend/hibernate. PM can be configured to freeze filesystems before processes. (The current default is not to freeze filesystems, but planned to freeze the filesystems as the new default.) Checking only fs freezing will fail PM without fs freezing, as the process freezing will time out. Checking only process freezing will fail PM with fs freezing since the fs freezing happens before process freezing. And the return value will indicate the reason, -ECANCLED for the explicitly canceled runs, and -EINTR for fs freeze or PM reasons. - Cancel the run if should_cancel_scrub() is true Unfortunately canceling is the only feasible solution here, pausing is not possible as we will still stay in the kernel space thus will still prevent the process from being frozen. This will cause a user impacting behavior change: Dev-replace can be interrupted by PM, and there is no way to resume but start from the beginning again. This means dev-replace may fail on newer kernels, and end users will need extra steps like using systemd-inhibit to prevent suspend/hibernate, to get back the old uninterrupted behavior. This behavior change will need extra documentation updates and communication with projects involving scrub/dev-replace including btrfs-progs. Reviewed-by: Filipe Manana <fdmanana@suse.com> Link: https://lore.kernel.org/linux-btrfs/d93b2a2d-6ad9-4c49-809f-11d769a6f30a@app.fastmail.com/ Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:24:52 +01:00
Qu Wenruo	02a7e90797	btrfs: scrub: add cancel/pause/removed bg checks for raid56 parity stripes For raid56, data and parity stripes are handled differently. For data stripes they are handled just like regular RAID1/RAID10 stripes, going through the regular scrub_simple_mirror(). But for parity stripes we have to read out all involved data stripes and do any needed verification and repair, then scrub the parity stripe. This process will take a much longer time than a regular stripe, but unlike scrub_simple_mirror(), we do not check if we should cancel/pause or the block group is already removed. Aligned the behavior of scrub_raid56_parity_stripe() to scrub_simple_mirror(), by adding: - Cancel check - Pause check - Removed block group check Since those checks are the same from the scrub_simple_mirror(), also update the comments of scrub_simple_mirror() by: - Remove too obvious comments We do not need extra comments on what we're checking, it's really too obvious. - Remove a stale comment about pausing Now the scrub is always queuing all involved stripes, and submit them in one go, there is no more submission part during pausing. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 22:21:38 +01:00
David Sterba	aebe2bb0b8	btrfs: fix trivial -Wshadow warnings When compiling with -Wshadow (also in 'make W=2' build) there are several reports of shadowed variables that seem to be harmless: - btrfs_do_encoded_write() - we can reuse 'ordered', there's no previous value that would need to be preserved - scrub_write_endio() - we need a standalone 'i' for bio iteration - scrub_stripe() - duplicate ret2 for errors that must not overwrite 'ret' - btrfs_subpage_set_writeback() - 'flags' is used for another irqsave lock but is not overwritten when reused for xarray due to scoping, but for clarity let's rename it - process_dir_items_leaf() - duplicate 'ret', used only for immediate checks Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-24 21:37:36 +01:00
Zilin Guan	5fea61aa1c	btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe() scrub_raid56_parity_stripe() allocates a bio with bio_alloc(), but fails to release it on some error paths, leading to a potential memory leak. Add the missing bio_put() calls to properly drop the bio reference in those error cases. Fixes: `1009254bf2` ("btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-11-05 20:01:12 +01:00
Qu Wenruo	42d3a055d9	btrfs: do not use folio_test_partial_kmap() in ASSERT()s [BUG] Syzbot reported an ASSERT() triggered inside scrub: BTRFS info (device loop0): scrub: started on devid 1 assertion failed: !folio_test_partial_kmap(folio) :: 0, in fs/btrfs/scrub.c:697 ------------[ cut here ]------------ kernel BUG at fs/btrfs/scrub.c:697! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6077 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:scrub_stripe_get_kaddr+0x1bb/0x1c0 fs/btrfs/scrub.c:697 Call Trace: <TASK> scrub_bio_add_sector fs/btrfs/scrub.c:932 [inline] scrub_submit_initial_read+0xf21/0x1120 fs/btrfs/scrub.c:1897 submit_initial_group_read+0x423/0x5b0 fs/btrfs/scrub.c:1952 flush_scrub_stripes+0x18f/0x1150 fs/btrfs/scrub.c:1973 scrub_stripe+0xbea/0x2a30 fs/btrfs/scrub.c:2516 scrub_chunk+0x2a3/0x430 fs/btrfs/scrub.c:2575 scrub_enumerate_chunks+0xa70/0x1350 fs/btrfs/scrub.c:2839 btrfs_scrub_dev+0x6e7/0x10e0 fs/btrfs/scrub.c:3153 btrfs_ioctl_scrub+0x249/0x4b0 fs/btrfs/ioctl.c:3163 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace 0000000000000000 ]--- Which doesn't make much sense, as all the folios we allocated for scrub should not be highmem. [CAUSE] Thankfully syzbot has a detailed kernel config file, showing that CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is set to y. And that debug option will force all folio_test_partial_kmap() to return true, to improve coverage on highmem tests. But in our case we really just want to make sure the folios we allocated are not highmem (and they are indeed not). Such incorrect result from folio_test_partial_kmap() is just screwing up everything. [FIX] Replace folio_test_partial_kmap() to folio_test_highmem() so that we won't bother those highmem specific debuging options. Fixes: `5fbaae4b85` ("btrfs: prepare scrub to support bs > ps cases") Reported-by: syzbot+bde59221318c592e6346@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-10-13 22:31:36 +02:00
David Sterba	cc53bd2085	btrfs: add unlikely annotations to branches leading to EIO The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen, where EIO is one of them. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:26 +02:00
David Sterba	9264d004a6	btrfs: add unlikely annotations to branches leading to EUCLEAN The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen, where EUCLEAN (a corruption) is one of them. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:26 +02:00
Sun YangKai	4ca6f24a52	btrfs: more trivial BTRFS_PATH_AUTO_FREE conversions Trivial pattern for the auto freeing with goto -> return conversions if possible. The following cases are considered trivial in this patch: 1. Cases where there are no operations between btrfs_free_path() and the function returns. 2. Cases where only simple cleanup operations (such as kfree(), kvfree(), clear_bit(), and fs_path_free()) are present between btrfs_free_path() and the function return. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:26 +02:00
Qu Wenruo	5fbaae4b85	btrfs: prepare scrub to support bs > ps cases This involves: - Migrate scrub_stripe::pages[] to folios[] - Use btrfs_alloc_folio_array() and folio_put() to alloc above array. - Migrate scrub_stripe_get_kaddr() and scrub_stripe_get_paddr() to use folio interfaces - Migrate raid56_parity_cache_data_pages() to raid56_parity_cache_data_folios() Since scrub is the only caller still using pages. This helper will copy the folio array contents into rbio::stripe_pages, with sector uptodate flags updated. And a new ASSERT() to make sure bs > ps cases will not hit this path. Since most scrub code is based on kaddr/paddr, the migration itself is pretty straightforward. And since we're here, also move the loop to set the stripe_sectors[].uptodate out of the copy loop. As we always mark all the sectors as uptodate for the data stripe, it's easier to do in one go, other than doing it inside the copy loop. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:25 +02:00
Qu Wenruo	35aff706dc	btrfs: concentrate highmem handling for data verification Currently for btrfs checksum verification, we do it in the following pattern: kaddr = kmap_local_*(); ret = btrfs_check_csum_csum(kaddr); kunmap_local(kaddr); It's OK for now, but it's still not following the patterns of helpers inside linux/highmem.h, which never requires a virt memory address. In those highmem helpers, they mostly accept a folio, some offset/length inside the folio, and in the implementation they check if the folio needs partial kmap, and do the handling. Inspired by those formal highmem helpers, enhance the highmem handling of data checksum verification by: - Rename btrfs_check_sector_csum() to btrfs_check_block_csum() To follow the more common term "block" used in all other major filesystems. - Pass a physical address into btrfs_check_block_csum() and btrfs_data_csum_ok() The physical address is always available even for a highmem page. Since it's page frame number << PAGE_SHIFT + offset in page. And with that physical address, we can grab the folio covering the page, and do extra checks to ensure it covers at least one block. This also allows us to do the kmap inside btrfs_check_block_csum(). This means all the extra HIGHMEM handling will be concentrated into btrfs_check_block_csum(), and no callers will need to bother highmem by themselves. - Properly zero out the block if csum mismatch Since btrfs_data_csum_ok() only got a paddr, we can not and should not use memzero_bvec(), which only accepts single page bvec. Instead use paddr to grab the folio and call folio_zero_range() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:16 +02:00
Thorsten Blum	a7f3dfb829	btrfs: scrub: replace max_t()/min_t() with clamp() in scrub_throttle_dev_io() Replace max_t() followed by min_t() with a single clamp(). As was pointed by David Laight in https://lore.kernel.org/linux-btrfs/20250906122458.75dfc8f0@pumpkin/ the calculation may overflow u32 when the input value is too large, so clamp_t() is not used. In practice the expected values are in range of megabytes to gigabytes (throughput limit) so the bug would not happen. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: David Sterba <dsterba@suse.com> [ Use clamp() and add explanation. ] Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:16 +02:00
David Sterba	17dc82dc1e	btrfs: fix typos in comments and strings Annual typo fixing pass. Strangely codespell found only about 30% of what is in this patch, the rest was done manually using text spellchecker with a custom dictionary of acceptable terms. Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-23 08:49:16 +02:00
David Sterba	67e78f983e	btrfs: convert several int parameters to bool We're almost done cleaning misused int/bool parameters. Convert a bunch of them, found by manual grepping. Note that btrfs_sync_fs() needs an int as it's mandated by the struct super_operations prototype. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2025-09-22 10:54:32 +02:00
David Sterba	0fe04bf132	btrfs: switch RCU helper versions to btrfs_warn() The RCU protection is now done in the plain helpers, we can remove the "_in_rcu" and "_rl_in_rcu". Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:56:38 +02:00
David Sterba	9db18fe3ac	btrfs: switch RCU helper versions to btrfs_err() The RCU protection is now done in the plain helpers, we can remove the "_in_rcu" and "_rl_in_rcu". Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:56:38 +02:00
David Sterba	4013cde56e	btrfs: rename err to ret in scrub_submit_extent_sector_read() Unify naming of return value to the preferred way. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-07-21 23:53:29 +02:00
Linus Torvalds	5ca7fe213b	for-6.16-rc3-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmhZQ9MACgkQxWXV+ddt WDsyvw/+K5N4zbig9D5QL5SdsQwMe/ZUk1KF0LLu6H3hFetdICeM/Z4K46EBh40X c9Sxb13gLnIAm8DR/IFTTlOZVrrbJ3CTazZuJbncCpaZchH863aYb/1KboxjJnpW KqOen20KdUh8HdevrJFhkFc7rOjp7KupfIHsbWqIxaWYPf8ORvUyK55lKxQz0HES E5tFXLNr6z/8Ws5pc2HnRLgnRcCHuRUNJUb1PEaTfPKxoFvTwjda6cDsYnXOJEO9 NOnh6lluurqja+3FUEFig2f292/CbKGtByYUDgfhHO21P//IHSDhlouvwipzI/kh 6WUoH1K+DWCxxNbIVFFbUYLxrDGu7R7/aWFHH2q0dNjqQeiQBbUnbn4WIjAAwDWf k9cmE+WgVqwQI+vpfG3eENUafG5MpcQQo2wKrxG0whWaC2fiA6QtI+3DfKyMj4XJ JI1jUhfCwHrqzoGQ4XBE3UYENqQw9RICNC+Z3UfZx+5sQMWcb+ac5qIGygvCfU8N Gtfx4ladZshpQUSuRneiLozxdxLyXX3LzCt2Ls1s5fPPikZft/+2QRu5rzSbb/Cp 50TDSn/pE1N/TEMVZaP5M2PxquBVDOZ4TFSsSm3IvceqFInm0UerAGaJ7+T2eZhM 3XHhIp6xTecHfwukvGqs+XSxB9PMLfF5M0gc+9PR+3oxzFRpowI= =XLWR -----END PGP SIGNATURE----- Merge tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fixes: - fix invalid inode pointer dereferences during log replay - fix a race between renames and directory logging - fix shutting down delayed iput worker - fix device byte accounting when dropping chunk - in zoned mode, fix offset calculations for DUP profile when conventional and sequential zones are used together Regression fixes: - fix possible double unlock of extent buffer tree (xarray conversion) - in zoned mode, fix extent buffer refcount when writing out extents (xarray conversion) Error handling fixes and updates: - handle unexpected extent type when replaying log - check and warn if there are remaining delayed inodes when putting a root - fix assertion when building free space tree - handle csum tree error with mount option 'rescue=ibadroot' Other: - error message updates: add prefix to all scrub related messages, include other information in messages" * tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix alloc_offset calculation for partly conventional block groups btrfs: handle csum tree error with rescue=ibadroots correctly btrfs: fix race between async reclaim worker and close_ctree() btrfs: fix assertion when building free space tree btrfs: don't silently ignore unexpected extent type when replaying log btrfs: fix invalid inode pointer dereferences during log replay btrfs: fix double unlock of buffer_tree xarray when releasing subpage eb btrfs: update superblock's device bytes_used when dropping chunk btrfs: fix a race between renames and directory logging btrfs: scrub: add prefix for the error messages btrfs: warn if leaking delayed_nodes in btrfs_put_root() btrfs: fix delayed ref refcount leak in debug assertion btrfs: include root in error message when unlinking inode btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails	2025-06-23 11:16:38 -07:00
Anand Jain	65d5112b4d	btrfs: scrub: add prefix for the error messages Add a "scrub: " prefix to all messages logged by scrub so that it's easy to filter them from dmesg for analysis. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-06-19 15:19:06 +02:00
Linus Torvalds	5e82ed5ca4	for-6.16-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmgtuJgACgkQxWXV+ddt WDt79g//YndozUasOP0raqNVvod4wYvmG/CX1yHOkFQpfRQSVG4av0KlTWnupXKG oEQvFbZ639tmXbBYlKlK8Ts8fy1dpj+2iG4ValukA4L7xkY8ML5DrGQfKYbPEm2i Ab9lp4qnZZutYVH2/5UGQqkEUA3/YIiOZ0hsZWir//zbkTCL9cuHwl2FUYbmFlHi Hxkd30QC0kZuxINdMxXGauF4JkFJFyiNnmI5dMjj07xMMWk1cv8vunoZ3LVjAlbW gX16+4rUmtJl33HbYqofee4Dcovvcuvt/fEM1LX0rGbKXOnKA2dQPoMQsjMAV82B mjhma5T709MgVHQiDdJduh86seaul4Cuv/E/OqoDj7Kfkoew/YquHEfU4TB4bvCX KmONEyJFd9QDq5CUyvfow7HENja6QbU31Fw6akrbfpsVcla0MKAUWPi+Vqpqf+pe qIWNcovorD2g/EVJV6y+w0K+kXTarPtXXmVnJnJPYtOkBWpARI3Y8wVxDCKX8Nfo 7Kpi/h9K87+d9opjjEajydNONDL9GQa4AY4u/oeiwcSuJHvCt/rsKKwHZRyycRiI q+nGwsNcmY/ih/EVUzLgYomGG08H9nOcKvZOQkfHpOTI1EgvILAeV9SpGMex7du1 PiPqVtv9Z60dKy6OValh7ttMpt7LszAK4Dk7XiyHrN1Q3sYDyrs= =bDOD -----END PGP SIGNATURE----- Merge tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Apart from numerous cleanups, there are some performance improvements and one minor mount option update. There's one more radix-tree conversion (one remaining), and continued work towards enabling large folios (almost finished). Performance: - extent buffer conversion to xarray gains throughput and runtime improvements on metadata heavy operations doing writeback (sample test shows +50% throughput, -33% runtime) - extent io tree cleanups lead to performance improvements by avoiding unnecessary searches or repeated searches - more efficient extent unpinning when committing transaction (estimated run time improvement 3-5%) User visible changes: - remove standalone mount option 'nologreplay', deprecated in 5.9, replacement is 'rescue=nologreplay' - in scrub, update reporting, add back device stats message after detected errors (accidentally removed during recent refactoring) Core: - convert extent buffer radix tree to xarray - in subpage mode, move block perfect compression out of experimental build - in zoned mode, introduce sub block groups to allow managing special block groups, like the one for relocation or tree-log, to handle some corner cases of ENOSPC - in scrub, simplify bitmaps for block tracking status - continued preparations for large folios: - remove assertions for folio order 0 - add support where missing: compression, buffered write, defrag, hole punching, subpage, send - fix fsync of files with no hard links not persisting deletion - reject tree blocks which are not nodesize aligned, a precaution from 4.9 times - move transaction abort calls closer to the error sites - remove usage of some struct bio_vec internals - simplifications in extent map - extent IO cleanups and optimizations - error handling improvements - enhanced ASSERT() macro with optional format strings - cleanups: - remove unused code - naming unifications, dropped __, added prefix - merge similar functions - use common helpers for various data structures" * tag 'for-6.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (198 commits) btrfs: move misplaced comment of btrfs_path::keep_locks btrfs: remove standalone "nologreplay" mount option btrfs: use a single variable to track return value at btrfs_page_mkwrite() btrfs: don't return VM_FAULT_SIGBUS on failure to set delalloc for mmap write btrfs: simplify early error checking in btrfs_page_mkwrite() btrfs: pass true to btrfs_delalloc_release_space() at btrfs_page_mkwrite() btrfs: fix wrong start offset for delalloc space release during mmap write btrfs: fix harmless race getting delayed ref head count when running delayed refs btrfs: log error codes during failures when writing super blocks btrfs: simplify error return logic when getting folio at prepare_one_folio() btrfs: return real error from __filemap_get_folio() calls btrfs: remove superfluous return value check at btrfs_dio_iomap_begin() btrfs: fix invalid data space release when truncating block in NOCOW mode btrfs: update Kconfig option descriptions btrfs: update list of features built under experimental config btrfs: send: remove btrfs_debug() calls btrfs: use boolean for delalloc argument to btrfs_free_reserved_extent() btrfs: use boolean for delalloc argument to btrfs_free_reserved_bytes() btrfs: fold error checks when allocating ordered extent and update comments btrfs: check we grabbed inode reference when allocating an ordered extent ...	2025-05-26 12:24:43 -07:00
Linus Torvalds	6f59de9bc0	for-6.16/block-20250523 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmgwnGYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq9aD/4iqOts77xhWWLrOJWkkhOcV5rREeyppq8X MKYul9S4cc4Uin9Xou9a+nab31QBQEk3nsN3kX9o3yAXvkh6yUm36HD8qYNW/46q IUkwRQQJ0COyTnexMZQNTbZPQDIYcenXmQxOcrEJ5jC1Jcz0sOKHsgekL+ab3kCy fLnuz2ozvjGDMala/NmE8fN5qSlj4qQABHgbamwlwfo4aWu07cwfqn5G/FCYJgDO xUvsnTVclom2g4G+7eSSvGQI1QyAxl5QpviPnj/TEgfFBFnhbCSoBTEY6ecqhlfW 6u59MF/Uw8E+weiuGY4L87kDtBhjQs3UMSLxCuwH7MxXb25ff7qB4AIkcFD0kKFH 3V5NtwqlU7aQT0xOjGxaHhfPwjLD+FVss4ARmuHS09/Kn8egOW9yROPyetnuH84R Oz0Ctnt1IPLFjvGeg3+rt9fjjS9jWOXLITb9Q6nX9gnCt7orCwIYke8YCpmnJyhn i+fV4CWYIQBBRKxIT0E/GhJxZOmL0JKpomnbpP2dH8npemnsTCuvtfdrK9gfhH2X chBVqCPY8MNU5zKfzdEiavPqcm9392lMzOoOXW2pSC1eAKqnAQ86ZT3r7rLntqE8 75LxHcvaQIsnpyG+YuJVHvoiJ83TbqZNpyHwNaQTYhDmdYpp2d/wTtTQywX4DuXb Y6NDJw5+kQ== =1PNK -----END PGP SIGNATURE----- Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...	2025-05-26 11:39:36 -07:00
Qu Wenruo	4ad57e1e22	btrfs: scrub: reduce memory usage of struct scrub_sector_verification That structure records needed info for block verification (either data checksum pointer, or expected tree block generation). But there is also a boolean to tell if this block belongs to a metadata or not, as the data checksum pointer and expected tree block generation is already a union, we need a dedicated bit to tell if this block is a metadata or not. However such layout means we're wasting 63 bits for x86_64, which is a huge memory waste. Thanks to the recent bitmap aggregation, we can easily move this single-bit-per-block member to a new sub-bitmap. And since we already have six 16 bits long bitmaps, adding another bitmap won't even increase any memory usage for x86_64, as we need two 64 bits long anyway. This will reduce the following memory usages: - sizeof(struct scrub_sector_verification) From 16 bytes to 8 bytes on x86_64. - scrub_stripe::sectors From 16 * 16 to 16 * 8 bytes. - Per-device scrub_ctx memory usage From 128 * (16 * 16) to 128 * (16 * 8), which saves 16KiB memory. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:56 +02:00
Qu Wenruo	1b660424a6	btrfs: scrub: aggregate small bitmaps into a larger one Currently we have several small bitmaps inside scrub_stripe: - extent_sector_bitmap - error_bitmap - io_error_bitmap - csum_error_bitmap - meta_error_bitmap - meta_gen_error_bitmap All those bitmaps are at most 16 bits long, but unsigned long is either 32 or 64 (more common) bits. This means we're wasting 1/2 or 3/4 space for each bitmap. And we can have 128 scrub_stripe for each device, such wasted space adds up quickly. Instead of using a single unsigned long for each bitmap, aggregate them into a larger bitmap, just like what we're doing for subpage support. This reduces 24 bytes from each scrub_stripe structure on x86_64 systems. This will need a lot of macros converting direct bitmap/bit operations into our scrub_stripe specific helpers, but all those helpers are very small and can be inlined. So overall the overhead shouldn't be that huge, and we save quite some memory space. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:55 +02:00
Qu Wenruo	f2c19541e4	btrfs: scrub: fix a wrong error type when metadata bytenr mismatches When the bytenr doesn't match for a metadata tree block, we will report it as an csum error, which is incorrect and should be reported as a metadata error instead. Fixes: `a3ddbaebc7` ("btrfs: scrub: introduce a helper to verify one metadata block") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:55 +02:00
Qu Wenruo	ce6920dba8	btrfs: scrub: move error reporting members to stack Currently the following members of scrub_stripe are only utilized for error reporting: - init_error_bitmap - init_nr_io_errors - init_nr_csum_errors - init_nr_meta_errors - init_nr_meta_gen_errors There is no need to put all those members into scrub_stripe, which take 24 bytes for each stripe, and we have 128 stripes for each device. Instead introduce a structure, scrub_error_records, and move all above members into that structure. And allocate such structure from stack inside scrub_stripe_read_repair_worker(). Since that function is called from a workqueue context, we have more than enough stack space for just 24 bytes. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:54 +02:00
Qu Wenruo	ec1f3a207c	btrfs: scrub: update device stats when an error is detected [BUG] Since the migration to the new scrub_stripe interface, scrub no longer updates the device stats when hitting an error, no matter if it's a read or checksum mismatch error. E.g: BTRFS info (device dm-2): scrub: started on devid 1 BTRFS error (device dm-2): unable to fixup (regular) error at logical 13631488 on dev /dev/mapper/test-scratch1 physical 13631488 BTRFS warning (device dm-2): checksum error at logical 13631488 on dev /dev/mapper/test-scratch1, physical 13631488, root 5, inode 257, offset 0, length 4096, links 1 (path: file) BTRFS error (device dm-2): unable to fixup (regular) error at logical 13631488 on dev /dev/mapper/test-scratch1 physical 13631488 BTRFS warning (device dm-2): checksum error at logical 13631488 on dev /dev/mapper/test-scratch1, physical 13631488, root 5, inode 257, offset 0, length 4096, links 1 (path: file) BTRFS info (device dm-2): scrub: finished on devid 1 with status: 0 Note there is no line showing the device stats error update. [CAUSE] In the migration to the new scrub_stripe interface, we no longer call btrfs_dev_stat_inc_and_print(). [FIX] - Introduce a new bitmap for metadata generation errors * A new bitmap @meta_gen_error_bitmap is introduced to record which blocks have metadata generation mismatch errors. * A new counter for that bitmap @init_nr_meta_gen_errors, is also introduced to store the number of generation mismatch errors that are found during the initial read. This is for the error reporting at scrub_stripe_report_errors(). * New dedicated error message for unrepaired generation mismatches * Update @meta_gen_error_bitmap if a transid mismatch is hit - Add btrfs_dev_stat_inc_and_print() calls to the following call sites * scrub_stripe_report_errors() * scrub_write_endio() This is only for the write errors. This means there is a minor behavior change: - The timing of device stats error message Since we concentrate the error messages at scrub_stripe_report_errors(), the device stats error messages will all show up in one go, after the detailed scrub error messages: BTRFS error (device dm-2): unable to fixup (regular) error at logical 13631488 on dev /dev/mapper/test-scratch1 physical 13631488 BTRFS warning (device dm-2): checksum error at logical 13631488 on dev /dev/mapper/test-scratch1, physical 13631488, root 5, inode 257, offset 0, length 4096, links 1 (path: file) BTRFS error (device dm-2): unable to fixup (regular) error at logical 13631488 on dev /dev/mapper/test-scratch1 physical 13631488 BTRFS warning (device dm-2): checksum error at logical 13631488 on dev /dev/mapper/test-scratch1, physical 13631488, root 5, inode 257, offset 0, length 4096, links 1 (path: file) BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Fixes: `e02ee89baa` ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:54 +02:00
Christoph Hellwig	adbfd189c4	btrfs: scrub: use virtual addresses directly Instead of the old @page and @page_offset pair inside scrub, here we can directly use the virtual address for a sector. This has the following benefit: - Simplified parameters A single @kaddr will repair @page and @page_offset. - No more unnecessary kmap/kunmap calls Since all pages utilized by scrub is allocated by scrub, and no highmem is allowed, we do not need to do any kmap/kunmap. And add an ASSERT() inside the new scrub_stripe_get_kaddr() to catch any unexpected highmem page. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:47 +02:00
Christoph Hellwig	959ddf2839	btrfs: move kmapping out of btrfs_check_sector_csum() Move kmapping the page out of btrfs_check_sector_csum(). This allows using bvec_kmap_local() where suitable and reduces the number of kmap*() calls in the raid56 code. This also means btrfs_check_sector_csum() will only accept a properly kmapped address. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-15 14:30:46 +02:00
Christoph Hellwig	760aa1818b	btrfs: use bdev_rw_virt in scrub_one_super Replace the code building a bio from a kernel direct map address and submitting it synchronously with the bdev_rw_virt helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Link: https://lore.kernel.org/r/20250507120451.4000627-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-07 07:31:07 -06:00
Qu Wenruo	f95d186255	btrfs: avoid NULL pointer dereference if no valid csum tree [BUG] When trying read-only scrub on a btrfs with rescue=idatacsums mount option, it will crash with the following call trace: BUG: kernel NULL pointer dereference, address: 0000000000000208 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs] Call Trace: <TASK> scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs] scrub_simple_mirror+0x175/0x290 [btrfs] scrub_stripe+0x5f7/0x6f0 [btrfs] scrub_chunk+0x9a/0x150 [btrfs] scrub_enumerate_chunks+0x333/0x660 [btrfs] btrfs_scrub_dev+0x23e/0x600 [btrfs] btrfs_ioctl+0x1dcf/0x2f80 [btrfs] __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x4f/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e [CAUSE] Mount option "rescue=idatacsums" will completely skip loading the csum tree, so that any data read will not find any data csum thus we will ignore data checksum verification. Normally call sites utilizing csum tree will check the fs state flag NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all. This results in scrub to call btrfs_search_slot() on a NULL pointer and triggered above crash. [FIX] Check both extent and csum tree root before doing any tree search. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-05-02 13:20:11 +02:00
David Sterba	dba6ae0b43	btrfs: unify ordering of btrfs_key initializations The btrfs_key is defined as objectid/type/offset and the keys are also printed like that. For better readability, update all key initializations to match this order. Signed-off-by: David Sterba <dsterba@suse.com>	2025-03-18 20:35:42 +01:00
Qu Wenruo	6aecd91a5c	btrfs: avoid NULL pointer dereference if no valid extent tree [BUG] Syzbot reported a crash with the following call trace: BTRFS info (device loop0): scrub: started on devid 1 BUG: kernel NULL pointer dereference, address: 0000000000000208 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 106e70067 P4D 106e70067 PUD 107143067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 UID: 0 PID: 689 Comm: repro Kdump: loaded Tainted: G O 6.13.0-rc4-custom+ #206 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 RIP: 0010:find_first_extent_item+0x26/0x1f0 [btrfs] Call Trace: <TASK> scrub_find_fill_first_stripe+0x13d/0x3b0 [btrfs] scrub_simple_mirror+0x175/0x260 [btrfs] scrub_stripe+0x5d4/0x6c0 [btrfs] scrub_chunk+0xbb/0x170 [btrfs] scrub_enumerate_chunks+0x2f4/0x5f0 [btrfs] btrfs_scrub_dev+0x240/0x600 [btrfs] btrfs_ioctl+0x1dc8/0x2fa0 [btrfs] ? do_sys_openat2+0xa5/0xf0 __x64_sys_ioctl+0x97/0xc0 do_syscall_64+0x4f/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [CAUSE] The reproducer is using a corrupted image where extent tree root is corrupted, thus forcing to use "rescue=all,ro" mount option to mount the image. Then it triggered a scrub, but since scrub relies on extent tree to find where the data/metadata extents are, scrub_find_fill_first_stripe() relies on an non-empty extent root. But unfortunately scrub_find_fill_first_stripe() doesn't really expect an NULL pointer for extent root, it use extent_root to grab fs_info and triggered a NULL pointer dereference. [FIX] Add an extra check for a valid extent root at the beginning of scrub_find_fill_first_stripe(). The new error path is introduced by `42437a6386` ("btrfs: introduce mount option rescue=ignorebadroots"), but that's pretty old, and later commit `b979547513` ("btrfs: scrub: introduce helper to find and fill sector info for a scrub_stripe") changed how we do scrub. So for kernels older than 6.6, the fix will need manual backport. Reported-by: syzbot+339e9dbe3a2ca419b85d@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/67756935.050a0220.25abdd.0a12.GAE@google.com/ Fixes: `42437a6386` ("btrfs: introduce mount option rescue=ignorebadroots") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2025-01-06 16:32:31 +01:00
David Sterba	887d417f0a	btrfs: drop unused parameter map from scrub_simple_mirror() The parameter map used to be passed to scrub_extent() until `e02ee89baa` ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure"), where the scrub implementation was completely reworked. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:16 +01:00
David Sterba	f2c144fba7	btrfs: scrub: drop unused parameter sctx from scrub_submit_extent_sector_read() The parameter is unused and we can reach sctx from scrub stripe if needed. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:15 +01:00
Johannes Thumshirn	9fde8a67b9	btrfs: scrub: skip initial RST lookup errors Performing the initial extent sector read on a RAID stripe-tree backed filesystem with pre-allocated extents will cause the RAID stripe-tree lookup code to return ENODATA, as pre-allocated extents do not have any on-disk bytes and thus no RAID stripe-tree entries. But the current scrub read code marks these extents as errors, because the lookup fails. If btrfs_map_block() returns -ENODATA, it means that the call to btrfs_get_raid_extent_offset() returned -ENODATA, because there is no entry for the corresponding range in the RAID stripe-tree. But as this range is in the extent tree it means we've hit a pre-allocated extent. In this case, don't mark the sector in the stripe's error bitmaps as faulty and carry on to the next. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:15 +01:00
Shen Lichuan	2144e1f23f	btrfs: correct typos in multiple comments across various files Fix some confusing spelling errors that were currently identified, the details are as follows: block-group.c: 2800: uncompressible ==> incompressible extent-tree.c: 3131: EXTEMT ==> EXTENT extent_io.c: 3124: utlizing ==> utilizing extent_map.c: 1323: ealier ==> earlier extent_map.c: 1325: possiblity ==> possibility fiemap.c: 189: emmitted ==> emitted fiemap.c: 197: emmitted ==> emitted fiemap.c: 203: emmitted ==> emitted transaction.h: 36: trasaction ==> transaction volumes.c: 5312: filesysmte ==> filesystem zoned.c: 1977: trasnsaction ==> transaction Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:14 +01:00
Riyan Dhiman	522945b342	btrfs: remove redundant stop_loop variable in scrub_stripe() The variable stop_loop was originally introduced in commit `625f1c8dc6` ("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in commit `3b080b2564` ("Btrfs: scrub raid56 stripes in the right way"). However, in a later commit `18d30ab961` ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code that modified stop_loop was removed, making the variable redundant. Currently, stop_loop is only initialized with 0 and is never used or modified within the scrub_stripe() function. As a result, this patch removes the stop_loop variable to clean up the code and eliminate unnecessary redundancy. This change has no impact on functionality, as stop_loop was never utilized in any meaningful way in the final version of the code. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Riyan Dhiman <riyandhiman14@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-11-11 14:34:13 +01:00
David Sterba	792e86ef31	btrfs: rename btrfs_submit_bio() to btrfs_submit_bbio() The function name is a bit misleading as it submits the btrfs_bio (bbio), rename it so we can use btrfs_submit_bio() when an actual bio is submitted. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:19 +02:00
Johannes Thumshirn	d6106f0dc5	btrfs: rename btrfs_io_stripe::is_scrub to rst_search_commit_root Rename 'btrfs_io_stripe::is_scrub' to 'rst_search_commit_root'. While 'is_scrub' describes the state of the io_stripe (it is a stripe submitted by scrub) it does not describe the purpose, namely looking at the commit root when searching RAID stripe-tree entries. Renaming the stripe to rst_search_commit_root describes this purpose. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-09-10 16:51:17 +02:00
Qu Wenruo	63447b7dd4	btrfs: scrub: update last_physical after scrubbing one stripe Currently sctx->stat.last_physical only got updated in the following cases: - When the last stripe of a non-RAID56 chunk is scrubbed This implies a pitfall, if the last stripe is at the chunk boundary, and we finished the scrub of the whole chunk, we won't update last_physical at all until the next chunk. - When a P/Q stripe of a RAID56 chunk is scrubbed This leads the following two problems: - sctx->stat.last_physical is not updated for a almost full chunk This is especially bad, affecting scrub resume, as the resume would start from last_physical, causing unnecessary re-scrub. - "btrfs scrub status" will not report any progress for a long time Fix the problem by properly updating @last_physical after each stripe is scrubbed. And since we're here, for the sake of consistency, use spin lock to protect the update of @last_physical, just like all the remaining call sites touching sctx->stat. Reported-by: Michel Palleau <michel.palleau@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAMFk-+igFTv2E8svg=cQ6o3e6CrR5QwgQ3Ok9EyRaEvvthpqCQ@mail.gmail.com/ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-01 17:15:07 +02:00
Qu Wenruo	33eb1e5db3	btrfs: factor out stripe length calculation into a helper Currently there are two locations which need to calculate the real length of a stripe (which can be at the end of a chunk, and the chunk size may not always be 64K aligned). Factor them into a helper as we're going to have a third user soon. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-08-01 17:15:05 +02:00
Qu Wenruo	0fbf6cbd72	btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array() There is only one caller utilizing the @extra_gfp parameter, alloc_eb_folio_array(). And in that case the extra_gfp is only assigned to __GFP_NOFAIL. Rename the @extra_gfp parameter to @nofail to indicate that. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-11 15:33:30 +02:00

1 2 3 4 5 ...

580 Commits