Commit Graph

287 Commits

Author SHA1 Message Date
Kent Overstreet 77d63522f0 bcachefs: Make replicas_delta_list smaller
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet 43de7376f3 bcachefs: Fix erasure coding disk space accounting
Disk space accounting for erasure coding + compression was completely
broken - we need to calculate the parity sectors delta the same way we
calculate disk_sectors, by calculating the old and new usage and
subtracting to get the difference.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet 37954a275f bcachefs: Limit pointers to being in only one stripe
This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet 332c6e5370 bcachefs: Fix bch2_mark_extent()
If an extent only contained cached or erasure coded pointers, there
won't be any devices in the normal dirty replicas list or an entry to
update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet 64bc001153 bcachefs: Rework btree iterator lifetimes
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet a7199432c3 bcachefs: Kill deferred btree updates
Will be replaced by cached btree iterators

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet 6309589468 bcachefs: Improved bch2_fcollapse()
Move extents instead of copying them - this way, we can iterate over
only live extents, not the entire keyspace. Also, this means we can
mostly skip running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 36e9d69854 bcachefs: Do updates in order they were queued up in
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 4430ea7046 bcachefs: Kill BTREE_INSERT_NOMARK_INSERT
Was dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 78854fca28 bcachefs: Fix BTREE_INSERT_NOMARK_OVERWRITES
bch2_mark_update() was correct, but bch2_trans_mark_update() wasn't
respecting BTREE_INSERT_NOMARK_OVERWRITES - key marking/triggers really
need to be cleaned up.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 06ab329c15 bcachefs: Improve pointer marking checks and error messages
Importantly, we don't want to use bch2_fs_inconsistent_on() for errors
that fsck can repair, becuase that will just put us in RO mode and
prevent fsck from actually fixing stuff. Probably want to get rid of it
in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 9940a791ea bcachefs: Fix error message on bucket overflow
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet df5d4dae0b bcachefs: Fixes for replicas tracking
The continue statement in bch2_trans_mark_extent() was wrong - by
bailing out early, we'd be constructing the wrong replicas list to
update. Also, the assertion in update_replicas() was wrong - due to
rounding with compressed extents, it is possible for sectors to be 0
sometimes.

Also, change extent_to_replicas() in replicas.c to match the replicas
list we construct in buckets.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 6671a7089f bcachefs: Refactor bch2_alloc_write()
Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 67163cded3 bcachefs: Trust in memory bucket mark
This fixes a bug in the journal replay -> extent_replay_key ->
split_compressed path, when we do an update that changes alloc info but
the alloc info in the btree isn't up to date yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet 41fcd62150 bcachefs: Fix faulty assertion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet 76426098e4 bcachefs: Reflink
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet 3c7f3b7aeb bcachefs: Refactor bch2_extent_trim_atomic() for reflink
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet 2cbe5cfe27 bcachefs: Rework calling convention for marking overwrites
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet e3d3a9d91a bcachefs: trans_get_key() now works correctly for extents
More prep work for reflink: for extents, we're not looking for an exact
mach on pos, rather that the pos is within the range of the key the
iterator points to.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:24 -04:00
Kent Overstreet 0c04f5eb0d bcachefs: Don't overflow trans with iters from triggers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:24 -04:00
Kent Overstreet 8d591d5da4 bcachefs: Convert some assertions to fsck errors
Actual repair code will come later, but this is a start

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:24 -04:00
Kent Overstreet 91052b9de8 bcachefs: Refactor trans_(get|update)_key
these are still pretty ugly...

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:23 -04:00
Kent Overstreet 88767d65d8 bcachefs: Update path now handles triggers that generate more triggers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:23 -04:00
Kent Overstreet 6e738539cd bcachefs: Improve key marking interface
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet 4ee202e2b7 bcachefs: better BTREE_INSERT_NO_CLEAR_REPLICAS
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet 3838be7841 bcachefs: Don't use a fixed size buffer for fs_usage_deltas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet 6fb076e60d bcachefs: Fix spurious inconsistency in recovery
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet 7cfac5f506 bcachefs: Fix for the stripes mark path and gc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet 460651ee86 bcachefs: Various improvements to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet 932aa83745 bcachefs: bch2_trans_mark_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet 5e82a9a1f4 bcachefs: Write out fs usage consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet fca1223ccf bcachefs: Avoid write lock on mark_lock
mark_lock is a frequently taken lock, and there's also potential for
deadlocks since currently bch2_clear_page_bits which is called from
memory reclaim has to take it to drop disk reservations.

The disk reservation get path takes it when it recalculates the number
of sectors known to be available, but it's not really needed for
consistency.  We just want to make sure we only have one thread updating
the sectors_available count, which we can do with a dedicated mutex.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet 94f651e2c7 bcachefs: Return errors from for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet 201a4d4cbe bcachefs: fix triggers for stripes btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet c6dd04f8f5 bcachefs: Mark overwrites from journal replay in initial gc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet a1d58243f9 bcachefs: add ability to run gc on metadata only
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet 36e916e13b bcachefs: Caller now responsible for calling mark_key for gc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:19 -04:00
Kent Overstreet 3a0e06db71 bcachefs: Assorted preemption fixes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:19 -04:00
Kent Overstreet 4d8100daa9 bcachefs: Allocate fs_usage in do_btree_insert_at()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet 0dc17247f1 bcachefs: kill struct btree_insert
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet 59928c1220 bcachefs: Don't BUG_ON() on bucket sector count overflow
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:17 -04:00
Kent Overstreet ecf37a4a80 bcachefs: fs_usage_u64s()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 768ac63924 bcachefs: Add a mechanism for blocking the journal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 8fe826f90a bcachefs: Convert bucket invalidation to key marking path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 73c27c6095 bcachefs: fixes for cached data accounting
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 8777210b92 bcachefs: refactor key marking code a bit
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 2ecc6171a3 bcachefs: Fix double counting when gc is running
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 39fbc5a49f bcachefs: gc lock no longer needed for disk reservations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet 76f4c7b0c3 bcachefs: Fix oldest_gen handling
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet 3577df5f7f bcachefs: serialize persistent_reserved
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet 3e0745e283 bcachefs: initialize fs usage summary in recovery
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet 4c97e04aa8 bcachefs: percpu utility code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet bdba6c29ff bcachefs: fix inode counting
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet 61c8d7c8eb bcachefs: Persist stripe blocks_used
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet 430735cd1a bcachefs: Persist alloc info on clean shutdown
- Does not persist alloc info for stripes yet
 - Also does not yet include filesystem block/sector counts yet, from
struct fs_usage
 - Not made use of just yet

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet 7ef2a73a58 bcachefs: Fix check for if extent update is allocating
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet d0cc3defba bcachefs: More allocator startup improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet 23f80d2b3b bcachefs: Factor out acc_u64s()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet 06b7345cc2 bcachefs: Include summarized counts in fs_usage
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:13 -04:00
Kent Overstreet 5663a41521 bcachefs: refactor bch_fs_usage
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:13 -04:00
Kent Overstreet 641ab73643 bcachefs: improve/clarify ptr_disk_sectors()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:13 -04:00
Kent Overstreet 9166b41db1 bcachefs: s/usage_lock/mark_lock
better describes what it's for, and we're going to call a new lock
usage_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:13 -04:00
Kent Overstreet 8eb7f3ee46 bcachefs: move dirty into bucket_mark
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet f0cfb963ec bcachefs: Track nr_inodes with the key marking machinery
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet 26609b619f bcachefs: Make bkey types globally unique
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet eeb83e25bb bcachefs: Hold usage_lock over mark_key and fs_usage_apply
Fixes an inconsistency at the end of gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet dfe9bfb32e bcachefs: Stripes now properly subject to gc
gc now verifies the contents of the stripes radix tree, important for
persistent alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet 9ca53b55f7 bcachefs: gc now operates on second set of bucket marks
This means we can now use gc to verify the allocation information -
important for testing persistant alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet 61274e9d45 bcachefs: Allocator startup improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet cd575ddf57 bcachefs: Erasure coding
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet b35b192583 bcachefs: Move key marking out of extents.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet 4628529f15 bcachefs: Disk usage in compressed sectors, not uncompressed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet 8b335baef2 bcachefs: Assorted fixes for running on very small devices
It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet b092dadd55 bcachefs: Scale down number of writepoints when low on space
this means we don't have to reserve space for them when calculating
filesystem capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet 47799326bc bcachefs: more key marking refactoring
prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet 1742237ba1 bcachefs: extent_for_each_ptr_decode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet 7b3f84ea7d bcachefs: Split out alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet 6eac2c2e24 bcachefs: Change how replicated data is accounted
Due to compression, the different replicas of a replicated extent don't
necessarily have to take up the same amount of space - so replicated
data sector counts shouldn't be stored divided by the number of
replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet 5b650fd11a bcachefs: Account for internal fragmentation better
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet 09f3297ac9 bcachefs: kill s_alloc, use bch_data_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet a7c7a3092e bcachefs: bch2_mark_key() now takes bch_data_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet 3142e7ef4b bcachefs: fix nbuckets usage on device resize
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet b29e197aaf bcachefs: Invalidate buckets when writing to alloc btree
Prep work for persistent alloc information. Refactoring also lets us
make free_inc much smaller, which means a lot fewer buckets stranded on
freelists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet b2be7c8b73 bcachefs: kill bucket mark sector count saturation
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet c692399529 bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch
journal_buf_switch is called from the foreground when getting a journal
reservation and thus is somewhat latency sensitive;
bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive
when it does run.

Call it from the journal write path instead, and punt the journal write
to worqueue context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet 1c6fdbd8f2 bcachefs: Initial commit
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00