linux

Commit Graph

Author	SHA1	Message	Date
Kent Overstreet	a1f26d700a	bcachefs: Handle btree node rewrites before going RW Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	12795a1937	bcachefs: Add some logging for btree node rewrites due to errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	d94189ad56	bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	06ab86d596	bcachefs: Fix btree_node_write_blocked() not being cleared The btree_node_write_blocked bit was a later addition to this code, it only mirrors the state of the b->write_blocked list (empty or nonempty) - unfortunately, when it was added it wasn't correctly kept in sync - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	adf6360b5d	bcachefs: Improve btree_reserve_get_fail tracepoint Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	87ced107f3	bcachefs: Convert EAGAIN errors to private error codes More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	ee2c6ea776	bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	42af0ad569	bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	4fcdd6ec34	bcachefs: Btree split improvement This improves the bkey_format calculation when splitting btree nodes. Previously, we'd use a format calculated for the original node for the lower of the two new nodes. This was particularly bad on sequential insertions, where we iteratively split the last btree node, whos format has to include KEY_MAX. Now, we calculate formats precisely for the keys the two new nodes will contain. This also should make splitting a bit more efficient, since we're only copying keys once (from the original node to the new node, instead of new node, replacement node, then upper split). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	46fee692ee	bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	df6a24f81a	bcachefs: Make error messages more uniform Use __func__ in error messages that refer to function name, and do so more uniformly - suggested by checkpatch.pl Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	de107dc800	bcachefs: Call bch2_btree_update_add_new_node() before dropping write lock btree nodes can be written by other threads (shrinker, journal reclaim) with only a read lock, but brand new nodes should only be written by the thread doing the split/interior update. bch2_btree_update_add_new_node() sets btree node flags to indicate that this is a new node and should not be written out by other threads, thus we need to call it before dropping our write lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1f0f731ffe	bcachefs: Btree splits now only take the locks they need Previously, bch2_btree_update_start() would always take all intent locks, all the way up to the root. We've finally got data from users where this became a scalability issue - so, this patch fixes bch2_btree_update_start() to only take the locks we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1ff7849f3b	bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	a8eefbd324	bcachefs: Add error path to btree_split() The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	8cbb000250	bcachefs: Write new btree nodes after parent update In order to avoid locking all btree nodes up to the root for btree node splits, we're going to have to introduce a new error path into bch2_btree_insert_node(); this mean we can't have done any writes or modified global state before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	8aaee94d46	bcachefs: Fix a deadlock in btree_update_nodes_written() btree_node_lock_nopath() is something we'd like to get rid of, it's always prone to deadlocks if we accidentally are holding other locks, because it doesn't mark the lock it's taking in a path: we'll want to get rid of it in the future, but for now this patch works it by calling bch2_trans_unlock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	c298fd7d34	bcachefs; Mark __bch2_trans_iter_init as inline This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	c6cf49a95a	bcachefs: Fix blocking with locks held This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	01ed3359b2	bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVE This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	d602657cd1	bcachefs: Fix error handling in bch2_btree_update_start() We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e4215d0fec	bcachefs: All held locks must be in a btree path With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	367d72dd5f	bcachefs: bch2_btree_path_upgrade() now emits transaction restart Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	38474c2642	bcachefs: Avoid using btree_node_lock_nopath() With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	da4474f209	bcachefs: Convert more locking code to btree_bkey_cached_common Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	d5024b011c	bcachefs: bch2_btree_node_lock_write_nofail() Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	ca7d8fcabf	bcachefs: New locking functions In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:40 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	d97e6aaed6	bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	ce56bf7fc2	bcachefs: Improve trans_restart_journal_preres_get tracepoint It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	f0d2e9f2e5	bcachefs: Add assertions for unexpected transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	9f96568c0a	bcachefs: Tracepoint improvements Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	315c9ba6da	bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	fd211bc71c	bcachefs: Don't set should_be_locked on paths that aren't locked It doesn't make any sense to set should_be_locked on btree_paths that aren't locked, and is often a bug - this patch adds assertions and fixes some of those bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	549d173c1b	bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Daniel Hill	8bfe14e86a	bcachefs: lock time stats prep work. We need the caller name and a place to store our results, btree_trans provides this. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e68914ca84	bcachefs: Rename __bch2_trans_do() -> commit_do() Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	a3d7afa5c1	bcachefs: Always use percpu_ref_tryget_live() on c->writes If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	401ec4db63	bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:33 -04:00
Kent Overstreet	1f93726e63	bcachefs: Tracepoint improvements Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	fd4cecd258	bcachefs: Lock ordering fix Can't take btree node locks while holding btree_reserve_cache_lock - it would be nice if we could check this with lockdep. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	c0960603e2	bcachefs: Shutdown path improvements We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	7419646b25	bcachefs: btree_update_interior.c prep for backpointers Previously, btree_update_interior.c passed keys to bch2_trans_mark_* that hadn't been fully initialized - they didn't have the key field filled out, just the value. With backpointers, we need to make sure keys are fully initialized before marking them - because the backpointer points back to the original key. This patch tweaks the interior update paths to fix this. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	e1b8f5f5ca	bcachefs: Plumb btree_id & level to trans_mark For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	275c8426fb	bcachefs: Add rw to .key_invalid() This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	f0ac7df23d	bcachefs: Convert .key_invalid methods to printbufs Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	f25d8215f4	bcachefs: Kill allocator threads & freelists Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	b17d3cec14	bcachefs: Run btree updates after write out of write_point In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	5f41739403	bcachefs: bch2_btree_update_start() refactoring This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	31f63fd124	bcachefs: Introduce a separate journal watermark for copygc Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	3e1547116f	bcachefs: x-macroize alloc_reserve enum This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:29 -04:00
Kent Overstreet	2a6870ada4	bcachefs: Use darray for extra_journal_entries Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:28 -04:00
Kent Overstreet	7071878bab	bcachefs: Add a missing btree_path_set_dirty() calls bch2_btree_iter_next_node() was mucking with other btree_path state without setting path->update to be consistent with the fact that the path is very much no longer uptodate - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:28 -04:00
Kent Overstreet	3098553776	bcachefs: Fix usage of six lock's percpu mode Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:26 -04:00
Kent Overstreet	ee68105f61	bcachefs: Simplify parameters to bch2_btree_update_start() We don't need to pass the number of nodes required to bch2_btree_update_start, just whether we're doing a split at @level. This is prep work for a fix to our usage of six lock's percpu mode, which is going to require us to count up and allocate interior nodes and leaf nodes seperately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:26 -04:00
Kent Overstreet	b66fbf3342	bcachefs: Drop unneeded journal pin in bch2_btree_update_start() When we do an interior btree update, we create new btree nodes and link them into the btree in memory, but they don't become reachable on disk until later, when btree_update_nodes_written_trans() runs. Updates to the new nodes can thus happen before they're reachable on disk, and if the updates to those new nodes are written before the nodes become reachable, we would then drop the journal pin for those updates before the btree has them. This is what the journal pin in bch2_btree_update_start() was protecting against. However, it's not actually needed because we don't allow subsequent append writes to btree nodes until the node is reachable on disk. Dropping this unneeded pin also fixes a bug introduced by "bcachefs: Journal seq now incremented at entry open, not close" - in the new code, if the journal is completely empty a journal pin list for journal_cur_seq() won't exist. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	bf3efff5e4	bcachefs: Fix race leading to btree node write getting stuck Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	82732ef510	bcachefs: Improve btree_node_write_if_need() btree_node_write_if_need() kicks off a btree node write only if need_write is set; this makes the locking easier to reason about by moving the check into the cmpxchg loop in __bch2_btree_node_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	de517c9551	bcachefs: Use x-macros for btree node flags This is for adding an array of strings for btree node flag names. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	55334d7897	bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES This was just dead code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	a0a07c59f5	bcachefs: Fix btree path sorting In btree_update_interior.c, we were changing a path's level directly - which affects path sort order - without re-sorting paths, leading to assertions when bch2_path_get() verified paths were sorted correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	fa8e94faee	bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:25 -04:00
Kent Overstreet	ae94c78fb1	bcachefs: bch2_trans_mark_key() now takes a bkey_i * We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	6e44568cc3	bcachefs: Set BTREE_NODE_SEQ() correctly in merge path BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes on disk, so that we can tell which btree node is newer if we ever have to scan the entire device to find btree nodes. The btree node merge path wasn't setting it correctly on the new node - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	c7ce27328b	bcachefs: Also show when blocked on write locks This consolidates some of the btree node lock path, so that when we're blocked taking a write lock on a node it shows up in bch2_btree_trans_to_text(), along with intent and read locks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	35228ecb7e	bcachefs: Don't keep nodes in btree_reserve locked These nodes aren't reachable by other threads, so there's no need to keep it locked - and this fixes a bug with the assertion in bch2_trans_unlock() firing on transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	b674bfadd8	bcachefs: Use BTREE_INSERT_USE_RESERVE in btree_update_key() bch2_btree_update_key() is used in the btree node write path - before delivering the completion we have to update the parent pointer with the number of sectors written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	669f87a5da	bcachefs: Switch to __func__for recording where btree_trans was initialized Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	d8601afca8	bcachefs: Simplify journal replay With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	5222a4607c	bcachefs: BTREE_ITER_WITH_JOURNAL This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:21 -04:00
Kent Overstreet	57af63b286	bcachefs: bch2_alloc_sectors_append_ptrs() now takes cached flag Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:20 -04:00
Kent Overstreet	8244f3209b	bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	f3e1f44433	bcachefs: BTREE_ITER_NOPRESERVE This adds a flag to not mark the initial btree_path as preserve, for paths that we expect to be cheap to reconstitute if necessary - this solves a btree_path overflow caused by need_whiteout_for_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	991ba02112	bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	f3cf0999ac	bcachefs: bch2_btree_node_rewrite() now returns transaction restarts We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	d355c6f4f7	bcachefs: for_each_btree_node() now returns errors directly This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	d697b9abba	bcachefs: More btree iterator fixes - check for getting to the end of the btree in bch2_path_verify_locks and __btree_path_traverse_all(), this fixes an infinite loop in __btree_path_traverse_all(). - relax requirement in bch2_btree_node_upgrade() that we must want an intent lock, this fixes bugs with paths that point to interior nodes (nonzero level). - bch2_btree_node_update_key(): fix it to upgrade the path to an intent lock, if necessary Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:13 -04:00
Kent Overstreet	71ed0056dc	bcachefs: Fix an assertion We can end up in a strange situation where a btree_path points to a node being freed even after pointers to it should have been replaced by pointers to the new node - if the btree node has been reused since the pointer to it was created. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:13 -04:00
Kent Overstreet	22b383ad7e	bcachefs: Kill retry loop in btree merge path Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	1d3ecd7ea7	bcachefs: Tighten up btree locking invariants New rule is: if a btree path holds any locks it should be holding precisely the locks wanted (accoringing to path->level and path->locks_want). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	67e0dd8f0d	bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:11 -04:00
Kent Overstreet	cab8e23373	bcachefs: Add an assertion for removing btree nodes from cache Chasing a bug that has something to do with the btree node cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	a0a568794d	bcachefs: More renaming Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	f7a966a3e2	bcachefs: Clean up/rename bch2_trans_node_* fns These utility functions are for managing btree node state within a btree_trans - rename them for consistency, and drop some unneeded arguments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	78cf784eaa	bcachefs: Further reduce iter->trans usage This is prep work for splitting btree_path out from btree_iter - btree_path will not have a pointer to btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:11 -04:00
Kent Overstreet	9f6bd30703	bcachefs: Reduce iter->trans usage Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	1a488e7306	bcachefs: Kill BTREE_INSERT_NOUNLOCK With the recent transaction restart changes, it's no longer needed - all transaction commits have BTREE_INSERT_NOUNLOCK semantics. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:10 -04:00
Kent Overstreet	e5af273fce	bcachefs: trans->restarted Start tracking when btree transactions have been restarted - and assert that we're always calling bch2_trans_begin() immediately after transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	a88171c9e6	bcachefs: Clean up interior update paths Btree node merging now happens prior to transaction commit, not after, so we don't need to pay attention to BTREE_INSERT_NOUNLOCK. Also, foreground_maybe_merge shouldn't be calling bch2_btree_iter_traverse_all() - this is becoming private to the btree iterator code and should only be called by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:09 -04:00
Kent Overstreet	9f1833cadd	bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:08 -04:00
Kent Overstreet	9f6e1f7bb0	bcachefs: Fix an allocator shutdown deadlock On fstest generic/388, we were seeing sporadic deadlocks in the emergency shutdown, where we'd get stuck shutting down the allocator because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated and then deallocated some btree nodes, putting them back on the btree_reserve_cache, after the allocator shutdown code had already cleared out that cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:08 -04:00
Kent Overstreet	19d5432445	bcachefs: Really don't hold btree locks while btree IOs are in flight This is something we've attempted to stick to for quite some time, as it helps guarantee filesystem latency - but there's a few remaining paths that this patch fixes. This is also necessary for an upcoming patch to update btree pointers after every btree write - since the btree write completion path will now be doing btree operations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:08 -04:00
Kent Overstreet	e3a67bdb6e	bcachefs: Regularize argument passing of btree_trans btree_trans should always be passed when we have one - iter->trans is disfavoured. This mainly updates old code in btree_update_interior.c, some of which predates btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:08 -04:00
Kent Overstreet	618b1c0e20	bcachefs: Split out SPOS_MAX Internal btree code really wants a POS_MAX with all fields ~0; external code more likely wants the snapshot field to be 0, because when we're passing it to bch2_trans_get_iter() it's used for the snapshot we're operating in, which should be 0 for most btrees that don't use snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	297d89343d	bcachefs: Extensive triggers cleanups - We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:07 -04:00
Kent Overstreet	8c3f6da9fc	bcachefs: Improve iter->should_be_locked Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:06 -04:00
Kent Overstreet	531a0095c9	bcachefs: Improve btree iterator tracepoints This patch adds some new tracepoints to the btree iterator code, and adds new fields to the existing tracepoints - primarily for the iterator position. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	ee7570546e	bcachefs: Fix a deadlock Waiting on a btree node write with btree locks held can deadlock, if the write errors: the write error path has to do do a btree update to drop the pointer to the replica that errored. The interior update path has to wait on in flight btree writes before freeing nodes on disk. Previously, this was done in bch2_btree_interior_update_will_free_node(), and could deadlock; now, we just stash a pointer to the node and do it in btree_update_nodes_written(), just prior to the transactional part of the update. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:05 -04:00
Kent Overstreet	731bdd2eff	bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	596d3bdc1e	bcachefs: Don't repair btree nodes until after interior journal replay is done We need the btree to be in a consistent state before we can rewrite btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	304b7e08c7	bcachefs: Fix an uninitialized var this fixes a valgrind complaint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	0098376f03	bcachefs: New helper __bch2_btree_insert_keys_interior() Consolidate common parts of bch2_btree_insert_keys_interior() and btree_split_insert_keys() - prep work for adding some new topology assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	bcd25dac53	bcachefs: Rewrite btree nodes with errors This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	537c32f521	bcachefs: Don't BUG_ON() btree topology error This replaces an assertion in the btree merge path with a bch2_inconsistent_error() - fsck will fix it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	4d47b21c4d	bcachefs: Fix a use after free Turns out, we weren't waiting on in flight btree writes when freeing existing btree nodes. This lead to stray btree writes overwriting newly allocated buckets, but only started showing itself with some of the recent allocator work and another patch to move submitting of btree writes to worqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	241e26369e	bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	35d5aff263	bcachefs: Kill bch2_fs_usage_scratch_get() This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2940295c97	bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	ecab6be7e5	bcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts This means that btree node splits don't have to automatically trigger a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	54ca47e114	bcachefs: Kill bch2_btree_node_get_sibling() This patch reworks the btree node merge path to use a second btree iterator to get the sibling node - which means bch2_btree_iter_get_sibling() can be deleted. Also, it uses bch2_btree_iter_traverse_all() if necessary - which means it should be more reliable. We don't currently even try to make it work when trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction commit, hopefully this will be a worthwhile tradeoff. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	1259cc31b2	bcachefs: Change where merging of interior btree nodes is trigger from Previously, we were doing btree node merging from bch2_btree_insert_node() - but this is called from the split path, when we're in the middle of creating new nodes and deleting new nodes and the iterators are in a weird state. Also, this means we're starting a new btree_update while in the middle of an existing one, and that's asking for deadlocks. Much simpler and saner to trigger btree node merging _after_ the whole btree node split path is finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e264b2f62a	bcachefs: Improve bch2_btree_update_start() bch2_btree_update_start() is now responsible for taking gc_lock and upgrading the iterator to lock parent nodes - greatly simplifying error handling and all of the callers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e751c01a8e	bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor\|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	4cf91b0270	bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	3bf57160c2	bcachefs: Fix packed bkey format calculation for new btree roots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	2da5d000b9	bcachefs: Generate better bkey formats when splitting nodes On btree node split, we weren't ensuring the min_key of the new larger node packs in the new format for this node. This triggers some painful slowpaths in the bset.c aux search tree code - this patch fixes that by calculating a new format for the new node with the new min_key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	0390ea8ad8	bcachefs: Drop bkey noops Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	a9d79c6e8b	bcachefs: Use pcpu mode of six locks for interior nodes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	f020bfcdb0	bcachefs: Use bch2_bpos_to_text() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	c052cf82f3	bcachefs: KEY_TYPE_discard is no longer used KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	f2785955bb	bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	1889ad5a12	bcachefs: Add code to scan for/rewite old btree nodes This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	d042b0402c	bcachefs: Add an option for metadata_target Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	51d2dfb82d	bcachefs: Add BTREE_PTR_RANGE_UPDATED This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	dcf64dfbbc	bcachefs: Fix btree node split after merge operations A btree node merge operation deletes a key in the parent node; if when inserting into the parent node we split the parent node, we can end up with a whiteout in the parent node that we don't want. The existing code drops them before doing the split, because they can screw up picking the pivot, but we forgot about the unwritten writeouts area - that needs to be cleared out too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	890e3f5bf7	bcachefs: Reserve some open buckets for btree allocations This reverts part of the change from "bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much" - it turns out we still should be reserving open buckets for btree node allocations, because otherwise data bucket allocations (especially with erasure coding enabled) can use up all our open buckets and we won't be able to do the metadata update that lets us release those open bucket references. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	719fe7fb55	bcachefs: Update transactional triggers interface to pass old & new keys This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	5db43418d5	bcachefs: Don't issue btree writes that weren't journalled If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	7bfbbd8802	bcachefs: Fix spurious alloc errors on forced shutdown Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	04e23a566f	bcachefs: Ensure we always have a journal pin in interior update path For the new nodes an interior btree update makes reachable, updates to those nodes may be journalled after the btree update starts but before the transactional part - where we make those nodes reachable. Those updates need to be kept in the journal until after the btree update completes, hence we should always get a journal pin at the start of the interior update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	7b48920770	bcachefs: Delete dead code The interior btree node update path has changed, this is no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	4e92cbb642	bcachefs: More debug code improvements Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	1c74cec10c	bcachefs: Add more debug checks tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	0b5c9f5940	bcachefs: Set preallocated transaction mem to avoid restarts this will reduce transaction restarts, from observation of tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	6a747c4683	bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	811d2bcd85	bcachefs: Drop typechecking from bkey_cmp_packed() This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	7807e14384	bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	760992aac8	bcachefs: Ensure we wake up threads locking node when reusing it Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	4fe7efa177	bcachefs: Fix an error path We were missing a 'goto retry' and continuing on with an error pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	a2b5313a39	bcachefs: Fix a faulty assertion Now that updates to interior nodes are journalled, we shouldn't be checking topology of interior nodes until we've finished replaying updates to that node. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	fff899b1d9	bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	937f503605	bcachefs: Use btree reserve when appropriate Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	2ca88e5ad9	bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	bd2bb273a0	bcachefs: Don't deadlock when btree node reuse changes lock ordering Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00

1 2 3 4 5 ...

326 Commits