Commit Graph

882 Commits

Author SHA1 Message Date
Chao Yu 5bcde45578 f2fs: get rid of buffer_head use
Convert to use folio and related functionality.

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15 15:26:40 +00:00
Matthew Wilcox (Oracle) 1da86618bd
fs: Convert aops->write_begin to take a folio
Convert all callers from working on a page to working on one page
of a folio (support for working on an entire folio can come later).
Removes a lot of folio->page->folio conversions.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:33:21 +02:00
Matthew Wilcox (Oracle) a225800f32
fs: Convert aops->write_end to take a folio
Most callers have a folio, and most implementations operate on a folio,
so remove the conversion from folio->page->folio to fit through this
interface.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:32:02 +02:00
Matthew Wilcox (Oracle) dfd2e81d37
f2fs: Convert f2fs_write_begin() to use a folio
Fetch a folio from the page cache instead of a page and use it
throughout.  We still have to convert back to a page for calling
internal f2fs functions, but hopefully they will be converted soon.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:32:00 +02:00
Matthew Wilcox (Oracle) a0f858d450
f2fs: Convert f2fs_write_end() to use a folio
Convert the passed page to a folio and operate on that.
Replaces five calls to compound_head() with one.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07 11:32:00 +02:00
Chao Yu 1a0bd289a5 f2fs: atomic: fix to avoid racing w/ GC
Case #1:
SQLite App		GC Thread		Kworker		Shrinker
- f2fs_ioc_start_atomic_write

- f2fs_ioc_commit_atomic_write
 - f2fs_commit_atomic_write
  - filemap_write_and_wait_range
  : write atomic_file's data to cow_inode
								echo 3 > drop_caches
								to drop atomic_file's
								cache.
			- f2fs_gc
			 - gc_data_segment
			  - move_data_page
			   - set_page_dirty

						- writepages
						 - f2fs_do_write_data_page
						 : overwrite atomic_file's data
						   to cow_inode
  - f2fs_down_write(&fi->i_gc_rwsem[WRITE])
  - __f2fs_commit_atomic_write
  - f2fs_up_write(&fi->i_gc_rwsem[WRITE])

Case #2:
SQLite App		GC Thread		Kworker
- f2fs_ioc_start_atomic_write

						- __writeback_single_inode
						 - do_writepages
						  - f2fs_write_cache_pages
						   - f2fs_write_single_data_page
						    - f2fs_do_write_data_page
						    : write atomic_file's data to cow_inode
			- f2fs_gc
			 - gc_data_segment
			  - move_data_page
			   - set_page_dirty

						- writepages
						 - f2fs_do_write_data_page
						 : overwrite atomic_file's data to cow_inode
- f2fs_ioc_commit_atomic_write

In above cases racing in between atomic_write and GC, previous
data in atomic_file may be overwrited to cow_file, result in
data corruption.

This patch introduces PAGE_PRIVATE_ATOMIC_WRITE bit flag in page.private,
and use it to indicate that there is last dirty data in atomic file,
and the data should be writebacked into cow_file, if the flag is not
tagged in page, we should never write data across files.

Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-05 20:18:36 +00:00
Sunmin Jeong f18d007693 f2fs: use meta inode for GC of COW file
In case of the COW file, new updates and GC writes are already
separated to page caches of the atomic file and COW file. As some cases
that use the meta inode for GC, there are some race issues between a
foreground thread and GC thread.

To handle them, we need to take care when to invalidate and wait
writeback of GC pages in COW files as the case of using the meta inode.
Also, a pointer from the COW inode to the original inode is required to
check the state of original pages.

For the former, we can solve the problem by using the meta inode for GC
of COW files. Then let's get a page from the original inode in
move_data_block when GCing the COW file to avoid race condition.

Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10 22:48:20 +00:00
Sunmin Jeong b40a2b0037 f2fs: use meta inode for GC of atomic file
The page cache of the atomic file keeps new data pages which will be
stored in the COW file. It can also keep old data pages when GCing the
atomic file. In this case, new data can be overwritten by old data if a
GC thread sets the old data page as dirty after new data page was
evicted.

Also, since all writes to the atomic file are redirected to COW inodes,
GC for the atomic file is not working well as below.

f2fs_gc(gc_type=FG_GC)
  - select A as a victim segment
  do_garbage_collect
    - iget atomic file's inode for block B
    move_data_page
      f2fs_do_write_data_page
        - use dn of cow inode
        - set fio->old_blkaddr from cow inode
    - seg_freed is 0 since block B is still valid
  - goto gc_more and A is selected as victim again

To solve the problem, let's separate GC writes and updates in the atomic
file by using the meta inode for GC writes.

Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Cc: stable@vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10 22:48:04 +00:00
Daejun Park c82bc1ab2a f2fs: fix null reference error when checking end of zone
This patch fixes a potentially null pointer being accessed by
is_end_zone_blkaddr() that checks the last block of a zone
when f2fs is mounted as a single device.

Fixes: e067dc3c6b ("f2fs: maintain six open zones for zoned devices")
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10 22:47:57 +00:00
Jaegeuk Kim 6aeb084fa0 f2fs: clean up set REQ_RAHEAD given rac
Let's set REQ_RAHEAD per rac by single source.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-06-18 02:31:48 +00:00
Linus Torvalds 72ece20127 f2fs update for 6.10-rc1
In this round, we've tried to address some performance issues on zoned storage
 such as direct IO and write_hints. In addition, we've migrated some IO paths
 using folio. Meanwhile, there are multiple bug fixes in the compression paths,
 sanity check conditions, and error handlers.
 
 Enhancement:
  - allow direct io of pinned files for zoned storage
  - assign the write hint per stream by default
  - convert read paths and test_writeback to folio
  - avoid allocating WARM_DATA segment for direct IO
 
 Bug fix:
  - fix false alarm on invalid block address
  - fix to add missing iput() in gc_data_segment()
  - fix to release node block count in error path of f2fs_new_node_page()
  - compress: don't allow unaligned truncation on released compress inode
  - compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
  - compress: fix error path of inc_valid_block_count()
  - compress: fix to update i_compr_blocks correctly
  - fix block migration when section is not aligned to pow2
  - don't trigger OPU on pinfile for direct IO
  - fix to do sanity check on i_xattr_nid in sanity_check_inode()
  - write missing last sum blk of file pinning section
  - clear writeback when compression failed
  - fix to adjust appropirate defragment pg_end
 
 As usual, there are several minor code clean-ups, and fixes to manage missing
 corner cases in the error paths.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmZLpYcACgkQQBSofoJI
 UNJQTw/+NaY7a1EgkMUpBAzxrJMKHcuBtyG42QKqgk6new0XejQGjPHojL2nPrw/
 t5G9TsbZbkHNMuhAkkTZMH+DFg92QYhByJlq79fxzya0XyGH4OaY1i4u67FLu0Qz
 PS/UKRkEI2B9lH+bGwa//XNMDSnzcao46bNi1SFbCNPGzU1cS35uOy/YgAdFlqTM
 WKJmM/AcNir4xtL30tBCVU//0OTtzT8+5YFVyPTeFR4WACsF6eTJAre9938xw1Ef
 p6ed6Wl2GYehqgFrAdAF07veZ1hVDSRAAB/1Mu1WKnNp57VBRjJW3DFDyApf+fIe
 2KJIDJd9/ece3dycuiZP/LXPV0sODqOI1/5s9RbFVq/QAhTSME5xq8hNXTejdl28
 PV6M2tKcTKMRpykppQg/K/N9PaO5Q6oFz0xlrOsrGoAhT1YnZfJi/DmzCZCCwYxW
 jyZor/r+849yDDdjhB94ZaByvj5S3OVqgsaunnbMBcGy+DDe0rUMXvRzVK4gTcCF
 lSTSp895BggWXLyPuXVNTjC4GIbzVbEDaHILPicfbqi0h5OCXG8YybKHiRs+ss6z
 ZrKJQxSVVvhjyHTVcBhb/Nc1s7Fm7DkX+KjV9GV3gwzB+AlVIgPlwyMTc2fZp3ST
 dUbmBR5+g4UUz2v4v4ZStAGy9eUFktO89u/roet8/74ppklj73E=
 =3mwj
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've tried to address some performance issues on zoned
  storage such as direct IO and write_hints. In addition, we've migrated
  some IO paths using folio. Meanwhile, there are multiple bug fixes in
  the compression paths, sanity check conditions, and error handlers.

  Enhancements:
   - allow direct io of pinned files for zoned storage
   - assign the write hint per stream by default
   - convert read paths and test_writeback to folio
   - avoid allocating WARM_DATA segment for direct IO

  Bug fixes:
   - fix false alarm on invalid block address
   - fix to add missing iput() in gc_data_segment()
   - fix to release node block count in error path of
     f2fs_new_node_page()
   - compress:
       - don't allow unaligned truncation on released compress inode
       - cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
       - fix error path of inc_valid_block_count()
       - fix to update i_compr_blocks correctly
   - fix block migration when section is not aligned to pow2
   - don't trigger OPU on pinfile for direct IO
   - fix to do sanity check on i_xattr_nid in sanity_check_inode()
   - write missing last sum blk of file pinning section
   - clear writeback when compression failed
   - fix to adjust appropirate defragment pg_end

  As usual, there are several minor code clean-ups, and fixes to manage
  missing corner cases in the error paths"

* tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
  f2fs: initialize last_block_in_bio variable
  f2fs: Add inline to f2fs_build_fault_attr() stub
  f2fs: fix some ambiguous comments
  f2fs: fix to add missing iput() in gc_data_segment()
  f2fs: allow dirty sections with zero valid block for checkpoint disabled
  f2fs: compress: don't allow unaligned truncation on released compress inode
  f2fs: fix to release node block count in error path of f2fs_new_node_page()
  f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
  f2fs: compress: fix error path of inc_valid_block_count()
  f2fs: compress: fix typo in f2fs_reserve_compress_blocks()
  f2fs: compress: fix to update i_compr_blocks correctly
  f2fs: check validation of fault attrs in f2fs_build_fault_attr()
  f2fs: fix to limit gc_pin_file_threshold
  f2fs: remove unused GC_FAILURE_PIN
  f2fs: use f2fs_{err,info}_ratelimited() for cleanup
  f2fs: fix block migration when section is not aligned to pow2
  f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
  f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
  f2fs: fix to avoid allocating WARM_DATA segment for direct IO
  f2fs: remove redundant parameter in is_next_segment_free()
  ...
2024-05-20 13:23:43 -07:00
Wu Bo 16409fdbb8 f2fs: initialize last_block_in_bio variable
Initialize last_block_in_bio of struct f2fs_bio_info and clean up code.

Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-15 04:18:40 +00:00
Wu Bo aa4074e8fe f2fs: fix block migration when section is not aligned to pow2
As for zoned-UFS, f2fs section size is forced to zone size. And zone
size may not aligned to pow2.

Fixes: 859fca6b70 ("f2fs: swap: support migrating swapfile in aligned write mode")
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Wu Bo <bo.wu@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-05-09 00:43:03 +00:00
Matthew Wilcox (Oracle) 196ad49cd6 f2fs: convert f2fs_clear_page_cache_dirty_tag to use a folio
Removes uses of page_mapping() and page_index().

Link: https://lkml.kernel.org/r/20240423225552.4113447-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:47 -07:00
Chao Yu 48d180e2bf f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
Otherwise, it breaks pinfile's sematics.

Cc: Daeho Jeong <daeho43@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29 17:41:17 +00:00
Chao Yu a320b2f08b f2fs: fix to avoid allocating WARM_DATA segment for direct IO
If active_log is not 6, we never use WARM_DATA segment, let's
avoid allocating WARM_DATA segment for direct IO.

Signed-off-by: Yunlei He <heyunlei@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-29 17:41:17 +00:00
Chao Yu 92f750d847 f2fs: convert f2fs__page tracepoint class to use folio
Convert f2fs__page tracepoint class() and its instances to use folio
and related functionality, and rename it to f2fs__folio().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:57:10 +00:00
Chao Yu 96ea46f30b f2fs: convert f2fs_read_inline_data() to use folio
Convert f2fs_read_inline_data() to use folio and related
functionality, and also convert its caller to use folio.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:57:03 +00:00
Chao Yu ed54eed355 f2fs: convert f2fs_read_single_page() to use folio
Convert f2fs_read_single_page() to use folio and related
functionality.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:56:52 +00:00
Chao Yu db92e6c729 f2fs: convert f2fs_mpage_readpages() to use folio
Convert f2fs_mpage_readpages() to use folio and related
functionality.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:56:37 +00:00
Jaegeuk Kim 7643f3fe27 f2fs: assign the write hint per stream by default
This reverts commit 930e260763 ("f2fs: remove obsolete whint_mode"), as we
decide to pass write hints to the disk.

Cc: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:56:13 +00:00
Jaegeuk Kim 16778aea91 f2fs: use folio_test_writeback
Let's convert PageWriteback to folio_test_writeback.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Chao Yu 9f0f6bf427 f2fs: support to map continuous holes or preallocated address
This patch supports to map continuous holes or preallocated addresses
to improve performace of lookuping mapping info during read DIO.

[testcase 1]
xfs_io -f /mnt/f2fs/hole -c "truncate 1m" -c "fsync"
xfs_io -d /mnt/f2fs/hole -c "pread -b 1m 0 1m"

[before]
f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
......
f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576

[after]
f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576

[testcase 2]
xfs_io -f /mnt/f2fs/preallocated -c "falloc 0 1m" -c "fsync"
xfs_io -d /mnt/f2fs/preallocated -c "pread -b 1m 0 1m"

[before]
f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
......
f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576

[after]
f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0xffffffff, len = 0x100, flags = 4, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:42 +00:00
Chao Yu 33e62cd7b4 f2fs: multidev: fix to recognize valid zero block address
As reported by Yi Zhang in mailing list [1], kernel warning was catched
during zbd/010 test as below:

./check zbd/010
zbd/010 (test gap zone support with F2FS)                    [failed]
    runtime    ...  3.752s
    something found in dmesg:
    [ 4378.146781] run blktests zbd/010 at 2024-02-18 11:31:13
    [ 4378.192349] null_blk: module loaded
    [ 4378.209860] null_blk: disk nullb0 created
    [ 4378.413285] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
    [ 4378.422334] scsi host15: scsi_debug: version 0191 [20210520]
                     dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
    [ 4378.434922] scsi 15:0:0:0: Direct-Access-ZBC Linux
scsi_debug       0191 PQ: 0 ANSI: 7
    [ 4378.443343] scsi 15:0:0:0: Power-on or device reset occurred
    [ 4378.449371] sd 15:0:0:0: Attached scsi generic sg5 type 20
    [ 4378.449418] sd 15:0:0:0: [sdf] Host-managed zoned block device
    ...
    (See '/mnt/tests/gitlab.com/api/v4/projects/19168116/repository/archive.zip/storage/blktests/blk/blktests/results/nodev/zbd/010.dmesg'

WARNING: CPU: 22 PID: 44011 at fs/iomap/iter.c:51
CPU: 22 PID: 44011 Comm: fio Not tainted 6.8.0-rc3+ #1
RIP: 0010:iomap_iter+0x32b/0x350
Call Trace:
 <TASK>
 __iomap_dio_rw+0x1df/0x830
 f2fs_file_read_iter+0x156/0x3d0 [f2fs]
 aio_read+0x138/0x210
 io_submit_one+0x188/0x8c0
 __x64_sys_io_submit+0x8c/0x1a0
 do_syscall_64+0x86/0x170
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

Shinichiro Kawasaki helps to analyse this issue and proposes a potential
fixing patch in [2].

Quoted from reply of Shinichiro Kawasaki:

"I confirmed that the trigger commit is dbf8e63f48 as Yi reported. I took a
look in the commit, but it looks fine to me. So I thought the cause is not
in the commit diff.

I found the WARN is printed when the f2fs is set up with multiple devices,
and read requests are mapped to the very first block of the second device in the
direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached()
modify map->m_pblk as the physical block address from each block device. It
becomes zero when it is mapped to the first block of the device. However,
f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the
whole f2fs, across the all block devices. It compares map->m_pblk against
NULL_ADDR == 0, then go into the unexpected branch and sets the invalid
iomap->length. The WARN catches the invalid iomap->length.

This WARN is printed even for non-zoned block devices, by following steps.

 - Create two (non-zoned) null_blk devices memory backed with 128MB size each:
   nullb0 and nullb1.
 # mkfs.f2fs /dev/nullb0 -c /dev/nullb1
 # mount -t f2fs /dev/nullb0 "${mount_dir}"
 # dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192
 # dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct

..."

So, the root cause of this issue is: when multi-devices feature is on,
f2fs_map_blocks() may return zero blkaddr in non-primary device, which is
a verified valid block address, however, f2fs_iomap_begin() treats it as
an invalid block address, and then it triggers the warning in iomap
framework code.

Finally, as discussed, we decide to use a more simple and direct way that
checking (map.m_flags & F2FS_MAP_MAPPED) condition instead of
(map.m_pblk != NULL_ADDR) to fix this issue.

Thanks a lot for the effort of Yi Zhang and Shinichiro Kawasaki on this
issue.

[1] https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
[2] https://lore.kernel.org/linux-f2fs-devel/gngdj77k4picagsfdtiaa7gpgnup6fsgwzsltx6milmhegmjff@iax2n4wvrqye/

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 1517c1a7a4 ("f2fs: implement iomap operations")
Fixes: 8d3c1fa3fa ("f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:38 +00:00
Chao Yu ac7805be6c f2fs: introduce map_is_mergeable() for cleanup
No logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-26 16:56:44 +00:00
Zhiguo Niu 31f85ccc84 f2fs: unify the error handling of f2fs_is_valid_blkaddr
There are some cases of f2fs_is_valid_blkaddr not handled as
ERROR_INVALID_BLKADDR,so unify the error handling about all of
f2fs_is_valid_blkaddr.
Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-12 18:25:17 -07:00
Chao Yu f1e7646a8c f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
This patch exchangs position of f2fs_precache_extents() and
filemap_fdatawrite(), so that f2fs_precache_extents() can load
extent info after physical addresses of all data are fixed.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-04 09:51:52 -08:00
Chao Yu 8249aac1b0 f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
In f2fs_migrate_blocks(), when traversing blocks in last section,
blkofs_end should be (start_blk + blkcnt - 1) % blk_per_sec, fix it.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-04 09:51:52 -08:00
Chao Yu 7d009e048d f2fs: fix to handle segment allocation failure correctly
If CONFIG_F2FS_CHECK_FS is off, and for very rare corner case that
we run out of free segment, we should not panic kernel, instead,
let's handle such error correctly in its caller.

Signed-off-by: Chao Yu <chao@kernel.org>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29 08:34:34 -08:00
Daeho Jeong 9703d69d9d f2fs: support file pinning for zoned devices
Support file pinning with conventional storage area for zoned devices

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-27 09:41:15 -08:00
Jaegeuk Kim 87161a2b0a f2fs: deprecate io_bits
Let's deprecate an unused io_bits feature to save CPU cycles and memory.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-20 11:08:57 -08:00
Wenjie Qi c2034ef619 f2fs: fix NULL pointer dereference in f2fs_submit_page_write()
BUG: kernel NULL pointer dereference, address: 0000000000000014
RIP: 0010:f2fs_submit_page_write+0x6cf/0x780 [f2fs]
Call Trace:
<TASK>
? show_regs+0x6e/0x80
? __die+0x29/0x70
? page_fault_oops+0x154/0x4a0
? prb_read_valid+0x20/0x30
? __irq_work_queue_local+0x39/0xd0
? irq_work_queue+0x36/0x70
? do_user_addr_fault+0x314/0x6c0
? exc_page_fault+0x7d/0x190
? asm_exc_page_fault+0x2b/0x30
? f2fs_submit_page_write+0x6cf/0x780 [f2fs]
? f2fs_submit_page_write+0x736/0x780 [f2fs]
do_write_page+0x50/0x170 [f2fs]
f2fs_outplace_write_data+0x61/0xb0 [f2fs]
f2fs_do_write_data_page+0x3f8/0x660 [f2fs]
f2fs_write_single_data_page+0x5bb/0x7a0 [f2fs]
f2fs_write_cache_pages+0x3da/0xbe0 [f2fs]
...
It is possible that other threads have added this fio to io->bio
and submitted the io->bio before entering f2fs_submit_page_write().
At this point io->bio = NULL.
If is_end_zone_blkaddr(sbi, fio->new_blkaddr) of this fio is true,
then an NULL pointer dereference error occurs at bio_get(io->bio).
The original code for determining zone end was after "out:",
which would have missed some fio who is zone end. I've moved
 this code before "skip:" to make sure it's done for each fio.

Fixes: e067dc3c6b ("f2fs: maintain six open zones for zoned devices")
Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-05 18:58:40 -08:00
Chao Yu 536af82115 f2fs: zone: fix to wait completion of last bio in zone correctly
It needs to check last zone_pending_bio and wait IO completion before
traverse next fio in io->io_list, otherwise, bio in next zone may be
submitted before all IO completion in current zone.

Fixes: e067dc3c6b ("f2fs: maintain six open zones for zoned devices")
Cc: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-05 18:58:40 -08:00
Chao Yu 5460749487 f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode
In reserve_compress_blocks(), we update blkaddrs of dnode in prior to
inc_valid_block_count(), it may cause inconsistent status bewteen
i_blocks and blkaddrs once inc_valid_block_count() fails.

To fix this issue, it needs to reverse their invoking order.

Fixes: c75488fb4d ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-05 18:58:39 -08:00
Chao Yu fd244524c2 f2fs: compress: fix to cover normal cluster write with cp_rwsem
When we overwrite compressed cluster w/ normal cluster, we should
not unlock cp_rwsem during f2fs_write_raw_pages(), otherwise data
will be corrupted if partial blocks were persisted before CP & SPOR,
due to cluster metadata wasn't updated atomically.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-05 18:58:38 -08:00
Chao Yu 8a430dd49e f2fs: compress: fix to guarantee persisting compressed blocks by CP
If data block in compressed cluster is not persisted with metadata
during checkpoint, after SPOR, the data may be corrupted, let's
guarantee to write compressed page by checkpoint.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-05 18:58:38 -08:00
Linus Torvalds 70d201a408 f2fs update for 6.8-rc1
In this series, we've some progress to support Zoned block device regarding to
 the power-cut recovery flow and enabling checkpoint=disable feature which is
 essential for Android OTA. Other than that, some patches touched sysfs entries
 and tracepoints which are minor, while several bug fixes on error handlers and
 compression flows are good to improve the overall stability.
 
 Enhancement:
  - enable checkpoint=disable for zoned block device
  - sysfs entries such as discard status, discard_io_aware, dir_level
  - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(), f2fs_new_inode()
  - use shared inode lock during f2fs_fiemap() and f2fs_seek_block()
 
 Bug fix:
  - address some power-cut recovery issues on zoned block device
  - handle errors and logics on do_garbage_collect(), f2fs_reserve_new_block(),
    f2fs_move_file_range(), f2fs_recover_xattr_data()
  - don't set FI_PREALLOCATED_ALL for partial write
  - fix to update iostat correctly in f2fs_filemap_fault()
  - fix to wait on block writeback for post_read case
  - fix to tag gcing flag on page during block migration
  - restrict max filesize for 16K f2fs
  - fix to avoid dirent corruption
  - explicitly null-terminate the xattr list
 
 There are also several clean-up patches to remove dead codes and better
 readability.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmWgMYcACgkQQBSofoJI
 UNJShxAAiYOXP7LPOAbPS1251BBgl8AIfs6u96hGTZkxOYsLHrBBbPbkWf3+nVbC
 JsBsVOe9K50rssK9kPg6XHPbmFGC8ERlyYcZTpONLfjtHOaQicbRnc//2qOvnCx8
 JOKcMVkZyLU/HbOCoUW6mzNCQlOl0aAV8tRcb7jwAxT0HgpjHTHxej/62gRcPKzC
 1E5w4iNTY//R97YGB36jPeGlKhbBZ7Ox1NM6AWadgE7B0j9rcYiBnPQllyeyaVVo
 XMCWRdl42tNMks2zgvU+vC41OrZ55bwLTQmVj3P1wnyKXig5/ZLQsrEcIGE+b2tP
 Mx+imCIRNYZqLwv5KYl6FU+KuLQGuZT1AjpP70Cb95WLyiYvVE6+xeiZg0fVTCEF
 3Hg7lEqMtAEAh1NEmJyYmbiAm9KQ3vHyse9ix++tfm+Xvgqj8b2flmzAtIFKpCBV
 J+yFI+A55IYuYZt7gzPoZLkQL0tULPf80TKQrzwlnHNtZ6T6FK2Nunu+Urwf1/Th
 s5IulqHJZxHU/Bgd6yQZUVfDILcXTkqNCpO3+qLZMPZizlH1hXiJFTeVzS6mnGvZ
 sK2LL4rEJ8EhDHU1F0SJzCWJcuR8cQ/t2zKYUygo9LvHbtEM1bZwC1Bqfolt7NrU
 +pgiM2wnE9yjkPdfZN1JgYZDq0/lGvxPQ5NAc/5ERX71QonRyn8=
 =MQl3
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs update from Jaegeuk Kim:
 "In this series, we've some progress to support Zoned block device
  regarding to the power-cut recovery flow and enabling
  checkpoint=disable feature which is essential for Android OTA.

  Other than that, some patches touched sysfs entries and tracepoints
  which are minor, while several bug fixes on error handlers and
  compression flows are good to improve the overall stability.

  Enhancements:
   - enable checkpoint=disable for zoned block device
   - sysfs entries such as discard status, discard_io_aware, dir_level
   - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(),
     f2fs_new_inode()
   - use shared inode lock during f2fs_fiemap() and f2fs_seek_block()

  Bug fixes:
   - address some power-cut recovery issues on zoned block device
   - handle errors and logics on do_garbage_collect(),
     f2fs_reserve_new_block(), f2fs_move_file_range(),
     f2fs_recover_xattr_data()
   - don't set FI_PREALLOCATED_ALL for partial write
   - fix to update iostat correctly in f2fs_filemap_fault()
   - fix to wait on block writeback for post_read case
   - fix to tag gcing flag on page during block migration
   - restrict max filesize for 16K f2fs
   - fix to avoid dirent corruption
   - explicitly null-terminate the xattr list

  There are also several clean-up patches to remove dead codes and
  better readability"

* tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits)
  f2fs: show more discard status by sysfs
  f2fs: Add error handling for negative returns from do_garbage_collect
  f2fs: Constrain the modification range of dir_level in the sysfs
  f2fs: Use wait_event_freezable_timeout() for freezable kthread
  f2fs: fix to check return value of f2fs_recover_xattr_data
  f2fs: don't set FI_PREALLOCATED_ALL for partial write
  f2fs: fix to update iostat correctly in f2fs_filemap_fault()
  f2fs: fix to check compress file in f2fs_move_file_range()
  f2fs: fix to wait on block writeback for post_read case
  f2fs: fix to tag gcing flag on page during block migration
  f2fs: add tracepoint for f2fs_vm_page_mkwrite()
  f2fs: introduce f2fs_invalidate_internal_cache() for cleanup
  f2fs: update blkaddr in __set_data_blkaddr() for cleanup
  f2fs: introduce get_dnode_addr() to clean up codes
  f2fs: delete obsolete FI_DROP_CACHE
  f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN
  f2fs: Restrict max filesize for 16K f2fs
  f2fs: let's finish or reset zones all the time
  f2fs: check write pointers when checkpoint=disable
  f2fs: fix write pointers on zoned device after roll forward
  ...
2024-01-11 20:39:15 -08:00
Christoph Hellwig 7437bb73f0 block: remove support for the host aware zone model
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):

 - host managed where non-conventional zones there is strict requirement
   to write at the write pointer, or else an error is returned
 - host aware where a write point is maintained if writes always happen
   at it, otherwise it is left in an under-defined state and the
   sequential write preferred zones behave like conventional zones
   (probably very badly performing ones, though)

Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it).  Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.

Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production.  Drop the support before it is too
late.  Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-19 20:17:43 -07:00
Chao Yu 55fdc1c24a f2fs: fix to wait on block writeback for post_read case
If inode is compressed, but not encrypted, it missed to call
f2fs_wait_on_block_writeback() to wait for GCed page writeback
in IPU write path.

Thread A				GC-Thread
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_block
					    - f2fs_submit_page_write
					     migrate normal cluster's block via
					     meta_inode's page cache
- f2fs_write_single_data_page
 - f2fs_do_write_data_page
  - f2fs_inplace_write_data
   - f2fs_submit_page_bio

IRQ
- f2fs_read_end_io
					IRQ
					old data overrides new data due to
					out-of-order GC and common IO.
					- f2fs_read_end_io

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 14:04:08 -08:00
Chao Yu 4e4f1eb994 f2fs: introduce f2fs_invalidate_internal_cache() for cleanup
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:34:55 -08:00
Chao Yu 59d0d4c3ea f2fs: update blkaddr in __set_data_blkaddr() for cleanup
This patch allows caller to pass blkaddr to f2fs_set_data_blkaddr()
and let __set_data_blkaddr() inside f2fs_set_data_blkaddr() to update
dn->data_blkaddr w/ last value of blkaddr.

Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:34:04 -08:00
Chao Yu 2020cd48e4 f2fs: introduce get_dnode_addr() to clean up codes
Just cleanup, no logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:32:01 -08:00
Chao Yu bb6e1c8fa5 f2fs: delete obsolete FI_DROP_CACHE
FI_DROP_CACHE was introduced in commit 1e84371ffe ("f2fs: change
atomic and volatile write policies") for volatile write feature,
after commit 7bc155fec5 ("f2fs: kill volatile write support"),
we won't support volatile write, let's delete related codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:30:08 -08:00
Chao Yu a539363613 f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN
Commit 3c6c2bebef ("f2fs: avoid punch_hole overhead when releasing
volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason:

This patch is to avoid some punch_hole overhead when releasing volatile
data. If volatile data was not written yet, we just can make the first
page as zero.

After commit 7bc155fec5 ("f2fs: kill volatile write support"), we
won't support volatile write, but it missed to remove obsolete
FI_FIRST_BLOCK_WRITTEN, delete it in this patch.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11 13:29:34 -08:00
Chao Yu 9458915036 f2fs: use shared inode lock during f2fs_fiemap()
f2fs_fiemap() will only traverse metadata of inode, let's use shared
inode lock for it to avoid unnecessary race on inode lock.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-17 09:31:18 -08:00
Daniel Rosenberg d7e9a9037d f2fs: Support Block Size == Page Size
This allows f2fs to support cases where the block size = page size for
both 4K and 16K block sizes. Other sizes should work as well, should the
need arise. This does not currently support 4K Block size filesystems if
the page size is 16K.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-10-04 16:53:36 -07:00
Chao Yu 943f7c6f98 f2fs: compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file()
If file has both cold and compress flag, during f2fs_ioc_compress_file(),
f2fs will trigger IPU for non-compress cluster and OPU for compress
cluster, so that, data of the file may be fragmented.

Fix it by always triggering OPU for IOs from user mode compression.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-12 13:49:34 -07:00
Chao Yu 2aaea533bf f2fs: compress: do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on
This patch covers sanity check logic on cluster w/ CONFIG_F2FS_CHECK_FS,
otherwise, there will be performance regression while querying cluster
mapping info.

Callers of f2fs_is_compressed_cluster() only care about whether cluster
is compressed or not, rather than # of valid blocks in compressed cluster,
so, let's adjust f2fs_is_compressed_cluster()'s logic according to
caller's requirement.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-12 13:49:33 -07:00
Chao Yu b0327c84e9 f2fs: compress: fix to avoid use-after-free on dic
Call trace:
 __memcpy+0x128/0x250
 f2fs_read_multi_pages+0x940/0xf7c
 f2fs_mpage_readpages+0x5a8/0x624
 f2fs_readahead+0x5c/0x110
 page_cache_ra_unbounded+0x1b8/0x590
 do_sync_mmap_readahead+0x1dc/0x2e4
 filemap_fault+0x254/0xa8c
 f2fs_filemap_fault+0x2c/0x104
 __do_fault+0x7c/0x238
 do_handle_mm_fault+0x11bc/0x2d14
 do_mem_abort+0x3a8/0x1004
 el0_da+0x3c/0xa0
 el0t_64_sync_handler+0xc4/0xec
 el0t_64_sync+0x1b4/0x1b8

In f2fs_read_multi_pages(), once f2fs_decompress_cluster() was called if
we hit cached page in compress_inode's cache, dic may be released, it needs
break the loop rather than continuing it, in order to avoid accessing
invalid dic pointer.

Fixes: 6ce19aff0b ("f2fs: compress: add compress_inode to cache compressed blocks")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-12 13:49:33 -07:00
Chao Yu c5d3f9b764 f2fs: compress: fix deadloop in f2fs_write_cache_pages()
With below mount option and testcase, it hangs kernel.

1. mount -t f2fs -o compress_log_size=5 /dev/vdb /mnt/f2fs
2. touch /mnt/f2fs/file
3. chattr +c /mnt/f2fs/file
4. dd if=/dev/zero of=/mnt/f2fs/file bs=1MB count=1
5. sync
6. dd if=/dev/zero of=/mnt/f2fs/file bs=111 count=11 conv=notrunc
7. sync

INFO: task sync:4788 blocked for more than 120 seconds.
      Not tainted 6.5.0-rc1+ #322
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync            state:D stack:0     pid:4788  ppid:509    flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x335/0xf80
 schedule+0x6f/0xf0
 wb_wait_for_completion+0x5e/0x90
 sync_inodes_sb+0xd8/0x2a0
 sync_inodes_one_sb+0x1d/0x30
 iterate_supers+0x99/0xf0
 ksys_sync+0x46/0xb0
 __do_sys_sync+0x12/0x20
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

The reason is f2fs_all_cluster_page_ready() assumes that pages array should
cover at least one cluster, otherwise, it will always return false, result
in deadloop.

By default, pages array size is 16, and it can cover the case cluster_size
is equal or less than 16, for the case cluster_size is larger than 16, let's
allocate memory of pages array dynamically.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-12 13:49:32 -07:00
Chao Yu 5118697f72 f2fs: fix error path of f2fs_submit_page_read()
In error path of f2fs_submit_page_read(), it missed to call
iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-23 10:24:40 -07:00
Minjie Du a842a90926 f2fs: increase usage of folio_next_index() helper
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().

Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:09 -07:00
Jaegeuk Kim d2d9bb3b6d f2fs: get out of a repeat loop when getting a locked data page
https://bugzilla.kernel.org/show_bug.cgi?id=216050

Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:07 -07:00
Christoph Hellwig 94c8431fb4 f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
Since commit a2ad63daa8 ("VFS: add FMODE_CAN_ODIRECT file flag") file
systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
wiring up a dummy direct_IO method to indicate support for direct I/O.

Do that for f2fs so that noop_direct_IO can eventually be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26 06:07:09 -07:00
Chao Yu f082c6b205 f2fs: fix potential deadlock due to unpaired node_write lock use
If S_NOQUOTA is cleared from inode during data page writeback of quota
file, it may miss to unlock node_write lock, result in potential
deadlock, fix to use the lock in paired.

Kworker					Thread
- writepage
 if (IS_NOQUOTA())
   f2fs_down_read(&sbi->node_write);
					- vfs_cleanup_quota_inode
					 - inode->i_flags &= ~S_NOQUOTA;
 if (IS_NOQUOTA())
   f2fs_up_read(&sbi->node_write);

Fixes: 79963d967b ("f2fs: shrink node_write lock coverage")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:04:07 -07:00
Daeho Jeong e067dc3c6b f2fs: maintain six open zones for zoned devices
To keep six open zone constraints, make them not to be open over six
open zones.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-23 18:37:38 -07:00
Li Zetao 1223e432d9 f2fs: remove redundant goto statement in f2fs_read_single_page()
After the commit "0a4ee518185", this "goto" statement was redundant,
remote it for clean code.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-08 11:18:04 -07:00
Chao Yu b62e71be21 f2fs: support errors=remount-ro|continue|panic mountoption
This patch supports errors=remount-ro|continue|panic mount option
for f2fs.

f2fs behaves as below in three different modes:
mode			continue	remount-ro	panic
access ops		normal		noraml		N/A
syscall errors		-EIO		-EROFS		N/A
mount option		rw		ro		N/A
pending dir write	keep		keep		N/A
pending non-dir write	drop		keep		N/A
pending node write	drop		keep		N/A
pending meta write	keep		keep		N/A

By default it uses "continue" mode.

[Yangtao helps to clean up function's name]
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-08 11:18:04 -07:00
Daeho Jeong 591fc34e1f f2fs: use cow inode data when updating atomic write
Need to use cow inode data content instead of the one in the original
inode, when we try to write the already updated atomic write files.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24 11:03:10 -07:00
Jaegeuk Kim bd90c5cd33 f2fs: relax sanity check if checkpoint is corrupted
1. extent_cache
 - let's drop the largest extent_cache
2. invalidate_block
 - don't show the warnings

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-18 09:05:54 -07:00
Chao Yu 635a52da86 f2fs: remove folio_detach_private() in .invalidate_folio and .release_folio
We have maintain PagePrivate and page_private and page reference
w/ {set,clear}_page_private_*, it doesn't need to call
folio_detach_private() in the end of .invalidate_folio and
.release_folio, remove it and use f2fs_bug_on instead.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-17 14:49:40 -07:00
Chao Yu c9b3649a93 f2fs: fix to drop all dirty pages during umount() if cp_error is set
xfstest generic/361 reports a bug as below:

f2fs_bug_on(sbi, sbi->fsync_node_num);

kernel BUG at fs/f2fs/super.c:1627!
RIP: 0010:f2fs_put_super+0x3a8/0x3b0
Call Trace:
 generic_shutdown_super+0x8c/0x1b0
 kill_block_super+0x2b/0x60
 kill_f2fs_super+0x87/0x110
 deactivate_locked_super+0x39/0x80
 deactivate_super+0x46/0x50
 cleanup_mnt+0x109/0x170
 __cleanup_mnt+0x16/0x20
 task_work_run+0x65/0xa0
 exit_to_user_mode_prepare+0x175/0x190
 syscall_exit_to_user_mode+0x25/0x50
 do_syscall_64+0x4c/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

During umount(), if cp_error is set, f2fs_wait_on_all_pages() should
not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise,
fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing
this bug.

In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs
to drop all dirty pages rather than redirtying them.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10 11:03:02 -07:00
Chao Yu 5cdb422c83 f2fs: fix to avoid use-after-free for cached IPU bio
xfstest generic/019 reports a bug:

kernel BUG at mm/filemap.c:1619!
RIP: 0010:folio_end_writeback+0x8a/0x90
Call Trace:
 end_page_writeback+0x1c/0x60
 f2fs_write_end_io+0x199/0x420
 bio_endio+0x104/0x180
 submit_bio_noacct+0xa5/0x510
 submit_bio+0x48/0x80
 f2fs_submit_write_bio+0x35/0x300
 f2fs_submit_merged_ipu_write+0x2a0/0x2b0
 f2fs_write_single_data_page+0x838/0x8b0
 f2fs_write_cache_pages+0x379/0xa30
 f2fs_write_data_pages+0x30c/0x340
 do_writepages+0xd8/0x1b0
 __writeback_single_inode+0x44/0x370
 writeback_sb_inodes+0x233/0x4d0
 __writeback_inodes_wb+0x56/0xf0
 wb_writeback+0x1dd/0x2d0
 wb_workfn+0x367/0x4a0
 process_one_work+0x21d/0x430
 worker_thread+0x4e/0x3c0
 kthread+0x103/0x130
 ret_from_fork+0x2c/0x50

The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write()
in f2fs_write_single_data_page() tries to flush IPU bio in cache, however
f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter,
result in submitting random cached bio which belong to other IO context,
then it will cause use-after-free issue, fix it by adding additional
validity check.

Fixes: 0b20fcec86 ("f2fs: cache global IPU bio")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10 11:01:49 -07:00
Yangtao Li c948be797d f2fs: remove else in f2fs_write_cache_pages()
As Christoph Hellwig point out:

	Please avoid the else by doing the goto in the branch.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29 15:17:39 -07:00
Yangtao Li 447286ebad f2fs: convert to use bitmap API
Let's use BIT() and GENMASK() instead of open it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29 15:17:37 -07:00
Linus Torvalds 103830683c f2fs-for-6.3-rc1
In this round, we've got a huge number of patches that improve code readability
 along with minor bug fixes, while we've mainly fixed some critical issues in
 recently-added per-block age-based extent_cache, atomic write support, and some
 folio cases.
 
 Enhancement:
  - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran
  - show ipu policy in debugfs
  - reduce stack memory cost by using bitfield in struct f2fs_io_info
  - introduce trace_f2fs_replace_atomic_write_block
  - enhance iostat support and adds flush commands
 
 Bug fix:
  - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  - fix kernel crash on the atomic write abort flow
  - call clear_page_private_reference in .{release,invalid}_folio
  - support .migrate_folio for compressed inode
  - fix cgroup writeback accounting with fs-layer encryption
  - retry to update the inode page given data corruption
  - fix kernel crash due to null io->bio
  - fix some bugs in per-block age-based extent_cache:
     a. wrong calculation of block age
     b. update age extent in f2fs_do_zero_range()
     c. update age extent correctly during truncation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmP9M/cACgkQQBSofoJI
 UNIX1Q//Yp+nDeY91H3IO6aMSHPqRoBBnVTr8ERtLUF0fuQbBkzcQE+t8cMSoYDM
 88sxoC+F7UnovNr84VeuKlHN4hYyAXuxj8OZetXI3XX+yiO+auEPtljGA0BaGwqL
 93lIg8nIQ2ing/oZ/4+h4dvpYCPrhKOQS6h1sHhIWlql6Wxwxq01uA47i0Ni6y/o
 D23JPFaDQfumN8qy1bm5xfLRhTQmaE35n5NhcBJpUD/rGK92NPXv7RLKPgZc3tKN
 tJmL+NXa3NNEx5e5TSP7JX+rhD7KlL5XlB/m8LbpPIx338I0pt0uzAc7nTIOlzM3
 DI0q3HXe9U3+JBHi+rKxkIniiRDmvhPx3NzdgcsYg75EnwdrNazsRHulBUEAXB4v
 ghHbx53OxA2uSnUVbkY4HXNYf7cYjrx5vbX0oqEx48btBCC8KGFIcHtI72tIBee3
 xdCxoM3e2AWijkFBOCkThXNuNNbdifQzn2e7xR7W+o0L9hwdR5t7tHhHT+cqG9Ox
 6UKWoZZUjYUAV/YQT5Qh6570GsGncM8gHAUMz7DVTIOB9wYkHtb0Q9tUmoQccwf5
 46r84c4/jxUQHt8SkXBBl1aiqHmR7EF17YvM5AXIdG3DqqOy70NWkM48hMOyIy2G
 eY/wTVJwpd6oDveQPtxaHn7Wo3jSPNUlidOfAYQ7Itm34VXY/zc=
 =yXLl
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've got a huge number of patches that improve code
  readability along with minor bug fixes, while we've mainly fixed some
  critical issues in recently-added per-block age-based extent_cache,
  atomic write support, and some folio cases.

  Enhancements:

   - add sysfs nodes to set last_age_weight and manage
     discard_io_aware_gran

   - show ipu policy in debugfs

   - reduce stack memory cost by using bitfield in struct f2fs_io_info

   - introduce trace_f2fs_replace_atomic_write_block

   - enhance iostat support and adds flush commands

  Bug fixes:

   - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"

   - fix kernel crash on the atomic write abort flow

   - call clear_page_private_reference in .{release,invalid}_folio

   - support .migrate_folio for compressed inode

   - fix cgroup writeback accounting with fs-layer encryption

   - retry to update the inode page given data corruption

   - fix kernel crash due to NULL io->bio

   - fix some bugs in per-block age-based extent_cache:
       - wrong calculation of block age
       - update age extent in f2fs_do_zero_range()
       - update age extent correctly during truncation"

* tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: drop unnecessary arg for f2fs_ioc_*()
  f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  f2fs: synchronize atomic write aborts
  f2fs: fix wrong segment count
  f2fs: replace si->sbi w/ sbi in stat_show()
  f2fs: export ipu policy in debugfs
  f2fs: make kobj_type structures constant
  f2fs: fix to do sanity check on extent cache correctly
  f2fs: add missing description for ipu_policy node
  f2fs: fix to set ipu policy
  f2fs: fix typos in comments
  f2fs: fix kernel crash due to null io->bio
  f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
  f2fs: add sysfs nodes to set last_age_weight
  f2fs: fix f2fs_show_options to show nogc_merge mount option
  f2fs: fix cgroup writeback accounting with fs-layer encryption
  f2fs: fix wrong calculation of block age
  f2fs: fix to update age extent in f2fs_do_zero_range()
  f2fs: fix to update age extent correctly during truncation
  f2fs: fix to avoid potential memory corruption in __update_iostat_latency()
  ...
2023-02-27 16:18:51 -08:00
Linus Torvalds 3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds 6639c3ce7f fsverity updates for 6.3
Fix the longstanding implementation limitation that fsverity was only
 supported when the Merkle tree block size, filesystem block size, and
 PAGE_SIZE were all equal.  Specifically, add support for Merkle tree
 block sizes less than PAGE_SIZE, and make ext4 support fsverity on
 filesystems where the filesystem block size is less than PAGE_SIZE.
 
 Effectively, this means that fsverity can now be used on systems with
 non-4K pages, at least on ext4.  These changes have been tested using
 the verity group of xfstests, newly updated to cover the new code paths.
 
 Also update fs/verity/ to support verifying data from large folios.
 There's also a similar patch for fs/crypto/, to support decrypting data
 from large folios, which I'm including in this pull request to avoid a
 merge conflict between the fscrypt and fsverity branches.
 
 There will be a merge conflict in fs/buffer.c with some of the foliation
 work in the mm tree.  Please use the merge resolution from linux-next.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCY/KJtRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/A/AP0RUlCClBRuHwXPRG0we8R1L153ga4s
 Vl+xRpCr+SswXwEAiOEpYN5cXoVKzNgxbEXo2pQzxi5lrpjZgUI6CL3DuQs=
 =ZRFX
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity updates from Eric Biggers:
 "Fix the longstanding implementation limitation that fsverity was only
  supported when the Merkle tree block size, filesystem block size, and
  PAGE_SIZE were all equal.

  Specifically, add support for Merkle tree block sizes less than
  PAGE_SIZE, and make ext4 support fsverity on filesystems where the
  filesystem block size is less than PAGE_SIZE.

  Effectively, this means that fsverity can now be used on systems with
  non-4K pages, at least on ext4. These changes have been tested using
  the verity group of xfstests, newly updated to cover the new code
  paths.

  Also update fs/verity/ to support verifying data from large folios.

  There's also a similar patch for fs/crypto/, to support decrypting
  data from large folios, which I'm including in here to avoid a merge
  conflict between the fscrypt and fsverity branches"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fscrypt: support decrypting data from large folios
  fsverity: support verifying data from large folios
  fsverity.rst: update git repo URL for fsverity-utils
  ext4: allow verity with fs block size < PAGE_SIZE
  fs/buffer.c: support fsverity in block_read_full_folio()
  f2fs: simplify f2fs_readpage_limit()
  ext4: simplify ext4_readpage_limit()
  fsverity: support enabling with tree block size < PAGE_SIZE
  fsverity: support verification with tree block size < PAGE_SIZE
  fsverity: replace fsverity_hash_page() with fsverity_hash_block()
  fsverity: use EFBIG for file too large to enable verity
  fsverity: store log2(digest_size) precomputed
  fsverity: simplify Merkle tree readahead size calculation
  fsverity: use unsigned long for level_start
  fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
  fsverity: pass pos and size to ->write_merkle_tree_block
  fsverity: optimize fsverity_cleanup_inode() on non-verity files
  fsverity: optimize fsverity_prepare_setattr() on non-verity files
  fsverity: optimize fsverity_file_open() on non-verity files
2023-02-20 12:33:41 -08:00
Jinyoung CHOI 146949defd f2fs: fix typos in comments
This patch is to fix typos in f2fs files.

Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07 10:39:28 -08:00
Jaegeuk Kim 267c159f9c f2fs: fix kernel crash due to null io->bio
We should return when io->bio is null before doing anything. Otherwise, panic.

BUG: kernel NULL pointer dereference, address: 0000000000000010
RIP: 0010:__submit_merged_write_cond+0x164/0x240 [f2fs]
Call Trace:
 <TASK>
 f2fs_submit_merged_write+0x1d/0x30 [f2fs]
 commit_checkpoint+0x110/0x1e0 [f2fs]
 f2fs_write_checkpoint+0x9f7/0xf00 [f2fs]
 ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs]
 __checkpoint_and_complete_reqs+0x84/0x190 [f2fs]
 ? preempt_count_add+0x82/0xc0
 ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs]
 issue_checkpoint_thread+0x4c/0xf0 [f2fs]
 ? __pfx_autoremove_wake_function+0x10/0x10
 kthread+0xff/0x130
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>

Cc: stable@vger.kernel.org # v5.18+
Fixes: 64bf0eef01 ("f2fs: pass the bio operation to bio_alloc_bioset")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07 10:39:28 -08:00
Yangtao Li d9bac032ac f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
Convert to use iostat_lat_type as parameter instead of raw number.
BTW, move NUM_PREALLOC_IOSTAT_CTXS to the header file, adjust
iostat_lat[{0,1,2}] to iostat_lat[{READ_IO,WRITE_SYNC_IO,WRITE_ASYNC_IO}]
in tracepoint function, and rename iotype to page_type to match the definition.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07 10:39:28 -08:00
Eric Biggers 844545c51a f2fs: fix cgroup writeback accounting with fs-layer encryption
When writing a page from an encrypted file that is using
filesystem-layer encryption (not inline encryption), f2fs encrypts the
pagecache page into a bounce page, then writes the bounce page.

It also passes the bounce page to wbc_account_cgroup_owner().  That's
incorrect, because the bounce page is a newly allocated temporary page
that doesn't have the memory cgroup of the original pagecache page.
This makes wbc_account_cgroup_owner() not account the I/O to the owner
of the pagecache page as it should.

Fix this by always passing the pagecache page to
wbc_account_cgroup_owner().

Fixes: 578c647879 ("f2fs: implement cgroup writeback support")
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-05 19:34:21 -08:00
Vishal Moola (Oracle) 1cd98ee747 f2fs: convert f2fs_write_cache_pages() to use filemap_get_folios_tag()
Convert the function to use a folio_batch instead of pagevec.  This is in
preparation for the removal of find_get_pages_range_tag().

Also modified f2fs_all_cluster_page_ready to take in a folio_batch instead
of pagevec.  This does NOT support large folios.  The function currently
only utilizes folios of size 1 so this shouldn't cause any issues right
now.

This version of the patch limits the number of pages fetched to
F2FS_ONSTACK_PAGES.  If that ever happens, update the start index here
since filemap_get_folios_tag() updates the index to be after the last
found folio, not necessarily the last used page.

Link: https://lkml.kernel.org/r/20230104211448.4804-15-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:16 -08:00
Chao Yu 2eae077e6e f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_info
This patch tries to use bitfield in struct f2fs_io_info to improve
memory usage.

struct f2fs_io_info {
...
	unsigned int need_lock:8;	/* indicate we need to lock cp_rwsem */
	unsigned int version:8;		/* version of the node */
	unsigned int submitted:1;	/* indicate IO submission */
	unsigned int in_list:1;		/* indicate fio is in io_list */
	unsigned int is_por:1;		/* indicate IO is from recovery or not */
	unsigned int retry:1;		/* need to reallocate block address */
	unsigned int encrypted:1;	/* indicate file is encrypted */
	unsigned int post_read:1;	/* require post read */
...
};

After this patch, size of struct f2fs_io_info reduces from 136 to 120.

[Nathan: fix a compile warning (single-bit-bitfield-constant-conversion)]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-02 13:37:16 -08:00
Yangtao Li c40e15a9a5 f2fs: merge f2fs_show_injection_info() into time_to_inject()
There is no need to additionally use f2fs_show_injection_info()
to output information. Concatenate time_to_inject() and
__time_to_inject() via a macro. In the new __time_to_inject()
function, pass in the caller function name and parent function.

In this way, we no longer need the f2fs_show_injection_info() function,
and let's remove it.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-11 11:15:19 -08:00
Eric Biggers feb0576a36 f2fs: simplify f2fs_readpage_limit()
Now that the implementation of FS_IOC_ENABLE_VERITY has changed to not
involve reading back Merkle tree blocks that were previously written,
there is no need for f2fs_readpage_limit() to allow for this case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://lore.kernel.org/r/20221223203638.41293-10-ebiggers@kernel.org
2023-01-09 19:06:09 -08:00
Chao Yu 8358014d6b f2fs: avoid to check PG_error flag
After below changes:
commit 14db0b3c7b ("fscrypt: stop using PG_error to track error status")
commit 98dc08bae6 ("fsverity: stop using PG_error to track error status")

There is no place in f2fs we will set PG_error flag in page, let's remove
other PG_error usage in f2fs, as a step towards freeing the PG_error flag
for other uses.

Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:41 -08:00
Yangtao Li fdb7ccc3f9 f2fs: introduce IS_F2FS_IPU_* macro
IS_F2FS_IPU_* macro can be used to identify whether
f2fs ipu related policies are enabled.

BTW, convert to use BIT() instead of open code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:41 -08:00
Christoph Hellwig fdbf69a7f5 f2fs: refactor the hole reporting and allocation logic in f2fs_map_blocks
Add a is_hole local variable to figure out if the block number might need
allocation, and untangle to logic to report the hole or fill it with a
block allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:38 -08:00
Christoph Hellwig 817c968b79 f2fs: factor out a f2fs_map_no_dnode
Factor out a helper to return a hole when no dnode was found.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:36 -08:00
Christoph Hellwig 0094e98bd1 f2fs: factor a f2fs_map_blocks_cached helper
Add a helper to deal with everything needed to return a f2fs_map_blocks
structure based on a lookup in the extent cache.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:31 -08:00
Christoph Hellwig cd8fc5226b f2fs: remove the create argument to f2fs_map_blocks
The create argument is always identicaly to map->m_may_create, so use
that consistently.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:29 -08:00
Christoph Hellwig ffdeab71d5 f2fs: remove f2fs_get_block
Fold f2fs_get_block into the two remaining callers to simplify the
call chain a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:26 -08:00
Christoph Hellwig 3cf684f2f8 f2fs: simplify __allocate_data_block
Just use a simple if block for the conditional call to
inc_valid_block_count.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:13 -08:00
Christoph Hellwig 44b0dfebbd f2fs: reflow prepare_write_begin
Reflow prepare_write_begin so that it reads more straight forward,
and so that there is one place that does an extent cache lookup
instead of three, two of which are hidden in f2fs_get_block calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:10 -08:00
Christoph Hellwig 2f51ade952 f2fs: f2fs_do_map_lock
Split f2fs_do_map_lock into a lock and unlock helper to make the code
using it easier to read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:07 -08:00
Christoph Hellwig cf342d3bed f2fs: add a f2fs_get_block_locked helper
This allows to keep the f2fs_do_map_lock based locking scheme
private to data.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:03 -08:00
Christoph Hellwig 04a91ab016 f2fs: add a f2fs_lookup_extent_cache_block helper
All but three callers of f2fs_lookup_extent_cache just want the block
address.  Add a small helper to simplify them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:00 -08:00
Christoph Hellwig bc29835a9d f2fs: split __submit_bio
Split __submit_bio into one function each for reads and writes, and a
helper for aligning writes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:12:56 -08:00
Christoph Hellwig da8c7fecc9 f2fs: rename F2FS_MAP_UNWRITTEN to F2FS_MAP_DELALLOC
NEW_ADDR blocks are purely in-memory preallocated blocks, and thus
equivalent to what the core FS code calls delayed allocations, and not
unwritten extents which do have on-disk blocks allocated from which
reads always return zeroes until they are converted to written status.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:12:53 -08:00
Christoph Hellwig 8d3c1fa3fa f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin
When testing with a mixed zoned / convention device combination, there
are regular but not 100% reproducible failures in xfstests generic/113
where the __is_valid_data_blkaddr assert hits due to finding a hole.

This seems to be because f2fs_map_blocks can set this flag on a hole
when it was found in the extent cache.

Rework f2fs_iomap_begin to just check the special block numbers directly.
This has the added benefits of the WARN_ON showing which invalid block
address we found, and being properly error out on delalloc blocks that
are confusingly called unwritten but not actually suitable for direct
I/O.

Fixes: 1517c1a7a4 ("f2fs: implement iomap operations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:12:28 -08:00
Chao Yu 6779b5db90 f2fs: fix to call clear_page_private_reference in .{release,invalid}_folio
b763f3bedc ("f2fs: restructure f2fs page.private layout") missed
to call clear_page_private_reference() in .{release,invalid}_folio,
fix it, though it's not a big deal since folio_detach_private() was
called to clear all privae info and reference count in the page.

BTW, remove page_private_reference() definition as it never be used.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-04 12:56:04 -08:00
Jaegeuk Kim fe59109ae5 f2fs: initialize extent_cache parameter
This can avoid confusing tracepoint values.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03 08:59:05 -08:00
Linus Torvalds 041fae9c10 f2fs-for-6.2-rc1
In this round, we've added two features: 1) F2FS_IOC_START_ATOMIC_REPLACE and
 2) per-block age-based extent cache. 1) is a variant of the previous atomic
 write feature which guarantees a per-file atomicity. It would be more efficient
 than AtomicFile implementation in Android framework. 2) implements another type
 of extent cache in memory which keeps the per-block age in a file, so that block
 allocator could split the hot and cold data blocks more accurately.
 
 Enhancement:
  - introduce F2FS_IOC_START_ATOMIC_REPLACE
  - refactor extent_cache to add a new per-block-age-based extent cache support
  - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs
  - add proc entry to show discard_plist info
  - optimize iteration over sparse directories
  - add barrier mount option
 
 Bug fix
  - avoid victim selection from previous victim section
  - fix to enable compress for newly created file if extension matches
  - set zstd compress level correctly
  - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot
  - correct i_size change for atomic writes
  - allow to read node block after shutdown
  - allow to set compression for inlined file
  - fix gc mode when gc_urgent_high_remaining is 1
  - should put a page when checking the summary info
 
 Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmOaTNUACgkQQBSofoJI
 UNIQnw//V7Q8DUHw5YNj04jutwXH2DNMLAmn/NJh5S6dIzy/LiywlSzVg53/0/FP
 4K577urUkIhgilRO+yncUMSnSQk7BluQvGSx4ja2AV+dpDomjxM3GwIacGzSvr7D
 VfVf8Vig10UEFrrtEEKtv1VFlYHAmo8lLpubzrZHV8aZFLHHYO2fakQhPu8BYsaz
 eGCJwxjvTZcQUPkaeG9tWto3ChI3F6PzreiQ5TztHhLWSEgw/o0qijpsc+2SthaV
 my7uGjeBY8EGPeSYbeCxRtdx8g8Qu11K3ISuDj8zBybmjG3IWOGt1CVcrY6tZbal
 aL70CMtHkMqMn03VqbpCTqBtdWNMrrw5sYSL3qXIUdXlX/2yJBh9fLAeNxKNs5Nu
 6veSb2WgYMHqIsClkAAcP0xJ8g6kodGoG60wVr4ek0Vdt4osaQqwq+bnffpwwxtQ
 F+7aRuinv+rdrHJ4CuFXAmHPKh2lBe2lTTWZEKg2RptTxZ5DhD2Qn6x1khPD2GFA
 mG2Aeiq6PVxxEeIO+w/VBCuAgpGTFV2N/ZIF8VfjFNdWiN5OGLWQNHC2KGj2G2uV
 +fA+B91txQWtjY9h72YJb2+aGIixcnLY24ni4mDgDItqtpCB4PW56W8cbnbv9Pl+
 aXAWdADqJdDyllHoVB/JQ24gr2fATJGRIDeYDnw+vPP4f5ZT5vg=
 =f00t
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE
  and a per-block age-based extent cache.

  F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic
  write feature which guarantees a per-file atomicity. It would be more
  efficient than AtomicFile implementation in Android framework.

  The per-block age-based extent cache implements another type of extent
  cache in memory which keeps the per-block age in a file, so that block
  allocator could split the hot and cold data blocks more accurately.

  Enhancements:
   - introduce F2FS_IOC_START_ATOMIC_REPLACE
   - refactor extent_cache to add a new per-block-age-based extent cache support
   - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs
   - add proc entry to show discard_plist info
   - optimize iteration over sparse directories
   - add barrier mount option

  Bug fixes:
   - avoid victim selection from previous victim section
   - fix to enable compress for newly created file if extension matches
   - set zstd compress level correctly
   - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot
   - correct i_size change for atomic writes
   - allow to read node block after shutdown
   - allow to set compression for inlined file
   - fix gc mode when gc_urgent_high_remaining is 1
   - should put a page when checking the summary info

  Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and
  doc"

* tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
  f2fs: reset wait_ms to default if any of the victims have been selected
  f2fs: fix some format WARNING in debug.c and sysfs.c
  f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()
  f2fs: fix iostat parameter for discard
  f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache
  f2fs: add block_age-based extent cache
  f2fs: allocate the extent_cache by default
  f2fs: refactor extent_cache to support for read and more
  f2fs: remove unnecessary __init_extent_tree
  f2fs: move internal functions into extent_cache.c
  f2fs: specify extent cache for read explicitly
  f2fs: introduce f2fs_is_readonly() for readability
  f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro
  f2fs: do some cleanup for f2fs module init
  MAINTAINERS: Add f2fs bug tracker link
  f2fs: remove the unused flush argument to change_curseg
  f2fs: open code allocate_segment_by_default
  f2fs: remove struct segment_allocation default_salloc_ops
  f2fs: introduce discard_urgent_util sysfs node
  f2fs: define MIN_DISCARD_GRANULARITY macro
  ...
2022-12-14 15:27:57 -08:00
Jaegeuk Kim e7547daccd f2fs: refactor extent_cache to support for read and more
This patch prepares extent_cache to be ready for addition.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:53:56 -08:00
Yangtao Li 870af777da f2fs: do some cleanup for f2fs module init
Just for cleanup, no functional changes.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08 09:32:20 -08:00
Eric Biggers 98dc08bae6 fsverity: stop using PG_error to track error status
As a step towards freeing the PG_error flag for other uses, change ext4
and f2fs to stop using PG_error to track verity errors.  Instead, if a
verity error occurs, just mark the whole bio as failed.  The coarser
granularity isn't really a problem since it isn't any worse than what
the block layer provides, and errors from a multi-page readahead aren't
reported to applications unless a single-page read fails too.

f2fs supports compression, which makes the f2fs changes a bit more
complicated than desired, but the basic premise still works.

Note: there are still a few uses of PageError in f2fs, but they are on
the write path, so they are unrelated and this patch doesn't touch them.

Reviewed-by: Chao Yu <chao@kernel.org>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
2022-11-28 23:15:10 -08:00
Daeho Jeong 41e8f85a75 f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACE
introduce a new ioctl to replace the whole content of a file atomically,
which means it induces truncate and content update at the same time.
We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with
F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with
F2FS_IOC_ABORT_ATOMIC_WRITE.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28 12:46:23 -08:00
Chao Yu 59237a2177 f2fs: optimize iteration over sparse directories
Wei Chen reports a kernel bug as blew:

INFO: task syz-executor.0:29056 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc5 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0  state:D stack:14632 pid:29056 ppid:  6574 flags:0x00000004
Call Trace:
 __schedule+0x4a1/0x1720
 schedule+0x36/0xe0
 rwsem_down_write_slowpath+0x322/0x7a0
 fscrypt_ioctl_set_policy+0x11f/0x2a0
 __f2fs_ioctl+0x1a9f/0x5780
 f2fs_ioctl+0x89/0x3a0
 __x64_sys_ioctl+0xe8/0x140
 do_syscall_64+0x34/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Eric did some investigation on this issue, quoted from reply of Eric:

"Well, the quality of this bug report has a lot to be desired (not on
upstream kernel, reproducer is full of totally irrelevant stuff, not
sent to the mailing list of the filesystem whose disk image is being
fuzzed, etc.).  But what is going on is that f2fs_empty_dir() doesn't
consider the case of a directory with an extremely large i_size on a
malicious disk image.

Specifically, the reproducer mounts an f2fs image with a directory
that has an i_size of 14814520042850357248, then calls
FS_IOC_SET_ENCRYPTION_POLICY on it.

That results in a call to f2fs_empty_dir() to check whether the
directory is empty.  f2fs_empty_dir() then iterates through all
3616826182336513 blocks the directory allegedly contains to check
whether any contain anything.  i_rwsem is held during this, so
anything else that tries to take it will hang."

In order to solve this issue, let's use f2fs_get_next_page_offset()
to speed up iteration by skipping holes for all below functions:
- f2fs_empty_dir
- f2fs_readdir
- find_in_level

The way why we can speed up iteration was described in
'commit 3cf4574705 ("f2fs: introduce get_next_page_offset to speed
up SEEK_DATA")'.

Meanwhile, in f2fs_empty_dir(), let's use f2fs_find_data_page()
instead f2fs_get_lock_data_page(), due to i_rwsem was held in
caller of f2fs_empty_dir(), there shouldn't be any races, so it's
fine to not lock dentry page during lookuping dirents in the page.

Link: https://lore.kernel.org/lkml/536944df-a0ae-1dd8-148f-510b476e1347@kernel.org/T/
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11 09:48:24 -08:00
Linus Torvalds 5d170fe435 f2fs-for-6.1-rc1
This round looks fairly small comparing to the previous updates which includes
 mostly minor bug fixes. Nevertheless, as we've still interested in improving
 the stability, Chao added some debugging methods to diagnoze subtle runtime
 inconsistency problem.
 
 Enhancement
  - store all the corruption or failure reasons in superblock
  - detect meta inode, summary info, and block address inconsistency
  - increase the limit for reserve_root for low-end devices
  - add the number of compressed IO in iostat
 
 Bug fix
  - DIO write fix for zoned devices
  - do out-of-place writes for cold files
  - fix some stat updates (FS_CP_DATA_IO, dirty page count)
  - fix race condition on setting FI_NO_EXTENT flag
  - fix data races when freezing super
  - fix wrong continue condition check in GC
  - do not allow ATGC for LFS mode
 
 In addition, there're some code enhancement and clean-ups as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmNEVIkACgkQQBSofoJI
 UNL/Qg//eu7k196yIKflDZmp5aJbb5ybpFmh7XkPiqAV17ns+R2uLGq68BvTs+Tg
 rqCjB7j2kkBh1kN32R7aGcx6tcbHjWc94pi59YTGQ6+pwkop3KJxFHSwAaUw6y34
 8NZwmsnrm9rv0A0QPhQPK19yWmG/2smUE9b/u7M3+20I1WANaxdS/vOKbZz/amOu
 f/BvsIIGS7Zzm9OpBCvGmq9Qpd83jlH6PuYGTC/OVbCrUiAJEmwN8wGsKP/9qB/5
 KxVpdlh3vxulS6ixNbMu2qw9GBAQpAOz50+eDL5ZtGvGIQNHZRpGlfpJoW1lz0EO
 4fJtpf5OMGqUbNaPCTG4qQGYAtKWA9YnFeWSS7RViQ6MryRXZMK8ka5eIe5Qblcf
 AXD/eU2gKzOu0fuvdBRCt/wTSb4gY8sMNhe4psDsZxfhaYIpX8Ee/XVa4d+Z4frg
 irN9gid1k3laMTx9dwJL8m7gIFvy3pak6l3B0bA69fAXd3faI40enuyfubFxnDet
 OuRNxj8j3J5C140ag5KOuBCRub2/aPaj9YSQqUstf64d8FzN/Ypn5iVPTs2DP/3D
 bcAFBwCS2+MCsk9+ra0WldZ5awdd6CRHDkvaYeDEuLCaLHUCo6CXe3aIyWawJBvJ
 RnghKNv82RIV+rQlI1/sg8lseoDnEZTp5iwDGw/qZ+ZUyn05apM=
 =aZ9y
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This round looks fairly small comparing to the previous updates and
  includes mostly minor bug fixes. Nevertheless, as we've still
  interested in improving the stability, Chao added some debugging
  methods to diagnoze subtle runtime inconsistency problem.

  Enhancements:
   - store all the corruption or failure reasons in superblock
   - detect meta inode, summary info, and block address inconsistency
   - increase the limit for reserve_root for low-end devices
   - add the number of compressed IO in iostat

  Bug fixes:
   - DIO write fix for zoned devices
   - do out-of-place writes for cold files
   - fix some stat updates (FS_CP_DATA_IO, dirty page count)
   - fix race condition on setting FI_NO_EXTENT flag
   - fix data races when freezing super
   - fix wrong continue condition check in GC
   - do not allow ATGC for LFS mode

  In addition, there're some code enhancement and clean-ups as usual"

* tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
  f2fs: change to use atomic_t type form sbi.atomic_files
  f2fs: account swapfile inodes
  f2fs: allow direct read for zoned device
  f2fs: support recording errors into superblock
  f2fs: support recording stop_checkpoint reason into super_block
  f2fs: remove the unnecessary check in f2fs_xattr_fiemap
  f2fs: introduce cp_status sysfs entry
  f2fs: fix to detect corrupted meta ino
  f2fs: fix to account FS_CP_DATA_IO correctly
  f2fs: code clean and fix a type error
  f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file
  f2fs: fix to do sanity check on summary info
  f2fs: port to vfs{g,u}id_t and associated helpers
  f2fs: fix to do sanity check on destination blkaddr during recovery
  f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT
  f2fs: fix race condition on setting FI_NO_EXTENT flag
  f2fs: remove redundant check in f2fs_sanity_check_cluster
  f2fs: add static init_idisk_time function to reduce the code
  f2fs: fix typo
  f2fs: fix wrong dirty page count when race between mmap and fallocate.
  ...
2022-10-10 20:28:41 -07:00