Commit Graph

156 Commits

Author SHA1 Message Date
Gao Xiang d53cd891f0 erofs: limit the level of fs stacking for file-backed mounts
Otherwise, it could cause potential kernel stack overflow (e.g., EROFS
mounting itself).

Reviewed-by: Sheng Yong <shengyong1@xiaomi.com>
Fixes: fb17675026 ("erofs: add file-backed mount support")
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-24 14:17:29 +08:00
Gao Xiang ebe4f3f6eb erofs: correct FSDAX detection
The detection of the primary device is skipped incorrectly
if the multiple or flattened feature is enabled.

It also fixes the FSDAX misdetection for non-block extra blobs.

Fixes: c6993c4cb9 ("erofs: Fallback to normal access if DAX is not supported on extra device")
Reported-and-tested-by: syzbot+31b8fb02cb8a25bd5e78@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/691af9f6.a70a0220.3124cb.0097.GAE@google.com
Cc: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-24 14:16:30 +08:00
Bo Liu (OpenAnolis) 1cf12c7177 erofs: Add support for FS_IOC_GETFSLABEL
Add support for reading to the erofs volume label from the
FS_IOC_GETFSLABEL ioctls.

Signed-off-by: Bo Liu (OpenAnolis) <liubo03@inspur.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-09-25 11:26:20 +08:00
Yuezhang Mo 181993bb0d erofs: fix runtime warning on truncate_folio_batch_exceptionals()
Commit 0e2f80afcfa6("fs/dax: ensure all pages are idle prior to
filesystem unmount") introduced the WARN_ON_ONCE to capture whether
the filesystem has removed all DAX entries or not and applied the
fix to xfs and ext4.

Apply the missed fix on erofs to fix the runtime warning:

[  5.266254] ------------[ cut here ]------------
[  5.266274] WARNING: CPU: 6 PID: 3109 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xff/0x260
[  5.266294] Modules linked in:
[  5.266999] CPU: 6 UID: 0 PID: 3109 Comm: umount Tainted: G S                  6.16.0+ #6 PREEMPT(voluntary)
[  5.267012] Tainted: [S]=CPU_OUT_OF_SPEC
[  5.267017] Hardware name: Dell Inc. OptiPlex 5000/05WXFV, BIOS 1.5.1 08/24/2022
[  5.267024] RIP: 0010:truncate_folio_batch_exceptionals+0xff/0x260
[  5.267076] Code: 00 00 41 39 df 7f 11 eb 78 83 c3 01 49 83 c4 08 41 39 df 74 6c 48 63 f3 48 83 fe 1f 0f 83 3c 01 00 00 43 f6 44 26 08 01 74 df <0f> 0b 4a 8b 34 22 4c 89 ef 48 89 55 90 e8 ff 54 1f 00 48 8b 55 90
[  5.267083] RSP: 0018:ffffc900013f36c8 EFLAGS: 00010202
[  5.267095] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  5.267101] RDX: ffffc900013f3790 RSI: 0000000000000000 RDI: ffff8882a1407898
[  5.267108] RBP: ffffc900013f3740 R08: 0000000000000000 R09: 0000000000000000
[  5.267113] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  5.267119] R13: ffff8882a1407ab8 R14: ffffc900013f3888 R15: 0000000000000001
[  5.267125] FS:  00007aaa8b437800(0000) GS:ffff88850025b000(0000) knlGS:0000000000000000
[  5.267132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  5.267138] CR2: 00007aaa8b3aac10 CR3: 000000024f764000 CR4: 0000000000f52ef0
[  5.267144] PKRU: 55555554
[  5.267150] Call Trace:
[  5.267154]  <TASK>
[  5.267181]  truncate_inode_pages_range+0x118/0x5e0
[  5.267193]  ? save_trace+0x54/0x390
[  5.267296]  truncate_inode_pages_final+0x43/0x60
[  5.267309]  evict+0x2a4/0x2c0
[  5.267339]  dispose_list+0x39/0x80
[  5.267352]  evict_inodes+0x150/0x1b0
[  5.267376]  generic_shutdown_super+0x41/0x180
[  5.267390]  kill_block_super+0x1b/0x50
[  5.267402]  erofs_kill_sb+0x81/0x90 [erofs]
[  5.267436]  deactivate_locked_super+0x32/0xb0
[  5.267450]  deactivate_super+0x46/0x60
[  5.267460]  cleanup_mnt+0xc3/0x170
[  5.267475]  __cleanup_mnt+0x12/0x20
[  5.267485]  task_work_run+0x5d/0xb0
[  5.267499]  exit_to_user_mode_loop+0x144/0x170
[  5.267512]  do_syscall_64+0x2b9/0x7c0
[  5.267523]  ? __lock_acquire+0x665/0x2ce0
[  5.267535]  ? __lock_acquire+0x665/0x2ce0
[  5.267560]  ? lock_acquire+0xcd/0x300
[  5.267573]  ? find_held_lock+0x31/0x90
[  5.267582]  ? mntput_no_expire+0x97/0x4e0
[  5.267606]  ? mntput_no_expire+0xa1/0x4e0
[  5.267625]  ? mntput+0x24/0x50
[  5.267634]  ? path_put+0x1e/0x30
[  5.267647]  ? do_faccessat+0x120/0x2f0
[  5.267677]  ? do_syscall_64+0x1a2/0x7c0
[  5.267686]  ? from_kgid_munged+0x17/0x30
[  5.267703]  ? from_kuid_munged+0x13/0x30
[  5.267711]  ? __do_sys_getuid+0x3d/0x50
[  5.267724]  ? do_syscall_64+0x1a2/0x7c0
[  5.267732]  ? irqentry_exit+0x77/0xb0
[  5.267743]  ? clear_bhb_loop+0x30/0x80
[  5.267752]  ? clear_bhb_loop+0x30/0x80
[  5.267765]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  5.267772] RIP: 0033:0x7aaa8b32a9fb
[  5.267781] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 83 0d 00 f7 d8
[  5.267787] RSP: 002b:00007ffd7c4c9468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[  5.267796] RAX: 0000000000000000 RBX: 00005a61592a8b00 RCX: 00007aaa8b32a9fb
[  5.267802] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005a61592b2080
[  5.267806] RBP: 00007ffd7c4c9540 R08: 00007aaa8b403b20 R09: 0000000000000020
[  5.267812] R10: 0000000000000001 R11: 0000000000000246 R12: 00005a61592a8c00
[  5.267817] R13: 0000000000000000 R14: 00005a61592b2080 R15: 00005a61592a8f10
[  5.267849]  </TASK>
[  5.267854] irq event stamp: 4721
[  5.267859] hardirqs last  enabled at (4727): [<ffffffff814abf50>] __up_console_sem+0x90/0xa0
[  5.267873] hardirqs last disabled at (4732): [<ffffffff814abf35>] __up_console_sem+0x75/0xa0
[  5.267884] softirqs last  enabled at (3044): [<ffffffff8132adb3>] kernel_fpu_end+0x53/0x70
[  5.267895] softirqs last disabled at (3042): [<ffffffff8132b5f4>] kernel_fpu_begin_mask+0xc4/0x120
[  5.267905] ---[ end trace 0000000000000000 ]---

Fixes: bde708f1a6 ("fs/dax: always remove DAX page-cache entries when breaking layouts")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Friendy Su <friendy.su@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-09-10 14:11:06 +08:00
Gao Xiang 0b96d9bed3 erofs: fix block count report when 48-bit layout is on
Fix incorrect shift order when combining the 48-bit block count.

Fixes: 2e1473d519 ("erofs: implement 48-bit block addressing for unencoded inodes")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250807082019.3093539-1-hsiangkao@linux.alibaba.com
2025-08-11 06:31:19 +08:00
Yuezhang Mo c6993c4cb9 erofs: Fallback to normal access if DAX is not supported on extra device
If using multiple devices, we should check if the extra device support
DAX instead of checking the primary device when deciding if to use DAX
to access a file.

If an extra device does not support DAX we should fallback to normal
access otherwise the data on that device will be inaccessible.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Friendy Su <friendy.su@sony.com>
Reviewed-by: Jacky Cao <jacky.cao@sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer@sony.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250804082030.3667257-2-Yuezhang.Mo@sony.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-08-11 06:02:19 +08:00
Chao Yu df0ce6cefa erofs: support to readahead dirent blocks in erofs_readdir()
This patch supports to readahead more blocks in erofs_readdir(), it can
enhance readdir performance in large direcotry.

readdir test in a large directory which contains 12000 sub-files.

		files_per_second
Before:		926385.54
After:		2380435.562

Meanwhile, let's introduces a new sysfs entry to control readahead
bytes to provide more flexible policy for readahead of readdir().
- location: /sys/fs/erofs/<disk>/dir_ra_bytes
- default value: 16384
- disable readahead: set the value to 0

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250721021352.2495371-1-chao@kernel.org
[ Gao Xiang: minor styling adjustment. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-07-24 19:44:08 +08:00
Bo Liu (OpenAnolis) 414091322c erofs: implement metadata compression
Thanks to the meta buffer infrastructure, metadata-compressed inodes are
just read from the metabox inode instead of the blockdevice (or backing
file) inode.

The same is true for shared extended attributes.

When metadata compression is enabled, inode numbers are divided from
on-disk NIDs because of non-LTS 32-bit application compatibility.

Co-developed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Bo Liu (OpenAnolis) <liubo03@inspur.com>
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250722003229.2121752-1-hsiangkao@linux.alibaba.com
2025-07-24 19:43:31 +08:00
Gao Xiang 5e744cb615 erofs: remove need_kmap in erofs_read_metabuf()
- need_kmap is always true except for a ztailpacking case; thus, just
   open-code that one;

 - The upcoming metadata compression will add a new boolean, so simplify
   this first.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250714090907.4095645-1-hsiangkao@linux.alibaba.com
2025-07-24 19:42:05 +08:00
Sheng Yong c36ec00d7f erofs: add 'fsoffset' mount option to specify filesystem offset
When attempting to use an archive file, such as APEX on android,
as a file-backed mount source, it fails because EROFS image within
the archive file does not start at offset 0. As a result, a loop
or a dm device is still needed to attach the image file at an
appropriate offset first. Similarly, if an EROFS image within a
block device does not start at offset 0, it cannot be mounted
directly either.

To address this issue, this patch adds a new mount option `fsoffset=x'
to accept a start offset for the primary device. The offset should be
aligned to the block size. EROFS will add this offset before performing
read requests.

Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Signed-off-by: Wang Shuai <wangshuai12@xiaomi.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250517090544.2687651-1-shengyong1@xiaomi.com
[ Gao Xiang: minor update on documentation and the error message. ]
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-22 11:57:57 +08:00
Sheng Yong 9748f2f54f erofs: avoid using multiple devices with different type
For multiple devices, both primary and extra devices should be the
same type. `erofs_init_device` has already guaranteed that if the
primary is a file-backed device, extra devices should also be
regular files.

However, if the primary is a block device while the extra device
is a file-backed device, `erofs_init_device` will get an ENOTBLK,
which is not treated as an error in `erofs_fc_get_tree`, and that
leads to an UAF:

  erofs_fc_get_tree
    get_tree_bdev_flags(erofs_fc_fill_super)
      erofs_read_superblock
        erofs_init_device  // sbi->dif0 is not inited yet,
                           // return -ENOTBLK
      deactivate_locked_super
        free(sbi)
    if (err is -ENOTBLK)
      sbi->dif0.file = filp_open()  // sbi UAF

So if -ENOTBLK is hitted in `erofs_init_device`, it means the
primary device must be a block device, and the extra device
is not a block device. The error can be converted to -EINVAL.

Fixes: fb17675026 ("erofs: add file-backed mount support")
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://lore.kernel.org/r/20250515014837.3315886-1-shengyong1@xiaomi.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-15 11:20:25 +08:00
Hongbo Li 510de8363f erofs: fix file handle encoding for 64-bit NIDs
EROFS uses NID to indicate the on-disk inode offset, which can
exceed 32 bits. However, the default encode_fh uses the ino32,
thus it doesn't work if the image is larger than 128GiB.

Let's introduce our own helpers to encode file handles.

It's easy to reproduce:
  1. prepare an erofs image with nid bigger than U32_MAX
  2. mount -t erofs foo.img /mnt/erofs
  3. set exportfs with configuration: /mnt/erofs *(rw,sync,
     no_root_squash)
  4. mount -t nfs $IP:/mnt/erofs /mnt/nfs
  5. md5sum /mnt/nfs/foo # foo is the file which nid bigger
     than U32_MAX.  # you will get ESTALE error.

In the case of overlayfs, the underlying filesystem's file
handle is encoded in ovl_fb.fid, which is similar to NFS's
case. If the NID of file is larger than U32_MAX, the overlay
will get -ESTALE error when calls exportfs_decode_fh.

Fixes: 3e917cc305 ("erofs: make filesystem exportable")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250507094015.14007-1-lihongbo22@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-15 10:38:12 +08:00
Hongbo Li 9fcd53c320 erofs: remove unused enum type
Opt_err is not used in EROFS, we can remove it.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Reviewed-by: Gao Xiang <xiang@kernel.org>
Link: https://lore.kernel.org/r/20250429075056.689570-1-lihongbo22@huawei.com
Signed-off-by: Gao Xiang <xiang@kernel.org>
2025-04-30 12:13:32 +08:00
Gao Xiang 0f24e3c05a erofs: enable 48-bit layout support
Both 48-bit block addressing and encoded extents are implemented,
let's enable them formally.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250310095625.2623817-1-hsiangkao@linux.alibaba.com
2025-03-17 14:02:16 +08:00
Gao Xiang fe1e57d44d erofs: initialize decompression early
- Rename erofs_init_managed_cache() to z_erofs_init_super();
 - Move the initialization of managed_pslots into z_erofs_init_super() too;
 - Move z_erofs_init_super() and packed inode preparation upwards, before
   the root inode initialization.

Therefore, the root directory can also be compressible.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250317054840.3483000-1-hsiangkao@linux.alibaba.com
2025-03-17 14:02:08 +08:00
Gao Xiang 2e1473d519 erofs: implement 48-bit block addressing for unencoded inodes
It adapts the on-disk changes from the previous commit.  It also
supports EROFS_NULL_ADDR (all 1's) for EROFS_INODE_FLAT_PLAIN inodes
to indicate 0-filled inodes, as it's common for composefs use cases.
As a result, EROFS_INODE_CHUNK_BASED is no longer needed.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250310095459.2620647-5-hsiangkao@linux.alibaba.com
2025-03-17 01:25:26 +08:00
Gao Xiang 61ba89b579 erofs: add 48-bit block addressing on-disk support
The current 32-bit block addressing limits EROFS to a 16TiB maximum
volume size with 4KiB blocks.  However, several new use cases now
require larger capacity support:
 - Massive datasets for model training in order to boost random
   sampling performance for each epoch;

 - Object storage clients using EROFS direct passthrough.

This extends core on-disk structures to support 48-bit block addressing,
such as inodes, device slots, and inode chunks.

Additionally:
 - Expand superblock root NID to 8-byte `rootnid_8b` to enable full
   out-of-place update incremental builds;

 - Introduce `epoch` field in the superblock as well as add `mtime`
   field to 32-byte compact inodes for basic timestamp support.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250310095459.2620647-4-hsiangkao@linux.alibaba.com
2025-03-17 01:23:25 +08:00
Gao Xiang 5794502777 erofs: allow 16-byte volume name again
Actually, volume name doesn't need to include the NIL terminator if
the string length matches the on-disk field size as mentioned in [1].

I tend to relax it together with the upcoming 48-bit block addressing
(or stable kernels which backport this fix) so that we could have a
chance to record a 16-byte volume name like ext4.

Since in-memory `volume_name` has no user, just get rid of the unneeded
check for now.  `sbi->uuid` is useless and avoid it too.

[1] https://lore.kernel.org/r/96efe46b-dcce-4490-bba1-a0b00932d1cc@linux.alibaba.com

Fixes: a64d9493f5 ("staging: erofs: refuse to mount images with malformed volume name")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250225033934.2542635-1-hsiangkao@linux.alibaba.com
2025-03-17 01:21:36 +08:00
Bo Liu 706e50e46a erofs: get rid of erofs_kmap_type
Since EROFS_KMAP_ATOMIC is no longer valid, get rid of erofs_kmap_type too.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20250217093141.2659-1-liubo03@inspur.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-03-17 01:21:24 +08:00
Chen Linxuan 6e2c2342e2 erofs: remove dead code in erofs_fc_parse_param
If an option is unknown to erofs, which means that option is not in
`erofs_fs_parameters`, `fs_parse` will return -ENOPARAM, which makes
`erofs_fc_parse_param` returns earlier.

Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/DB86A4E2BB2BB44E+20250117100635.335963-2-chenlinxuan@uniontech.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-01-19 23:57:38 +08:00
Gao Xiang 54ab25d03e erofs: micro-optimize superblock checksum
Just verify the remaining unknown on-disk data instead of allocating a
temporary buffer for the whole superblock and zeroing out the checksum
field since .magic(EROFS_SUPER_MAGIC_V1) is verified and .checksum(0)
is fixed.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241212023948.1143038-1-hsiangkao@linux.alibaba.com
2025-01-17 03:17:32 +08:00
Gao Xiang 6422cde1b0 erofs: use buffered I/O for file-backed mounts by default
For many use cases (e.g. container images are just fetched from remote),
performance will be impacted if underlay page cache is up-to-date but
direct i/o flushes dirty pages first.

Instead, let's use buffered I/O by default to keep in sync with loop
devices and add a (re)mount option to explicitly give a try to use
direct I/O if supported by the underlying files.

The container startup time is improved as below:
[workload] docker.io/library/workpress:latest
                                     unpack        1st run  non-1st runs
EROFS snapshotter buffered I/O file  4.586404265s  0.308s   0.198s
EROFS snapshotter direct I/O file    4.581742849s  2.238s   0.222s
EROFS snapshotter loop               4.596023152s  0.346s   0.201s
Overlayfs snapshotter                5.382851037s  0.206s   0.214s

Fixes: fb17675026 ("erofs: add file-backed mount support")
Cc: Derek McGowan <derek@mcg.dev>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com
2024-12-16 21:02:07 +08:00
Gao Xiang 7b00af2c54 erofs: use `struct erofs_device_info` for the primary device
Instead of just listing each one directly in `struct erofs_sb_info`
except that we still use `sb->s_bdev` for the primary block device.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com
2024-12-16 21:01:59 +08:00
Gao Xiang e2de3c1bf6 erofs: add erofs_sb_free() helper
Unify the common parts of erofs_fc_free() and erofs_kill_sb() as
erofs_sb_free().

Thus, fput() in erofs_fc_get_tree() is no longer needed, too.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241212133504.2047178-1-hsiangkao@linux.alibaba.com
2024-12-13 00:26:27 +08:00
Hongzhen Luo bae0854160 erofs: fix blksize < PAGE_SIZE for file-backed mounts
Adjust sb->s_blocksize{,_bits} directly for file-backed
mounts when the fs block size is smaller than PAGE_SIZE.

Previously, EROFS used sb_set_blocksize(), which caused
a panic if bdev-backed mounts is not used.

Fixes: fb17675026 ("erofs: add file-backed mount support")
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241015103836.3757438-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-11-18 18:50:14 +08:00
Gou Hao 90655ee279 erofs: simplify definition of the log functions
Use printk instead of pr_info/err to reduce
redundant code.

Signed-off-by: Gou Hao <gouhao@uniontech.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241114013247.30821-1-gouhao@uniontech.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-11-18 18:50:13 +08:00
Gao Xiang 14c2d97265
erofs: use get_tree_bdev_flags() to avoid misleading messages
Users can pass in an arbitrary source path for the proper type of
a mount then without "Can't lookup blockdev" error message.

Reported-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Closes: https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241009033151.2334888-2-hsiangkao@linux.alibaba.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21 14:30:27 +02:00
Gao Xiang 416a8b2c02 erofs: ensure regular inodes for file-backed mounts
Only regular inodes are allowed for file-backed mounts, not directories
(as seen in the original syzbot case) or special inodes.

Also ensure that .read_folio() is implemented on the underlying fs
for the primary device.

Fixes: fb17675026 ("erofs: add file-backed mount support")
Reported-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/00000000000011bdde0622498ee3@google.com
Tested-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240917130803.32418-1-hsiangkao@linux.alibaba.com
2024-10-11 13:36:41 +08:00
Gao Xiang 0d442ce0b3 erofs: mark experimental fscache backend deprecated
Although fscache is still described as "General Filesystem Caching" for
network filesystems and other things such as ISO9660 filesystems, it has
actually become a part of netfslib recently, which was unexpected at the
time when "EROFS over fscache" proposed (2021) since EROFS is entirely a
disk filesystem and the dependency is redundant.

Mark it deprecated and it will be removed after "fanotify pre-content
hooks" lands, which will provide the same functionality for EROFS.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240830032840.3783206-4-hsiangkao@linux.alibaba.com
2024-09-10 15:27:11 +08:00
Gao Xiang fb17675026 erofs: add file-backed mount support
It actually has been around for years: For containers and other sandbox
use cases, there will be thousands (and even more) of authenticated
(sub)images running on the same host, unlike OS images.

Of course, all scenarios can use the same EROFS on-disk format, but
bdev-backed mounts just work well for OS images since golden data is
dumped into real block devices.  However, it's somewhat hard for
container runtimes to manage and isolate so many unnecessary virtual
block devices safely and efficiently [1]: they just look like a burden
to orchestrators and file-backed mounts are preferred indeed.  There
were already enough attempts such as Incremental FS, the original
ComposeFS and PuzzleFS acting in the same way for immutable fses.  As
for current EROFS users, ComposeFS, containerd and Android APEXs will
be directly benefited from it.

On the other hand, previous experimental feature "erofs over fscache"
was once also intended to provide a similar solution (inspired by
Incremental FS discussion [2]), but the following facts show file-backed
mounts will be a better approach:
 - Fscache infrastructure has recently been moved into new Netfslib
   which is an unexpected dependency to EROFS really, although it
   originally claims "it could be used for caching other things such as
   ISO9660 filesystems too." [3]

 - It takes an unexpectedly long time to upstream Fscache/Cachefiles
   enhancements.  For example, the failover feature took more than
   one year, and the deamonless feature is still far behind now;

 - Ongoing HSM "fanotify pre-content hooks" [4] together with this will
   perfectly supersede "erofs over fscache" in a simpler way since
   developers (mainly containerd folks) could leverage their existing
   caching mechanism entirely in userspace instead of strictly following
   the predefined in-kernel caching tree hierarchy.

After "fanotify pre-content hooks" lands upstream to provide the same
functionality, "erofs over fscache" will be removed then (as an EROFS
internal improvement and EROFS will not have to bother with on-demand
fetching and/or caching improvements anymore.)

[1] https://github.com/containers/storage/pull/2039
[2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@mail.gmail.com
[3] https://docs.kernel.org/filesystems/caching/fscache.html
[4] https://lore.kernel.org/r/cover.1723670362.git.josef@toxicpanda.com

Closes: https://github.com/containers/composefs/issues/144
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240830032840.3783206-1-hsiangkao@linux.alibaba.com
2024-09-10 15:26:35 +08:00
Gao Xiang 59aadaa7eb erofs: clean up erofs_register_sysfs()
After commit 684b290abc ("erofs: add support for
FS_IOC_GETFSSYSFSPATH"), `sb->s_sysfs_name` is now valid.

Just use it to get rid of duplicated logic.

Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240828095232.571946-1-hsiangkao@linux.alibaba.com
2024-09-10 00:46:34 +08:00
Hongzhen Luo 2c534624ae erofs: get rid of check_layout_compatibility()
Simple enough to just open-code it.

Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Reviewed-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240806112208.150323-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-08-19 11:06:20 +08:00
Huang Xiaojia 684b290abc erofs: add support for FS_IOC_GETFSSYSFSPATH
FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem,
potentially standarizing sysfs reporting. This patch add support for
FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev
cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases.

Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com>
Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-26 18:47:46 +08:00
Gao Xiang 5a7cce827e erofs: refine z_erofs_{init,exit}_subsystem()
Introduce z_erofs_{init,exit}_decompressor() to unexport
z_erofs_{deflate,lzma,zstd}_{init,exit}().

Besides, call them in z_erofs_{init,exit}_subsystem()
for simplicity.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240709094106.3018109-2-hsiangkao@linux.alibaba.com
2024-07-09 19:04:40 +08:00
Huang Xiaojia cc69a681b2 erofs: convert to use super_set_uuid to support for FS_IOC_GETFSUUID
FS_IOC_GETFSUUID ioctl exposes the uuid of a filesystem. To support
the ioctl, init sb->s_uuid with super_set_uuid().

Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240624063704.2476070-1-huangxiaojia2@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-06-26 17:02:28 +08:00
Al Viro 076d965eb8 erofs: don't align offset for erofs_read_metabuf() (simple cases)
Most of the callers of erofs_read_metabuf() have the following form:

	block = erofs_blknr(sb, offset);
	off = erofs_blkoff(sb, offset);
	p = erofs_read_metabuf(...., erofs_pos(sb, block), ...);
	if (IS_ERR(p))
		return PTR_ERR(p);
	q = p + off;
	// no further uses of p, block or off.

The value passed to erofs_read_metabuf() is offset rounded down to block
size, i.e. offset - off.  Passing offset as-is would increase the return
value by off in case of success and keep the return value unchanged in
in case of error.  In other words, the same could be achieved by

	q = erofs_read_metabuf(...., offset, ...);
	if (IS_ERR(q))
		return PTR_ERR(q);

This commit convert these simple cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425195915.GD1031757@ZenIV
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-18 01:47:26 +08:00
Al Viro e09815446d erofs: mechanically convert erofs_read_metabuf() to offsets
just lift the call of erofs_pos() into the callers; it will
collapse in most of them, but that's better done caller-by-caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20240425195846.GC1031757@ZenIV
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-18 01:46:18 +08:00
Hongzhen Luo c34110e0fd erofs: clean up erofs_show_options()
Avoid unnecessary #ifdefs and simplify the code a bit.

Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240517095652.2282972-1-hongzhen@linux.alibaba.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-18 01:42:18 +08:00
Gao Xiang 20c02972ec Merge branch 'misc.erofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git
Al Viro has a series of "->bd_inode elimination" which touches several
subsystems, but he also has EROFS-specific further cleanup patches
which I tend to go with EROFS tree for more testing.

Let's merge "#misc.erofs" as Al suggested in the previous email [1]:

"#misc.erofs (the first two commits) is put into never-rebased mode;
 you pull it into your tree and do whatever's convenient with the rest.
 I merge the same branch into block_device work; that way it doesn't
 cause conflicts whatever else happens in our trees."

[1] https://lore.kernel.org/r/20240503041542.GV2118490@ZenIV

Signed-off-by: Gao Xiang <xiang@kernel.org>
2024-05-18 01:39:54 +08:00
Gao Xiang 7c35de4df1 erofs: Zstandard compression support
Add Zstandard compression as the 4th supported algorithm since it
becomes more popular now and some end users have asked this for
quite a while [1][2].

Each EROFS physical cluster contains only one valid standard
Zstandard frame as described in [3] so that decompression can be
performed on a per-pcluster basis independently.

Currently, it just leverages multi-call stream decompression APIs with
internal sliding window buffers.  One-shot or bufferless decompression
could be implemented later for even better performance if needed.

[1] https://github.com/erofs/erofs-utils/issues/6
[2] https://lore.kernel.org/r/Y08h+z6CZdnS1XBm@B-P7TQMD6M-0146.lan
[3] https://www.rfc-editor.org/rfc/rfc8478.txt

Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240508234453.17896-1-xiang@kernel.org
2024-05-09 07:46:56 +08:00
Hongzhen Luo 1872df8dcd erofs: derive fsid from on-disk UUID for .statfs() if possible
Use the superblock's UUID to generate the fsid when it's non-null.

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08 17:12:51 +08:00
Chunhai Guo f36f3010f6 erofs: rename per-CPU buffers to global buffer pool and make it configurable
It will cost more time if compressed buffers are allocated on demand for
low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep
compressed data if in-place decompression is unfulfilled.  While it is kind
of wasteful of memory for a device with hundreds of CPUs, and only a small
number of CPUs concurrently decompress most of the time.

This patch renames it as 'global buffer pool' and makes it configurable.
This allows two or more CPUs to share a common buffer to reduce memory
occupation.

Suggested-by: Gao Xiang <xiang@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08 17:12:49 +08:00
Christian Brauner 7af2ae1b15 erofs: reliably distinguish block based and fscache mode
When erofs_kill_sb() is called in block dev based mode, s_bdev may not
have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled,
it will be mistaken for fscache mode, and then attempt to free an anon_dev
that has never been allocated, triggering the following warning:

============================================
ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140
Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630
RIP: 0010:ida_free+0x134/0x140
Call Trace:
 <TASK>
 erofs_kill_sb+0x81/0x90
 deactivate_locked_super+0x35/0x80
 get_tree_bdev+0x136/0x1e0
 vfs_get_tree+0x2c/0xf0
 do_new_mount+0x190/0x2f0
 [...]
============================================

Now when erofs_kill_sb() is called, erofs_sb_info must have been
initialised, so use sbi->fsid to distinguish between the two modes.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-3-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28 20:36:52 +08:00
Baokun Li 07abe43a28 erofs: get rid of erofs_fs_context
Instead of allocating the erofs_sb_info in fill_super() allocate it during
erofs_init_fs_context() and ensure that erofs can always have the info
available during erofs_kill_sb(). After this erofs_fs_context is no longer
needed, replace ctx with sbi, no functional changes.

Suggested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240419123611.947084-2-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-04-28 20:36:51 +08:00
Al Viro 469ad583c1 erofs: switch erofs_bread() to passing offset instead of block number
Callers are happier that way, especially since we no longer need to
play with splitting offset into block number and offset within block,
passing the former to erofs_bread(), then adding the latter...

erofs_bread() always reads entire pages, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-07 03:04:50 -04:00
Gao Xiang a97b59ed79 erofs: drop experimental warning for FSDAX
As EXT4/XFS filesystems, FSDAX functionality is considered to be stable.
Let's drop this warning.

Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240325005116.106351-1-hsiangkao@linux.alibaba.com
2024-03-25 10:48:15 +08:00
Linus Torvalds f88c3fb81c mm, slab: remove last vestiges of SLAB_MEM_SPREAD
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own.  But let's just rip off
the band-aid and get it over and done with.  I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-12 20:32:19 -07:00
Linus Torvalds f153fbe1ea Changes since last update:
- Some folio conversions for compressed inodes;
 
  - Add compressed inode support over fscache;
 
  - Fix lockdep false positives of erofs_pseudo_mnt.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmXv5CwRHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qr5cQ/9EKPRY5R4qy+4AuH1vOsE+Gwl73fAYncR
 zwKeTQ5NXgCTbmZMPNSP9jikQaCy+tCnYVflD8PEeAiFqNaPTDjAGl+brTiQqgfJ
 RdvuqjgxhGvcBLNUkdwmV9TzWjPSjICuz1AjL1Lxvx0hBYxEtTEzRG06DHnZs1Hy
 qqfX6pp8uTUwk1fn+H94UGbdN7tSmiyJ18BmtfID7qYKa1/hkregcC3pLkcdPgGp
 bDFaf6NDpV2W8J+dxOIdFX+toi3Ssog3LY60uj2MBULqmGlB5JZw7HQua1Ol4qEi
 kS8ZWXPjHSAs4e5NdSe/lPybVpR+s72QQ7vyspIxdQFxm2wVWHpLqgRrdtNqySGt
 zrsIqfHzVCubfQA7dB1OJJHlJZEyMvprgfP4WmthHtekvaox6JaSx6Vydx1byTHS
 /2nBCkjNEgYYvUIcYLwjmcQvdvABGCp7IMv7h79NH1tNQDQL2mHvRw/WnkyYINpl
 qrZ3zQn2f2jlkjj0wUul3AGhMk91NvBmnfZz9+mn/jv2oZKn2v50mp2nuzSY7AeS
 /leqm9d42zQ0iWrXDc8OHaO1Qp2I3h4nET50KvD/pYveKI9C1/PUGuwd3jzsNF9M
 2f19ve2ovViQwLUp8ovt85Xcst8ALtaOgcQdLVFQAskaK8QClDoz6MCTMWQlbcOz
 9vW2k/9IZxU=
 =CXzx
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, we introduce compressed inode support over fscache
  since a lot of native EROFS images are explicitly compressed so that
  EROFS over fscache can be more widely used even without Dragonfly
  Nydus [1].

  Apart from that, there are some folio conversions for compressed
  inodes available as well as a lockdep false positive fix.

  Summary:

   - Some folio conversions for compressed inodes;

   - Add compressed inode support over fscache;

   - Fix lockdep false positives of erofs_pseudo_mnt"

Link: https://nydus.dev [1]

* tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: support compressed inodes over fscache
  erofs: make iov_iter describe target buffers over fscache
  erofs: fix lockdep false positives on initializing erofs_pseudo_mnt
  erofs: refine managed cache operations to folios
  erofs: convert z_erofs_submissionqueue_endio() to folios
  erofs: convert z_erofs_fill_bio_vec() to folios
  erofs: get rid of `justfound` debugging tag
  erofs: convert z_erofs_do_read_page() to folios
  erofs: convert z_erofs_onlinepage_.* to folios
2024-03-12 13:25:53 -07:00
Baokun Li 0f28be64d1 erofs: fix lockdep false positives on initializing erofs_pseudo_mnt
Lockdep reported the following issue when mounting erofs with a domain_id:

============================================
WARNING: possible recursive locking detected
6.8.0-rc7-xfstests #521 Not tainted
--------------------------------------------
mount/396 is trying to acquire lock:
ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
						at: alloc_super+0xe3/0x3d0

but task is already holding lock:
ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
						at: alloc_super+0xe3/0x3d0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&type->s_umount_key#50/1);
  lock(&type->s_umount_key#50/1);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by mount/396:
 #0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3},
			at: alloc_super+0xe3/0x3d0
 #1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3},
			at: erofs_fscache_register_fs+0x3d/0x270 [erofs]

stack backtrace:
CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521
Call Trace:
 <TASK>
 dump_stack_lvl+0x64/0xb0
 validate_chain+0x5c4/0xa00
 __lock_acquire+0x6a9/0xd50
 lock_acquire+0xcd/0x2b0
 down_write_nested+0x45/0xd0
 alloc_super+0xe3/0x3d0
 sget_fc+0x62/0x2f0
 vfs_get_super+0x21/0x90
 vfs_get_tree+0x2c/0xf0
 fc_mount+0x12/0x40
 vfs_kern_mount.part.0+0x75/0x90
 kern_mount+0x24/0x40
 erofs_fscache_register_fs+0x1ef/0x270 [erofs]
 erofs_fc_fill_super+0x213/0x380 [erofs]

This is because the file_system_type of both erofs and the pseudo-mount
point of domain_id is erofs_fs_type, so two successive calls to
alloc_super() are considered to be using the same lock and trigger the
warning above.

Therefore add a nodev file_system_type called erofs_anon_fs_type in
fscache.c to silence this complaint. Because kern_mount() takes a
pointer to struct file_system_type, not its (string) name. So we don't
need to call register_filesystem(). In addition, call init_pseudo() in
erofs_anon_init_fs_context() as suggested by Al Viro, so that we can
remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and
erofs_anon_context_ops.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a9849560c5 ("erofs: introduce a pseudo mnt to manage shared cookies")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-and-tested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yang Erkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20240307101018.2021925-1-libaokun1@huawei.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-10 18:41:32 +08:00
Christian Brauner 87b355f448
erofs: port device access to file
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-20-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25 12:05:25 +01:00