mirror of https://github.com/torvalds/linux.git
391 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
8e2e0a79a3 |
quota: fix to propagate error of mark_dquot_dirty() to caller
in order to let caller be aware of failure of mark_dquot_dirty(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240412094942.2131243-1-chao@kernel.org> |
|
|
|
5a6cb47ef0 |
fs: quota: use group allocation of per-cpu counters API
Use group allocation of per-cpu counters api to accelerate dquot_init() and simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240325041240.53537-1-wangkefeng.wang@huawei.com> |
|
|
|
e29dd522c1 |
quota: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag is already a no-op after removal of SLAB allocator and in [1] it was fully deprecated. Remove its usage so we can delete it from slab. No functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/ Message-Id: <20240224135118.830073-1-chengming.zhou@linux.dev> |
|
|
|
a898cb621a |
quota: Detect loops in quota tree
Syzbot has found that when it creates corrupted quota files where the quota tree contains a loop, we will deadlock when tryling to insert a dquot. Add loop detection into functions traversing the quota tree. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
ccb49011bb |
quota: Properly annotate i_dquot arrays with __rcu
Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.
Fixes:
|
|
|
|
179b8c97eb |
quota: Fix rcu annotations of inode dquot pointers
Dquot pointers in i_dquot array in the inode are protected by
dquot_srcu. Annotate the array pointers with __rcu, perform the locked
dereferences with srcu_dereference_check() instead of plain reads, and
set the array elements with rcu_assign_pointer().
Fixes:
|
|
|
|
d0aa72604f |
quota: Fix potential NULL pointer dereference
Below race may cause NULL pointer dereference P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) .... If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered. So let's fix it by using a temporary pointer to avoid this issue. Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240202081852.2514092-1-wangjianjian3@huawei.com> |
|
|
|
a1e1b2beca |
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
Quota code acquires dquot->dq_lock whenever reading / writing dquot. When reading / writing quota info we hold dqio_sem. Since these locks can be acquired during inode reclaim (through dquot_drop() -> dqput() -> dquot_release()) we are setting nofs allocation context whenever acquiring these locks. Hence there's no need to use GFP_NOFS allocations in quota code doing IO. Just switch it to GFP_KERNEL. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
6c5026c1ef |
quota: Set nofs allocation context when acquiring dqio_sem
dqio_sem can be acquired during inode reclaim through dquot_drop() -> dqput() -> dquot_release() -> write_file_info() context. In some cases (most notably through dquot_get_next_id()) it can be acquired without holding dquot->dq_lock (which already sets nofs allocation context). So we need to set nofs allocation context when acquiring it as well. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
249f374eb9 |
quota: Remove BUG_ON from dqget()
dqget() checks whether dquot->dq_sb is set when returning it using BUG_ON. Firstly this doesn't work as an invalidation check for quite some time (we release dquot with dq_sb set these days), secondly using BUG_ON is quite harsh. Use WARN_ON_ONCE and check whether dquot is still hashed instead. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
d44c576637 |
quota: Remove BUG_ON in dquot_load_quota_sb()
The case when someone passes DQUOT_SUSPENDED flag to dquot_load_quota_sb() is easy to handle. So just WARN_ON_ONCE and bail with error if that happens instead of calling BUG_ON which is likely to lockup the machine. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
c8238508c8 |
quota: Replace BUG_ON in dqput()
The BUG_ON in dqput() will likely take the whole machine down when it hits. Replace it with WARN_ON_ONCE() instead and stop hiding that behind CONFIG_QUOTA_DEBUG. Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
a05aea98d4 |
sysctl-6.8-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
penalty sysctl has been changed to allow us to not require the sentinel, the
final empty element on the sysctl array. Joel Granados has been doing all this
work. On the v6.6 kernel we got the major infrastructure changes required to
support this. For v6.7 we had all arch/ and drivers/ modified to remove
the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only.
The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches
are still being reviewed. After that we then can expect also the removal of the
no longer needed check for procname == NULL.
Let us recap the purpose of this work:
- this helps reduce the overall build time size of the kernel and run time
memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move sysctls
out from kernel/sysctl.c to their own files
Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further
work by Thomas Weißschuh with the constificatin of the struct ctl_table.
Due to Joel Granados's work, and to help bring in new blood, I have suggested
for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward
to seeing him sent you a pull request for further sysctl changes. This also
removes Iurii Zaikin as a maintainer as he has moved on to other projects and
has had no time to help at all.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmWdWDESHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinjJAP/jTNNoyzWisvrrvmXqR5txFGLOE+wW6x
Xv9avuiM+DTHsH/wK8CkXEivwDqYNAZEHU7NEcolS5bJX/ddSRwN9b5aSVlCrUdX
Ab4rXmpeSCNFp9zNszWJsDuBKIqjvsKw7qGleGtgZ2qAUHbbH30VROLWCggaee50
wU3icDLdwkasxrcMXy4Sq5dT5wYC4j/QelqBGIkYPT14Arl1im5zqPZ95gmO/s/6
mdicTAmq+hhAUfUBJBXRKtsvxY6CItxe55Q4fjpncLUJLHUw+VPVNoBKFWJlBwlh
LO3liKFfakPSkil4/en+/+zuMByd0JBkIzIJa+Kk5kjpbHRhK0RkmU4+Y5G5spWN
jjLfiv6RxInNaZ8EWQBMfjE95A7PmYDQ4TOH08+OvzdDIi6B0BB5tBGQpG9BnyXk
YsLg1Uo4CwE/vn1/a9w0rhadjUInvmAryhb/uSJYFz/lmApLm2JUpY3/KstwGetb
z+HmLstJb24Djkr6pH8DcjhzRBHeWQ5p0b4/6B+v1HqAUuEhdbyw1F2GrDywyF3R
h/UOAaKLm1+ffdA246o9TejKiDU96qEzzXMaCzPKyestaRZuiyuYEMDhYbvtsMV5
zIdMJj5HQ+U1KHDv4IN99DEj7+/vjE3f4Sjo+POFpQeQ8/d+fxpFNqXVv449dgnb
6xEkkxsR0ElM
=2qBt
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"To help make the move of sysctls out of kernel/sysctl.c not incur a
size penalty sysctl has been changed to allow us to not require the
sentinel, the final empty element on the sysctl array. Joel Granados
has been doing all this work.
In the v6.6 kernel we got the major infrastructure changes required to
support this. For v6.7 we had all arch/ and drivers/ modified to
remove the sentinel. For v6.8-rc1 we get a few more updates for fs/
directory only.
The kernel/ directory is left but we'll save that for v6.9-rc1 as
those patches are still being reviewed. After that we then can expect
also the removal of the no longer needed check for procname == NULL.
Let us recap the purpose of this work:
- this helps reduce the overall build time size of the kernel and run
time memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move
sysctls out from kernel/sysctl.c to their own files
Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to
see further work by Thomas Weißschuh with the constificatin of the
struct ctl_table.
Due to Joel Granados's work, and to help bring in new blood, I have
suggested for him to become a maintainer and he's accepted. So for
v6.9-rc1 I look forward to seeing him sent you a pull request for
further sysctl changes. This also removes Iurii Zaikin as a maintainer
as he has moved on to other projects and has had no time to help at
all"
* tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: remove struct ctl_path
sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR
coda: Remove the now superfluous sentinel elements from ctl_table array
sysctl: Remove the now superfluous sentinel elements from ctl_table array
fs: Remove the now superfluous sentinel elements from ctl_table array
cachefiles: Remove the now superfluous sentinel element from ctl_table array
sysclt: Clarify the results of selftest run
sysctl: Add a selftest for handling empty dirs
sysctl: Fix out of bounds access for empty sysctl registers
MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl
MAINTAINERS: remove Iurii Zaikin from proc sysctl
|
|
|
|
9d5b947535 |
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
|
|
|
d1c371035c |
quota: convert dquot_claim_space_nodirty() to return void
dquot_claim_space_nodirty() always return zero, let's convert it to return void, then, its caller can get rid of handling failure case. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20231210025028.3262900-1-chao@kernel.org> |
|
|
|
ecae0bd517 |
Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Kemeng Shi has contributed some compation maintenance work in the
series "Fixes and cleanups to compaction".
- Joel Fernandes has a patchset ("Optimize mremap during mutual
alignment within PMD") which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested.
- More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
following patch series:
mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
- In the series "Do not try to access unaccepted memory" Adrian Hunter
provides some fixups for the recently-added "unaccepted memory' feature.
To increase the feature's checking coverage. "Plug a few gaps where
RAM is exposed without checking if it is unaccepted memory".
- In the series "cleanups for lockless slab shrink" Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code.
- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series "use refcount+RCU method to implement
lockless slab shrink".
- David Hildenbrand contributes some maintenance work for the rmap code
in the series "Anon rmap cleanups".
- Kefeng Wang does more folio conversions and some maintenance work in
the migration code. Series "mm: migrate: more folio conversion and
unification".
- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series "Add and use bdev_getblk()".
- In the series "Use nth_page() in place of direct struct page
manipulation" Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames.
- In the series "mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO" has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of gigantic
pages are in use.
- Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
rationalization and folio conversions in the hugetlb code.
- Yin Fengwei has improved mlock()'s handling of large folios in the
series "support large folio for mlock"
- In the series "Expose swapcache stat for memcg v1" Liu Shixin has
added statistics for memcg v1 users which are available (and useful)
under memcg v2.
- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named "MDWE
without inheritance".
- Kefeng Wang has provided the series "mm: convert numa balancing
functions to use a folio" which does what it says.
- In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
makes is possible for a process to propagate KSM treatment across
exec().
- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use "high
bandwidth memory" in addition to Optane Data Center Persistent Memory
Modules (DCPMM). The series is named "memory tiering: calculate
abstract distance based on ACPI HMAT"
- In the series "Smart scanning mode for KSM" Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans.
- Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
series "mm: memcg: fix tracking of pending stats updates values".
- In the series "Implement IOCTL to get and optionally clear info about
PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
us to atomically read-then-clear page softdirty state. This is mainly
used by CRIU.
- Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
- a bunch of relatively minor maintenance tweaks to this code.
- Matthew Wilcox has increased the use of the VMA lock over file-backed
page faults in the series "Handle more faults under the VMA lock". Some
rationalizations of the fault path became possible as a result.
- In the series "mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
and folio conversions.
- In the series "various improvements to the GUP interface" Lorenzo
Stoakes has simplified and improved the GUP interface with an eye to
providing groundwork for future improvements.
- Andrey Konovalov has sent along the series "kasan: assorted fixes and
improvements" which does those things.
- Some page allocator maintenance work from Kemeng Shi in the series
"Two minor cleanups to break_down_buddy_pages".
- In thes series "New selftest for mm" Breno Leitao has developed
another MM self test which tickles a race we had between madvise() and
page faults.
- In the series "Add folio_end_read" Matthew Wilcox provides cleanups
and an optimization to the core pagecache code.
- Nhat Pham has added memcg accounting for hugetlb memory in the series
"hugetlb memcg accounting".
- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series "Abstract vma_merge() and split_vma()".
- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series "Fix page_owner's use of free timestamps".
- Lorenzo Stoakes has fixed the handling of new mappings of sealed files
in the series "permit write-sealed memfd read-only shared mappings".
- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series "Batch hugetlb vmemmap modification operations".
- Some buffer_head folio conversions and cleanups from Matthew Wilcox in
the series "Finish the create_empty_buffers() transition".
- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the series
"mm: PCP high auto-tuning".
- Roman Gushchin has contributed the patchset "mm: improve performance
of accounted kernel memory allocations" which improves their performance
by ~30% as measured by a micro-benchmark.
- folio conversions from Kefeng Wang in the series "mm: convert page
cpupid functions to folios".
- Some kmemleak fixups in Liu Shixin's series "Some bugfix about
kmemleak".
- Qi Zheng has improved our handling of memoryless nodes by keeping them
off the allocation fallback list. This is done in the series "handle
memoryless nodes more appropriately".
- khugepaged conversions from Vishal Moola in the series "Some
khugepaged folio conversions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
FgeUPAD1oasg6CP+INZvCj34waNxwAc=
=E+Y4
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Kemeng Shi has contributed some compation maintenance work in the
series 'Fixes and cleanups to compaction'
- Joel Fernandes has a patchset ('Optimize mremap during mutual
alignment within PMD') which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested
- More DAMON/DAMOS maintenance and feature work from SeongJae Park i
the following patch series:
mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
- In the series 'Do not try to access unaccepted memory' Adrian
Hunter provides some fixups for the recently-added 'unaccepted
memory' feature. To increase the feature's checking coverage. 'Plug
a few gaps where RAM is exposed without checking if it is
unaccepted memory'
- In the series 'cleanups for lockless slab shrink' Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code
- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series 'use refcount+RCU method to
implement lockless slab shrink'
- David Hildenbrand contributes some maintenance work for the rmap
code in the series 'Anon rmap cleanups'
- Kefeng Wang does more folio conversions and some maintenance work
in the migration code. Series 'mm: migrate: more folio conversion
and unification'
- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series 'Add and use bdev_getblk()'
- In the series 'Use nth_page() in place of direct struct page
manipulation' Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames
- In the series 'mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO' has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of
gigantic pages are in use
- Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
rationalization and folio conversions in the hugetlb code
- Yin Fengwei has improved mlock()'s handling of large folios in the
series 'support large folio for mlock'
- In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
added statistics for memcg v1 users which are available (and
useful) under memcg v2
- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named 'MDWE
without inheritance'
- Kefeng Wang has provided the series 'mm: convert numa balancing
functions to use a folio' which does what it says
- In the series 'mm/ksm: add fork-exec support for prctl' Stefan
Roesch makes is possible for a process to propagate KSM treatment
across exec()
- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use 'high
bandwidth memory' in addition to Optane Data Center Persistent
Memory Modules (DCPMM). The series is named 'memory tiering:
calculate abstract distance based on ACPI HMAT'
- In the series 'Smart scanning mode for KSM' Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans
- Yosry Ahmed has fixed some inconsistencies in memcg statistics in
the series 'mm: memcg: fix tracking of pending stats updates
values'
- In the series 'Implement IOCTL to get and optionally clear info
about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
which permits us to atomically read-then-clear page softdirty
state. This is mainly used by CRIU
- Hugh Dickins contributed the series 'shmem,tmpfs: general
maintenance', a bunch of relatively minor maintenance tweaks to
this code
- Matthew Wilcox has increased the use of the VMA lock over
file-backed page faults in the series 'Handle more faults under the
VMA lock'. Some rationalizations of the fault path became possible
as a result
- In the series 'mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()' David Hildenbrand has implemented some
cleanups and folio conversions
- In the series 'various improvements to the GUP interface' Lorenzo
Stoakes has simplified and improved the GUP interface with an eye
to providing groundwork for future improvements
- Andrey Konovalov has sent along the series 'kasan: assorted fixes
and improvements' which does those things
- Some page allocator maintenance work from Kemeng Shi in the series
'Two minor cleanups to break_down_buddy_pages'
- In thes series 'New selftest for mm' Breno Leitao has developed
another MM self test which tickles a race we had between madvise()
and page faults
- In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
and an optimization to the core pagecache code
- Nhat Pham has added memcg accounting for hugetlb memory in the
series 'hugetlb memcg accounting'
- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series 'Abstract vma_merge() and split_vma()'
- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series 'Fix page_owner's use of free timestamps'
- Lorenzo Stoakes has fixed the handling of new mappings of sealed
files in the series 'permit write-sealed memfd read-only shared
mappings'
- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series 'Batch hugetlb vmemmap modification operations'
- Some buffer_head folio conversions and cleanups from Matthew Wilcox
in the series 'Finish the create_empty_buffers() transition'
- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the
series 'mm: PCP high auto-tuning'
- Roman Gushchin has contributed the patchset 'mm: improve
performance of accounted kernel memory allocations' which improves
their performance by ~30% as measured by a micro-benchmark
- folio conversions from Kefeng Wang in the series 'mm: convert page
cpupid functions to folios'
- Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
kmemleak'
- Qi Zheng has improved our handling of memoryless nodes by keeping
them off the allocation fallback list. This is done in the series
'handle memoryless nodes more appropriately'
- khugepaged conversions from Vishal Moola in the series 'Some
khugepaged folio conversions'"
[ bcachefs conflicts with the dynamically allocated shrinkers have been
resolved as per Stephen Rothwell in
https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/
with help from Qi Zheng.
The clone3 test filtering conflict was half-arsed by yours truly ]
* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
mm/damon/sysfs: update monitoring target regions for online input commit
mm/damon/sysfs: remove requested targets when online-commit inputs
selftests: add a sanity check for zswap
Documentation: maple_tree: fix word spelling error
mm/vmalloc: fix the unchecked dereference warning in vread_iter()
zswap: export compression failure stats
Documentation: ubsan: drop "the" from article title
mempolicy: migration attempt to match interleave nodes
mempolicy: mmap_lock is not needed while migrating folios
mempolicy: alloc_pages_mpol() for NUMA policy without vma
mm: add page_rmappable_folio() wrapper
mempolicy: remove confusing MPOL_MF_LAZY dead code
mempolicy: mpol_shared_policy_init() without pseudo-vma
mempolicy trivia: use pgoff_t in shared mempolicy tree
mempolicy trivia: slightly more consistent naming
mempolicy trivia: delete those ancient pr_debug()s
mempolicy: fix migrate_pages(2) syscall return nr_failed
kernfs: drop shared NUMA mempolicy hooks
hugetlbfs: drop shared NUMA mempolicy pretence
mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
...
|
|
|
|
5efad0a765 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmVCKSwACgkQnJ2qBz9k QNn6jwgAjQOrUJkJvnqindNNAGvkwE5zph+MWJolbQ7gu3Im5jYLECD4huER3m6q j+tj9ya49iHuIJLEBSctb3NvnhwGgNkupeM9klHGWDtyJwiV5FiV1OYEusxjiq5G tuWCImEn/OoLckfdP4umKqNDMftDVoTUYMViUoS9KBNXVVz0Y4Ex0KwSZFSM47j8 yRlSCRe6CK4lNyiAxLsm8kSzPge/s+mwz7P7z3i/4IgCkvw3fnMBEulCSbMZjhHD WrdCmYsgOJtqj+/hSSKPIJ5W92fFjLY1yaxocsVIzRGHQXriBznix7OS4iRLXtzk 9eBAzvVlu/HXBZCF+ZRIUMG+2P17Dg== =0k3R -----END PGP SIGNATURE----- Merge tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, and quota updates from Jan Kara: - conversion of ext2 directory code to use folios - cleanups in UDF declarations - bugfix for quota interaction with file encryption * tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to folios ext2: Convert ext2_make_empty() to use a folio ext2: Convert ext2_unlink() and ext2_rename() to use folios ext2: Convert ext2_delete_entry() to use folios ext2: Convert ext2_empty_dir() to use a folio ext2: Convert ext2_add_link() to use a folio ext2: Convert ext2_readdir to use a folio ext2: Add ext2_get_folio() ext2: Convert ext2_check_page to ext2_check_folio highmem: Add folio_release_kmap() udf: Avoid unneeded variable length array in struct fileIdentDesc udf: Annotate struct udf_bitmap with __counted_by quota: explicitly forbid quota files from being encrypted |
|
|
|
869b6ea160 |
quota: Fix slow quotaoff
Eric has reported that commit |
|
|
|
eab477e883 |
quota: dynamically allocate the dquota-cache shrinker
Use new APIs to dynamically allocate the dquota-cache shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-14-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
d3cc1b0be2 |
quota: explicitly forbid quota files from being encrypted
Since commit
|
|
|
|
1500e7e072 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmTvFssACgkQnJ2qBz9k QNl7HggAuY154urYJdh7M+mbKDSywhcK0YT5pNNkcXVpv/t2c073Ce57+ObDCBaS xetyFgH2XlvuAJ4dWmRDwBEzJ0jquKzvYJEMiXAexgy47ctnNPx5kLPsXpt3g+2q pro7sK1b5BmX/zrgOontbJ8/YAwX85XToD4Cv5XyNSx/ex6/zsd5FProfdiY/HAt qAcv7NkNTBbJBEBHhBNQSL2wOj3LzQV1U8v0XEcsBvTUxlX2jH8J4CsuFIotXqCF 37SNvZPk2c04HbaLgyU4Ura69qD0fn4vTMocuCoaf0CN2PL5jblRAwsAO2bfSqJE AxZFq3afI0YV3Y9OrVlzHtSALuiZMQ== =QPEQ -----END PGP SIGNATURE----- Merge tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, quota, and udf updates from Jan Kara: - fixes for possible use-after-free issues with quota when racing with chown - fixes for ext2 crashing when xattr allocation races with another block allocation to the same file from page writeback code - fix for block number overflow in ext2 - marking of reiserfs as obsolete in MAINTAINERS - assorted minor cleanups * tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix kernel-doc warnings ext2: improve consistency of ext2_fsblk_t datatype usage ext2: dump current reservation window info ext2: fix race between setxattr and write back ext2: introduce new flags argument for ext2_new_blocks() ext2: remove ext2_new_block() ext2: fix datatype of block number in ext2_xattr_set2() udf: Drop pointless aops assignment quota: use lockdep_assert_held_write in dquot_load_quota_sb MAINTAINERS: change reiserfs status to obsolete udf: Fix -Wstringop-overflow warnings quota: simplify drop_dquot_ref() quota: fix dqput() to follow the guarantees dquot_srcu should provide quota: add new helper dquot_active() quota: rename dquot_active() to inode_quota_active() quota: factor out dquot_write_dquot() ext2: remove redundant assignment to variable desc and variable best_desc |
|
|
|
86be6b8bd8 |
quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks
Currently we check whether superblock has ->quota_read and ->quota_write operations to check whether filesystem supports quotas. However for example for shmfs we will not read or write dquots so check whether quota operations are set in the superblock instead. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <20230725144510.253763-4-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
7a64774add |
quota: use lockdep_assert_held_write in dquot_load_quota_sb
Use lockdep_assert_held_write to assert and self-document the locking state in dquot_load_quota_sb instead of hand-crafting it with a trylock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230802115439.2145212-2-hch@lst.de> |
|
|
|
7bce48f0fe |
quota: simplify drop_dquot_ref()
As Honza said, remove_inode_dquot_ref() currently does not release the last dquot reference but instead adds the dquot to tofree_head list. This is because dqput() can sleep while dropping of the last dquot reference (writing back the dquot and calling ->release_dquot()) and that must not happen under dq_list_lock. Now that dqput() queues the final dquot cleanup into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally and we can significantly simplify it. Here we open code the simplified code of remove_inode_dquot_ref() into remove_dquot_ref() and remove the function put_dquot_list() which is no longer used. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-6-libaokun1@huawei.com> |
|
|
|
dabc8b2075 |
quota: fix dqput() to follow the guarantees dquot_srcu should provide
The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:
cpu1 cpu2
_________________|_________________
wb_do_writeback CHOWN(1)
...
ext4_da_update_reserve_space
dquot_claim_block
...
dquot_mark_dquot_dirty // try to dirty old quota
test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
if (test_bit(DQ_MOD_B, &dquot->dq_flags))
// test no dirty, wait dq_list_lock
...
dquot_transfer
__dquot_transfer
dqput_all(transfer_from) // rls old dquot
dqput // last dqput
dquot_release
clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
atomic_dec(&dquot->dq_count)
put_dquot_last(dquot)
list_add_tail(&dquot->dq_free, &free_dquots)
// add the dquot to free_dquots
if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
add dqi_dirty_list // add released dquot to dirty_list
This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.
As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.
Fixes:
|
|
|
|
33bcfafc48 |
quota: add new helper dquot_active()
Add new helper function dquot_active() to make the code more concise. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-4-libaokun1@huawei.com> |
|
|
|
4b9bdfa165 |
quota: rename dquot_active() to inode_quota_active()
Now we have a helper function dquot_dirty() to determine if dquot has DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by renaming it to inode_quota_active() and later on we will add the helper function dquot_active() to determine if dquot has DQ_ACTIVE_B bit. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-3-libaokun1@huawei.com> |
|
|
|
0241284778 |
quota: factor out dquot_write_dquot()
Refactor out dquot_write_dquot() to reduce duplicate code. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-2-libaokun1@huawei.com> |
|
|
|
d6a95db3c7 |
quota: fix warning in dqgrab()
There's issue as follows when do fault injection: WARNING: CPU: 1 PID: 14870 at include/linux/quotaops.h:51 dquot_disable+0x13b7/0x18c0 Modules linked in: CPU: 1 PID: 14870 Comm: fsconfig Not tainted 6.3.0-next-20230505-00006-g5107a9c821af-dirty #541 RIP: 0010:dquot_disable+0x13b7/0x18c0 RSP: 0018:ffffc9000acc79e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88825e41b980 RDX: 0000000000000000 RSI: ffff88825e41b980 RDI: 0000000000000002 RBP: ffff888179f68000 R08: ffffffff82087ca7 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed102f3ed026 R12: ffff888179f68130 R13: ffff888179f68110 R14: dffffc0000000000 R15: ffff888179f68118 FS: 00007f450a073740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe96f2efd8 CR3: 000000025c8ad000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dquot_load_quota_sb+0xd53/0x1060 dquot_resume+0x172/0x230 ext4_reconfigure+0x1dc6/0x27b0 reconfigure_super+0x515/0xa90 __x64_sys_fsconfig+0xb19/0xd20 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happens as follows: ProcessA ProcessB ProcessC sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_suspend -> suspend all type quota sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_resume ret = dquot_load_quota_sb add_dquot_ref do_open -> open file O_RDWR vfs_open do_dentry_open get_write_access atomic_inc_unless_negative(&inode->i_writecount) ext4_file_open dquot_file_open dquot_initialize __dquot_initialize dqget atomic_inc(&dquot->dq_count); __dquot_initialize __dquot_initialize dqget if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) ext4_acquire_dquot -> Return error DQ_ACTIVE_B flag isn't set dquot_disable invalidate_dquots if (atomic_read(&dquot->dq_count)) dqgrab WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) -> Trigger warning In the above scenario, 'dquot->dq_flags' has no DQ_ACTIVE_B is normal when dqgrab(). To solve above issue just replace the dqgrab() use in invalidate_dquots() with atomic_inc(&dquot->dq_count). Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-3-yebin10@huawei.com> |
|
|
|
6a4e336379 |
quota: Properly disable quotas when add_dquot_ref() fails
When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want to disable quotas we are trying to enable. However dquot_disable() call was passed just the flags we are enabling so in case flags == DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL instead of properly disabling quotas. Fix the problem by always passing DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this case. Reported-and-tested-by: Ye Bin <yebin10@huawei.com> Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-2-yebin10@huawei.com> |
|
|
|
576215cffd |
fs: Drop wait_unfrozen wait queue
wait_unfrozen waitqueue is used only in quota code to wait for filesystem to become unfrozen. In that place we can just use sb_start_write() - sb_end_write() pair to achieve the same. So just remove the waitqueue. Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230525141710.7595-1-jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
36d532d713 |
quota: mark PRINT_QUOTA_WARNING as BROKEN
PRINT_QUOTA_WARNING is deprecated since commit
|
|
|
|
f4251e371d |
quota: update Kconfig comment
f2fs support quota since commit
|
|
|
|
9e1fb91bcb |
quota: Use register_sysctl_init() for registering fs_dqstats_table
register_sysctl_init() also prints information that may be useful for further debugging when we fail to register sysctl table. Use it when registering fs_dqstats_table. Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
c87d175d0a |
quota: make dquot_set_dqinfo return errors from ->write_info
dquot_set_dqinfo() ignores the return code from the ->write_info call, which means that quotacalls like Q_SETINFO never see the error. This doesn't seem right, so fix that. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230227120216.31306-2-frank.li@vivo.com> |
|
|
|
f8107c996f |
quota: fixup *_write_file_info() to return proper error code
For v1_write_file_info function, when quota_write() returns 0, it should be considered an EIO error. And for v2_write_file_info(), fix to proper error return code instead of raw number. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230227120216.31306-1-frank.li@vivo.com> |
|
|
|
dced733d7f |
quota: simplify two-level sysctl registration for fs_dqstats_table
There is no need to declare two tables to just create directories, this can be easily be done with a prefix path with register_sysctl(). Simplify this registration. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230310231206.3952808-4-mcgrof@kernel.org> |
|
|
|
8cc01d43f8 |
RCU pull request for v6.3
This pull request contains the following branches:
doc.2023.01.05a: Documentation updates.
fixes.2023.01.23a: Miscellaneous fixes, perhaps most notably:
o Throttling callback invocation based on the number of callbacks
that are now ready to invoke instead of on the total number
of callbacks.
o Several patches that suppress false-positive boot-time
diagnostics, for example, due to lockdep not yet being
initialized.
o Make expedited RCU CPU stall warnings dump stacks of any tasks
that are blocking the stalled grace period. (Normal RCU CPU
stall warnings have doen this for mnay years.)
o Lazy-callback fixes to avoid delays during boot, suspend, and
resume. (Note that lazy callbacks must be explicitly enabled,
so this should not (yet) affect production use cases.)
kvfree.2023.01.03a: Cause kfree_rcu() and friends to take advantage of
polled grace periods, thus reducing memory footprint by almost
two orders of magnitude, admittedly on a microbenchmark.
This series also begins the transition from kfree_rcu(p) to
kfree_rcu_mightsleep(p). This transition was motivated by bugs
where kfree_rcu(p), which can block, was typed instead of the
intended kfree_rcu(p, rh).
srcu.2023.01.03a: SRCU updates, perhaps most notably fixing a bug that
causes SRCU to fail when booted on a system with a non-zero boot
CPU. This surprising situation actually happens for kdump kernels
on the powerpc architecture. It also adds an srcu_down_read()
and srcu_up_read(), which act like srcu_read_lock() and
srcu_read_unlock(), but allow an SRCU read-side critical section
to be handed off from one task to another.
srcu-always.2023.02.02a: Cleans up the now-useless SRCU Kconfig option.
There are a few more commits that are not yet acked or pulled
into maintainer trees, and these will be in a pull request for
a later merge window.
tasks.2023.01.03a: RCU-tasks updates, perhaps most notably these fixes:
o A strange interaction between PID-namespace unshare and the
RCU-tasks grace period that results in a low-probability but
very real hang.
o A race between an RCU tasks rude grace period on a single-CPU
system and CPU-hotplug addition of the second CPU that can result
in a too-short grace period.
o A race between shrinking RCU tasks down to a single callback list
and queuing a new callback to some other CPU, but where that
queuing is delayed for more than an RCU grace period. This can
result in that callback being stranded on the non-boot CPU.
torture.2023.01.05a: Torture-test updates and fixes.
torturescript.2023.01.03a: Torture-test scripting updates and fixes.
stall.2023.01.09a: Provide additional RCU CPU stall-warning information
in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and
restore the full five-minute timeout limit for expedited RCU
CPU stall warnings.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmPq29UTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jAhVEACEAKJY1VJ9IUqz7CwzAYkzgRJfiygh
oDUXmlqtm6ew9pr2GdLUVCVsUSldzBc0K7Djb/G1niv4JPs+v7YwupIV33+UbStU
Qxt6ztTdxc4lKospLm1+2vF9ZdzVEmiP4wVCc4iDarv5FM3FpWSTNc8+L7qmlC+X
myjv+GqMTxkXZBvYJOgJGFjDwN8noTd7Fr3mCCVLFm3PXMDa7tcwD6HRP5AqD2N8
qC5M6LEqepKVGmz0mYMLlSN1GPaqIsEcexIFEazRsPEivPh/iafyQCQ/cqxwhXmV
vEt7u+dXGZT/oiDq9cJ+/XRDS2RyKIS6dUE14TiiHolDCn1ONESahfA/gXWKykC2
BaGPfjWXrWv/hwbeZ+8xEdkAvTIV92tGpXir9Fby1Z5PjP3balvrnn6hs5AnQBJb
NdhRPLzy/dCnEF+CweAYYm1qvTo8cd5nyiNwBZHn7rEAIu3Axrecag1rhFl3AJ07
cpVMQXZtkQVa2X8aIRTUC+ijX6yIqNaHlu0HqNXgIUTDzL4nv5cMjOMzpNQP9/dZ
FwAMZYNiOk9IlMiKJ8ZiVcxeiA8ouIBlkYM3k6vGrmiONZ7a/EV/mSHoJqI8bvqr
AxUIJ2Ayhg3bxPboL5oKgCiLql0A7ZVvz6quX6McitWGMgaSvel1fDzT3TnZd41e
4AFBFd/+VedUGg==
=bBYK
-----END PGP SIGNATURE-----
Merge tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes, perhaps most notably:
- Throttling callback invocation based on the number of callbacks
that are now ready to invoke instead of on the total number of
callbacks
- Several patches that suppress false-positive boot-time
diagnostics, for example, due to lockdep not yet being
initialized
- Make expedited RCU CPU stall warnings dump stacks of any tasks
that are blocking the stalled grace period. (Normal RCU CPU
stall warnings have done this for many years)
- Lazy-callback fixes to avoid delays during boot, suspend, and
resume. (Note that lazy callbacks must be explicitly enabled, so
this should not (yet) affect production use cases)
- Make kfree_rcu() and friends take advantage of polled grace periods,
thus reducing memory footprint by almost two orders of magnitude,
admittedly on a microbenchmark
This also begins the transition from kfree_rcu(p) to
kfree_rcu_mightsleep(p). This transition was motivated by bugs where
kfree_rcu(p), which can block, was typed instead of the intended
kfree_rcu(p, rh)
- SRCU updates, perhaps most notably fixing a bug that causes SRCU to
fail when booted on a system with a non-zero boot CPU. This
surprising situation actually happens for kdump kernels on the
powerpc architecture
This also adds an srcu_down_read() and srcu_up_read(), which act like
srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side
critical section to be handed off from one task to another
- Clean up the now-useless SRCU Kconfig option
There are a few more commits that are not yet acked or pulled into
maintainer trees, and these will be in a pull request for a later
merge window
- RCU-tasks updates, perhaps most notably these fixes:
- A strange interaction between PID-namespace unshare and the
RCU-tasks grace period that results in a low-probability but
very real hang
- A race between an RCU tasks rude grace period on a single-CPU
system and CPU-hotplug addition of the second CPU that can
result in a too-short grace period
- A race between shrinking RCU tasks down to a single callback
list and queuing a new callback to some other CPU, but where
that queuing is delayed for more than an RCU grace period. This
can result in that callback being stranded on the non-boot CPU
- Torture-test updates and fixes
- Torture-test scripting updates and fixes
- Provide additional RCU CPU stall-warning information in kernels built
with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute
timeout limit for expedited RCU CPU stall warnings
* tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep()
kernel/notifier: Remove CONFIG_SRCU
init: Remove "select SRCU"
fs/quota: Remove "select SRCU"
fs/notify: Remove "select SRCU"
fs/btrfs: Remove "select SRCU"
fs: Remove CONFIG_SRCU
drivers/pci/controller: Remove "select SRCU"
drivers/net: Remove "select SRCU"
drivers/md: Remove "select SRCU"
drivers/hwtracing/stm: Remove "select SRCU"
drivers/dax: Remove "select SRCU"
drivers/base: Remove CONFIG_SRCU
rcu: Disable laziness if lazy-tracking says so
rcu: Track laziness during boot and suspend
rcu: Remove redundant call to rcu_boost_kthread_setaffinity()
rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts
rcu: Align the output of RCU CPU stall warning messages
rcu: Add RCU stall diagnosis information
sched: Add helper nr_context_switches_cpu()
...
|
|
|
|
dbea8bcded |
fs/quota: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jan Kara <jack@suse.com> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: John Ogness <john.ogness@linutronix.de> |
|
|
|
4d7ca40901
|
fs: port vfs{g,u}id helpers to mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
|
|
|
|
0dbe12f2e4
|
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
|
|
|
|
f861646a65
|
quota: port to mnt_idmap
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
|
|
|
|
d323877484 |
ext4: fix bug_on in __es_tree_search caused by bad quota inode
We got a issue as fllows: ================================================================== kernel BUG at fs/ext4/extents_status.c:202! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 1 PID: 810 Comm: mount Not tainted 6.1.0-rc1-next-g9631525255e3 #352 RIP: 0010:__es_tree_search.isra.0+0xb8/0xe0 RSP: 0018:ffffc90001227900 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000077512a0f RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000002a10 RDI: ffff8881004cd0c8 RBP: ffff888177512ac8 R08: 47ffffffffffffff R09: 0000000000000001 R10: 0000000000000001 R11: 00000000000679af R12: 0000000000002a10 R13: ffff888177512d88 R14: 0000000077512a10 R15: 0000000000000000 FS: 00007f4bd76dbc40(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005653bf993cf8 CR3: 000000017bfdf000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_cache_extent+0xe2/0x210 ext4_cache_extents+0xd2/0x110 ext4_find_extent+0x5d5/0x8c0 ext4_ext_map_blocks+0x9c/0x1d30 ext4_map_blocks+0x431/0xa50 ext4_getblk+0x82/0x340 ext4_bread+0x14/0x110 ext4_quota_read+0xf0/0x180 v2_read_header+0x24/0x90 v2_check_quota_file+0x2f/0xa0 dquot_load_quota_sb+0x26c/0x760 dquot_load_quota_inode+0xa5/0x190 ext4_enable_quotas+0x14c/0x300 __ext4_fill_super+0x31cc/0x32c0 ext4_fill_super+0x115/0x2d0 get_tree_bdev+0x1d2/0x360 ext4_get_tree+0x19/0x30 vfs_get_tree+0x26/0xe0 path_mount+0x81d/0xfc0 do_mount+0x8d/0xc0 __x64_sys_mount+0xc0/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> ================================================================== Above issue may happen as follows: ------------------------------------- ext4_fill_super ext4_orphan_cleanup ext4_enable_quotas ext4_quota_enable ext4_iget --> get error inode <5> ext4_ext_check_inode --> Wrong imode makes it escape inspection make_bad_inode(inode) --> EXT4_BOOT_LOADER_INO set imode dquot_load_quota_inode vfs_setup_quota_inode --> check pass dquot_load_quota_sb v2_check_quota_file v2_read_header ext4_quota_read ext4_bread ext4_getblk ext4_map_blocks ext4_ext_map_blocks ext4_find_extent ext4_cache_extents ext4_es_cache_extent __es_tree_search.isra.0 ext4_es_end --> Wrong extents trigger BUG_ON In the above issue, s_usr_quota_inum is set to 5, but inode<5> contains incorrect imode and disordered extents. Because 5 is EXT4_BOOT_LOADER_INO, the ext4_ext_check_inode check in the ext4_iget function can be bypassed, finally, the extents that are not checked trigger the BUG_ON in the __es_tree_search function. To solve this issue, check whether the inode is bad_inode in vfs_setup_quota_inode(). Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221026042310.3839669-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org |
|
|
|
191249f708 |
quota: Add more checking after reading from quota file
It would be better to do more sanity checking (eg. dqdh_entries, block no.) for the content read from quota file, which can prevent corrupting the quota file. Link: https://lore.kernel.org/r/20220923134555.2623931-4-chengzhihao1@huawei.com Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
3fc61e0e96 |
quota: Replace all block number checking with helper function
Cleanup all block checking places, replace them with helper function do_check_range(). Link: https://lore.kernel.org/r/20220923134555.2623931-3-chengzhihao1@huawei.com Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> |
|
|
|
6c8ea8b8cd |
quota: Check next/prev free block number after reading from quota file
Following process:
Init: v2_read_file_info: <3> dqi_free_blk 0 dqi_free_entry 5 dqi_blks 6
Step 1. chown bin f_a -> dquot_acquire -> v2_write_dquot:
qtree_write_dquot
do_insert_tree
find_free_dqentry
get_free_dqblk
write_blk(info->dqi_blocks) // info->dqi_blocks = 6, failure. The
content in physical block (corresponding to blk 6) is random.
Step 2. chown root f_a -> dquot_transfer -> dqput_all -> dqput ->
ext4_release_dquot -> v2_release_dquot -> qtree_delete_dquot:
dquot_release
remove_tree
free_dqentry
put_free_dqblk(6)
info->dqi_free_blk = blk // info->dqi_free_blk = 6
Step 3. drop cache (buffer head for block 6 is released)
Step 4. chown bin f_b -> dquot_acquire -> commit_dqblk -> v2_write_dquot:
qtree_write_dquot
do_insert_tree
find_free_dqentry
get_free_dqblk
dh = (struct qt_disk_dqdbheader *)buf
blk = info->dqi_free_blk // 6
ret = read_blk(info, blk, buf) // The content of buf is random
info->dqi_free_blk = le32_to_cpu(dh->dqdh_next_free) // random blk
Step 5. chown bin f_c -> notify_change -> ext4_setattr -> dquot_transfer:
dquot = dqget -> acquire_dquot -> ext4_acquire_dquot -> dquot_acquire ->
commit_dqblk -> v2_write_dquot -> dq_insert_tree:
do_insert_tree
find_free_dqentry
get_free_dqblk
blk = info->dqi_free_blk // If blk < 0 and blk is not an error
code, it will be returned as dquot
transfer_to[USRQUOTA] = dquot // A random negative value
__dquot_transfer(transfer_to)
dquot_add_inodes(transfer_to[cnt])
spin_lock(&dquot->dq_dqb_lock) // page fault
, which will lead to kernel page fault:
Quota error (device sda): qtree_write_dquot: Error -8000 occurred
while creating quota
BUG: unable to handle page fault for address: ffffffffffffe120
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
Oops: 0002 [#1] PREEMPT SMP
CPU: 0 PID: 5974 Comm: chown Not tainted 6.0.0-rc1-00004
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:_raw_spin_lock+0x3a/0x90
Call Trace:
dquot_add_inodes+0x28/0x270
__dquot_transfer+0x377/0x840
dquot_transfer+0xde/0x540
ext4_setattr+0x405/0x14d0
notify_change+0x68e/0x9f0
chown_common+0x300/0x430
__x64_sys_fchownat+0x29/0x40
In order to avoid accessing invalid quota memory address, this patch adds
block number checking of next/prev free block read from quota file.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216372
Fixes:
|
|
|
|
6614a3c316 |
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve latency
and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA
jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/
SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE=
=w/UH
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
|
|
|
|
e33c267ab7 |
mm: shrinkers: provide shrinkers with names
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
b27c82e129
|
attr: port attribute changes to new types
Now that we introduced new infrastructure to increase the type safety
for filesystems supporting idmapped mounts port the first part of the
vfs over to them.
This ports the attribute changes codepaths to rely on the new better
helpers using a dedicated type.
Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.
The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.
We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.
Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
inode->i_{g,u}id since they are different types. Instead they need to
use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
vfs{g,u}id into the filesystem.
The other nice effect is that filesystems like overlayfs don't need to
care about idmappings explicitly anymore and can simply set up struct
iattr accordingly directly.
Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
|
|
71e7b535b8
|
quota: port quota helpers mount ids
Port the is_quota_modification() and dqout_transfer() helper to type
safe vfs{g,u}id_t. Since these helpers are only called by a few
filesystems don't introduce a new helper but simply extend the existing
helpers to pass down the mount's idmapping.
Note, that this is a non-functional change, i.e. nothing will have
happened here or at the end of this series to how quota are done! This
a change necessary because we will at the end of this series make
ownership changes easier to reason about by keeping the original value
in struct iattr for both non-idmapped and idmapped mounts.
For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.
This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.
Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.
Since struct iattr uses an anonymous union with overlapping types as
supported by the C standard, filesystems that haven't converted to
ia_vfs{g,u}id won't see any difference and things will continue to work
as before. In other words, no functional changes intended with this
change.
Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|