Commit Graph

30 Commits

Author SHA1 Message Date
Liu Shixin 72a14e821c memcg: expose swapcache stat for memcg v1
Patch series "Expose swapcache stat for memcg v1", v2.

Since commit b603894248 ("mm: memcg: add swapcache stat for memcg v2")
adds swapcache stat for the cgroup v2, it seems there is no reason to hide
it in memcg v1.  Conversely, with swapcached it is more accurate to
evaluate the available memory for memcg.

Link: https://lkml.kernel.org/r/20230915105845.3199656-1-liushixin2@huawei.com
Link: https://lkml.kernel.org/r/20230915105845.3199656-2-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Suggested-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-06 14:44:10 -07:00
Michal Hocko 4597648fdd mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb9 ("memcg: drop kmem.limit_in_bytes") and
partially reverts 58056f7750 ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit.  Unfortunately it has turned out that
there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1].  The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result.  The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate through
the maze of 3rd party consumers of the software.

Now, reverting alone 86327e8eb9 is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file.  Therefore we have to go and revisit 58056f7750 as
well.  There are two ways to go ahead.  Either we give up on the
deprecation and fully revert 58056f7750 as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact. 
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.

Note to backporters to stable trees.  a8c49af3be ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore.  Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).

[akpm@linux-foundation.org: fix build - remove unused local]
Link: http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net [1]
Link: https://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz
Fixes: 86327e8eb9 ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-29 17:20:47 -07:00
Linus Torvalds 7716f383a5 cgroup: Changes for v6.6
* Per-cpu cpu usage stats are now tracked. This currently isn't printed out
   in the cgroupfs interface and can only be accessed through e.g. BPF.
   Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs.
 
 * cpuset received some cleanups and prepatory patches for the pending
   cpus.exclusive patchset which will allow cpuset partitions to be created
   below non-partition parents, which should ease the management of partition
   cpusets.
 
 * A lot of code and documentation cleanup patches.
 
 * tools/testing/selftests/cgroup/test_cpuset.c is added. This causes trivial
   conflicts in .gitignore and Makefile under the directory against
   fe3b1bf19b ("selftests: cgroup: add test_zswap program"). They can be
   resolved by keeping lines from both branches.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZPENTg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGcyBAP44cHwpSFxXe3cehxAzb1l/2BZXtzU5l48OqUQd
 MwHyrwEAm7+MTVAR2xOF4f+oVM9KWmKj7oV7Clpixl1S7hHyjwE=
 =FCc9
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Per-cpu cpu usage stats are now tracked

   This currently isn't printed out in the cgroupfs interface and can
   only be accessed through e.g. BPF. Should decide on a not-too-ugly
   way to show per-cpu stats in cgroupfs

 - cpuset received some cleanups and prepatory patches for the pending
   cpus.exclusive patchset which will allow cpuset partitions to be
   created below non-partition parents, which should ease the management
   of partition cpusets

 - A lot of code and documentation cleanup patches

 - tools/testing/selftests/cgroup/test_cpuset.c added

* tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (32 commits)
  cgroup: Avoid -Wstringop-overflow warnings
  cgroup:namespace: Remove unused cgroup_namespaces_init()
  cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants
  cgroup: clean up if condition in cgroup_pidlist_start()
  cgroup: fix obsolete function name in cgroup_destroy_locked()
  Documentation: cgroup-v2.rst: Correct number of stats entries
  cgroup: fix obsolete function name above css_free_rwork_fn()
  cgroup/cpuset: fix kernel-doc
  cgroup: clean up printk()
  cgroup: fix obsolete comment above cgroup_create()
  docs: cgroup-v1: fix typo
  docs: cgroup-v1: correct the term of Page Cache organization in inode
  cgroup/misc: Store atomic64_t reads to u64
  cgroup/misc: Change counters to be explicit 64bit types
  cgroup/misc: update struct members descriptions
  cgroup: remove cgrp->kn check in css_populate_dir()
  cgroup: fix obsolete function name
  cgroup: use cached local variable parent in for loop
  cgroup: remove obsolete comment above struct cgroupstats
  cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED
  ...
2023-09-01 15:58:21 -07:00
Michal Hocko 86327e8eb9 memcg: drop kmem.limit_in_bytes
kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been
deprecated since 58056f7750 ("memcg, kmem: further deprecate
kmem.limit_in_bytes") merged in 5.16.  We haven't heard about any serious
users since then but it seems that the mere presence of the file is
causing more harm thatn good.  We (SUSE) have had several bug reports from
customers where Docker based containers started to fail because a write to
kmem.limit_in_bytes has failed.

This was unexpected because runc code only expects ENOENT (kmem disabled)
or EBUSY (tasks already running within cgroup).  So a new error code was
unexpected and the whole container startup failed.  This has been later
addressed by
52390d6804
so current Docker runtimes do not suffer from the problem anymore.  There
are still older version of Docker in use and likely hard to get rid of
completely.

Address this by wiping out the file completely and effectively get back to
pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.

I would recommend backporting to stable trees which have picked up
58056f7750 ("memcg, kmem: further deprecate kmem.limit_in_bytes").

[mhocko@suse.com: restore _KMEM switch case]
  Link: https://lkml.kernel.org/r/ZKe5wxdbvPi5Cwd7@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:12:11 -07:00
Xiongwei Song ab8aebdc9f docs: cgroup-v1: fix typo
"listers" -> "listeners"

Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-07-21 08:11:41 -10:00
Xiongwei Song fe9ebb8cec docs: cgroup-v1: correct the term of Page Cache organization in inode
The radix-tree for Page Cache has been replaced with xarray, see
commit eb797a8ee0 ("page cache: Rearrange address_space"), so move
"radix-tree" to "xarray".

Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-07-21 08:11:27 -10:00
Kefeng Wang 6c77b607ee mm: kill lock|unlock_page_memcg()
Since commit c7c3dec1c9 ("mm: rmap: remove lock_page_memcg()"),
no more user, kill lock_page_memcg() and unlock_page_memcg().

Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19 16:19:33 -07:00
Linus Torvalds 3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Johannes Weiner da34a8484d mm: memcontrol: deprecate charge moving
Charge moving mode in cgroup1 allows memory to follow tasks as they
migrate between cgroups.  This is, and always has been, a questionable
thing to do - for several reasons.

First, it's expensive.  Pages need to be identified, locked and isolated
from various MM operations, and reassigned, one by one.

Second, it's unreliable.  Once pages are charged to a cgroup, there isn't
always a clear owner task anymore.  Cache isn't moved at all, for example.
Mapped memory is moved - but if trylocking or isolating a page fails,
it's arbitrarily left behind.  Frequent moving between domains may leave a
task's memory scattered all over the place.

Third, it isn't really needed.  Launcher tasks can kick off workload tasks
directly in their target cgroup.  Using dedicated per-workload groups
allows fine-grained policy adjustments - no need to move tasks and their
physical pages between control domains.  The feature was never
forward-ported to cgroup2, and it hasn't been missed.

Despite it being a niche usecase, the maintenance overhead of supporting
it is enormous.  Because pages are moved while they are live and subject
to various MM operations, the synchronization rules are complicated. 
There are lock_page_memcg() in MM and FS code, which non-cgroup people
don't understand.  In some cases we've been able to shift code and cgroup
API calls around such that we can rely on native locking as much as
possible.  But that's fragile, and sometimes we need to hold MM locks for
longer than we otherwise would (pte lock e.g.).

Mark the feature deprecated. Hopefully we can remove it soon.

And backport into -stable kernels so that people who develop against
earlier kernels are warned about this deprecation as early as possible.

[akpm@linux-foundation.org: fix memory.rst underlining]
Link: https://lkml.kernel.org/r/Y5COd+qXwk/S+n8N@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18 17:12:42 -08:00
Bagas Sanjaya 980660cae7 docs: cgroup-v1: use numbered lists for user interface setup
Setup instructions for memory resource controller UI uses a mix of
section headings and normal paragraphs, whereas numbered lists are
better fit for this purpose.

While at it, also slightly reword the instructions and add reference to
"Why are cgroups needed?" in the main cgroups documentation.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:11 -10:00
Bagas Sanjaya da3ad2e14f docs: cgroup-v1: add internal cross-references
The documentation contains references to other sections in the doc
(internal). Add cross-references for them so that these can be accessed
without having to manually search for them.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya 5fa16afc4b docs: cgroup-v1: make swap extension subsections subsections
Subsections text of swap extension section is marked up as bold text,
whereas making them proper subsection is more appropriate.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya b9d2a17b32 docs: cgroup-v1: use bullet lists for list of stat file tables
The stat file section contains three tables, where the leading texts for
them are subsection heading. Organize them in the bullet list, while
demoting headings into normal text.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya f7423bb771 docs: cgroup-v1: move hierarchy of accounting caption
The caption for hierarchy of accounting figure is in the code block,
which is quite odd. Move the caption into :caption: option of
code-block:: directive instead.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya 71da431c30 docs: cgroup-v1: fix footnotes
The documentation contains external references, which some of them are
marked as footnotes. Fix the syntax for them to be properly rendered as
such.

Non-footnote references aren't affected since the text for these is
aligned the same to the footnotes.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya eb08489448 docs: cgroup-v1: use code block for locking order schema
The locking order schema is a figure (like diagram), which should have
been formatted with literal code block for consistency with other
figures.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya 4ddb1a2aa1 docs: cgroup-v1: wrap remaining admonitions in admonition blocks
Wrap two other admonitions in appropriate blocks in order for readers to
pay more attention to block contents:

  * hint:: for editor's note
  * warning:: for move charges deprecation

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Bagas Sanjaya 56eb276701 docs: cgroup-v1: replace custom note constructs with appropriate admonition blocks
Admonition constructs on the documentation use definition lists, which
isn't fit for the purpose. Replace them with appropriate blocks:

  * Use caution:: for outdated document notice
  * hint:: for memo
  * note:: for other constructs
  * warning:: for memory reclaim

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-05 07:38:10 -10:00
Jian Wen 9b34a307f3 docs: admin-guide: cgroup-v1: update description of inactive_file
MADV_FREE pages have been moved into the LRU_INACTIVE_FILE list by commit
f7ad2a6cb9 ("mm: move MADV_FREE pages into LRU_INACTIVE_FILE list").

Link: https://lkml.kernel.org/r/20221111034639.3593380-1-wenjian1@xiaomi.com
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:54 -08:00
Johannes Weiner e55b9f9686 mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
Since 2d1c498072 ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03 14:03:36 -07:00
Sebastian Andrzej Siewior 2343e88d23 mm/memcg: disable threshold event handlers on PREEMPT_RT
During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'.  Moving the code around
and avoiding then would then lead to an additional local-irq-save
section within memcg_check_events().  While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-off block on non-PREEMPT_RT configurations.

The threshold event handler is a deprecated memcg v1 feature.  Instead
of trying to get it to work under PREEMPT_RT just disable it.  There
should be no users on PREEMPT_RT.  From that perspective it makes even
less sense to get it to work under PREEMPT_RT while having zero users.

Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT.  Make an empty memcg_check_events() and
memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT.
Document that the two knobs are disabled on PREEMPT_RT.

Link: https://lkml.kernel.org/r/20220226204144.1008339-3-bigeasy@linutronix.de
Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:02 -07:00
Shakeel Butt 58056f7750 memcg, kmem: further deprecate kmem.limit_in_bytes
The deprecation process of kmem.limit_in_bytes started with the commit
0158115f70 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also
explains in detail the motivation behind the deprecation.  To summarize,
it is the unexpected behavior on hitting the kmem limit.  This patch
moves the deprecation process to the next stage by disallowing to set
the kmem limit.  In future we might just remove the kmem.limit_in_bytes
file completely.

[akpm@linux-foundation.org: s/ENOTSUPP/EOPNOTSUPP/]
[arnd@arndb.de: mark cancel_charge() inline]
  Link: https://lkml.kernel.org/r/20211022070542.679839-1-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211019153408.2916808-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:35 -07:00
Bhaskar Chowdhury fdebeae0d7 docs: admin-guide: cgroup-v1: Fix typos in the file memory.rst
s/overcommited/overcommitted/
s/Overcommiting/Overcommitting/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210313061029.28024-1-unixbhaskar@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-03-15 13:03:25 -06:00
Yang Shi 1eff491fc4 doc: memcontrol: add description for oom_kill
When debugging an oom issue, I found the oom_kill counter of memcg is
confusing.  At the first glance without checking document, I thought it
just counts for memcg oom, but it turns out it counts both global and
memcg oom.

The cgroup v2 documents it, but the description is missed for cgroup v1.

Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Chris Down <chris@chrisdown.name>
Link: https://lore.kernel.org/r/20210226021254.3980-1-shy828301@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-03-06 17:36:50 -07:00
Joe Perches 05a5f51ca5 Documentation: Replace lkml.org links with lore
Replace the lkml.org links with lore to better use a single source
that's more likely to stay available long-term.

Done by bash script:

cvt_lkml_to_lore ()
{
    tmpfile=$(mktemp ./.cvt_links.XXXXXXX)

    header=$(echo $1 | sed 's@/lkml/@/lkml/headers/@')

    wget -qO - $header > $tmpfile
    if [[ $? == 0 ]] ; then
	link=$(grep -i '^Message-Id:' $tmpfile | head -1 | \
		   sed -r -e 's/^\s*Message-Id:\s*<\s*//' -e  's/\s*>\s*$//' -e 's@^@https://lore.kernel.org/r/@')
	#    echo "testlink: $link"
	if [ -n "$link" ] ; then
	    wget -qO - $link > /dev/null
	    if [[ $? == 0 ]] ; then
		echo $link
	    fi
	fi
    fi

    rm -f $tmpfile
}

git grep -P -o "\bhttps?://(?:www.)?lkml.org/lkml[\/\w]+" $@ |
    while read line ; do
	echo $line
	file=$(echo $line | cut -f1 -d':')
	link=$(echo $line | cut -f2- -d':')
	newlink=$(cvt_lkml_to_lore $link)
	if [[ -n "$newlink" ]] ; then
	    sed -i -e "s#\b$link\b#$newlink#" $file
	fi
    done

Link: https://lore.kernel.org/patchwork/patch/1265849/#1462688
Signed-off-by: Joe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/77cdb7f32cfb087955bfc3600b86c40bed5d4104.camel@perches.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-01-11 12:47:38 -07:00
Hugh Dickins 15b4473617 mm/lru: revise the comments of lru_lock
Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix
the incorrect comments in code.  Also fixed some zone->lru_lock comment
error from ancient time.  etc.

I struggled to understand the comment above move_pages_to_lru() (surely
it never calls page_referenced()), and eventually realized that most of
it had got separated from shrink_active_list(): move that comment back.

Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 14:48:04 -08:00
Roman Gushchin 184218639a docs: cgroup-v1: reflect the deprecation of the non-hierarchical mode
Update cgroup v1 docs after the deprecation of the non-hierarchical mode
of the memory controller.

Link: https://lkml.kernel.org/r/20201110220800.929549-3-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 12:13:40 -08:00
Alex Shi 0a27cae138 mm: memcontrol: document the new swap control behavior
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Link: http://lkml.kernel.org/r/20200508183105.225460-18-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:48 -07:00
Michal Hocko 0158115f70 memcg, kmem: deprecate kmem.limit_in_bytes
Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
which turned out to be really a bad idea because there are paths which
cannot shrink the kernel memory usage enough to get below the limit (e.g.
because the accounted memory is not reclaimable).  There are cases when
the failure is even not allowed (e.g.  __GFP_NOFAIL).  This means that the
kmem limit is in excess to the hard limit without any way to shrink and
thus completely useless.  OOM killer cannot be invoked to handle the
situation because that would lead to a premature oom killing.

As a result many places might see ENOMEM returning from kmalloc and result
in unexpected errors.  E.g.  a global OOM killer when there is a lot of
free memory because ENOMEM is translated into VM_FAULT_OOM in #PF path and
therefore pagefault_out_of_memory would result in OOM killer.

Please note that the kernel memory is still accounted to the overall limit
along with the user memory so removing the kmem specific limit should
still allow to contain kernel memory consumption.  Unlike the kmem one,
though, it invokes memory reclaim and targeted memcg oom killing if
necessary.

Start the deprecation process by crying to the kernel log.  Let's see
whether there are relevant usecases and simply return to EINVAL in the
second stage if nobody complains in few releases.

[akpm@linux-foundation.org: tweak documentation text]
Link: http://lkml.kernel.org/r/20190911151612.GI4023@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Thomas Lindroth <thomas.lindroth@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 15:54:10 -07:00
Mauro Carvalho Chehab da82c92f11 docs: cgroup-v1: add it to the admin-guide book
Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15 11:03:02 -03:00