Commit Graph

24189 Commits

Author SHA1 Message Date
Linus Torvalds 6f110a5e4f Disable SLUB_TINY for build testing
... and don't error out so hard on missing module descriptions.

Before commit 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').

After that commit the warning became an unconditional hard error.

And it turns out not all modules have been converted despite the claims
to the contrary.  As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.

The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself.  And so anybody doing full build tests
didn't actually see this failre.

So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage.  Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.

Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-06 10:00:04 -07:00
Thomas Gleixner 8fa7292fee treewide: Switch/rename to timer_delete[_sync]()
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-05 10:30:12 +02:00
Linus Torvalds 8c7c1b5506 - The 2 patch series "mm: fixes for fallouts from mem_init() cleanup"
from Mike Rapoport fixes a couple of issues with the just-merged "arch,
   mm: reduce code duplication in mem_init()" series.
 
 - The 4 patch series "MAINTAINERS: add my isub-entries to MM part." from
   Mike Rapoport does some maintenance on MAINTAINERS.
 
 - The 6 patch series "remove tlb_remove_page_ptdesc()" from Qi Zheng
   does some cleanup work to the page mapping code.
 
 - The 7 patch series "mseal system mappings" from Jeff Xu permits
   sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors
   (arm compat-mode), sigpage (arm compat-mode).
 
 - Plus the usual shower of singleton patches.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4XpgAKCRDdBJ7gKXxA
 jnwtAP43Rp3zyWf034fEypea36xQqcsy4I7YUTdZEgnFS7LCZwEApM97JvGHsYEr
 Ns9Zhnh+E3RWASfOAzJoVZVrAaMovg4=
 =MyVR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

 - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
   Rapoport fixes a couple of issues with the just-merged "arch, mm:
   reduce code duplication in mem_init()" series

 - The series "MAINTAINERS: add my isub-entries to MM part." from Mike
   Rapoport does some maintenance on MAINTAINERS

 - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
   cleanup work to the page mapping code

 - The series "mseal system mappings" from Jeff Xu permits sealing of
   "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
   compat-mode), sigpage (arm compat-mode)

 - Plus the usual shower of singleton patches

* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
  mseal sysmap: add arch-support txt
  mseal sysmap: enable s390
  selftest: test system mappings are sealed
  mseal sysmap: update mseal.rst
  mseal sysmap: uprobe mapping
  mseal sysmap: enable arm64
  mseal sysmap: enable x86-64
  mseal sysmap: generic vdso vvar mapping
  selftests: x86: test_mremap_vdso: skip if vdso is msealed
  mseal sysmap: kernel config and header change
  mm: pgtable: remove tlb_remove_page_ptdesc()
  x86: pgtable: convert to use tlb_remove_ptdesc()
  riscv: pgtable: unconditionally use tlb_remove_ptdesc()
  mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
  mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
  mm: pgtable: make generic tlb_remove_table() use struct ptdesc
  microblaze/mm: put mm_cmdline_setup() in .init.text section
  mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
  MAINTAINERS: mm: add entry for secretmem
  MAINTAINERS: mm: add entry for numa memblocks and numa emulation
  ...
2025-04-03 11:10:00 -07:00
Linus Torvalds 204e9a18f1 5 hotfixes. 3 are cc:stable and the remainder address post-6.14 issues
or aren't considered necessary for -stable kernels.
 
 All 5 patches are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+4VZAAKCRDdBJ7gKXxA
 jqptAQDhx4JF+Q7pLzf3CQoMkTvg68mbS9lfItb9JP4IhGu/xgD8CfZAy33sJKLg
 1cHzWI+fdltyq+Nev+kO3oqawTGpyQE=
 =Tm8z
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM hotfixes from Andrew Morton:
 "Five hotfixes. Three are cc:stable and the remainder address post-6.14
  issues or aren't considered necessary for -stable kernels.

  All patches are for MM"

* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
  mm/hugetlb: move hugetlb_sysctl_init() to the __init section
  mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
  mm/hugetlb_vmemmap: fix memory loads ordering
  mm/userfaultfd: fix release hang over concurrent GUP
2025-04-03 10:47:47 -07:00
Alexei Starovoitov 2985dae1e5 mm/page_alloc: Fix try_alloc_pages
Fix an obvious bug. try_alloc_pages() should set_page_refcounted.

[ Not so obvious: it was probably correct at the time it was written but
  was at some point then rebased on top of v6.14-rc1.

  And at that point there was a semantic conflict with commit
  efabfe1420 ("mm/page_alloc: move set_page_refcounted() to callers
  of get_page_from_freelist()") and became buggy.
							- Linus ]

Fixes: 97769a53f1 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil BAbka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-02 09:34:31 -07:00
Linus Torvalds 3491aa0478 VFIO updates for v6.15-rc1
- Relax IGD support code to match display class device rather than
    specifically requiring a VGA device. (Tomita Moeko)
 
  - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD
    levels to take advantage of huge pfnmap support added in v6.12.
    (Alex Williamson)
 
  - Extend virtio vfio-pci variant driver to include migration support
    for block devices where enabled by the PF. (Yishai Hadas)
 
  - Virtualize INTx PIN register for devices where the platform does
    not route legacy PCI interrupts for the device and the interrupt
    is reported as IRQ_NOTCONNECTED. (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmfq5nEbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsi3KAP/2MQcQaKTZ6/+dG6YdKT
 ZFaY4+xJ14DnUN/z96UlIWLk8bWgSyDFxdoFMbtFGENKRslEWxZ7In9Caow7f6ux
 7/usBjSvJa5Yx9YWRGsblrx7IyYfSW6R1V+jH3xPd+K8Ir4K7SUvb1CJLVPdfEYh
 OWer8eRpZ5tw3R2X4o+QxZ+H4Fx1zVQourW35h4daqrjnn7kOQMJIzGYOwHSDlCy
 lW0X0yD3sGgw9w7qAmEDmw9UbKGf245AVylIl5T1a7c3RaO+eKdKPZfNa18g0J/Q
 5pRMK+2PvZ+S0OTYxotcF9GtEJ3iBxY8W4QnlLiyTs9XyZ7tLMzGvLEKmCDKA0U8
 yAtoJ5T00PVXjMxkZx1+oMGja9Hx+b7gABTYpbf5wRtab6EdNWln++I1HCLKgZZ+
 yStvQNsMYGbJsLfwiGouMwD24JT+xg3A+Dv2Cx+Ai4NVJebxTD8Lhc0lz2I6IpOh
 wFBpBzBIPpcG53oQ1Syb9GLESQ0Acb4LUMjsSxIg7QFSrWgAAlq/PiLXv852S3xJ
 pUEh7r/YByQytUsQajgE7ekKqyXw0gn99Z+UTk0LUIq/y7SxrIPeqzq2qRf490RV
 wnkOrMxrAWj84lkIv8hLCiLsXMmvV4rsMJgV+s8KQZ+hPv38mPbkFYGdHj4h5RPA
 5J5h32dDkHLK+X5u//gBY8xh
 =fJPO
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Relax IGD support code to match display class device rather than
   specifically requiring a VGA device (Tomita Moeko)

 - Accelerate DMA mapping of device MMIO by iterating at PMD and PUD
   levels to take advantage of huge pfnmap support added in v6.12
   (Alex Williamson)

 - Extend virtio vfio-pci variant driver to include migration support
   for block devices where enabled by the PF (Yishai Hadas)

 - Virtualize INTx PIN register for devices where the platform does not
   route legacy PCI interrupts for the device and the interrupt is
   reported as IRQ_NOTCONNECTED (Alex Williamson)

* tag 'vfio-v6.15-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Handle INTx IRQ_NOTCONNECTED
  vfio/virtio: Enable support for virtio-block live migration
  vfio/type1: Use mapping page mask for pfnmaps
  mm: Provide address mask in struct follow_pfnmap_args
  vfio/type1: Use consistent types for page counts
  vfio/type1: Use vfio_batch for vaddr_get_pfns()
  vfio/type1: Convert all vaddr_get_pfns() callers to use vfio_batch
  vfio/type1: Catch zero from pin_user_pages_remote()
  vfio/pci: match IGD devices in display controller class
2025-04-01 19:35:19 -07:00
Jinjiang Tu 9342bc134a mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
We triggered the below BUG:

 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x2 pfn:0x240402
 head: order:9 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x1ffffe0000000040(head|node=1|zone=3|lastcpupid=0x1ffff)
 page_type: f4(hugetlb)
 page dumped because: VM_BUG_ON_PAGE(page->compound_head & 1)
 ------------[ cut here ]------------
 kernel BUG at ./include/linux/page-flags.h:310!
 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
 Modules linked in:
 CPU: 7 UID: 0 PID: 166 Comm: sh Not tainted 6.14.0-rc7-dirty #374
 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : const_folio_flags+0x3c/0x58
 lr : const_folio_flags+0x3c/0x58
 Call trace:
  const_folio_flags+0x3c/0x58 (P)
  do_migrate_range+0x164/0x720
  offline_pages+0x63c/0x6fc
  memory_subsys_offline+0x190/0x1f4
  device_offline+0xc0/0x13c
  state_store+0x90/0xd8
  dev_attr_store+0x18/0x2c
  sysfs_kf_write+0x44/0x54
  kernfs_fop_write_iter+0x120/0x1cc
  vfs_write+0x240/0x378
  ksys_write+0x70/0x108
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x48/0x10c
  el0_svc_common.constprop.0+0x40/0xe0

When allocating a hugetlb folio, between the folio is taken from buddy and
prep_compound_page() is called, start_isolate_page_range() and
do_migrate_range() is called.  When do_migrate_range() scans the head page
of the hugetlb folio, the compound_head field isn't set, so scans the tail
page next.  And at this time, the compound_head field of tail page is set,
folio_test_large() is called by tail page, thus triggers VM_BUG_ON().

To fix it, get folio refcount before calling folio_test_large().

Link: https://lkml.kernel.org/r/20250324131750.1551884-1-tujinjiang@huawei.com
Fixes: 8135d8926c ("mm: memory_hotplug: memory hotremove supports thp migration")
Fixes: b62b51d2d1 ("mm: memory_hotplug: remove head variable in do_migrate_range()")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:12 -07:00
Mike Rapoport (Microsoft) 7790c9c926 memblock: don't release high memory to page allocator when HIGHMEM is off
Nathan Chancellor reports the following crash on a MIPS system with
CONFIG_HIGHMEM=n:

  Linux version 6.14.0-rc6-00359-g6faea3422e3b (nathan@ax162) (mips-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Fri Mar 21 08:12:02 MST 2025
  earlycon: uart8250 at I/O port 0x3f8 (options '38400n8')
  printk: legacy bootconsole [uart8250] enabled
  Config serial console: console=ttyS0,38400n8r
  CPU0 revision is: 00019300 (MIPS 24Kc)
  FPU revision is: 00739300
  MIPS: machine is mti,malta
  Software DMA cache coherency enabled
  Initial ramdisk at: 0x8fad0000 (5360128 bytes)
  OF: reserved mem: Reserved memory: No reserved-memory node in the DT
  Primary instruction cache 2kB, VIPT, 2-way, linesize 16 bytes.
  Primary data cache 2kB, 2-way, VIPT, no aliases, linesize 16 bytes
  Zone ranges:
    DMA      [mem 0x0000000000000000-0x0000000000ffffff]
    Normal   [mem 0x0000000001000000-0x000000001fffffff]
  Movable zone start for each node
  Early memory node ranges
    node   0: [mem 0x0000000000000000-0x000000000fffffff]
    node   0: [mem 0x0000000090000000-0x000000009fffffff]
  Initmem setup node 0 [mem 0x0000000000000000-0x000000009fffffff]
  On node 0, zone Normal: 16384 pages in unavailable ranges
  random: crng init done
  percpu: Embedded 3 pages/cpu s18832 r8192 d22128 u49152
  Kernel command line: rd_start=0xffffffff8fad0000 rd_size=5360128  console=ttyS0,38400n8r
  printk: log buffer data + meta data: 32768 + 102400 = 135168 bytes
  Dentry cache hash table entries: 65536 (order: 4, 262144 bytes, linear)
  Inode-cache hash table entries: 32768 (order: 3, 131072 bytes, linear)
  Writing ErrCtl register=00000000
  Readback ErrCtl register=00000000
  Built 1 zonelists, mobility grouping on.  Total pages: 16384
  mem auto-init: stack:all(zero), heap alloc:off, heap free:off
  Unhandled kernel unaligned access[#1]:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00359-g6faea3422e3b #1
  Hardware name: mti,malta
  $ 0   : 00000000 00000001 81cb0880 00129027
  $ 4   : 00000001 0000000a 00000002 00129026
  $ 8   : ffffdfff 80101e00 00000002 00000000
  $12   : 81c9c224 81c63e68 00000002 00000000
  $16   : 805b1e00 00025800 81cb0880 00000002
  $20   : 00000000 81c63e64 0000000a 81f10000
  $24   : 81c63e64 81c63e60
  $28   : 81c60000 81c63de0 00000001 81cc9d20
  Hi    : 00000000
  Lo    : 00000000
  epc   : 814a227c __free_pages_ok+0x144/0x3c0
  ra    : 81cc9d20 memblock_free_all+0x1d4/0x27c
  Status: 10000002        KERNEL EXL
  Cause : 00800410 (ExcCode 04)
  BadVA : 00129026
  PrId  : 00019300 (MIPS 24Kc)
  Modules linked in:
  Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 81f10000 805a9e00 81c80000 00000000 00000002 814aa240 000003ff 00000400
          00000000 81f10000 81c9c224 00003b1f 81c80000 81c63e60 81ca0000 81c63e64
          81f10000 0000000a 0000001f 81cc9d20 81f10000 81cc96d8 00000000 81c80000
          81c9c224 81c63e60 81c63e64 00000000 81f10000 00024000 00028000 00025c00
          90000000 a0000000 00000002 00000017 00000000 00000000 81f10000 81f10000
          ...
  Call Trace:
  [<814a227c>] __free_pages_ok+0x144/0x3c0
  [<81cc9d20>] memblock_free_all+0x1d4/0x27c
  [<81cc6764>] mm_core_init+0x100/0x138
  [<81cb4ba4>] start_kernel+0x4a0/0x6e4

  Code: 1080ffd5  02003825  2467ffff <8ce30000> 7c630500  1060ffd4  00000000  8ce30000  7c630180

The crash happens because commit 6faea3422e ("arch, mm: streamline
HIGHMEM freeing") too eagerly frees high memory to the page allocator even
when HIGHMEM is disabled.

Make sure that when CONFIG_HIGHMEM=n the high memory is not released to the
page allocator.

Link: https://lore.kernel.org/all/20250323190647.GA1009914@ax162
Link: https://lkml.kernel.org/r/20250325114928.1791109-3-rppt@kernel.org
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: 6faea3422e ("arch, mm: streamline HIGHMEM freeing")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Mike Rapoport (Microsoft) 2ebc3b68ac mm/mm_init: init holes in the end of the memory map for FLATMEM
Patch series "mm: fixes for fallouts from mem_init() cleanup".

These are the fixes for fallouts from mem_init() cleanup reported by
Nathan Chancellor and kbuild.  The details are in the commit messages.


This patch (of 2):

Kernel test robot reports the following crash on 32-bit system with
FLATMEM and DEBUG_VM_PGFLAGS enabled:

[    0.478822][    T0] kernel BUG at include/linux/page-flags.h:536!
[    0.479312][    T0] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
[    0.479768][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-rc6-00357-g8268af309d07 #1
[    0.480470][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.481260][ T0] EIP: reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.481683][ T0] Code: 5d c3 01 f1 89 c8 ba e1 38 f4 c3 e8 1e 37 8e fc 0f 0b b8 90 e2 62 c4 e8 e2 05 5e fc 01 f1 89 c8 ba be 85 f7 c3 e8 04 37 8e fc <0f> 0b b8 80 e2 62 c4 e8 c8 05 5e fc 55 89 e5 53 57 56 83 ec 10 89
[    0.483177][    T0] EAX: 00000000 EBX: c425df50 ECX: 00000000 EDX: 00000000
[    0.483712][    T0] ESI: 017ffc00 EDI: ffffffff EBP: c425df34 ESP: c425df2c
[    0.484248][    T0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
[    0.484846][    T0] CR0: 80050033 CR2: 00000000 CR3: 04b48000 CR4: 00000090
[    0.485376][    T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.485907][    T0] DR6: fffe0ff0 DR7: 00000400
[    0.486253][    T0] Call Trace:
[ 0.486494][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:478)
[ 0.486822][ T0] ? die (arch/x86/kernel/dumpstack.c:?)
[ 0.487099][ T0] ? do_trap (arch/x86/kernel/traps.c:? arch/x86/kernel/traps.c:197)
[ 0.487409][ T0] ? do_error_trap (arch/x86/kernel/traps.c:217)
[ 0.487752][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.488153][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.488490][ T0] ? handle_invalid_op (arch/x86/kernel/traps.c:254)
[ 0.488869][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.489271][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:316)
[ 0.489619][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1055)
[ 0.489996][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.490332][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.490733][ T0] ? exc_overflow (arch/x86/kernel/traps.c:301)
[ 0.491068][ T0] ? reserve_bootmem_region (include/linux/page-flags.h:536)
[ 0.491470][ T0] memmap_init_reserved_pages (mm/memblock.c:2203)
[ 0.491887][ T0] free_low_memory_core_early (mm/memblock.c:?)
[ 0.492302][ T0] memblock_free_all (mm/memblock.c:2272 include/linux/atomic/atomic-arch-fallback.h:546 include/linux/atomic/atomic-long.h:123 include/linux/atomic/atomic-instrumented.h:3261 include/linux/mm.h:67 mm/memblock.c:2273)
[ 0.492659][ T0] mem_init (arch/x86/mm/init_32.c:735)
[ 0.492952][ T0] mm_core_init (mm/mm_init.c:2730)
[ 0.493271][ T0] start_kernel (init/main.c:958)
[ 0.493604][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79)
[ 0.493969][ T0] startup_32_smp (arch/x86/kernel/head_32.S:292)

The crash happens because after commit 8268af309d ("arch, mm: set
max_mapnr when allocating memory map for FLATMEM") max_mapnr is rounded up
to MAX_ORDER_NR_PAGES and the pages in the end of the memory map are
passing pfn_valid() check in reserve_bootmem_region().

Make sure that that pages in the end of the memory map are initialized,
just like the pages in the end of the last section for SPARSEMEM.

Link: https://lkml.kernel.org/r/20250325114928.1791109-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250325114928.1791109-2-rppt@kernel.org
Fixes: 8268af309d ("arch, mm: set max_mapnr when allocating memory map for FLATMEM")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503241424.d16223ec-lkp@intel.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:11 -07:00
Ye Liu bd145bdd26 mm/page_alloc: replace flag check with PageHWPoison() in check_new_page_bad()
This patch replaces the direct check for the __PG_HWPOISON flag with the
PageHWPoison() macro, improving code readability and maintaining
consistency with other parts of the memory management code.

Link: https://lkml.kernel.org/r/20250320063346.489030-1-ye.liu@linux.dev
Signed-off-by: Ye Liu <liuye@kylinos.cn>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:10 -07:00
Taotao Chen 7f29070f4c mm/damon/core: simplify control flow in damon_register_ops()
The function logic is not complex, so using goto is unnecessary.  Replace
it with a straightforward if-else to simplify control flow and improve
readability.

Link: https://lkml.kernel.org/r/Z9vxcPCw8tDsjKw1@OneApple
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:10 -07:00
Harry Yoo 7fa46cdfff mm/kasan: use SLAB_NO_MERGE flag instead of an empty constructor
Use SLAB_NO_MERGE flag to prevent merging instead of providing an empty
constructor.  Using an empty constructor in this manner is an abuse of
slab interface.

The SLAB_NO_MERGE flag should be used with caution, but in this case, it
is acceptable as the cache is intended solely for debugging purposes.

No functional changes intended.

Link: https://lkml.kernel.org/r/20250318015926.1629748-1-harry.yoo@oracle.com
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:10 -07:00
Johannes Weiner 7a95a05f15 mm: page_alloc: fix defrag_mode's retry & OOM path
Brendan points out that defrag_mode doesn't properly clear
ALLOC_NOFRAGMENT on its last-ditch attempt to allocate.  But looking
closer, the problem is actually more severe: it doesn't actually *check*
whether it's already retried, and keeps looping.  This means the OOM path
is never taken, and the thread can loop indefinitely.

This is verified with an intentional OOM test on defrag_mode=1, which
results in the machine hanging.  After this patch, it triggers the OOM
kill reliably and recovers.

Clear ALLOC_NOFRAGMENT properly, and only retry once.

Link: https://lkml.kernel.org/r/20250401041231.GA2117727@cmpxchg.org
Fixes: e3aa7df331 ("mm: page_alloc: defrag_mode")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:09 -07:00
Lorenzo Stoakes 36eed54008 mm/mremap: do not set vrm->vma NULL immediately prior to checking it
This seems rather unwise.  If we cannot merge, extend, then we need to
recall the original VMA to see if we need to uncharge.

If we do need to, do so.

Link: https://lkml.kernel.org/r/b2fb6b9c-376d-4e9b-905e-26d847fd3865@lucifer.local
Fixes: d5c8aec054 ("mm/mremap: initial refactor of move_vma()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-=by: "Lai, Yi" <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/linux-mm/Z+lcvEIHMLiKVR1i@ly-workstation/
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:17:09 -07:00
Yosry Ahmed c11bcbc0a5 mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex.  crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).

On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
crypto_scomp_init_tfm()), and then allocates memory.  If the allocation
results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.

The above dependencies can cause an ABBA deadlock.  For example in the
following scenario:

(1) Task A running on CPU #1:
    crypto_alloc_acomp_node()
      Holds scomp_lock
      Enters reclaim
      Reads per_cpu_ptr(pool->acomp_ctx, 1)

(2) Task A is descheduled

(3) CPU #1 goes offline
    zswap_cpu_comp_dead(CPU #1)
      Holds per_cpu_ptr(pool->acomp_ctx, 1))
      Calls crypto_free_acomp()
      Waits for scomp_lock

(4) Task A running on CPU #2:
      Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
      DEADLOCK

Since there is no requirement to call crypto_free_acomp() with the per-CPU
acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
unlocked.  Also move the acomp_request_free() and kfree() calls for
consistency and to avoid any potential sublte locking dependencies in the
future.

With this, only setting acomp_ctx fields to NULL occurs with the mutex
held.  This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.

Opportunistically, move the NULL check on acomp_ctx so that it takes place
before the mutex dereference.

Link: https://lkml.kernel.org/r/20250226185625.2672936-1-yosry.ahmed@linux.dev
Fixes: 12dcb0ef54 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Co-developed-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reported-by: syzbot+1a517ccfcbc6a7ab0f82@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcea51.050a0220.bbfd1.0096.GAE@google.com/
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Nhat Pham <nphamcs@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chris Murphy <lists@colorremedies.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:14:43 -07:00
Marc Herbert 1ca77ff183 mm/hugetlb: move hugetlb_sysctl_init() to the __init section
hugetlb_sysctl_init() is only invoked once by an __init function and is
merely a wrapper around another __init function so there is not reason to
keep it.

Fixes the following warning when toning down some GCC inline options:

 WARNING: modpost: vmlinux: section mismatch in reference:
   hugetlb_sysctl_init+0x1b (section: .text) ->
     __register_sysctl_init (section: .init.text)

Link: https://lkml.kernel.org/r/20250319060041.2737320-1-marc.herbert@linux.intel.com
Signed-off-by: Marc Herbert <Marc.Herbert@linux.intel.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:14:43 -07:00
Liu Shixin a0a9f2180b mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
I found a NULL pointer dereference as followed:

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP PTI
 CPU: 5 UID: 0 PID: 5964 Comm: sh Kdump: loaded Not tainted 6.13.0-dirty #20
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.
 RIP: 0010:has_unmovable_pages+0x184/0x360
 ...
 Call Trace:
  <TASK>
  set_migratetype_isolate+0xd1/0x180
  start_isolate_page_range+0xd2/0x170
  alloc_contig_range_noprof+0x101/0x660
  alloc_contig_pages_noprof+0x238/0x290
  alloc_gigantic_folio.isra.0+0xb6/0x1f0
  only_alloc_fresh_hugetlb_folio.isra.0+0xf/0x60
  alloc_pool_huge_folio+0x80/0xf0
  set_max_huge_pages+0x211/0x490
  __nr_hugepages_store_common+0x5f/0xe0
  nr_hugepages_store+0x77/0x80
  kernfs_fop_write_iter+0x118/0x200
  vfs_write+0x23c/0x3f0
  ksys_write+0x62/0xe0
  do_syscall_64+0x5b/0x170
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

As has_unmovable_pages() call folio_hstate() without hugetlb_lock, there
is a race to free the HugeTLB page between PageHuge() and folio_hstate(). 
There is no need to add hugetlb_lock here as the HugeTLB page can be freed
in lot of places.  So it's enough to unfold folio_hstate() and add a check
to avoid NULL pointer dereference for hugepage_migration_supported().

Link: https://lkml.kernel.org/r/20250122061151.578768-1-liushixin2@huawei.com
Fixes: 464c7ffbcb ("mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-01 15:14:43 -07:00
Linus Torvalds eb0ece1602 - The 6 patch series "Enable strict percpu address space checks" from
Uros Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.
 
   This has caused a small amount of fallout - two or three issues were
   reported.  In all cases the calling code was founf to be incorrect.
 
 - The 4 patch series "Some cleanup for memcg" from Chen Ridong
   implements some relatively monir cleanups for the memcontrol code.
 
 - The 17 patch series "mm: fixes for device-exclusive entries (hmm)"
   from David Hildenbrand fixes a boatload of issues which David found then
   using device-exclusive PTE entries when THP is enabled.  More work is
   needed, but this makes thins better - our own HMM selftests now succeed.
 
 - The 2 patch series "mm: zswap: remove z3fold and zbud" from Yosry
   Ahmed remove the z3fold and zbud implementations.  They have been
   deprecated for half a year and nobody has complained.
 
 - The 5 patch series "mm: further simplify VMA merge operation" from
   Lorenzo Stoakes implements numerous simplifications in this area.  No
   runtime effects are anticipated.
 
 - The 4 patch series "mm/madvise: remove redundant mmap_lock operations
   from process_madvise()" from SeongJae Park rationalizes the locking in
   the madvise() implementation.  Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.
 
 - The 12 patch series "Tiny cleanup and improvements about SWAP code"
   from Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.
 
 - The 2 patch series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak user-visible
   output.
 
 - The 2 patch series "mm/damon/paddr: fix large folios access and
   schemes handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.
 
 - The 3 patch series "mm/damon/core: fix wrong and/or useless
   damos_walk() behaviors" from SeongJae Park fixes a few issues with the
   accuracy of kdamond's walking of DAMON regions.
 
 - The 3 patch series "expose mapping wrprotect, fix fb_defio use" from
   Lorenzo Stoakes changes the interaction between framebuffer deferred-io
   and core MM.  No functional changes are anticipated - this is
   preparatory work for the future removal of page structure fields.
 
 - The 4 patch series "mm/damon: add support for hugepage_size DAMOS
   filter" from Usama Arif adds a DAMOS filter which permits the filtering
   by huge page sizes.
 
 - The 4 patch series "mm: permit guard regions for file-backed/shmem
   mappings" from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state.  The feature now covers shmem and
   file-backed mappings.
 
 - The 4 patch series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping for
   pte-mapped large folios.
 
 - The 18 patch series "reimplement per-vma lock as a refcount" from
   Suren Baghdasaryan puts the vm_lock back into the vma.  Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy.  This patchset provides small (0-10%) improvements on one
   microbenchmark.
 
 - The 5 patch series "Docs/mm/damon: misc DAMOS filters documentation
   fixes and improves" from SeongJae Park does some maintenance work on the
   DAMON docs.
 
 - The 27 patch series "hugetlb/CMA improvements for large systems" from
   Frank van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for unmapped
   pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.
 
 - The 19 patch series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.
 
 - The 12 patch series "selftests/mm: Some cleanups from trying to run
   them" from Brendan Jackman fixes a pile of unrelated issues which
   Brendan encountered while runnimg our selftests.
 
 - The 2 patch series "fs/proc/task_mmu: add guard region bit to pagemap"
   from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.
 
 - The 7 patch series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply wasn't
   being effective.
 
 - The 5 patch series "mm: cleanups for device-exclusive entries (hmm)"
   from David Hildenbrand implements a number of unrelated cleanups in this
   code.
 
 - The 5 patch series "mm: Rework generic PTDUMP configs" from Anshuman
   Khandual implements a number of preparatoty cleanups to the
   GENERIC_PTDUMP Kconfig logic.
 
 - The 8 patch series "mm/damon: auto-tune aggregation interval" from
   SeongJae Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.
 
 - The 5 patch series "Fix lazy mmu mode" from Ryan Roberts fixes some
   issues in powerpc, sparc and x86 lazy MMU implementations.  Ryan did
   this in preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.
 
 - The 2 patch series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the code
   easier to follow.
 
 - The 3 patch series "page_counter cleanup and size reduction" from
   Shakeel Butt cleans up the page_counter code and fixes a size increase
   which we accidentally added late last year.
 
 - The 3 patch series "Add a command line option that enables control of
   how many threads should be used to allocate huge pages" from Thomas
   Prescher does that.  It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.
 
 - The 3 patch series "Fix calculations in trace_balance_dirty_pages()
   for cgwb" from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.
 
 - The 9 patch series "mm/damon: make allow filters after reject filters
   useful and intuitive" from SeongJae Park improves the handling of allow
   and reject filters.  Behaviour is made more consistent and the
   documention is updated accordingly.
 
 - The 5 patch series "Switch zswap to object read/write APIs" from Yosry
   Ahmed updates zswap to the new object read/write APIs and thus permits
   the removal of some legacy code from zpool and zsmalloc.
 
 - The 6 patch series "Some trivial cleanups for shmem" from Baolin Wang
   does as it claims.
 
 - The 20 patch series "fs/dax: Fix ZONE_DEVICE page reference counts"
   from Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.
 
 - The 4 patch series "refactor mremap and fix bug" from Lorenzo Stoakes
   is a preparatoty refactoring and cleanup of the mremap() code.
 
 - The 20 patch series "mm: MM owner tracking for large folios (!hugetlb)
   + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.
 
 - The 8 patch series "mm/damon: add sysfs dirs for managing DAMOS
   filters based on handling layers" from SeongJae Park adds a couple of
   new sysfs directories to ease the management of DAMON/DAMOS filters.
 
 - The 13 patch series "arch, mm: reduce code duplication in mem_init()"
   from Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.
 
 - The 13 patch series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.
 
 - The 3 patch series "mm: page_ext: Introduce new iteration API" from
   Luiz Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.
 
 - The 8 patch series "Buddy allocator like (or non-uniform) folio split"
   from Zi Yan reworks the code to split a folio into smaller folios.  The
   main benefit is lessened memory consumption: fewer post-split folios are
   generated.
 
 - The 2 patch series "Minimize xa_node allocation during xarry split"
   from Zi Yan reduces the number of xarray xa_nodes which are generated
   during an xarray split.
 
 - The 2 patch series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.
 
 - The 3 patch series "Add tracepoints for lowmem reserves, watermarks
   and totalreserve_pages" from Martin Liu adds some more tracepoints to
   the page allocator code.
 
 - The 4 patch series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which SeongJae
   observed during his earlier madvise work.
 
 - The 3 patch series "mm/hwpoison: Fix regressions in memory failure
   handling" from Shuai Xue addresses two quite serious regressions which
   Shuai has observed in the memory-failure implementation.
 
 - The 5 patch series "mm: reliable huge page allocator" from Johannes
   Weiner makes huge page allocations cheaper and more reliable by reducing
   fragmentation.
 
 - The 5 patch series "Minor memcg cleanups & prep for memdescs" from
   Matthew Wilcox is preparatory work for the future implementation of
   memdescs.
 
 - The 4 patch series "track memory used by balloon drivers" from Nico
   Pache introduces a way to track memory used by our various balloon
   drivers.
 
 - The 2 patch series "mm/damon: introduce DAMOS filter type for active
   pages" from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.
 
 - The 2 patch series "Adding Proactive Memory Reclaim Statistics" from
   Hao Jia separates the proactive reclaim statistics from the direct
   reclaim statistics.
 
 - The 2 patch series "mm/vmscan: don't try to reclaim hwpoison folio"
   from Jinjiang Tu fixes our handling of hwpoisoned pages within the
   reclaim code.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ+nZaAAKCRDdBJ7gKXxA
 jsOWAPiP4r7CJHMZRK4eyJOkvS1a1r+TsIarrFZtjwvf/GIfAQCEG+JDxVfUaUSF
 Ee93qSSLR1BkNdDw+931Pu0mXfbnBw==
 =Pn2K
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "Enable strict percpu address space checks" from Uros
   Bizjak uses x86 named address space qualifiers to provide
   compile-time checking of percpu area accesses.

   This has caused a small amount of fallout - two or three issues were
   reported. In all cases the calling code was found to be incorrect.

 - The series "Some cleanup for memcg" from Chen Ridong implements some
   relatively monir cleanups for the memcontrol code.

 - The series "mm: fixes for device-exclusive entries (hmm)" from David
   Hildenbrand fixes a boatload of issues which David found then using
   device-exclusive PTE entries when THP is enabled. More work is
   needed, but this makes thins better - our own HMM selftests now
   succeed.

 - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
   remove the z3fold and zbud implementations. They have been deprecated
   for half a year and nobody has complained.

 - The series "mm: further simplify VMA merge operation" from Lorenzo
   Stoakes implements numerous simplifications in this area. No runtime
   effects are anticipated.

 - The series "mm/madvise: remove redundant mmap_lock operations from
   process_madvise()" from SeongJae Park rationalizes the locking in the
   madvise() implementation. Performance gains of 20-25% were observed
   in one MADV_DONTNEED microbenchmark.

 - The series "Tiny cleanup and improvements about SWAP code" from
   Baoquan He contains a number of touchups to issues which Baoquan
   noticed when working on the swap code.

 - The series "mm: kmemleak: Usability improvements" from Catalin
   Marinas implements a couple of improvements to the kmemleak
   user-visible output.

 - The series "mm/damon/paddr: fix large folios access and schemes
   handling" from Usama Arif provides a couple of fixes for DAMON's
   handling of large folios.

 - The series "mm/damon/core: fix wrong and/or useless damos_walk()
   behaviors" from SeongJae Park fixes a few issues with the accuracy of
   kdamond's walking of DAMON regions.

 - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
   Stoakes changes the interaction between framebuffer deferred-io and
   core MM. No functional changes are anticipated - this is preparatory
   work for the future removal of page structure fields.

 - The series "mm/damon: add support for hugepage_size DAMOS filter"
   from Usama Arif adds a DAMOS filter which permits the filtering by
   huge page sizes.

 - The series "mm: permit guard regions for file-backed/shmem mappings"
   from Lorenzo Stoakes extends the guard region feature from its
   present "anon mappings only" state. The feature now covers shmem and
   file-backed mappings.

 - The series "mm: batched unmap lazyfree large folios during
   reclamation" from Barry Song cleans up and speeds up the unmapping
   for pte-mapped large folios.

 - The series "reimplement per-vma lock as a refcount" from Suren
   Baghdasaryan puts the vm_lock back into the vma. Our reasons for
   pulling it out were largely bogus and that change made the code more
   messy. This patchset provides small (0-10%) improvements on one
   microbenchmark.

 - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
   improves" from SeongJae Park does some maintenance work on the DAMON
   docs.

 - The series "hugetlb/CMA improvements for large systems" from Frank
   van der Linden addresses a pile of issues which have been observed
   when using CMA on large machines.

 - The series "mm/damon: introduce DAMOS filter type for unmapped pages"
   from SeongJae Park enables users of DMAON/DAMOS to filter my the
   page's mapped/unmapped status.

 - The series "zsmalloc/zram: there be preemption" from Sergey
   Senozhatsky teaches zram to run its compression and decompression
   operations preemptibly.

 - The series "selftests/mm: Some cleanups from trying to run them" from
   Brendan Jackman fixes a pile of unrelated issues which Brendan
   encountered while runnimg our selftests.

 - The series "fs/proc/task_mmu: add guard region bit to pagemap" from
   Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
   determine whether a particular page is a guard page.

 - The series "mm, swap: remove swap slot cache" from Kairui Song
   removes the swap slot cache from the allocation path - it simply
   wasn't being effective.

 - The series "mm: cleanups for device-exclusive entries (hmm)" from
   David Hildenbrand implements a number of unrelated cleanups in this
   code.

 - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
   implements a number of preparatoty cleanups to the GENERIC_PTDUMP
   Kconfig logic.

 - The series "mm/damon: auto-tune aggregation interval" from SeongJae
   Park implements a feedback-driven automatic tuning feature for
   DAMON's aggregation interval tuning.

 - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
   powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
   preparation for implementing lazy mmu mode for arm64 to optimize
   vmalloc.

 - The series "mm/page_alloc: Some clarifications for migratetype
   fallback" from Brendan Jackman reworks some commentary to make the
   code easier to follow.

 - The series "page_counter cleanup and size reduction" from Shakeel
   Butt cleans up the page_counter code and fixes a size increase which
   we accidentally added late last year.

 - The series "Add a command line option that enables control of how
   many threads should be used to allocate huge pages" from Thomas
   Prescher does that. It allows the careful operator to significantly
   reduce boot time by tuning the parallalization of huge page
   initialization.

 - The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
   from Tang Yizhou fixes the tracing output from the dirty page
   balancing code.

 - The series "mm/damon: make allow filters after reject filters useful
   and intuitive" from SeongJae Park improves the handling of allow and
   reject filters. Behaviour is made more consistent and the documention
   is updated accordingly.

 - The series "Switch zswap to object read/write APIs" from Yosry Ahmed
   updates zswap to the new object read/write APIs and thus permits the
   removal of some legacy code from zpool and zsmalloc.

 - The series "Some trivial cleanups for shmem" from Baolin Wang does as
   it claims.

 - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
   Alistair Popple regularizes the weird ZONE_DEVICE page refcount
   handling in DAX, permittig the removal of a number of special-case
   checks.

 - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
   preparatoty refactoring and cleanup of the mremap() code.

 - The series "mm: MM owner tracking for large folios (!hugetlb) +
   CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
   which we determine whether a large folio is known to be mapped
   exclusively into a single MM.

 - The series "mm/damon: add sysfs dirs for managing DAMOS filters based
   on handling layers" from SeongJae Park adds a couple of new sysfs
   directories to ease the management of DAMON/DAMOS filters.

 - The series "arch, mm: reduce code duplication in mem_init()" from
   Mike Rapoport consolidates many per-arch implementations of
   mem_init() into code generic code, where that is practical.

 - The series "mm/damon/sysfs: commit parameters online via
   damon_call()" from SeongJae Park continues the cleaning up of sysfs
   access to DAMON internal data.

 - The series "mm: page_ext: Introduce new iteration API" from Luiz
   Capitulino reworks the page_ext initialization to fix a boot-time
   crash which was observed with an unusual combination of compile and
   cmdline options.

 - The series "Buddy allocator like (or non-uniform) folio split" from
   Zi Yan reworks the code to split a folio into smaller folios. The
   main benefit is lessened memory consumption: fewer post-split folios
   are generated.

 - The series "Minimize xa_node allocation during xarry split" from Zi
   Yan reduces the number of xarray xa_nodes which are generated during
   an xarray split.

 - The series "drivers/base/memory: Two cleanups" from Gavin Shan
   performs some maintenance work on the drivers/base/memory code.

 - The series "Add tracepoints for lowmem reserves, watermarks and
   totalreserve_pages" from Martin Liu adds some more tracepoints to the
   page allocator code.

 - The series "mm/madvise: cleanup requests validations and
   classifications" from SeongJae Park cleans up some warts which
   SeongJae observed during his earlier madvise work.

 - The series "mm/hwpoison: Fix regressions in memory failure handling"
   from Shuai Xue addresses two quite serious regressions which Shuai
   has observed in the memory-failure implementation.

 - The series "mm: reliable huge page allocator" from Johannes Weiner
   makes huge page allocations cheaper and more reliable by reducing
   fragmentation.

 - The series "Minor memcg cleanups & prep for memdescs" from Matthew
   Wilcox is preparatory work for the future implementation of memdescs.

 - The series "track memory used by balloon drivers" from Nico Pache
   introduces a way to track memory used by our various balloon drivers.

 - The series "mm/damon: introduce DAMOS filter type for active pages"
   from Nhat Pham permits users to filter for active/inactive pages,
   separately for file and anon pages.

 - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
   separates the proactive reclaim statistics from the direct reclaim
   statistics.

 - The series "mm/vmscan: don't try to reclaim hwpoison folio" from
   Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
   code.

* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
  mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
  x86/mm: restore early initialization of high_memory for 32-bits
  mm/vmscan: don't try to reclaim hwpoison folio
  mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
  cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
  mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
  selftests/mm: speed up split_huge_page_test
  selftests/mm: uffd-unit-tests support for hugepages > 2M
  docs/mm/damon/design: document active DAMOS filter type
  mm/damon: implement a new DAMOS filter type for active pages
  fs/dax: don't disassociate zero page entries
  MM documentation: add "Unaccepted" meminfo entry
  selftests/mm: add commentary about 9pfs bugs
  fork: use __vmalloc_node() for stack allocation
  docs/mm: Physical Memory: Populate the "Zones" section
  xen: balloon: update the NR_BALLOON_PAGES state
  hv_balloon: update the NR_BALLOON_PAGES state
  balloon_compaction: update the NR_BALLOON_PAGES state
  meminfo: add a per node counter for balloon drivers
  mm: remove references to folio in __memcg_kmem_uncharge_page()
  ...
2025-04-01 09:29:18 -07:00
Linus Torvalds 46d29f23a7 ring-buffer updates for v6.15
- Restructure the persistent memory to have a "scratch" area
 
   Instead of hard coding the KASLR offset in the persistent memory
   by the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should
   not be part of that.
 
   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.
 
 - Allow where modules are loaded to be saved in the new scratch pad
 
   Save the addresses of modules when they are loaded into the persistent
   memory scratch pad.
 
 - A new module_for_each_mod() helper function was created
 
   With the acknowledgement of the module maintainers a new module helper
   function was created to iterate over all the currently loaded modules.
   This has a callback to be called for each module. This is needed for
   when tracing is started in the persistent buffer and the currently loaded
   modules need to be saved in the scratch area.
 
 - Expose the last boot information where the kernel and modules were loaded
 
   The last_boot_info file is updated to print out the addresses of where
   the kernel "_text" location was loaded from a previous boot, as well
   as where the modules are loaded. If the buffer is recording the current
   boot, it only prints "# Current" so that it does not expose the KASLR
   offset of the currently running kernel.
 
 - Allow the persistent ring buffer to be released (freed)
 
   To have this in production environments, where the kernel command line can
   not be changed easily, the ring buffer needs to be freed when it is not
   going to be used. The memory for the buffer will always be allocated at
   boot up, but if the system isn't going to enable tracing, the memory needs
   to be freed. Allow it to be freed and added back to the kernel memory
   pool.
 
 - Allow stack traces to print the function names in the persistent buffer
 
   Now that the modules are saved in the persistent ring buffer, if the same
   modules are loaded, the printing of the function names will examine the
   saved modules. If the module is found in the scratch area and is also
   loaded, then it will do the offset shift and use kallsyms to display the
   function name. If the address is not found, it simply displays the address
   from the previous boot in hex.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+cUERQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrAsAQCFt2nfzxoe3wtF5EqIT1VHp/8bQVjG
 gBe8B6ouboreogD/dS7yK8MRy24ZAmObGwYG0RbVicd50S7P8Rf7+823ng8=
 =OJKk
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ring-buffer updates from Steven Rostedt:

 - Restructure the persistent memory to have a "scratch" area

   Instead of hard coding the KASLR offset in the persistent memory by
   the ring buffer, push that work up to the callers of the persistent
   memory as they are the ones that need this information. The offsets
   and such is not important to the ring buffer logic and it should not
   be part of that.

   A scratch pad is now created when the caller allocates a ring buffer
   from persistent memory by stating how much memory it needs to save.

 - Allow where modules are loaded to be saved in the new scratch pad

   Save the addresses of modules when they are loaded into the
   persistent memory scratch pad.

 - A new module_for_each_mod() helper function was created

   With the acknowledgement of the module maintainers a new module
   helper function was created to iterate over all the currently loaded
   modules. This has a callback to be called for each module. This is
   needed for when tracing is started in the persistent buffer and the
   currently loaded modules need to be saved in the scratch area.

 - Expose the last boot information where the kernel and modules were
   loaded

   The last_boot_info file is updated to print out the addresses of
   where the kernel "_text" location was loaded from a previous boot, as
   well as where the modules are loaded. If the buffer is recording the
   current boot, it only prints "# Current" so that it does not expose
   the KASLR offset of the currently running kernel.

 - Allow the persistent ring buffer to be released (freed)

   To have this in production environments, where the kernel command
   line can not be changed easily, the ring buffer needs to be freed
   when it is not going to be used. The memory for the buffer will
   always be allocated at boot up, but if the system isn't going to
   enable tracing, the memory needs to be freed. Allow it to be freed
   and added back to the kernel memory pool.

 - Allow stack traces to print the function names in the persistent
   buffer

   Now that the modules are saved in the persistent ring buffer, if the
   same modules are loaded, the printing of the function names will
   examine the saved modules. If the module is found in the scratch area
   and is also loaded, then it will do the offset shift and use kallsyms
   to display the function name. If the address is not found, it simply
   displays the address from the previous boot in hex.

* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Use _text and the kernel offset in last_boot_info
  tracing: Show last module text symbols in the stacktrace
  ring-buffer: Remove the unused variable bmeta
  tracing: Skip update_last_data() if cleared and remove active check for save_mod()
  tracing: Initialize scratch_size to zero to prevent UB
  tracing: Fix a compilation error without CONFIG_MODULES
  tracing: Freeable reserved ring buffer
  mm/memblock: Add reserved memory release function
  tracing: Update modules to persistent instances when loaded
  tracing: Show module names and addresses of last boot
  tracing: Have persistent trace instances save module addresses
  module: Add module_for_each_mod() function
  tracing: Have persistent trace instances save KASLR offset
  ring-buffer: Add ring_buffer_meta_scratch()
  ring-buffer: Add buffer meta data for persistent ring buffer
  ring-buffer: Use kaslr address instead of text delta
  ring-buffer: Fix bytes_dropped calculation issue
2025-03-31 13:37:22 -07:00
Linus Torvalds 7405c0f01a Miscellaneous x86 fixes and updates:
- Fix a large number of x86 Kconfig dependency and help text accuracy
    bugs/problems, by Mateusz Jończyk and David Heideberg.
 
  - Fix a VM_PAT interaction with fork() crash. This also touches
    core kernel code.
 
  - Fix an ORC unwinder bug for interrupt entries
 
  - Fixes and cleanups.
 
  - Fix an AMD microcode loader bug that can promote verification failures
    into success.
 
  - Add early-printk support for MMIO based UARTs on an x86 board that
    had no other serial debugging facility and also experienced early
    boot crashes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfnFBERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iVDxAAmiB4soT3/WbaWJJdeVyxEL7sOmUNOm04
 5kAVHJVK8QGdje0eWa6h7xmuQD3UOxafE2coCrOxHhZi2qpAAY6CPIIy6oIBRwZK
 gLgT5xn1CHojfm4UFC3YUOyecBRPUF2C5jfkajWdZHumyPP/sOObqvGanpQRAYd5
 bfPHEvrBpeEeS7WkATCdyF2j+I5xYflD4g/MDAsMmqasQHOnjBuFX5VBeVxxkysC
 dMsFkFpxqcA95MnnyOnxXzgOtRTY0UystX07D3Bk1pqhG9zor+mp8OynsTRCU87T
 ZPPbUr2qACNmCqEEXl+F1mAkgj5H66xE2gaJdYx0/jBAIbX8Nwih7mMxhJShVU07
 Lhc0tukmVrDoDaVIr2HsxqI8iokuYLszUjDAqEQmQDrgelL6usPYghN1b2bDSJ9r
 0hCO/s79024H/U9oMrC+CF52D5UH/fE98ipigrbKRIO/hOsoxiiniF3DG2NVWZM2
 n5nPnOdbperqjCEteN1nxQfr7XZkvP95Bwmuqqc90XH+tzKJdHruUkbm4ua7NEEz
 WKgsUIYFjeN5ZrHbJaNtHlQueTyvsyGmL1nlaLi/MaJbSXPsM/WfwvHsaKTh3NrE
 BFwEAhMZVLDHEfnFT0Ev7Mm1MGpW8MbHoRBR1+E5FWWNS4X0yGLKXWRp8diw25Tm
 W3ZVsn65E6U=
 =/qKX
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

 - Fix a large number of x86 Kconfig dependency and help text accuracy
   bugs/problems, by Mateusz Jończyk and David Heideberg

 - Fix a VM_PAT interaction with fork() crash. This also touches core
   kernel code

 - Fix an ORC unwinder bug for interrupt entries

 - Fixes and cleanups

 - Fix an AMD microcode loader bug that can promote verification
   failures into success

 - Add early-printk support for MMIO based UARTs on an x86 board that
   had no other serial debugging facility and also experienced early
   boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
  x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
  x86/fpu: Update the outdated comment above fpstate_init_user()
  x86/early_printk: Add support for MMIO-based UARTs
  x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
  x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
  x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
  x86/Kconfig: Correct X86_X2APIC help text
  x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
  x86/Kconfig: Document release year of glibc 2.3.3
  x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
  x86/Kconfig: Document CONFIG_PCI_MMCONFIG
  x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
  x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
  x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
  x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30 15:25:15 -07:00
Linus Torvalds aa918db707 bpf_try_alloc_pages
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfkHCQACgkQ6rmadz2v
 bTrzWhAAnDcJsGgSJ9EbElpTfgBWE7aijXo/MsPxxRhORc0uR6MnhPx1iADP4KYj
 lTGEIBgRuDG3qaM4EXpPd32rUJJHv8hot7z9zfvUgSuFNLZEHWXJtz/i4ileOxin
 08zV+zA5WL2fqamAmMRFMI37DeSWy3xU0/qlbWgNnURjPjRri6CF4rVFUWq+QMY+
 XP8ITD/6nOLUR6Bq2M18aHnk2VJWkxVP9Oi+vz1VHbOjKaJC7ATa1+Q4qMqWyTb1
 8IAYWiZR1ZPc214ITaspVzLoLb/wxHxy3QMrdAWAL6sjp0B4J8YxIq1qsBuR1FN7
 TxTRQND/+LjqrAgs5AmFqz3ndKmahjGQWnQEh/rDYJtx+sLJk9hfsMIDF8Wmxuwl
 RftdV0g9bPljR5Qgc9i8DNtEjoAbNjoP8xLjt9HfQakVl8V9jPe0bxZ5tJDf+T0M
 n/VgEjaRzdXqFOLal6Z5wl/jkIn1l1kWQuCMI2z5Z0Ls+PlYX56xdZxfK2Rh3m+e
 3W89vqj9ytJ3rZKG8DRsxukuHwnJ+Gia3XI2h/5cc8kEM5ss1Ase8oIkmrwaLd9x
 +zVXNoDCCPRQgTStwItW+2YdFmE9uijhEZh9yPwT1/rtFuKd0oSebVIpjih/bGqH
 mMN9gYO4+ArSbqku9X2lP3VjMOf6M6SZGm+PzG25PAMGzjqGqwk=
 =AHTr
 -----END PGP SIGNATURE-----

Merge tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf try_alloc_pages() support from Alexei Starovoitov:
 "The pull includes work from Sebastian, Vlastimil and myself with a lot
  of help from Michal and Shakeel.

  This is a first step towards making kmalloc reentrant to get rid of
  slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches
  make page allocator safe from any context.

  Vlastimil kicked off this effort at LSFMM 2024:

    https://lwn.net/Articles/974138/

  and we continued at LSFMM 2025:

    https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/

  Why:

  SLAB wrappers bind memory to a particular subsystem making it
  unavailable to the rest of the kernel. Some BPF maps in production
  consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G,
  1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF
  map preallocation won't be necessary.

  How:

  Synchronous kmalloc/page alloc stack has multiple stages going from
  fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
  rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already
  relying on trylock.

  This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and
  return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this
  functionality into try_alloc_pages() helper. We make sure that the
  logic is sane in PREEMPT_RT.

  End result: try_alloc_pages()/free_pages_nolock() are safe to call
  from any context.

  try_kmalloc() for any context with similar trylock approach will
  follow. It will use try_alloc_pages() when slab needs a new page.
  Though such try_kmalloc/page_alloc() is an opportunistic allocator,
  this design ensures that the probability of successful allocation of
  small objects (up to one page in size) is high.

  Even before we have try_kmalloc(), we already use try_alloc_pages() in
  BPF arena implementation and it's going to be used more extensively in
  BPF"

* tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
  mm: Fix the flipped condition in gfpflags_allow_spinning()
  bpf: Use try_alloc_pages() to allocate pages for bpf needs.
  mm, bpf: Use memcg in try_alloc_pages().
  memcg: Use trylock to access memcg stock_lock.
  mm, bpf: Introduce free_pages_nolock()
  mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
  locking/local_lock: Introduce localtry_lock_t
2025-03-30 13:45:28 -07:00
Linus Torvalds fa593d0f96 bpf-next-6.15
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmfi6ZAACgkQ6rmadz2v
 bTpLOg/+J7xUddPMhlpFAUlifQEadE5hmw6v1tXpM3zyKHzUWJiv/qsx3j8/ckgD
 D+d4P8bqIbI9SSuIS4oZ0+D9pr/g7GYztnoYZmPiYJ7v2AijPuof5dsagFQE8E2y
 rhfbt9KHTMzzkdkTvaAZaITS/HWAoJ2YVRB6gfLex2ghcXYHcgmtKRZniQrbBiFZ
 MIXBN8Rg6HP+pUdIVllSXFcQCb3XIgjPONRAos4hr5tIm+3Ku7Jvkgk2H/9vUcoF
 bdXAcg8xygyH7eY+1l3e7nEPQlG0jUZEsL+tq+vpdoLRLqlIpAUYmwUvqcmq4dPS
 QGFjiUcpDbXlxsUFpzjXHIFto7fXCfND7HEICQPwAncdflIIfYaATSQUfkEexn0a
 wBCFlAChrEzAmg2vFl4EeEr0fdSe/3jswrgKx0m6ctKieMjgloBUeeH4fXOpfkhS
 9tvhuduVFuronlebM8ew4w9T/mBgbyxkE5KkvP4hNeB3ni3N0K6Mary5/u2HyN1e
 lqTlnZxRA4p6lrvxce/mDrR4VSwlKLcSeQVjxAL1afD5KRkuZJnUv7bUhS361vkG
 IjNrQX30EisDAz+X7tMn3ndBf9vVatwFT4+c3yaxlQRor1WofhDfT88HPiyB4QqQ
 Kdx2EHgbQxJp4vkzhp4/OXlTfkihsMEn8egzZuphdPEQ9Y+Jdwg=
 =aN/V
 -----END PGP SIGNATURE-----

Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Pull bpf updates from Alexei Starovoitov:
 "For this merge window we're splitting BPF pull request into three for
  higher visibility: main changes, res_spin_lock, try_alloc_pages.

  These are the main BPF changes:

   - Add DFA-based live registers analysis to improve verification of
     programs with loops (Eduard Zingerman)

   - Introduce load_acquire and store_release BPF instructions and add
     x86, arm64 JIT support (Peilin Ye)

   - Fix loop detection logic in the verifier (Eduard Zingerman)

   - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)

   - Add kfunc for populating cpumask bits (Emil Tsalapatis)

   - Convert various shell based tests to selftests/bpf/test_progs
     format (Bastien Curutchet)

   - Allow passing referenced kptrs into struct_ops callbacks (Amery
     Hung)

   - Add a flag to LSM bpf hook to facilitate bpf program signing
     (Blaise Boscaccy)

   - Track arena arguments in kfuncs (Ihor Solodrai)

   - Add copy_remote_vm_str() helper for reading strings from remote VM
     and bpf_copy_from_user_task_str() kfunc (Jordan Rome)

   - Add support for timed may_goto instruction (Kumar Kartikeya
     Dwivedi)

   - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)

   - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
     local storage (Martin KaFai Lau)

   - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)

   - Allow retrieving BTF data with BTF token (Mykyta Yatsenko)

   - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
     (Song Liu)

   - Reject attaching programs to noreturn functions (Yafang Shao)

   - Introduce pre-order traversal of cgroup bpf programs (Yonghong
     Song)"

* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
  selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
  bpf: Fix out-of-bounds read in check_atomic_load/store()
  libbpf: Add namespace for errstr making it libbpf_errstr
  bpf: Add struct_ops context information to struct bpf_prog_aux
  selftests/bpf: Sanitize pointer prior fclose()
  selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
  selftests/bpf: test_xdp_vlan: Rename BPF sections
  bpf: clarify a misleading verifier error message
  selftests/bpf: Add selftest for attaching fexit to __noreturn functions
  bpf: Reject attaching fexit/fmod_ret to __noreturn functions
  bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
  bpf: Make perf_event_read_output accessible in all program types.
  bpftool: Using the right format specifiers
  bpftool: Add -Wformat-signedness flag to detect format errors
  selftests/bpf: Test freplace from user namespace
  libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
  bpf: Return prog btf_id without capable check
  bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
  bpf, x86: Fix objtool warning for timed may_goto
  bpf: Check map->record at the beginning of check_and_free_fields()
  ...
2025-03-30 12:43:03 -07:00
Linus Torvalds 0c86b42439 drm for 6.15-rc1
uapi:
 - add mediatek tiled fourcc
 - add support for notifying userspace on device wedged
 
 new driver:
 - appletbdrm: support for Apple Touchbar displays on m1/m2
 - nova-core: skeleton rust driver to develop nova inside off
 
 firmware:
 - add some rust firmware pieces
 
 rust:
 - add 'LocalModule' type alias
 
 component:
 - add helper to query bound status
 
 fbdev:
 - fbtft: remove access to page->index
 
 media:
 - cec: tda998x: import driver from drm
 
 dma-buf:
 - add fast path for single fence merging
 
 tests:
 - fix lockdep warnings
 
 atomic:
 - allow full modeset on connector changes
 - clarify semantics of allow_modeset and drm_atomic_helper_check
 - async-flip: support on arbitary planes
 - writeback: fix UAF
 - Document atomic-state history
 
 format-helper:
 - support ARGB8888 to ARGB4444 conversions
 
 buddy:
 - fix multi-root cleanup
 
 ci:
 - update IGT
 
 dp:
 - support extended wake timeout
 - mst: fix RAD to string conversion
 - increase DPCD eDP control CAP size to 5 bytes
 - add DPCD eDP v1.5 definition
 - add helpers for LTTPR transparent mode
 
 panic:
 - encode QR code according to Fido 2.2
 
 scheduler:
 - add parameter struct for init
 - improve job peek/pop operations
 - optimise drm_sched_job struct layout
 
 ttm:
 - refactor pool allocation
 - add helpers for TTM shrinker
 
 panel-orientation:
 - add a bunch of new quirks
 
 panel:
 - convert panels to multi-style functions
 - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
   LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006,
   Lenovo T14s Gen6 Snapdragon
 - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
   kd110n11-51ie, Starry 2082109qfh040022-50e
 - visionox-r66451: use multi-style MIPI-DSI functions
 - raydium-rm67200: Add driver for Raydium RM67200
 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
 - sony-td4353-jdi: Use MIPI-DSI multi-func interface
 - summit: Add driver for Apple Summit display panel
 - visionox-rm692e5: Add driver for Visionox RM692E5
 
 bridge:
 - pass full atomic state to various callbacks
 - adv7511: Report correct capabilities
 - it6505: Fix HDCP V compare
 - snd65dsi86: fix device IDs
 - nwl-dsi: set bridge type
 - ti-sn65si83: add error recovery and set bridge type
 - synopsys: add HDMI audio support
 
 xe:
 - support device-wedged event
 - add mmap support for PCI memory barrier
 - perf pmu integration and expose per-engien activity
 - add EU stall sampling support
 - GPU SVM and Xe SVM implementation
 - use TTM shrinker
 - add survivability mode to allow the driver to do
   firmware updates in critical failure states
 - PXP HWDRM support for MTL and LNL
 - expose package/vram temps over hwmon
 - enable DP tunneling
 - drop mmio_ext abstraction
 - Reject BO evcition if BO is bound to current VM
 - Xe suballocator improvements
 - re-use display vmas when possible
 - add GuC Buffer Cache abstraction
 - PCI ID update for Panther Lake and Battlemage
 - Enable SRIOV for Panther Lake
 - Refactor VRAM manager location
 
 i915:
 - enable extends wake timeout
 - support device-wedged event
 - Enable DP 128b/132b SST DSC
 - FBC dirty rectangle support for display version 30+
 - convert i915/xe to drm client setup
 - Compute HDMI PLLS for rates not in fixed tables
 - Allow DSB usage when PSR is enabled on LNL+
 - Enable panel replay without full modeset
 - Enable async flips with compressed buffers on ICL+
 - support luminance based brightness via DPCD for eDP
 - enable VRR enable/disable without full modeset
 - allow GuC SLPC default strategies on MTL+ for performance
 - lots of display refactoring in move to struct intel_display
 
 amdgpu:
 - add device wedged event
 - support async page flips on overlay planes
 - enable broadcast RGB drm property
 - add info ioctl for virt mode
 - OEM i2c support for RGB lights
 - GC 11.5.2 + 11.5.3 support
 - SDMA 6.1.3 support
 - NBIO 7.9.1 + 7.11.2 support
 - MMHUB 1.8.1 + 3.3.2 support
 - DCN 3.6.0 support
 - Add dynamic workload profile switching for GC 10-12
 - support larger VBIOS sizes
 - Mark gttsize parameters as deprecated
 - Initial JPEG queue resset support
 
 amdkfd:
 - add KFD per process flags for setting precision
 - sync pasid values between KGD and KFD
 - improve GTT/VRAM handling for APUs
 - fix user queue validation on GC7/8
 - SDMA queue reset support
 
 raedeon:
 - rs400 hyperz fix
 
 i2c:
 - td998x: drop platform_data, split driver into media and bridge
 
 ast:
 - transmitter chip detection refactoring
 - vbios display mode refactoring
 - astdp: fix connection status and filter unsupported modes
 - cursor handling refactoring
 
 imagination:
 - check job dependencies with sched helper
 
 ivpu:
 - improve command queue handling
 - use workqueue for IRQ handling
 - add support HW fault injection
 - locking fixes
 
 mgag200:
 - add support for G200eH5
 
 msm:
 - dpu: add concurrent writeback support for DPU 10.x+
 - use LTTPR helpers
 - GPU:
   - Fix obscure GMU suspend failure
   - Expose syncobj timeline support
   - Extend GPU devcoredump with pagetable info
   - a623 support
   - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump
 - Display:
   - Add cpu-cfg interconnect paths on SM8560 and SM8650
   - Introduce KMS OMMU fault handler, causing devcoredump snapshot
   - Fixed error pointer dereference in msm_kms_init_aspace()
 - DPU:
   - Fix mode_changing handling
   - Add writeback support on SM6150 (QCS615)
   - Fix DSC programming in 1:1:1 topology
   - Reworked hardware resource allocation, moving it to the CRTC code
   - Enabled support for Concurrent WriteBack (CWB) on SM8650
   - Enabled CDM blocks on all relevant platforms
   - Reworked debugfs interface for BW/clocks debugging
   - Clear perf params before calculating bw
   - Support YUV formats on writeback
   - Fixed double inclusion
   - Fixed writeback in YUV formats when using cloned output, Dropped
     wb2_formats_rgb
   - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
     kerneldocs
   - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
 - DSI:
   - DSC-related fixes
   - Rework clock programming
 - DSI PHY:
   - Fix 7nm (and lower) PHY programming
   - Add proper DT schema definitions for DSI PHY clocks
 - HDMI:
   - Rework the driver, enabling the use of the HDMI Connector framework
 - Bindings:
   - Added eDP PHY on SA8775P
 
 nouveau:
 - move drm_slave_encoder interface into driver
 - nvkm: refactor GSP RPC
 - use LTTPR helpers
 
 mediatek:
 - HDMI fixup and refinement
 - add MT8188 dsc compatible
 - MT8365 SoC support
 
 panthor:
 - Expose sizes of intenral BOs via fdinfo
 - Fix race between reset and suspend
 - Improve locking
 
 qaic:
 - Add support for AIC200
 
 renesas:
 - Fix limits in DT bindings
 
 rockchip:
 - support rk3562-mali
 - rk3576: Add HDMI support
 - vop2: Add new display modes on RK3588 HDMI0 up to 4K
 - Don't change HDMI reference clock rate
 - Fix DT bindings
 - analogix_dp: add eDP support
 - fix shutodnw
 
 solomon:
 - Set SPI device table to silence warnings
 - Fix pixel and scanline encoding
 
 v3d:
 - handle clock
 
 vc4:
 - Use drm_exec
 - Use dma-resv for wait-BO ioctl
 - Remove seqno infrastructure
 
 virtgpu:
 - Support partial mappings of GEM objects
 - Reserve VGA resources during initialization
 - Fix UAF in virtgpu_dma_buf_free_obj()
 - Add panic support
 
 vkms:
 - Switch to a managed modesetting pipeline
 - Add support for ARGB8888
 - fix UAf
 
 xlnx:
 - Set correct DMA segment size
 - use mutex guards
 - Fix error handling
 - Fix docs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmfmA2cACgkQDHTzWXnE
 hr7lKQ/+I7gmln3ka8FyKnpwG5KusDpxz3OzgUKHpzOTkXL1+vPMt0vjCKRJKE3D
 zfpUTWNbwlVN0krqmUGyFIeCt8wmevBF6HvQ+GsWbiWltUj3xIlnkV0TVH84XTUo
 0evrXNG9K8sLjMKjrf7yrGL53Ayoaq9IO9wtOws+FCgtykAsMR/IWLrYLpj21ZQ6
 Oclhq5Cz21WRoQpzySR23s3DLi4LHri26RGKbGNh2PzxYwyP/euGW6O+ncEduNmg
 vQLgUfptaM/EubJFG6jxDWZJ2ChIAUUxQuhZwt7DKqRsYIcJKcfDSUzqL95t6SYU
 zewlYmeslvusoesCeTJzHaUj34yqJGsvjFPsHFUEvAy8BVncsqS40D6mhrRMo5nD
 chnSJu+IpDOEEqcdbb1J73zKLw6X3GROM8qUQEThJBD2yTsOdw9d0HXekMDMBXSi
 NayKvXfsBV19rI74FYnRCzHt8BVVANh5qJNnR5RcnPZ2KzHQbV0JFOA9YhxE8vFU
 GWkFWKlpAQUQ+yoTy1kuO9dcUxLIC7QseMeS5BYhcJBMEV78xQuRYRxgsL8YS4Yg
 rIhcb3mZwMFj7jBAqfpLKWiI+GTup+P9vcz7Bvm5iIf8gEhZLUTwqBeAYXQkcWd4
 i1AqDuFR0//sAgODoeU2sg1Q3yTL/i/DhcwvCIPgcDZ/x4Eg868=
 =oK/N
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "Outside of drm there are some rust patches from Danilo who maintains
  that area in here, and some pieces for drm header check tests.

  The major things in here are a new driver supporting the touchbar
  displays on M1/M2, the nova-core stub driver which is just the vehicle
  for adding rust abstractions and start developing a real driver inside
  of.

  xe adds support for SVM with a non-driver specific SVM core
  abstraction that will hopefully be useful for other drivers, along
  with support for shrinking for TTM devices. I'm sure xe and AMD
  support new devices, but the pipeline depth on these things is hard to
  know what they end up being in the marketplace!

  uapi:
   - add mediatek tiled fourcc
   - add support for notifying userspace on device wedged

  new driver:
   - appletbdrm: support for Apple Touchbar displays on m1/m2
   - nova-core: skeleton rust driver to develop nova inside off

  firmware:
   - add some rust firmware pieces

  rust:
   - add 'LocalModule' type alias

  component:
   - add helper to query bound status

  fbdev:
   - fbtft: remove access to page->index

  media:
   - cec: tda998x: import driver from drm

  dma-buf:
   - add fast path for single fence merging

  tests:
   - fix lockdep warnings

  atomic:
   - allow full modeset on connector changes
   - clarify semantics of allow_modeset and drm_atomic_helper_check
   - async-flip: support on arbitary planes
   - writeback: fix UAF
   - Document atomic-state history

  format-helper:
   - support ARGB8888 to ARGB4444 conversions

  buddy:
   - fix multi-root cleanup

  ci:
   - update IGT

  dp:
   - support extended wake timeout
   - mst: fix RAD to string conversion
   - increase DPCD eDP control CAP size to 5 bytes
   - add DPCD eDP v1.5 definition
   - add helpers for LTTPR transparent mode

  panic:
   - encode QR code according to Fido 2.2

  scheduler:
   - add parameter struct for init
   - improve job peek/pop operations
   - optimise drm_sched_job struct layout

  ttm:
   - refactor pool allocation
   - add helpers for TTM shrinker

  panel-orientation:
   - add a bunch of new quirks

  panel:
   - convert panels to multi-style functions
   - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3,
     LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry
     116KHD024006, Lenovo T14s Gen6 Snapdragon
   - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay
     kd110n11-51ie, Starry 2082109qfh040022-50e
   - visionox-r66451: use multi-style MIPI-DSI functions
   - raydium-rm67200: Add driver for Raydium RM67200
   - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17
   - sony-td4353-jdi: Use MIPI-DSI multi-func interface
   - summit: Add driver for Apple Summit display panel
   - visionox-rm692e5: Add driver for Visionox RM692E5

  bridge:
   - pass full atomic state to various callbacks
   - adv7511: Report correct capabilities
   - it6505: Fix HDCP V compare
   - snd65dsi86: fix device IDs
   - nwl-dsi: set bridge type
   - ti-sn65si83: add error recovery and set bridge type
   - synopsys: add HDMI audio support

  xe:
   - support device-wedged event
   - add mmap support for PCI memory barrier
   - perf pmu integration and expose per-engien activity
   - add EU stall sampling support
   - GPU SVM and Xe SVM implementation
   - use TTM shrinker
   - add survivability mode to allow the driver to do firmware updates
     in critical failure states
   - PXP HWDRM support for MTL and LNL
   - expose package/vram temps over hwmon
   - enable DP tunneling
   - drop mmio_ext abstraction
   - Reject BO evcition if BO is bound to current VM
   - Xe suballocator improvements
   - re-use display vmas when possible
   - add GuC Buffer Cache abstraction
   - PCI ID update for Panther Lake and Battlemage
   - Enable SRIOV for Panther Lake
   - Refactor VRAM manager location

  i915:
   - enable extends wake timeout
   - support device-wedged event
   - Enable DP 128b/132b SST DSC
   - FBC dirty rectangle support for display version 30+
   - convert i915/xe to drm client setup
   - Compute HDMI PLLS for rates not in fixed tables
   - Allow DSB usage when PSR is enabled on LNL+
   - Enable panel replay without full modeset
   - Enable async flips with compressed buffers on ICL+
   - support luminance based brightness via DPCD for eDP
   - enable VRR enable/disable without full modeset
   - allow GuC SLPC default strategies on MTL+ for performance
   - lots of display refactoring in move to struct intel_display

  amdgpu:
   - add device wedged event
   - support async page flips on overlay planes
   - enable broadcast RGB drm property
   - add info ioctl for virt mode
   - OEM i2c support for RGB lights
   - GC 11.5.2 + 11.5.3 support
   - SDMA 6.1.3 support
   - NBIO 7.9.1 + 7.11.2 support
   - MMHUB 1.8.1 + 3.3.2 support
   - DCN 3.6.0 support
   - Add dynamic workload profile switching for GC 10-12
   - support larger VBIOS sizes
   - Mark gttsize parameters as deprecated
   - Initial JPEG queue resset support

  amdkfd:
   - add KFD per process flags for setting precision
   - sync pasid values between KGD and KFD
   - improve GTT/VRAM handling for APUs
   - fix user queue validation on GC7/8
   - SDMA queue reset support

  raedeon:
   - rs400 hyperz fix

  i2c:
   - td998x: drop platform_data, split driver into media and bridge

  ast:
   - transmitter chip detection refactoring
   - vbios display mode refactoring
   - astdp: fix connection status and filter unsupported modes
   - cursor handling refactoring

  imagination:
   - check job dependencies with sched helper

  ivpu:
   - improve command queue handling
   - use workqueue for IRQ handling
   - add support HW fault injection
   - locking fixes

  mgag200:
   - add support for G200eH5

  msm:
   - dpu: add concurrent writeback support for DPU 10.x+
   - use LTTPR helpers
   - GPU:
     - Fix obscure GMU suspend failure
     - Expose syncobj timeline support
     - Extend GPU devcoredump with pagetable info
     - a623 support
     - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot /
       devcoredump
   - Display:
     - Add cpu-cfg interconnect paths on SM8560 and SM8650
     - Introduce KMS OMMU fault handler, causing devcoredump snapshot
     - Fixed error pointer dereference in msm_kms_init_aspace()
   - DPU:
     - Fix mode_changing handling
     - Add writeback support on SM6150 (QCS615)
     - Fix DSC programming in 1:1:1 topology
     - Reworked hardware resource allocation, moving it to the CRTC code
     - Enabled support for Concurrent WriteBack (CWB) on SM8650
     - Enabled CDM blocks on all relevant platforms
     - Reworked debugfs interface for BW/clocks debugging
     - Clear perf params before calculating bw
     - Support YUV formats on writeback
     - Fixed double inclusion
     - Fixed writeback in YUV formats when using cloned output, Dropped
       wb2_formats_rgb
     - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt
       kerneldocs
     - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode()
   - DSI:
     - DSC-related fixes
     - Rework clock programming
   - DSI PHY:
     - Fix 7nm (and lower) PHY programming
     - Add proper DT schema definitions for DSI PHY clocks
   - HDMI:
     - Rework the driver, enabling the use of the HDMI Connector
       framework
   - Bindings:
     - Added eDP PHY on SA8775P

  nouveau:
   - move drm_slave_encoder interface into driver
   - nvkm: refactor GSP RPC
   - use LTTPR helpers

  mediatek:
   - HDMI fixup and refinement
   - add MT8188 dsc compatible
   - MT8365 SoC support

  panthor:
   - Expose sizes of intenral BOs via fdinfo
   - Fix race between reset and suspend
   - Improve locking

  qaic:
   - Add support for AIC200

  renesas:
   - Fix limits in DT bindings

  rockchip:
   - support rk3562-mali
   - rk3576: Add HDMI support
   - vop2: Add new display modes on RK3588 HDMI0 up to 4K
   - Don't change HDMI reference clock rate
   - Fix DT bindings
   - analogix_dp: add eDP support
   - fix shutodnw

  solomon:
   - Set SPI device table to silence warnings
   - Fix pixel and scanline encoding

  v3d:
   - handle clock

  vc4:
   - Use drm_exec
   - Use dma-resv for wait-BO ioctl
   - Remove seqno infrastructure

  virtgpu:
   - Support partial mappings of GEM objects
   - Reserve VGA resources during initialization
   - Fix UAF in virtgpu_dma_buf_free_obj()
   - Add panic support

  vkms:
   - Switch to a managed modesetting pipeline
   - Add support for ARGB8888
   - fix UAf

  xlnx:
   - Set correct DMA segment size
   - use mutex guards
   - Fix error handling
   - Fix docs"

* tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits)
  drm/amd/pm: Update feature list for smu_v13_0_6
  drm/amdgpu: Add parameter documentation for amdgpu_sync_fence
  drm/amdgpu/discovery: optionally use fw based ip discovery
  drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics
  drm/amdgpu/discovery: check ip_discovery fw file available
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amd/pm: Remove unnecessay UQ10 to UINT conversion
  drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA
  drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush
  drm/amd/amdgpu: Increase max rings to enable SDMA page ring
  drm/amdgpu: Decode deferred error type in gfx aca bank parser
  drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs
  drm/amdgpu/mes: clean up SDMA HQD loop
  drm/amdgpu/mes: enable compute pipes across all MEC
  drm/amdgpu/mes: drop MES 10.x leftovers
  drm/amdgpu/mes: optimize compute loop handling
  drm/amdgpu/sdma: guilty tracking is per instance
  drm/amdgpu/sdma: fix engine reset handling
  drm/amdgpu: remove invalid usage of sched.ready
  drm/amdgpu: add cleaner shader trace point
  ...
2025-03-28 17:44:52 -07:00
Masami Hiramatsu (Google) 74e2498ccf mm/memblock: Add reserved memory release function
Add reserve_mem_release_by_name() to release a reserved memory region
with a given name. This allows us to release reserved memory which is
defined by kernel cmdline, after boot.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/173989133862.230693.14094993331347437600.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-28 08:39:28 -04:00
Linus Torvalds 81d8e5e213 f2fs-for-6.15-rc1
In this round, there are three major updates: 1) folio conversion, 2) refactor
 for mount API conversion, 3) some performance improvement such as direct IO,
 checkpoint speed, and IO priority hints. For stability, there are patches
 which add more sanity checks and fixes some major issues like i_size in
 atomic write operations and write pointer recovery in zoned devices.
 
 Enhancement:
  - huge folio converion work by Matthew Wilcox
  - clean up for mount API conversion by Eric Sandeen
  - improve direct IO speed in the overwrite case
  - add some sanity check on node consistency
  - set highest IO priority for checkpoint thread
  - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
  - add ioctl to get IO priority hint
  - add carve_out sysfs node for fsstat
 
 Bug fix:
  - disable nat_bits during umount to avoid potential nat entry corruption
  - fix missing i_size update on atomic writes
  - fix missing discard for active segments
  - fix running out of free segments
  - fix out-of-bounds access in f2fs_truncate_inode_blocks()
  - call f2fs_recover_quota_end() correctly
  - fix potential deadloop in prepare_compress_overwrite()
  - fix the missing write pointer correction for zoned device
  - fix to avoid panic once fallocation fails for pinfile
  - don't retry IO for corrupted data scenario
 
 There are many other clean up patches and minor bug fixes as usual.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmfhjuYACgkQQBSofoJI
 UNJF+hAAip0Sf6bXwt43KR3lmbXg/YBTtPDACOs375Pn8pfRVsAPIAf5e1AnlBET
 rA0KpqUrEZHXenQFF9n4LIoHYqfuUw3EMempuvp4qgQcx15Kaajw+EGWVLYstNy2
 9dELc9DA55f/i1uvHezGfQFy6hMfXasf+tkaYk0z0ZYDStYboLgkVAY+F869q1Dl
 D16Y7Lna12h4eCxDdssIPDsLjFH/2LDn7SmhXsnpZxwK0Zx8JSo83WxXHnzXHxz4
 vUkukInKgqcBDvf6ufW/YF3/tqSs20XEXNK3cI1vyHx1dwij6Us+G0n0WJHL30QE
 zsliecR0X28JP1rJ3ldCjr+4Kxm6/u/Uwpinm2Fm1jL67UY4TZcaARBE8k20I6ND
 j/L1+sXrIdZ1aILM9/bwCjgXiVFdZbvlfGfpTj1duAkQgRd+/s9cjPlo5j5HTad5
 XJmzJz6YOaUNarhP/E31Z9SV9M2kEcmoDOTxKBg6ZcMWXZ27B6Z0ag92prd0GWk6
 rWDuVj0eP/LDY1QedbHRPbg1D84jVgcnldfPaf9ptln993skJGdgS0dkegTqMR0L
 H8RgpWOzWZI53gnQdePdej8diHmD8uRTrLf/oABm//GzTHC5BdwVpArOl1DCuLFF
 YkMtVEicgnWRmL3PgODAdTJXaDi3uGvT116i+lfMlUn9p+t93r0=
 =E+TE
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there are three major updates: (1) folio conversion,
  (2) refactoring for mount API conversion, (3) some performance
  improvement such as direct IO, checkpoint speed, and IO priority
  hints.

  For stability, there are patches which add more sanity checks and
  fixes some major issues like i_size in atomic write operations and
  write pointer recovery in zoned devices.

  Enhancements:
   - huge folio converion work by Matthew Wilcox
   - clean up for mount API conversion by Eric Sandeen
   - improve direct IO speed in the overwrite case
   - add some sanity check on node consistency
   - set highest IO priority for checkpoint thread
   - keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
   - add ioctl to get IO priority hint
   - add carve_out sysfs node for fsstat

  Bug fixes:
   - disable nat_bits during umount to avoid potential nat entry corruption
   - fix missing i_size update on atomic writes
   - fix missing discard for active segments
   - fix running out of free segments
   - fix out-of-bounds access in f2fs_truncate_inode_blocks()
   - call f2fs_recover_quota_end() correctly
   - fix potential deadloop in prepare_compress_overwrite()
   - fix the missing write pointer correction for zoned device
   - fix to avoid panic once fallocation fails for pinfile
   - don't retry IO for corrupted data scenario

  There are many other clean up patches and minor bug fixes as usual"

* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: fix missing discard for active segments
  f2fs: optimize f2fs DIO overwrites
  f2fs: fix to avoid atomicity corruption of atomic file
  f2fs: pass sbi rather than sb to parse_options()
  f2fs: pass sbi rather than sb to quota qf_name helpers
  f2fs: defer readonly check vs norecovery
  f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
  f2fs: make LAZYTIME a mount option flag
  f2fs: make INLINECRYPT a mount option flag
  f2fs: factor out an f2fs_default_check function
  f2fs: consolidate unsupported option handling errors
  f2fs: use f2fs_sb_has_device_alias during option parsing
  f2fs: add carve_out sysfs node
  f2fs: fix to avoid running out of free segments
  f2fs: Remove f2fs_write_node_page()
  f2fs: Remove f2fs_write_meta_page()
  f2fs: Remove f2fs_write_data_page()
  f2fs: Remove check for ->writepage
  Revert "f2fs: rebuild nat_bits during umount"
  f2fs: fix to avoid accessing uninitialized curseg
  ...
2025-03-27 12:55:54 -07:00
Linus Torvalds 592329e5e9 Summary
* Move vm_table members out of kernel/sysctl.c
 
   All vm_table array members have moved to their respective subsystems leading
   to the removal of vm_table from kernel/sysctl.c. This increases modularity by
   placing the ctl_tables closer to where they are actually used and at the same
   time reducing the chances of merge conflicts in kernel/sysctl.c.
 
 * ctl_table range fixes
 
   Replace the proc_handler function that checks variable ranges in
   coredump_sysctls and vdso_table with the one that actually uses the extra{1,2}
   pointers as min/max values. This tightens the range of the values that users
   can pass into the kernel effectively preventing {under,over}flows.
 
 * Misc fixes
 
   Correct grammar errors and typos in test messages. Update sysctl files in
   MAINTAINERS. Constified and removed array size in declaration for
   alignment_tbl
 
 * Testing
 
   - These have all been in linux-next for at least 1 month
   - They have gone through 0-day
   - Ran all these through sysctl selftests in x86_64
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmfhV8EACgkQupfNUreW
 QU/udAv/VCXGkndQsJ5biXpXYFnokX0gIEaYzzHiqrFycZqr8ys0/wWzc+ar1LjF
 Jvanl2uKB0mUviLKt7Gk0+Hri+PJlYIrbx+5K5eo2wsKUUxFykqLLm59y/orPODl
 gyPQjKNpHJb7COsnEc3Lrq/fvol4NPHlcBPXG8NwehccTeBHZ1ninfo+pSnxh3o8
 kI3GSLLxD4K9AgBl5QuVWH4gU7o//u7lUkKzy03NW+2jmuRv3dRcYF7IdgMINNee
 AeXnygdSBxLzECBvmkfNdyg+AmL8hdsmzbsIh7UuJDvxLlQOInVLZa+sXBotCOIc
 TImCrr1Ws1OuGrD0kpH+21tJvc8pNFWt61QlulObQdrLndWHdZEGyGOusLpXTwbn
 jIWZmMvzk1foSwdgzwPFzUqPEpW3FrBVDo4Z4kenBDrCp56QTX7hGRvkNYJNKvot
 Ue+i8BeHR/Gm/p+UMqgsSTOaNJXTqZhFqwJQVzxU/9LN/vkS0On6fbjgBd5X6Pn+
 a5dlc9gy
 =0bcX
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:

 - Move vm_table members out of kernel/sysctl.c

   All vm_table array members have moved to their respective subsystems
   leading to the removal of vm_table from kernel/sysctl.c. This
   increases modularity by placing the ctl_tables closer to where they
   are actually used and at the same time reducing the chances of merge
   conflicts in kernel/sysctl.c.

 - ctl_table range fixes

   Replace the proc_handler function that checks variable ranges in
   coredump_sysctls and vdso_table with the one that actually uses the
   extra{1,2} pointers as min/max values. This tightens the range of the
   values that users can pass into the kernel effectively preventing
   {under,over}flows.

 - Misc fixes

   Correct grammar errors and typos in test messages. Update sysctl
   files in MAINTAINERS. Constified and removed array size in
   declaration for alignment_tbl

* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
  selftests/sysctl: fix wording of help messages
  selftests: fix spelling/grammar errors in sysctl/sysctl.sh
  MAINTAINERS: Update sysctl file list in MAINTAINERS
  sysctl: Fix underflow value setting risk in vm_table
  coredump: Fixes core_pipe_limit sysctl proc_handler
  sysctl: remove unneeded include
  sysctl: remove the vm_table
  sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
  x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
  fs: dcache: move the sysctl to fs/dcache.c
  sunrpc: simplify rpcauth_cache_shrink_count()
  fs: drop_caches: move sysctl to fs/drop_caches.c
  fs: fs-writeback: move sysctl to fs/fs-writeback.c
  mm: nommu: move sysctl to mm/nommu.c
  security: min_addr: move sysctl to security/min_addr.c
  mm: mmap: move sysctl to mm/mmap.c
  mm: util: move sysctls to mm/util.c
  mm: vmscan: move vmscan sysctls to mm/vmscan.c
  mm: swap: move sysctl to mm/swap.c
  mm: filemap: move sysctl to mm/filemap.c
  ...
2025-03-26 21:02:05 -07:00
David Hildenbrand dc84bc2aba x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
If track_pfn_copy() fails, we already added the dst VMA to the maple
tree. As fork() fails, we'll cleanup the maple tree, and stumble over
the dst VMA for which we neither performed any reservation nor copied
any page tables.

Consequently untrack_pfn() will see VM_PAT and try obtaining the
PAT information from the page table -- which fails because the page
table was not copied.

The easiest fix would be to simply clear the VM_PAT flag of the dst VMA
if track_pfn_copy() fails. However, the whole thing is about "simply"
clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy()
and performed a reservation, but copying the page tables fails, we'll
simply clear the VM_PAT flag, not properly undoing the reservation ...
which is also wrong.

So let's fix it properly: set the VM_PAT flag only if the reservation
succeeded (leaving it clear initially), and undo the reservation if
anything goes wrong while copying the page tables: clearing the VM_PAT
flag after undoing the reservation.

Note that any copied page table entries will get zapped when the VMA will
get removed later, after copy_page_range() succeeded; as VM_PAT is not set
then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be
happy. Note that leaving these page tables in place without a reservation
is not a problem, as we are aborting fork(); this process will never run.

A reproducer can trigger this usually at the first try:

  https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c

  WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110
  Modules linked in: ...
  CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
  RIP: 0010:get_pat_info+0xf6/0x110
  ...
  Call Trace:
   <TASK>
   ...
   untrack_pfn+0x52/0x110
   unmap_single_vma+0xa6/0xe0
   unmap_vmas+0x105/0x1f0
   exit_mmap+0xf6/0x460
   __mmput+0x4b/0x120
   copy_process+0x1bf6/0x2aa0
   kernel_clone+0xab/0x440
   __do_sys_clone+0x66/0x90
   do_syscall_64+0x95/0x180

Likely this case was missed in:

  d155df53f3 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")

... and instead of undoing the reservation we simply cleared the VM_PAT flag.

Keep the documentation of these functions in include/linux/pgtable.h,
one place is more than sufficient -- we should clean that up for the other
functions like track_pfn_remap/untrack_pfn separately.

Fixes: d155df53f3 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")
Fixes: 2ab640379a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3")
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yuxin wang <wang1315768607@163.com>
Reported-by: Marius Fleischer <fleischermarius@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20250321112323.153741-1-david@redhat.com
Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/
Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/
2025-03-25 22:35:14 +01:00
Linus Torvalds a50b4fe095 A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
   the callback pointer. This turned out to be suboptimal for the upcoming
   Rust integration and is obviously a silly implementation to begin with.
 
   This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
   with hrtimer_setup(T, cb);
 
   The conversion was done with Coccinelle and a few manual fixups.
 
   Once the conversion has completely landed in mainline, hrtimer_init()
   will be removed and the hrtimer::function becomes a private member.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8
 GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc
 rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv
 ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r
 d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8
 OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV
 Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE
 iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop
 Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn
 gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I
 Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0
 iJbvLJeisXr3GA==
 =bYAX
 -----END PGP SIGNATURE-----

Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer cleanups from Thomas Gleixner:
 "A treewide hrtimer timer cleanup

  hrtimers are initialized with hrtimer_init() and a subsequent store to
  the callback pointer. This turned out to be suboptimal for the
  upcoming Rust integration and is obviously a silly implementation to
  begin with.

  This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
  with hrtimer_setup(T, cb);

  The conversion was done with Coccinelle and a few manual fixups.

  Once the conversion has completely landed in mainline, hrtimer_init()
  will be removed and the hrtimer::function becomes a private member"

* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  wifi: rt2x00: Switch to use hrtimer_update_function()
  io_uring: Use helper function hrtimer_update_function()
  serial: xilinx_uartps: Use helper function hrtimer_update_function()
  ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
  RDMA: Switch to use hrtimer_setup()
  virtio: mem: Switch to use hrtimer_setup()
  drm/vmwgfx: Switch to use hrtimer_setup()
  drm/xe/oa: Switch to use hrtimer_setup()
  drm/vkms: Switch to use hrtimer_setup()
  drm/msm: Switch to use hrtimer_setup()
  drm/i915/request: Switch to use hrtimer_setup()
  drm/i915/uncore: Switch to use hrtimer_setup()
  drm/i915/pmu: Switch to use hrtimer_setup()
  drm/i915/perf: Switch to use hrtimer_setup()
  drm/i915/gvt: Switch to use hrtimer_setup()
  drm/i915/huc: Switch to use hrtimer_setup()
  drm/amdgpu: Switch to use hrtimer_setup()
  stm class: heartbeat: Switch to use hrtimer_setup()
  i2c: Switch to use hrtimer_setup()
  iio: Switch to use hrtimer_setup()
  ...
2025-03-25 10:54:15 -07:00
Linus Torvalds e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Linus Torvalds 23608993bb Locking changes for v6.15:
Locking primitives:
 
     - Micro-optimize percpu_{,try_}cmpxchg{64,128}_op() and {,try_}cmpxchg{64,128}
       on x86 (Uros Bizjak)
 
     - mutexes: extend debug checks in mutex_lock() (Yunhui Cui)
 
     - Misc cleanups (Uros Bizjak)
 
   Lockdep:
 
     - Fix might_fault() lockdep check of current->mm->mmap_lock (Peter Zijlstra)
 
     - Don't disable interrupts on RT in disable_irq_nosync_lockdep.*()
       (Sebastian Andrzej Siewior)
 
     - Disable KASAN instrumentation of lockdep.c (Waiman Long)
 
     - Add kasan_check_byte() check in lock_acquire() (Waiman Long)
 
     - Misc cleanups (Sebastian Andrzej Siewior)
 
   Rust runtime integration:
 
     - Use Pin for all LockClassKey usages (Mitchell Levy)
     - sync: Add accessor for the lock behind a given guard (Alice Ryhl)
     - sync: condvar: Add wait_interruptible_freezable() (Alice Ryhl)
     - sync: lock: Add an example for Guard:: Lock_ref() (Boqun Feng)
 
   Split-lock detection feature (x86):
 
     - Fix warning mode with disabled mitigation mode (Maksim Davydov)
 
   Locking events:
 
     - Add locking events for rtmutex slow paths (Waiman Long)
     - Add locking events for lockdep (Waiman Long)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfeeMARHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j+DQ/7BUherLWPuvGCx+0+FwBG9T2XFKm8cy8r
 8p7UM/0gOzMn+EvBz+LFL2/b5BWu2tjB2Qyen2tvq8NcDQdS8GpFA+u9rxeXQzQY
 tu5LxsnQbvQe0UXl5aJX1D8ft2xmRSU+a/2uQC3/PAbXByTwN/dkEqDoxJQG6GuP
 0mpULlbG0D5j2YiiaQyG2+3xKj+fd1mg/aEoG5lx88ko6Bgoguj8b+tX/4f70YWl
 igNxWoJ8CZxCBbd7+o8vFFvvYpk1sj6Ni3LyTs658t5deJpfxOu9xkrmlxGm/d7q
 IryuiQC7yYwWWFF96W3yJ13lyojKZTVCYr50hzMd88HE/NGJawZZQJMtyeRGS2r9
 7wNZDl0JiPRUgl8bTFOHZUgVU5IIgTSGpgv4XHvUFF0+QtZ91IqB+/fcMIpdEBV9
 K02wOfqIb3uUsCXGmNfFVi1E7TeXWUDudqHN7rosxOpFDSm1PvGI4rnnaNjddVr3
 kerNfRSyoBaj5Ff1zr59yM8XZVBPmY8MrruwoODMxxcfasM6vllEjv9McBRSoxlb
 HC3+wXaadWlUnaitaVU6Xak9qIj0djaSgQfQ9nS48XuN4EfztepLYM9OEPAsNWXh
 5NZDdYXB1ndYsDTlCLiEl2c0831duJpy2kpVOkaCqC3hu+JjVt82ZeeBhOZeAXQK
 glwrSkq0FiU=
 =33q7
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "Locking primitives:
   - Micro-optimize percpu_{,try_}cmpxchg{64,128}_op() and
     {,try_}cmpxchg{64,128} on x86 (Uros Bizjak)
   - mutexes: extend debug checks in mutex_lock() (Yunhui Cui)
   - Misc cleanups (Uros Bizjak)

  Lockdep:
   - Fix might_fault() lockdep check of current->mm->mmap_lock (Peter
     Zijlstra)
   - Don't disable interrupts on RT in disable_irq_nosync_lockdep.*()
     (Sebastian Andrzej Siewior)
   - Disable KASAN instrumentation of lockdep.c (Waiman Long)
   - Add kasan_check_byte() check in lock_acquire() (Waiman Long)
   - Misc cleanups (Sebastian Andrzej Siewior)

  Rust runtime integration:
   - Use Pin for all LockClassKey usages (Mitchell Levy)
   - sync: Add accessor for the lock behind a given guard (Alice Ryhl)
   - sync: condvar: Add wait_interruptible_freezable() (Alice Ryhl)
   - sync: lock: Add an example for Guard:: Lock_ref() (Boqun Feng)

  Split-lock detection feature (x86):
   - Fix warning mode with disabled mitigation mode (Maksim Davydov)

  Locking events:
   - Add locking events for rtmutex slow paths (Waiman Long)
   - Add locking events for lockdep (Waiman Long)"

* tag 'locking-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lockdep: Remove disable_irq_lockdep()
  lockdep: Don't disable interrupts on RT in disable_irq_nosync_lockdep.*()
  rust: lockdep: Use Pin for all LockClassKey usages
  rust: sync: condvar: Add wait_interruptible_freezable()
  rust: sync: lock: Add an example for Guard:: Lock_ref()
  rust: sync: Add accessor for the lock behind a given guard
  locking/lockdep: Add kasan_check_byte() check in lock_acquire()
  locking/lockdep: Disable KASAN instrumentation of lockdep.c
  locking/lock_events: Add locking events for lockdep
  locking/lock_events: Add locking events for rtmutex slow paths
  x86/split_lock: Fix the delayed detection logic
  lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock
  x86/locking: Remove semicolon from "lock" prefix
  locking/mutex: Add MUTEX_WARN_ON() into fast path
  x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations
  x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()
2025-03-24 20:55:03 -07:00
Linus Torvalds bcb044256d sched_ext: Changes for v6.15
- Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.
 
 - The default idle CPU selection logic is revamped and improved in multiple
   ways including being made topology aware.
 
 - sched_ext was disabling ttwu_queue for simplicity, which can be costly
   when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.
 
 - tools/sched_ext updates to improve compatibility among others.
 
 - Other misc updates and fixes.
 
 - sched_ext/for-6.14-fixes were pulled a few times to receive prerequisite
   fixes and resolve conflicts.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ999Sg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGf/KAQCoMTVOBpQT9gCaCKDOmrVJTwi6boEoV5WnGZzw
 PDr0vwEAq36iz4no6Y5THcN/DCx+52IiS0zuhPy3rBZVo11TMgU=
 =iQ+A
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Add mechanism to count and report internal events. This significantly
   improves visibility on subtle corner conditions.

 - The default idle CPU selection logic is revamped and improved in
   multiple ways including being made topology aware.

 - sched_ext was disabling ttwu_queue for simplicity, which can be
   costly when hardware topology is more complex. Implement
   SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
   enable ttwu_queue.

 - tools/sched_ext updates to improve compatibility among others.

 - Other misc updates and fixes.

 - sched_ext/for-6.14-fixes were pulled a few times to receive
   prerequisite fixes and resolve conflicts.

* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
  sched_ext: idle: Refactor scx_select_cpu_dfl()
  sched_ext: idle: Honor idle flags in the built-in idle selection policy
  sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
  sched_ext: Add trace point to track sched_ext core events
  sched_ext: Change the event type from u64 to s64
  sched_ext: Documentation: add task lifecycle summary
  tools/sched_ext: Provide a compatible helper for scx_bpf_events()
  selftests/sched_ext: Add NUMA-aware scheduler test
  tools/sched_ext: Provide consistent access to scx flags
  sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
  sched_ext: idle: Introduce scx_bpf_nr_node_ids()
  sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
  sched_ext: idle: Per-node idle cpumasks
  sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
  sched_ext: idle: Make idle static keys private
  sched/topology: Introduce for_each_node_numadist() iterator
  mm/numa: Introduce nearest_node_nodemask()
  nodemask: numa: reorganize inclusion path
  nodemask: add nodes_copy()
  tools/sched_ext: Sync with scx repo
  ...
2025-03-24 17:23:48 -07:00
Linus Torvalds 94dc216ad8 cgroup: Changes for v6.15
- Add deprecation info messages to cgroup1-only features.
 
 - rstat updates including a bug fix and breaking up a critical section to
   reduce interrupt latency impact.
 
 - Other misc and doc updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZ9xO2g4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGQz4AQDeWKmngRsnddEMkqOV1ArwXSr+8xUQrvCBx0RL
 vcjOQQEAusGCTeGXWJ96kw+N9BXvGwFsfSeoxjOqAnvrBS1EgAc=
 =WvJg
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Add deprecation info messages to cgroup1-only features

 - rstat updates including a bug fix and breaking up a critical section
   to reduce interrupt latency impact

 - Other misc and doc updates

* tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: Cleanup flushing functions and locking
  cgroup/rstat: avoid disabling irqs for O(num_cpu)
  mm: Fix a build breakage in memcontrol-v1.c
  blk-cgroup: Simplify policy files registration
  cgroup: Update file naming comment
  cgroup: Add deprecation message to legacy freezer controller
  mm: Add transformation message for per-memcg swappiness
  RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_level
  cgroup/cpuset-v1: Add deprecation messages to memory_migrate
  cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwall
  cgroup: Print message when /proc/cgroups is read on v2-only system
  cgroup/blkio: Add deprecation messages to reset_stats
  cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and memory_spread_slab
  cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and memory_pressure_enabled
  cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
  cgroup/rstat: Fix forceidle time in cpu.stat
  cgroup/misc: Remove unused misc_cg_res_total_usage
  cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.c
  cgroup: update comment about dropping cgroup kn refs
2025-03-24 16:49:40 -07:00
Linus Torvalds 05b00ffd7a slab updates for 6.15
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmfb4r0ACgkQu+CwddJF
 iJq6NQf/WNEQAoRY1DEeQiBAvixTYry0j/w1dumpValvt/lybccMwwhWho5i17/o
 2J4nif5L5O6D+jZWyz76fx2bcn7GjhteiKtzuVI0mSdDXyYLBLVGa9dMrE1/0kxy
 51HnldCLfNmC3qp0pG2E7j2chsxDbTwz4ZPiEAW9kzpvgfEWmfydejzv5+ROFQm7
 gH3vRJ7H5enxp2a52DovBN1JllYK9uxMTM3Pq1L37n9Hm1zIR+swbI/3VhklRN4C
 nrO6my6GU2+bMQTvPKwuHBIHUH7yS6Z411wCotPmRO0jfLMq/UY5lthgWpqvsC+o
 XtgULoikQbcd8kts9g71bHSEinwlGw==
 =whkW
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

 - Move the TINY_RCU kvfree_rcu() implementation from RCU to SLAB
   subsystem and cleanup its integration (Vlastimil Babka)

   Following the move of the TREE_RCU batching kvfree_rcu()
   implementation in 6.14, move also the simpler TINY_RCU variant.
   Refactor the #ifdef guards so that the simple implementation is also
   used with SLUB_TINY.

   Remove the need for RCU to recognize fake callback function pointers
   (__is_kvfree_rcu_offset()) when handling call_rcu() by implementing a
   callback that calculates the object's address from the embedded
   rcu_head address without knowing its offset.

 - Improve kmalloc cache randomization in kvmalloc (GONG Ruiqi)

   Due to an extra layer of function call, all kvmalloc() allocations
   used the same set of random caches. Thanks to moving the kvmalloc()
   implementation to slub.c, this is improved and randomization now
   works for kvmalloc.

 - Various improvements to debugging, testing and other cleanups (Hyesoo
   Yu, Lilith Gkini, Uladzislau Rezki, Matthew Wilcox, Kevin Brodsky, Ye
   Bin)

* tag 'slab-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slub: Handle freelist cycle in on_freelist()
  mm/slab: call kmalloc_noprof() unconditionally in kmalloc_array_noprof()
  slab: Mark large folios for debugging purposes
  kunit, slub: Add test_kfree_rcu_wq_destroy use case
  mm, slab: cleanup slab_bug() parameters
  mm: slub: call WARN() when detecting a slab corruption
  mm: slub: Print the broken data before restoring them
  slab: Achieve better kmalloc caches randomization in kvmalloc
  slab: Adjust placement of __kvmalloc_node_noprof
  mm/slab: simplify SLAB_* flag handling
  slab: don't batch kvfree_rcu() with SLUB_TINY
  rcu, slab: use a regular callback function for kvfree_rcu
  rcu: remove trace_rcu_kvfree_callback
  slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
2025-03-24 16:15:47 -07:00
Linus Torvalds fc13a78e1f hardening updates for v6.15-rc1
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan Vadivel)
 
 - samples/check-exec: Fix script name (Mickaël Salaün)
 
 - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
 
 - lib/string_choices: Sort by function name (R Sundar)
 
 - hardening: Allow default HARDENED_USERCOPY to be set at compile time
   (Mel Gorman)
 
 - uaccess: Split out compile-time checks into ucopysize.h
 
 - kbuild: clang: Support building UM with SUBARCH=i386
 
 - x86: Enable i386 FORTIFY_SOURCE on Clang 16+
 
 - ubsan/overflow: Rework integer overflow sanitizer option
 
 - Add missing __nonstring annotations for callers of memtostr*()/strtomem*()
 
 - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for it
 
 - Introduce __nonstring_array for silencing future GCC 15 warnings
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZ9hGrgAKCRA2KwveOeQk
 u1WvAQC3ZxFu3b0Omfmht2pPqCltf2UOQNvUx3egjoeXpUaNSgD+Lxr/T4xksy7E
 jHh7rCYDkruOWs3DHA5JjRQcf0BBLQo=
 =FTQp
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "As usual, it's scattered changes all over. Patches touching things
  outside of our traditional areas in the tree have been Acked by
  maintainers or were trivial changes:

   - loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
     Vadivel)

   - samples/check-exec: Fix script name (Mickaël Salaün)

   - yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)

   - lib/string_choices: Sort by function name (R Sundar)

   - hardening: Allow default HARDENED_USERCOPY to be set at compile
     time (Mel Gorman)

   - uaccess: Split out compile-time checks into ucopysize.h

   - kbuild: clang: Support building UM with SUBARCH=i386

   - x86: Enable i386 FORTIFY_SOURCE on Clang 16+

   - ubsan/overflow: Rework integer overflow sanitizer option

   - Add missing __nonstring annotations for callers of
     memtostr*()/strtomem*()

   - Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
     it

   - Introduce __nonstring_array for silencing future GCC 15 warnings"

* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
  compiler_types: Introduce __nonstring_array
  hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
  x86/build: Remove -ffreestanding on i386 with GCC
  ubsan/overflow: Enable ignorelist parsing and add type filter
  ubsan/overflow: Enable pattern exclusions
  ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
  samples/check-exec: Fix script name
  yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
  kbuild: clang: Support building UM with SUBARCH=i386
  loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
  lib/string_choices: Rearrange functions in sorted order
  string.h: Validate memtostr*()/strtomem*() arguments more carefully
  compiler.h: Introduce __must_be_noncstr()
  nilfs2: Mark on-disk strings as nonstring
  uapi: stddef.h: Introduce __kernel_nonstring
  x86/tdx: Mark message.bytes as nonstring
  string: kunit: Mark nonstring test strings as __nonstring
  scsi: qla2xxx: Mark device strings as nonstring
  scsi: mpt3sas: Mark device strings as nonstring
  scsi: mpi3mr: Mark device strings as nonstring
  ...
2025-03-24 15:18:08 -07:00
Linus Torvalds 26d8e43079 vfs-6.15-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
 onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
 N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
 =VJgY
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs async dir updates from Christian Brauner:
 "This contains cleanups that fell out of the work from async directory
  handling:

   - Change kern_path_locked() and user_path_locked_at() to never return
     a negative dentry. This simplifies the usability of these helpers
     in various places

   - Drop d_exact_alias() from the remaining place in NFS where it is
     still used. This also allows us to drop the d_exact_alias() helper
     completely

   - Drop an unnecessary call to fh_update() from nfsd_create_locked()

   - Change i_op->mkdir() to return a struct dentry

     Change vfs_mkdir() to return a dentry provided by the filesystems
     which is hashed and positive. This allows us to reduce the number
     of cases where the resulting dentry is not positive to very few
     cases. The code in these places becomes simpler and easier to
     understand.

   - Repack DENTRY_* and LOOKUP_* flags"

* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  doc: fix inline emphasis warning
  VFS: Change vfs_mkdir() to return the dentry.
  nfs: change mkdir inode_operation to return alternate dentry if needed.
  fuse: return correct dentry for ->mkdir
  ceph: return the correct dentry on mkdir
  hostfs: store inode in dentry after mkdir if possible.
  Change inode_operations.mkdir to return struct dentry *
  nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
  nfs/vfs: discard d_exact_alias()
  VFS: add common error checks to lookup_one_qstr_excl()
  VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
  VFS: repack LOOKUP_ bit flags.
  VFS: repack DENTRY_ flags.
2025-03-24 10:47:14 -07:00
Linus Torvalds 99c21beaab vfs-6.15-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90p4AAKCRCRxhvAZXjc
 ojMIAP9atkG3u7+490+NGWLdulQlaHnD51Owa9MiW87UfKpsTQEArwi/NrJqXJNT
 PFQ2xIa5TxG+9haChR89w3kjZ6b/hgs=
 =iDkx
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Add CONFIG_DEBUG_VFS infrastucture:
      - Catch invalid modes in open
      - Use the new debug macros in inode_set_cached_link()
      - Use debug-only asserts around fd allocation and install

   - Place f_ref to 3rd cache line in struct file to resolve false
     sharing

Cleanups:

   - Start using anon_inode_getfile_fmode() helper in various places

   - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
     f_pos_lock

   - Add unlikely() to kcmp()

   - Remove legacy ->remount_fs method from ecryptfs after port to the
     new mount api

   - Remove invalidate_inodes() in favour of evict_inodes()

   - Simplify ep_busy_loopER by removing unused argument

   - Avoid mmap sem relocks when coredumping with many missing pages

   - Inline getname()

   - Inline new_inode_pseudo() and de-staticize alloc_inode()

   - Dodge an atomic in putname if ref == 1

   - Consistently deref the files table with rcu_dereference_raw()

   - Dedup handling of struct filename init and refcounts bumps

   - Use wq_has_sleeper() in end_dir_add()

   - Drop the lock trip around I_NEW wake up in evict()

   - Load the ->i_sb pointer once in inode_sb_list_{add,del}

   - Predict not reaching the limit in alloc_empty_file()

   - Tidy up do_sys_openat2() with likely/unlikely

   - Call inode_sb_list_add() outside of inode hash lock

   - Sort out fd allocation vs dup2 race commentary

   - Turn page_offset() into a wrapper around folio_pos()

   - Remove locking in exportfs around ->get_parent() call

   - try_lookup_one_len() does not need any locks in autofs

   - Fix return type of several functions from long to int in open

   - Fix return type of several functions from long to int in ioctls

  Fixes:

   - Fix watch queue accounting mismatch"

* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
  fs: sort out fd allocation vs dup2 race commentary, take 2
  fs: call inode_sb_list_add() outside of inode hash lock
  fs: tidy up do_sys_openat2() with likely/unlikely
  fs: predict not reaching the limit in alloc_empty_file()
  fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
  fs: drop the lock trip around I_NEW wake up in evict()
  fs: use wq_has_sleeper() in end_dir_add()
  VFS/autofs: try_lookup_one_len() does not need any locks
  fs: dedup handling of struct filename init and refcounts bumps
  fs: consistently deref the files table with rcu_dereference_raw()
  exportfs: remove locking around ->get_parent() call.
  fs: use debug-only asserts around fd allocation and install
  fs: dodge an atomic in putname if ref == 1
  vfs: Remove invalidate_inodes()
  ecryptfs: remove NULL remount_fs from super_operations
  watch_queue: fix pipe accounting mismatch
  fs: place f_ref to 3rd cache line in struct file to resolve false sharing
  epoll: simplify ep_busy_loop by removing always 0 argument
  fs: Turn page_offset() into a wrapper around folio_pos()
  kcmp: improve performance adding an unlikely hint to task comparisons
  ...
2025-03-24 09:13:50 -07:00
Liu Ye 0a1e082b64 mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
The `movable` variable is always used when `CONFIG_TRANSPARENT_HUGEPAGE`
is enabled, so the `__maybe_unused` attribute is not necessary.  This
patch removes it and keeps the variable declaration within the `#ifdef`
block for better clarity.

Link: https://lkml.kernel.org/r/20250319091726.401158-1-liuyerd@163.com
Signed-off-by: Liu Ye<liuye@kylinos.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:17 -07:00
Jinjiang Tu 1b0449544c mm/vmscan: don't try to reclaim hwpoison folio
Syzkaller reports a bug as follows:

Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000
Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users
Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed
page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e
memcg:ffff0000dd6d9000
anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9
raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000
page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
------------[ cut here ]------------
kernel BUG at mm/swap_state.c:184!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Modules linked in:
CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3
Hardware name: linux,dummy-virt (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : add_to_swap+0xbc/0x158
lr : add_to_swap+0xbc/0x158
sp : ffff800087f37340
x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780
x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0
x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4
x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000
x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c
x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b
x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000
x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001
x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000
Call trace:
 add_to_swap+0xbc/0x158
 shrink_folio_list+0x12ac/0x2648
 shrink_inactive_list+0x318/0x948
 shrink_lruvec+0x450/0x720
 shrink_node_memcgs+0x280/0x4a8
 shrink_node+0x128/0x978
 balance_pgdat+0x4f0/0xb20
 kswapd+0x228/0x438
 kthread+0x214/0x230
 ret_from_fork+0x10/0x20

I can reproduce this issue with the following steps:

1) When a dirty swapcache page is isolated by reclaim process and the
   page isn't locked, inject memory failure for the page. 
   me_swapcache_dirty() clears uptodate flag and tries to delete from lru,
   but fails.  Reclaim process will put the hwpoisoned page back to lru.

2) The process that maps the hwpoisoned page exits, the page is deleted
   the page will never be freed and will be in the lru forever.

3) If we trigger a reclaim again and tries to reclaim the page,
   add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is
   cleared.

To fix it, skip the hwpoisoned page in shrink_folio_list().  Besides, the
hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap
it in shrink_folio_list(), otherwise the folio will fail to be unmaped by
hwpoison_user_mappings() since the folio isn't in lru list.

Link: https://lkml.kernel.org/r/20250318083939.987651-3-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger,kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:17 -07:00
Jinjiang Tu 5f5ee52d4f mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
Patch series "mm/vmscan: don't try to reclaim hwpoison folio".

Fix a bug during memory reclaim if folio is hwpoisoned.


This patch (of 2):

Introduce helper folio_contain_hwpoisoned_page() to check if the entire
folio is hwpoisoned or it contains hwpoisoned pages.

Link: https://lkml.kernel.org/r/20250318083939.987651-1-tujinjiang@huawei.com
Link: https://lkml.kernel.org/r/20250318083939.987651-2-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger,kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Hao Jia e452872b40 mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
Patch series "Adding Proactive Memory Reclaim Statistics".

These two patches are related to proactive memory reclaim.

Patch 1 Split proactive reclaim statistics from direct reclaim counters
and introduces new counters: pgsteal_proactive, pgdemote_proactive,
and pgscan_proactive.

Patch 2 Adds pswpin and pswpout items to the cgroup-v2 documentation.


This patch (of 2):

In proactive memory reclaim scenarios, it is necessary to accurately track
proactive reclaim statistics to dynamically adjust the frequency and
amount of memory being reclaimed proactively.  Currently, proactive
reclaim is included in direct reclaim statistics, which can make these
direct reclaim statistics misleading.

Therefore, separate proactive reclaim memory from the direct reclaim
counters by introducing new counters: pgsteal_proactive,
pgdemote_proactive, and pgscan_proactive, to avoid confusion with direct
reclaim.

Link: https://lkml.kernel.org/r/20250318075833.90615-1-jiahao.kernel@gmail.com
Link: https://lkml.kernel.org/r/20250318075833.90615-2-jiahao.kernel@gmail.com
Signed-off-by: Hao Jia <jiahao1@lixiang.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:16 -07:00
Nhat Pham 3b23a44f1f mm/damon: implement a new DAMOS filter type for active pages
Patch series "mm/damon: introduce DAMOS filter type for active pages".

The memory reclaim algorithm categorizes pages into active and inactive
lists, separately for file and anon pages.  The system's performance
relies heavily on the (relative and absolute) accuracy of this
categorization.

This patch series add a new DAMOS filter for pages' activeness, giving us
visibility into the access frequency of the pages on each list.  This
insight can help us diagnose issues with the active-inactive balancing
dynamics, and make decisions to optimize reclaim efficiency and memory
utilization.

For instance, we might decide to enable DAMON_LRU_SORT, if we find that
there are pages on the active list that are infrequently accessed, or less
frequently accessed than pages on the inactive list.


This patch (of 2):

Implement a DAMOS filter type for active pages on DAMON kernel API, and
add support of it from the physical address space DAMON operations set
(paddr).

Link: https://lkml.kernel.org/r/20250318183029.2062917-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20250318183029.2062917-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:15 -07:00
Nico Pache 4d689474e1 balloon_compaction: update the NR_BALLOON_PAGES state
Update the NR_BALLOON_PAGES counter when pages are added or removed using
the balloon compaction interface.

The virtio, Vmware, and pseries-cmm balloon drivers utilize the
balloon_compaction interface to allocate and free balloon pages.  Other
balloon drivers will have to maintain this counter manually.

Link: https://lkml.kernel.org/r/20250314213757.244258-3-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juegren Gross <jgross@suse.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:13 -07:00
Nico Pache 835de37603 meminfo: add a per node counter for balloon drivers
Patch series "track memory used by balloon drivers", v2.

This series introduces a way to track memory used by balloon drivers.

Add a NR_BALLOON_PAGES counter to track how many pages are reclaimed by
the balloon drivers.  First add the accounting, then updates the balloon
drivers (virtio, Hyper-V, VMware, Pseries-cmm, and Xen) to maintain this
counter.  The virtio, Vmware, and pseries-cmm balloon drivers utilize the
balloon_compaction interface to allocate and free balloon pages.  Other
balloon drivers will have to maintain this counter manually.

This makes the information visible in memory reporting interfaces like
/proc/meminfo, show_mem, and OOM reporting.

This provides admins visibility into their VM balloon sizes without
requiring different virtualization tooling.  Furthermore, this information
is helpful when debugging an OOM inside a VM.


This patch (of 4):

Add NR_BALLOON_PAGES counter to track memory used by balloon drivers and
expose it through /proc/meminfo and other memory reporting interfaces.

[npache@redhat.com: document Balloon Meminfo entry]
  Link: https://lkml.kernel.org/r/a0315ccf-f244-460e-8643-fd7388724fe5@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-1-npache@redhat.com
Link: https://lkml.kernel.org/r/20250314213757.244258-2-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Juegren Gross <jgross@suse.com>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:13 -07:00
Matthew Wilcox (Oracle) 0d2a260523 mm: remove references to folio in __memcg_kmem_uncharge_page()
This use of folios is misleading because these pages are not part of
a folio.  Remove an unnecessary call to page_folio(), saving 58 bytes
of text in a Debian kernel build.

Link: https://lkml.kernel.org/r/20250314133617.138071-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:12 -07:00
Matthew Wilcox (Oracle) 7cc57ecae4 mm: remove references to folio in split_page_memcg()
We know that the passed in page is not part of a folio (it's a plain page
allocated with GFP_ACCOUNT), so we should get rid of the misleading
references to folios.

Introduce page_objcg() and page_set_objcg() helpers to make things more
clear.

Link: https://lkml.kernel.org/r/20250314133617.138071-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:12 -07:00
Matthew Wilcox (Oracle) 1506c25508 mm: simplify split_page_memcg()
The last argument to split_page_memcg() is now always 0, so remove it,
effectively reverting commit b8791381d7.

Link: https://lkml.kernel.org/r/20250314133617.138071-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:12 -07:00
Matthew Wilcox (Oracle) fa23a338de mm: separate folio_split_memcg_refs() from split_page_memcg()
Patch series "Minor memcg cleanups & prep for memdescs", v2.

Separate the handling of accounted folios and GFP_ACCOUNT pages for easier
to understand code.  For more detail, see
https://lore.kernel.org/linux-mm/Z9LwTOudOlCGny3f@casper.infradead.org/


This patch (of 5):

Folios always use memcg_data to refer to the mem_cgroup while pages
allocated with GFP_ACCOUNT have a pointer to the obj_cgroup.  Since the
caller already knows what it has, split the function into two and then we
don't need to check.

Move the assignment of split folio memcg_data to the point where we set up
the other parts of the new folio.  That leaves folio_split_memcg_refs()
just handling the memcg accounting.

Link: https://lkml.kernel.org/r/20250314133617.138071-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20250314133617.138071-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:11 -07:00
Shakeel Butt cb44821e1f memcg: move do_memsw_account() to CONFIG_MEMCG_V1
The do_memsw_account() is used to enable or disable legacy memory+swap
accounting in memory cgroup.  However with disabled CONFIG_MEMCG_V1, we
don't need to keep checking it.  So, let's always return false for
!CONFIG_MEMCG_V1 configs.

Before the patch:

$ size mm/memcontrol.o
   text    data     bss     dec     hex filename
  49928   10736    4172   64836    fd44 mm/memcontrol.o

After the patch:

$ size mm/memcontrol.o
   text    data     bss     dec     hex filename
  49430   10480    4172   64082    fa52 mm/memcontrol.o

Link: https://lkml.kernel.org/r/20250312222552.3284173-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:11 -07:00
Shakeel Butt 20d6c17252 memcg: avoid refill_stock for root memcg
We never charge the page counters of root memcg, so there is no need to
put root memcg in the memcg stock.  At the moment, refill_stock() can be
called from try_charge_memcg(), obj_cgroup_uncharge_pages() and
mem_cgroup_uncharge_skmem().

The try_charge_memcg() and mem_cgroup_uncharge_skmem() are never called
with root memcg, so those are fine.  However obj_cgroup_uncharge_pages()
can potentially call refill_stock() with root memcg if the objcg object
has been reparented over to the root memcg.  Let's just avoid
refill_stock() from obj_cgroup_uncharge_pages() for root memcg.

Link: https://lkml.kernel.org/r/20250313054812.2185900-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhockoc@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:11 -07:00
Mike Rapoport (Microsoft) b4f65dbdf8 mm/mm_init: rename init_reserved_page to init_deferred_page
When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, init_reserved_page()
function performs initialization of a struct page that would have been
deferred normally.

Rename it to init_deferred_page() to better reflect what the function does.

Link: https://lkml.kernel.org/r/20250225083017.567649-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21 22:03:11 -07:00