Commit Graph

49558 Commits

Author SHA1 Message Date
Linus Torvalds 5fee0dafba - Restore the original buslock locking in a couple of places in the irq core
subsystem after a rework
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj+EwoACgkQEsHwGGHe
 VUqm6xAAuPDn4E0wuxgD5l6gXYDWXx7xoHEDT0KuL2J9OsfbWoHl8OwObBRmD7ls
 au/SuuJUSs3NEntQwLfTklyi7UignkTzcyOYLqb2fMYPFLk+nRXWSjvxsQMQV/u3
 wwSXyK1YaZ4qaEKqIAPm5Uvs4E1DQFu6zzBdjVTKB+w1n0Lh9P4xBdDaHgwc/dV/
 8jKt39JsInLzCy+8aDLeabeU5X5qDscnbpJ3LEHf/6scMBCAvQbnfeICvDijzLgf
 FF4qw+O7qGzFQTKRB2B4pymoFhKGOnGR4jtygejjm3wDO/k2QKS3OwoJo8mzIM3S
 p/HimQ7Uy0KEU11Vo37ANdE8XErkeoj7meoBNGFiU4KZzRU99CnRz0EDap9RUvlx
 clat0CC/3NSGau2hcbYDrTSsjkoWVbEtQJ2XbvHavnE0MscHUMIf1vIQjWzvVG06
 0u5R1OPD+0czeCIXKZQVDGyRcRmmAF1+na3AuBUDq1h0i+KT4V/Y1vX64IFkDdd3
 NaMk6GVmQu3bDpJ4LBpdhVl7cGV50kAbGl77VHST4pERvWQ1EWwwutDp0CK96zo0
 WQnQfjF4/5Ja9l5nCLK7kffQtjdFg/jY/wyixwASEWJDM81T+fZSf32VGkP6Wf0N
 tQYfjKOEj1l/ilRRarSxW8opazhZuN7t7k5e8IxPYP9LmbgAbnQ=
 =oIwu
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Restore the original buslock locking in a couple of places in the irq
   core subsystem after a rework

* tag 'irq_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/manage: Add buslock back in to enable_irq()
  genirq/manage: Add buslock back in to __disable_irq_nosync()
  genirq/chip: Add buslock back in to irq_set_handler()
2025-10-26 09:54:36 -07:00
Linus Torvalds 1bc9743b64 - Make sure a CFS runqueue on a throttled hierarchy has its PELT clock
throttled otherwise task movement and manipulation would lead to dangling
   cfs_rq references and an eventual crash
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj+Cf0ACgkQEsHwGGHe
 VUqkvA/8D1ItoOslMeTpD6YtcaNN9oxzQ7Zow1QaWaPqirUsc+2l/zZ/3R5s0Zlt
 9n0mUNdZ6EC03ZGPwYCNVLk2PvTywmMdwXOypya303PXLez2bPigekJIyXJeW5FV
 YuJWTJBQWtZwiFf2ekP1OmHRceOA4KuBIwmWvfW4YwdXlUGfDLn+X6a4z8GsH/z+
 ss8iUTfbEraBoFFaF16xq1zxrvRDw5vZpX2HkcHADiTVdkHcuXrf+33AeW/URWKz
 FrwimiW+HJdue9trFNwLKUggHCPDoUpHLPA/kmWFiGCZWRXBPpmZ56NGRgfoadGa
 4/Hb9ASMjMFl8Y9gnkOqLyomhQ8vJ8LkNqDChiJ5AiQQFYRekrPuZw+zuCENtzVZ
 miAmp/kXCGSCWTMNZKlztxJGhmn/yiH+sVegmyHyDqGfqnuEBF3sebkf/DDkDAvu
 88SG1YB8OlgmDIxShhfHQqw1nZa7BshLkViak6110n4fP6fbZrbY0MwBLHX2VVpQ
 jJeFuvQ2pZuEl1LKVDsy+ROIShkQITZ8IOeabnm6vAeHEpjomDvmlZOmc5f9NfHV
 wH6SmrHzSaEam70EJflzoglujYy+JMtVIUd7QC/jYXtPOYj1fcHPgwqlnv25uW9e
 4IrwjFNwc2u0MAemKcqRO4DUEwAczD0y+dL/6eVKK8niVmat4f8=
 =8MbE
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

 - Make sure a CFS runqueue on a throttled hierarchy has its PELT clock
   throttled otherwise task movement and manipulation would lead to
   dangling cfs_rq references and an eventual crash

* tag 'sched_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled
2025-10-26 09:42:19 -07:00
Linus Torvalds 7ea5092f52 - Do not create more than eight (max supported) AUX clocks sysfs hierarchies
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj+BzAACgkQEsHwGGHe
 VUrGkg//aG7SmeAZiKL1SCyEZgNitZRzuxD7lHS+gky7qsjZu8ieGwV4WvcPoG2B
 wQq7dfH7ZiafiOV/IjGaRwWDbg90934QBZNW9hswMyNsg5ci8vhDCRu6TSWhZN5O
 HSezJNElq9t3QK1+40qY8m5Zw7DlnJJOQhOyDlkc/tdzZjJs+KSDctEytQ1RjKn6
 kR1x7GE6UUJpdKP0/fvlvJVALSrR7hyzqmv+G9HBLhuz8E/lj6IUhRpeouQ3u4tn
 7TNYHi5/HEIUE/T87YEsIIYeAVe6hQ33M73rJ6UMYDhuwfLmePVYtm5gsPt8D6Zi
 9z73CTZKhry1fpMR0X8pow+zHsyMyYtErX93mFmmvCiPtC8FlvUSLWTVx0n8itBQ
 jyGZOVPAJWiZ1FCuSaZeaBB3s4/AGDFAOOZIS+l0oRzZHE4xU23y1cgyNbf1pp6H
 3i2UR0UZc8D5YVaT6LnmhVDosjZrV7V+GPCqcLNxb8QCqr+dkHjb8fv/G2b0RB34
 YUg798hulYOYhU+mKr/qOJs9SPztx8VgmirURU/4wU8aM7vpPPdn5IsgeohzBmSA
 2LLE3M2KFCtDem5O5I2HUrBsJBr85om+XMAP2O7ctoWzl3GC4j0Fqr4dK2M0UUfD
 sTMamGyci7QfiUvzfe/Qh+T1dp6b/4kmU/1rsoyA1fn8V2q+khA=
 =PFVB
 -----END PGP SIGNATURE-----

Merge tag 'timers_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Do not create more than eight (max supported) AUX clocks sysfs
   hierarchies

* tag 'timers_urgent_for_v6.18_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix aux clocks sysfs initialization loop bound
2025-10-26 09:40:16 -07:00
Andy Shevchenko 53abe3e1c1 sched: Remove never used code in mm_cid_get()
Clang is not happy with set but unused variable (this is visible
with `make W=1` build:

  kernel/sched/sched.h:3744:18: error: variable 'cpumask' set but not used [-Werror,-Wunused-but-set-variable]

It seems like the variable was never used along with the assignment
that does not have side effects as far as I can see.  Remove those
altogether.

Fixes: 223baf9d17 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-10-24 16:55:46 -07:00
Charles Keepax ef3330b99c genirq/manage: Add buslock back in to enable_irq()
The locking was changed from a buslock to a plain lock, but the patch
description states there was no functional change. Assuming this was
accidental so reverting to using the buslock.

Fixes: bddd10c554 ("genirq/manage: Rework enable_irq()")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251023154901.1333755-4-ckeepax@opensource.cirrus.com
2025-10-24 11:38:39 +02:00
Charles Keepax 56363e25f7 genirq/manage: Add buslock back in to __disable_irq_nosync()
The locking was changed from a buslock to a plain lock, but the patch
description states there was no functional change. Assuming this was
accidental so reverting to using the buslock.

Fixes: 1b74444467 ("genirq/manage: Rework __disable_irq_nosync()")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251023154901.1333755-3-ckeepax@opensource.cirrus.com
2025-10-24 11:38:39 +02:00
Charles Keepax 5d7e45dd67 genirq/chip: Add buslock back in to irq_set_handler()
The locking was changed from a buslock to a plain lock, but the patch
description states there was no functional change. Assuming this was
accidental so reverting to using the buslock.

Fixes: 5cd05f3e23 ("genirq/chip: Rework irq_set_handler() variants")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251023154901.1333755-2-ckeepax@opensource.cirrus.com
2025-10-24 11:38:39 +02:00
Linus Torvalds 5121062e83 A couple of fixes for Runtime Verification
- A bug caused a kernel panic when reading enabled_monitors was reported.
   Change callbacks functions to always use list_head iterators and by
   doing so, fix the wrong pointer that was leading to the panic.
 
 - The rtapp/pagefault monitor relies on the MMU to be present
   (pagefaults exist) but that was not enforced via kconfig, leading to
   potential build errors on systems without an MMU. Add that kconfig
   dependency.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaPqYBRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrZjAQC02FScGQM+TTQ2kvIFAscKEfZddks9
 iiodkKWxMTZoxwD/VxXKQUD8CP2HW9uSpJw/O3Zv+NAU80Eq8V2f7/0d9gw=
 =AnZN
 -----END PGP SIGNATURE-----

Merge tag 'trace-rv-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A couple of fixes for Runtime Verification:

   - A bug caused a kernel panic when reading enabled_monitors was
     reported.

     Change callback functions to always use list_head iterators and by
     doing so, fix the wrong pointer that was leading to the panic.

   - The rtapp/pagefault monitor relies on the MMU to be present
     (pagefaults exist) but that was not enforced via kconfig, leading
     to potential build errors on systems without an MMU.

     Add that kconfig dependency"

* tag 'trace-rv-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Make rtapp/pagefault monitor depends on CONFIG_MMU
  rv: Fully convert enabled_monitors to use list_head as iterator
2025-10-23 16:50:25 -07:00
Linus Torvalds 0f3ad9c610 17 hotfixes. 12 are cc:stable and 14 are for MM.
There's a two-patch DAMON series from SeongJae Park which addresses a
 missed check and possible memory leak.  Apart from that it's all
 singletons - please see the changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaPkz/QAKCRDdBJ7gKXxA
 js6eAQCdnA10LouzzVdqA+HuYh206z8qE2KGsGKpUGDfJv40uAEA4ZbxYrMJmwhU
 MXFn7Czphh/NOfFFCnrDnOlAFH7MmQc=
 =hc/P
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "17 hotfixes. 12 are cc:stable and 14 are for MM.

  There's a two-patch DAMON series from SeongJae Park which addresses a
  missed check and possible memory leak. Apart from that it's all
  singletons - please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  csky: abiv2: adapt to new folio flags field
  mm/damon/core: use damos_commit_quota_goal() for new goal commit
  mm/damon/core: fix potential memory leak by cleaning ops_filter in damon_destroy_scheme
  hugetlbfs: move lock assertions after early returns in huge_pmd_unshare()
  vmw_balloon: indicate success when effectively deflating during migration
  mm/damon/core: fix list_add_tail() call on damon_call()
  mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remap
  mm: prevent poison consumption when splitting THP
  ocfs2: clear extent cache after moving/defragmenting extents
  mm: don't spin in add_stack_record when gfp flags don't allow
  dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC
  mm/damon/sysfs: dealloc commit test ctx always
  mm/damon/sysfs: catch commit test ctx alloc failure
  hung_task: fix warnings caused by unaligned lock pointers
2025-10-22 14:57:35 -10:00
K Prateek Nayak 0e4a169d1a sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled
Matteo reported hitting the assert_list_leaf_cfs_rq() warning from
enqueue_task_fair() post commit fe8d238e64 ("sched/fair: Propagate
load for throttled cfs_rq") which transitioned to using
cfs_rq_pelt_clock_throttled() check for leaf cfs_rq insertions in
propagate_entity_cfs_rq().

The "cfs_rq->pelt_clock_throttled" flag is used to indicate if the
hierarchy has its PELT frozen. If a cfs_rq's PELT is marked frozen, all
its descendants should have their PELT frozen too or weird things can
happen as a result of children accumulating PELT signals when the
parents have their PELT clock stopped.

Another side effect of this is the loss of integrity of the leaf cfs_rq
list. As debugged by Aaron, consider the following hierarchy:

    root(#)
   /    \
  A(#)   B(*)
         |
         C <--- new cgroup
         |
         D <--- new cgroup

  # - Already on leaf cfs_rq list
  * - Throttled with PELT frozen

The newly created cgroups don't have their "pelt_clock_throttled" signal
synced with cgroup B. Next, the following series of events occur:

1. online_fair_sched_group() for cgroup D will call
   propagate_entity_cfs_rq(). (Same can happen if a throttled task is
   moved to cgroup C and enqueue_task_fair() returns early.)

   propagate_entity_cfs_rq() adds the cfs_rq of cgroup C to
   "rq->tmp_alone_branch" since its PELT clock is not marked throttled
   and cfs_rq of cgroup B is not on the list.

   cfs_rq of cgroup B is skipped since its PELT is throttled.

   root cfs_rq already exists on cfs_rq leading to
   list_add_leaf_cfs_rq() returning early.

   The cfs_rq of cgroup C is left dangling on the
   "rq->tmp_alone_branch".

2. A new task wakes up on cgroup A. Since the whole hierarchy is already
   on the leaf cfs_rq list, list_add_leaf_cfs_rq() keeps returning early
   without any modifications to "rq->tmp_alone_branch".

   The final assert_list_leaf_cfs_rq() in enqueue_task_fair() sees the
   dangling reference to cgroup C's cfs_rq in "rq->tmp_alone_branch".

   !!! Splat !!!

Syncing the "pelt_clock_throttled" indicator with parent cfs_rq is not
enough since the new cfs_rq is not yet enqueued on the hierarchy. A
dequeue on other subtree on the throttled hierarchy can freeze the PELT
clock for the parent hierarchy without setting the indicators for this
newly added cfs_rq which was never enqueued.

Since there are no tasks on the new hierarchy, start a cfs_rq on a
throttled hierarchy with its PELT clock throttled. The first enqueue, or
the distribution (whichever happens first) will unfreeze the PELT clock
and queue the cfs_rq on the leaf cfs_rq list.

While at it, add an assert_list_leaf_cfs_rq() in
propagate_entity_cfs_rq() to catch such cases in the future.

Closes: https://lore.kernel.org/lkml/58a587d694f33c2ea487c700b0d046fa@codethink.co.uk/
Fixes: e1fad12dcb ("sched/fair: Switch to task based throttle model")
Reported-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Suggested-by: Aaron Lu <ziqianlu@bytedance.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Tested-by: Matteo Martelli <matteo.martelli@codethink.co.uk>
Link: https://patch.msgid.link/20251021053522.37583-1-kprateek.nayak@amd.com
2025-10-22 15:21:52 +02:00
Linus Torvalds 6548d364a3 cgroup: Fixes for v6.18-rc2
- Fix seqcount lockdep assertion failure in cgroup freezer on PREEMPT_RT.
   Plain seqcount_t expects preemption disabled, but PREEMPT_RT spinlocks
   don't disable preemption. Switch to seqcount_spinlock_t to properly
   associate css_set_lock with the freeze timing seqcount.
 
 - Misc changes including kernel-doc warning fix for misc_res_type enum and
   improved selftest diagnostics.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaPZ7wg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGTiuAP4swnb+BmcqOu37rxIuKXS+6FwRb0Om2wBs4m5W
 bgASMAEAva+DWpWx0l8GFV4kf14wRV6rYlkMRcP3KSyqXNLmLg4=
 =8A4u
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.18-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - Fix seqcount lockdep assertion failure in cgroup freezer on
   PREEMPT_RT.

   Plain seqcount_t expects preemption disabled, but PREEMPT_RT
   spinlocks don't disable preemption. Switch to seqcount_spinlock_t to
   properly associate css_set_lock with the freeze timing seqcount.

 - Misc changes including kernel-doc warning fix for misc_res_type enum
   and improved selftest diagnostics.

* tag 'cgroup-for-6.18-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/misc: fix misc_res_type kernel-doc warning
  selftests: cgroup: Use values_close_report in test_cpu
  selftests: cgroup: add values_close_report helper
  cgroup: Fix seqcount lockdep assertion in cgroup freezer
2025-10-20 09:41:27 -10:00
Haofeng Li 39a9ed0fb6 timekeeping: Fix aux clocks sysfs initialization loop bound
The loop in tk_aux_sysfs_init() uses `i <= MAX_AUX_CLOCKS` as the
termination condition, which results in 9 iterations (i=0 to 8) when
MAX_AUX_CLOCKS is defined as 8. However, the kernel is designed to support
only up to 8 auxiliary clocks.

This off-by-one error causes the creation of a 9th sysfs entry that exceeds
the intended auxiliary clock range.

Fix the loop bound to use `i < MAX_AUX_CLOCKS` to ensure exactly 8
auxiliary clock entries are created, matching the design specification.

Fixes: 7b95663a3d ("timekeeping: Provide interface to control auxiliary clocks")
Signed-off-by: Haofeng Li <lihaofeng@kylinos.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/tencent_2376993D9FC06A3616A4F981B3DE1C599607@qq.com
2025-10-20 19:56:12 +02:00
Nam Cao 3d62f95bd8 rv: Make rtapp/pagefault monitor depends on CONFIG_MMU
There is no page fault without MMU. Compiling the rtapp/pagefault monitor
without CONFIG_MMU fails as page fault tracepoints' definitions are not
available.

Make rtapp/pagefault monitor depends on CONFIG_MMU.

Fixes: 9162620eb6 ("rv: Add rtapp_pagefault monitor")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509260455.6Z9Vkty4-lkp@intel.com/
Cc: stable@vger.kernel.org
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20251002082317.973839-1-namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-10-20 12:47:40 +02:00
Nam Cao 103541e6a5 rv: Fully convert enabled_monitors to use list_head as iterator
The callbacks in enabled_monitors_seq_ops are inconsistent. Some treat the
iterator as struct rv_monitor *, while others treat the iterator as struct
list_head *.

This causes a wrong type cast and crashes the system as reported by Nathan.

Convert everything to use struct list_head * as iterator. This also makes
enabled_monitors consistent with available_monitors.

Fixes: de090d1cca ("rv: Fix wrong type cast in enabled_monitors_next()")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20250923002004.GA2836051@ax162/
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20251002082235.973099-1-namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-10-20 12:47:40 +02:00
Linus Torvalds d9043c79ba - Make sure the check for lost pelt idle time is done unconditionally to
have correct lost idle time accounting
 
 - Stop the deadline server task before a CPU goes offline
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj0yvsACgkQEsHwGGHe
 VUqiYRAAncYon7a++87nuCHIw2ktAcjn4PJTz0F1VGw9ZvcbWThUhNoA17jd4uOz
 XCzSH1rnHnlz359cJIzFwgVYjkBIaqT8GBN0al9ODra37laZCo89bKLmOeAlH81H
 1xJXrDwn7U8dYBjgf6E6OGCdAx40kspCBxmpxrFW1VrGDvfNjEAKezm5GWeSED0Z
 umA93dBr82i4IvfARUkK8s35ctHyx+o+7lCvCSsKSJgM02WWrKqAA/lv6jFjIgdE
 0UuYJv+5A2e1Iog2KNSbvSPn23VaMnsZtvXfJoRLFHEsNTiL9NliTnwrOY6xx0Z8
 9+GUeWsbobKwcKSk4dctOh0g/4afNbxWe2aAPmScHJNHtXHSeejps+zy4xFCLTZn
 2muHCdZ2zo6YSL+og4TQax+FnLYnGUtPFDOQYsNxv/Cp1H+cbgvG5Qp08XXt8Tfl
 Mt82g25GKklc28AN5Ui7FKTFmV2K363pV04YVZjXOwmxwiEYbwKw8gKfxi7CRW7S
 fl4nW6Kp8BFtJQxc/RCXDIiX3h0wRlTOmF5FzyFYxgdsmO5AdGqS9tqknLrV2NlH
 JVtj7alnrmCU34LwtTVfCvYQZiNd4IN+B6/htsL3AzrcLnqJz4O/T/Eyv9UL4yUs
 yvQuO+yStCyk0BFYaGM3/E0xp87NYjaLiHnpM2jia3DT3UT1t7Q=
 =uqJW
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the check for lost pelt idle time is done unconditionally
   to have correct lost idle time accounting

 - Stop the deadline server task before a CPU goes offline

* tag 'sched_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix pelt lost idle time detection
  sched/deadline: Stop dl_server before CPU goes offline
2025-10-19 04:59:43 -10:00
Linus Torvalds 343b4b44a1 - Make sure perf reporting works correctly in setups using overlayfs or FUSE
- Move the uprobe optimization to a better location logically
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj0x3YACgkQEsHwGGHe
 VUo6nQ/9E8LWC6PJG40QUXNZuj5qLe9VVaiWTW7w/zgeCf9nxkt6OhlOIu4fCMKz
 6n5marqnvOoG9EXetUz5+n0wJvc9vDACESC0m6ESddaI4PGXULNJIsN2C5dR3UZ3
 RULxaXvz9PVVkW3UIuM/U9az7fsG/ttH1rtrWQOsUYQZEO7vA9g+8KtASwnB7yBa
 29WzVDYQIuHigdFPkVOuKBEdhslOjNjMM/N/shFOyFS62MGgwwFG/f4xv0c2GanJ
 9gS2HPGhwOXLm8x/1Y6D8eKjiT5lvqZcDcRnui8bj7L7YGx+HU4PhRIIg7sBvGqA
 QQGolxA9Xo2BTufUTxEQK9v2fSvg0f9wuKbkDbRUdyUeWiZZjEeBM/m0AkzEEeKf
 FUrLCi3V/mN5J/sXSgIwjuCtYctwmsfaukL2bz6DB7feoTHceQmHunKCtBlDZtLE
 Md/4hzMNYM+T/3nx27quGz8Cepxn9PSObN7W+DddWr0TxOxg2Pq6iMbnd7MulueP
 K/AMvqDtbbVUB1XpsFvadRLcYUYYfXT9tiOCxa9O2w2NXDG8qeB6FZwScBaWuz1N
 9GpKBhVMgZT8m0d3N8NoBi0+h32UVZnsJJ3UhHnceE8UyYf4kSO5L2K3nPHJa301
 AavIPkH7+YOl5TAg6JlyYbRRdwfoUzxKUqY/hQ6Q8aLvwb2Jing=
 =huy7
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

 - Make sure perf reporting works correctly in setups using
   overlayfs or FUSE

 - Move the uprobe optimization to a better location logically

* tag 'perf_urgent_for_v6.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix MMAP2 event device with backing files
  perf/core: Fix MMAP event path names with backing files
  perf/core: Fix address filter match with backing files
  uprobe: Move arch_uprobe_optimize right after handlers execution
2025-10-19 04:54:08 -10:00
Shardul Bankar f6fddc6df3 bpf: Fix memory leak in __lookup_instance error path
When __lookup_instance() allocates a func_instance structure but fails
to allocate the must_write_set array, it returns an error without freeing
the previously allocated func_instance. This causes a memory leak of 192
bytes (sizeof(struct func_instance)) each time this error path is triggered.

Fix by freeing 'result' on must_write_set allocation failure.

Fixes: b3698c356a ("bpf: callchain sensitive stack liveness tracking using CFG")
Reported-by: BPF Runtime Fuzzer (BRF)
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20251016063330.4107547-1-shardulsb08@gmail.com
2025-10-16 10:45:17 -07:00
Marek Szyprowski 03521c892b dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC
Commit 370645f41e ("dma-mapping: force bouncing if the kmalloc() size is
not cache-line-aligned") introduced DMA_BOUNCE_UNALIGNED_KMALLOC feature
and permitted architecture specific code configure kmalloc slabs with
sizes smaller than the value of dma_get_cache_alignment().

When that feature is enabled, the physical address of some small
kmalloc()-ed buffers might be not aligned to the CPU cachelines, thus not
really suitable for typical DMA.  To properly handle that case a SWIOTLB
buffer bouncing is used, so no CPU cache corruption occurs.  When that
happens, there is no point reporting a false-positive DMA-API warning that
the buffer is not properly aligned, as this is not a client driver fault.

[m.szyprowski@samsung.com: replace is_swiotlb_allocated() with is_swiotlb_active(), per Catalin]
  Link: https://lkml.kernel.org/r/20251010173009.3916215-1-m.szyprowski@samsung.com
Link: https://lkml.kernel.org/r/20251009141508.2342138-1-m.szyprowski@samsung.com
Fixes: 370645f41e ("dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Inki Dae <m.szyprowski@samsung.com>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-15 13:24:33 -07:00
Alexei Starovoitov 5fb750e8a9 bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
The following kmemleak splat:

[    8.105530] kmemleak: Trying to color unknown object at 0xff11000100e918c0 as Black
[    8.106521] Call Trace:
[    8.106521]  <TASK>
[    8.106521]  dump_stack_lvl+0x4b/0x70
[    8.106521]  kvfree_call_rcu+0xcb/0x3b0
[    8.106521]  ? hrtimer_cancel+0x21/0x40
[    8.106521]  bpf_obj_free_fields+0x193/0x200
[    8.106521]  htab_map_update_elem+0x29c/0x410
[    8.106521]  bpf_prog_cfc8cd0f42c04044_overwrite_cb+0x47/0x4b
[    8.106521]  bpf_prog_8c30cd7c4db2e963_overwrite_timer+0x65/0x86
[    8.106521]  bpf_prog_test_run_syscall+0xe1/0x2a0

happens due to the combination of features and fixes, but mainly due to
commit 6d78b4473c ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
It's using __GFP_HIGH, which instructs slub/kmemleak internals to skip
kmemleak_alloc_recursive() on allocation, so subsequent kfree_rcu()->
kvfree_call_rcu()->kmemleak_ignore() complains with the above splat.

To fix this imbalance, replace bpf_map_kmalloc_node() with
kmalloc_nolock() and kfree_rcu() with call_rcu() + kfree_nolock() to
make sure that the objects allocated with kmalloc_nolock() are freed
with kfree_nolock() rather than the implicit kfree() that kfree_rcu()
uses internally.

Note, the kmalloc_nolock() happens under bpf_spin_lock_irqsave(), so
it will always fail in PREEMPT_RT. This is not an issue at the moment,
since bpf_timers are disabled in PREEMPT_RT. In the future
bpf_spin_lock will be replaced with state machine similar to
bpf_task_work.

Fixes: 6d78b4473c ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com
2025-10-15 12:22:22 +02:00
Vincent Guittot 17e3e88ed0 sched/fair: Fix pelt lost idle time detection
The check for some lost idle pelt time should be always done when
pick_next_task_fair() fails to pick a task and not only when we call it
from the fair fast-path.

The case happens when the last running task on rq is a RT or DL task. When
the latter goes to sleep and the /Sum of util_sum of the rq is at the max
value, we don't account the lost of idle time whereas we should.

Fixes: 67692435c4 ("sched: Rework pick_next_task() slow-path")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-10-14 13:43:08 +02:00
Peter Zijlstra (Intel) ee6e44dfe6 sched/deadline: Stop dl_server before CPU goes offline
IBM CI tool reported kernel warning[1] when running a CPU removal
operation through drmgr[2]. i.e "drmgr -c cpu -r -q 1"

WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219 cpudl_set+0x58/0x170
NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
Call Trace:
[c000000002c2f8c0] init_stack+0x78c0/0x8000 (unreliable)
[c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[c00000000034df84] __hrtimer_run_queues+0x1a4/0x390
[c00000000034f624] hrtimer_interrupt+0x124/0x300
[c00000000002a230] timer_interrupt+0x140/0x320

Git bisects to: commit 4ae8d9aa9f ("sched/deadline: Fix dl_server getting stuck")

This happens since:
- dl_server hrtimer gets enqueued close to cpu offline, when
  kthread_park enqueues a fair task.
- CPU goes offline and drmgr removes it from cpu_present_mask.
- hrtimer fires and warning is hit.

Fix it by stopping the dl_server before CPU is marked dead.

[1]: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com/
[2]: https://github.com/ibm-power-utilities/powerpc-utils/tree/next/src/drmgr

[sshegde: wrote the changelog and tested it]
Fixes: 4ae8d9aa9f ("sched/deadline: Fix dl_server getting stuck")
Closes: https://lore.kernel.org/all/8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
2025-10-14 13:43:08 +02:00
Adrian Hunter fa4f4bae89 perf/core: Fix MMAP2 event device with backing files
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.

That causes perf to misreport the device major/minor numbers of the file
system of the file, and the generation of the file, and potentially other
inode details.  There is an existing helper file_user_inode() for that
situation.

Use file_user_inode() instead of file_inode() to get the inode for MMAP2
events.

Example:

  Setup:

    # cd /root
    # mkdir test ; cd test ; mkdir lower upper work merged
    # cp `which cat` lower
    # mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
    # perf record -e cycles:u -- /root/test/merged/cat /proc/self/maps
    ...
    55b2c91d0000-55b2c926b000 r-xp 00018000 00:1a 3419                       /root/test/merged/cat
    ...
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.004 MB perf.data (5 samples) ]
    #
    # stat /root/test/merged/cat
      File: /root/test/merged/cat
      Size: 1127792         Blocks: 2208       IO Block: 4096   regular file
    Device: 0,26    Inode: 3419        Links: 1
    Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
    Access: 2025-09-08 12:23:59.453309624 +0000
    Modify: 2025-09-08 12:23:59.454309624 +0000
    Change: 2025-09-08 12:23:59.454309624 +0000
     Birth: 2025-09-08 12:23:59.453309624 +0000

  Before:

    Device reported 00:02 differs from stat output and /proc/self/maps

    # perf script --show-mmap-events | grep /root/test/merged/cat
             cat     377 [-01]   243.078558: PERF_RECORD_MMAP2 377/377: [0x55b2c91d0000(0x9b000) @ 0x18000 00:02 3419 2068525940]: r-xp /root/test/merged/cat

  After:

    Device reported 00:1a is the same as stat output and /proc/self/maps

    # perf script --show-mmap-events | grep /root/test/merged/cat
             cat     362 [-01]   127.755167: PERF_RECORD_MMAP2 362/362: [0x55ba6e781000(0x9b000) @ 0x18000 00:1a 3419 0]: r-xp /root/test/merged/cat

With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels.  FMODE_BACKING that it depends on was added
in v6.5.  This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
2025-10-14 10:38:10 +02:00
Adrian Hunter 8818f507a9 perf/core: Fix MMAP event path names with backing files
Some file systems like FUSE-based ones or overlayfs may record the backing
file in struct vm_area_struct vm_file, instead of the user file that the
user mmapped.

Since commit def3ae83da ("fs: store real path instead of fake path in
backing file f_path"), file_path() no longer returns the user file path
when applied to a backing file.  There is an existing helper
file_user_path() for that situation.

Use file_user_path() instead of file_path() to get the path for MMAP
and MMAP2 events.

Example:

  Setup:

    # cd /root
    # mkdir test ; cd test ; mkdir lower upper work merged
    # cp `which cat` lower
    # mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
    # perf record -e intel_pt//u -- /root/test/merged/cat /proc/self/maps
    ...
    55b0ba399000-55b0ba434000 r-xp 00018000 00:1a 3419                       /root/test/merged/cat
    ...
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.060 MB perf.data ]
    #

  Before:

    File name is wrong (/cat), so decoding fails:

    # perf script --no-itrace --show-mmap-events
             cat     367 [016]   100.491492: PERF_RECORD_MMAP2 367/367: [0x55b0ba399000(0x9b000) @ 0x18000 00:02 3419 489959280]: r-xp /cat
    ...
    # perf script --itrace=e | wc -l
    Warning:
    19 instruction trace errors
    19
    #

  After:

    File name is correct (/root/test/merged/cat), so decoding is ok:

    # perf script --no-itrace --show-mmap-events
                 cat     364 [016]    72.153006: PERF_RECORD_MMAP2 364/364: [0x55ce4003d000(0x9b000) @ 0x18000 00:02 3419 3132534314]: r-xp /root/test/merged/cat
    # perf script --itrace=e
    # perf script --itrace=e | wc -l
    0
    #

Fixes: def3ae83da ("fs: store real path instead of fake path in backing file f_path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org
2025-10-14 10:38:09 +02:00
Adrian Hunter ebfc8542ad perf/core: Fix address filter match with backing files
It was reported that Intel PT address filters do not work in Docker
containers.  That relates to the use of overlayfs.

overlayfs records the backing file in struct vm_area_struct vm_file,
instead of the user file that the user mmapped.  In order for an address
filter to match, it must compare to the user file inode.  There is an
existing helper file_user_inode() for that situation.

Use file_user_inode() instead of file_inode() to get the inode for address
filter matching.

Example:

  Setup:

    # cd /root
    # mkdir test ; cd test ; mkdir lower upper work merged
    # cp `which cat` lower
    # mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
    # perf record --buildid-mmap -e intel_pt//u --filter 'filter * @ /root/test/merged/cat' -- /root/test/merged/cat /proc/self/maps
    ...
    55d61d246000-55d61d2e1000 r-xp 00018000 00:1a 3418                       /root/test/merged/cat
    ...
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.015 MB perf.data ]
    # perf buildid-cache --add /root/test/merged/cat

  Before:

    Address filter does not match so there are no control flow packets

    # perf script --itrace=e
    # perf script --itrace=b | wc -l
    0
    # perf script -D | grep 'TIP.PGE' | wc -l
    0
    #

  After:

    Address filter does match so there are control flow packets

    # perf script --itrace=e
    # perf script --itrace=b | wc -l
    235
    # perf script -D | grep 'TIP.PGE' | wc -l
    57
    #

With respect to stable kernels, overlayfs mmap function ovl_mmap() was
added in v4.19 but file_user_inode() was not added until v6.8 and never
back-ported to stable kernels.  FMODE_BACKING that it depends on was added
in v6.5.  This issue has gone largely unnoticed, so back-porting before
v6.8 is probably not worth it, so put 6.8 as the stable kernel prerequisite
version, although in practice the next long term kernel is 6.12.

Closes: https://lore.kernel.org/linux-perf-users/aBCwoq7w8ohBRQCh@fremen.lan
Reported-by: Edd Barrett <edd@theunixzoo.co.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org # 6.8
2025-10-14 10:38:09 +02:00
Jiri Olsa 62685ab071 uprobe: Move arch_uprobe_optimize right after handlers execution
It's less confusing to optimize uprobe right after handlers execution
and before we do the check for changed ip register to avoid situations
where changed ip register would skip uprobe optimization.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
2025-10-14 10:38:09 +02:00
Linus Torvalds 67029a49db tracing fixes for v6.18:
The previous fix to trace_marker required updating trace_marker_raw
 as well. The difference between trace_marker_raw from trace_marker
 is that the raw version is for applications to write binary structures
 directly into the ring buffer instead of writing ASCII strings.
 This is for applications that will read the raw data from the ring
 buffer and get the data structures directly. It's a bit quicker than
 using the ASCII version.
 
 Unfortunately, it appears that our test suite has several tests that
 test writes to the trace_marker file, but lacks any tests to the
 trace_marker_raw file (this needs to be remedied). Two issues came
 about the update to the trace_marker_raw file that syzbot found:
 
 - Fix tracing_mark_raw_write() to use per CPU buffer
 
   The fix to use the per CPU buffer to copy from user space was needed for
   both the trace_maker and trace_maker_raw file.
 
   The fix for reading from user space into per CPU buffers properly fixed
   the trace_marker write function, but the trace_marker_raw file wasn't
   fixed properly. The user space data was correctly written into the per CPU
   buffer, but the code that wrote into the ring buffer still used the user
   space pointer and not the per CPU buffer that had the user space data
   already written.
 
 - Stop the fortify string warning from writing into trace_marker_raw
 
   After converting the copy_from_user_nofault() into a memcpy(), another
   issue appeared. As writes to the trace_marker_raw expects binary data, the
   first entry is a 4 byte identifier. The entry structure is defined as:
 
   struct {
 	struct trace_entry ent;
 	int id;
 	char buf[];
   };
 
   The size of this structure is reserved on the ring buffer with:
 
     size = sizeof(*entry) + cnt;
 
   Then it is copied from the buffer into the ring buffer with:
 
     memcpy(&entry->id, buf, cnt);
 
   This use to be a copy_from_user_nofault(), but now converting it to
   a memcpy() triggers the fortify-string code, and causes a warning.
 
   The allocated space is actually more than what is copied, as the cnt
   used also includes the entry->id portion. Allocating sizeof(*entry)
   plus cnt is actually allocating 4 bytes more than what is needed.
 
   Change the size function to:
 
     size = struct_size(entry, buf, cnt - sizeof(entry->id));
 
   And update the memcpy() to unsafe_memcpy().
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOq0fhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4HAQCNbFEDjzNNEueCWLOptC5YfeJgdvPB
 399CzuFJl02ZOgD/flPJa1r+NaeYOBhe1BgpFF9FzB/SPXAXkXUGLM7WIgg=
 =goks
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:
 "The previous fix to trace_marker required updating trace_marker_raw as
  well. The difference between trace_marker_raw from trace_marker is
  that the raw version is for applications to write binary structures
  directly into the ring buffer instead of writing ASCII strings. This
  is for applications that will read the raw data from the ring buffer
  and get the data structures directly. It's a bit quicker than using
  the ASCII version.

  Unfortunately, it appears that our test suite has several tests that
  test writes to the trace_marker file, but lacks any tests to the
  trace_marker_raw file (this needs to be remedied). Two issues came
  about the update to the trace_marker_raw file that syzbot found:

   - Fix tracing_mark_raw_write() to use per CPU buffer

     The fix to use the per CPU buffer to copy from user space was
     needed for both the trace_maker and trace_maker_raw file.

     The fix for reading from user space into per CPU buffers properly
     fixed the trace_marker write function, but the trace_marker_raw
     file wasn't fixed properly. The user space data was correctly
     written into the per CPU buffer, but the code that wrote into the
     ring buffer still used the user space pointer and not the per CPU
     buffer that had the user space data already written.

   - Stop the fortify string warning from writing into trace_marker_raw

     After converting the copy_from_user_nofault() into a memcpy(),
     another issue appeared. As writes to the trace_marker_raw expects
     binary data, the first entry is a 4 byte identifier. The entry
     structure is defined as:

     struct {
   	struct trace_entry ent;
   	int id;
   	char buf[];
     };

     The size of this structure is reserved on the ring buffer with:

       size = sizeof(*entry) + cnt;

     Then it is copied from the buffer into the ring buffer with:

       memcpy(&entry->id, buf, cnt);

     This use to be a copy_from_user_nofault(), but now converting it to
     a memcpy() triggers the fortify-string code, and causes a warning.

     The allocated space is actually more than what is copied, as the
     cnt used also includes the entry->id portion. Allocating
     sizeof(*entry) plus cnt is actually allocating 4 bytes more than
     what is needed.

     Change the size function to:

       size = struct_size(entry, buf, cnt - sizeof(entry->id));

     And update the memcpy() to unsafe_memcpy()"

* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Stop fortify-string from warning in tracing_mark_raw_write()
  tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
2025-10-11 16:06:04 -07:00
Linus Torvalds fbde105f13 bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
 bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
 /hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
 DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
 +Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
 vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
 o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
 EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
 3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
 nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
 aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
 Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
 =77JH
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Finish constification of 1st parameter of bpf_d_path() (Rong Tao)

 - Harden userspace-supplied xdp_desc validation (Alexander Lobakin)

 - Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
   Borkmann)

 - Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)

 - Use correct context to unpin bpf hash map with special types (KaFai
   Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add test for unpinning htab with internal timer struct
  bpf: Avoid RCU context warning when unpinning htab with internal structs
  xsk: Harden userspace-supplied xdp_desc validation
  bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
  libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
  bpf: Finish constification of 1st parameter of bpf_d_path()
2025-10-11 10:31:38 -07:00
Linus Torvalds ae13bd2310 Just one series here - Mike Rappoport has taught KEXEC handover to
preserve vmalloc allocations across handover.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaOmDWAAKCRDdBJ7gKXxA
 jh+MAQDUPBj3mFm238CXI5DC1gJ3ETe3NJjJvfzIjLs51c+dFgD+PUuvDA0GUtKH
 LCl6T+HJXh2FgGn1F2Kl/0hwPtEvuA4=
 =HYr7
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more updates from Andrew Morton:
 "Just one series here - Mike Rappoport has taught KEXEC handover to
  preserve vmalloc allocations across handover"

* tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt
  kho: add support for preserving vmalloc allocations
  kho: replace kho_preserve_phys() with kho_preserve_pages()
  kho: check if kho is finalized in __kho_preserve_order()
  MAINTAINERS, .mailmap: update Umang's email address
2025-10-11 10:27:52 -07:00
Steven Rostedt 54b91e54b1 tracing: Stop fortify-string from warning in tracing_mark_raw_write()
The way tracing_mark_raw_write() records its data is that it has the
following structure:

  struct {
	struct trace_entry;
	int id;
	char buf[];
  };

But memcpy(&entry->id, buf, size) triggers the following warning when the
size is greater than the id:

 ------------[ cut here ]------------
 memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4)
 WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
 Modules linked in:
 CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48
 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292
 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001
 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40
 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010
 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000
 FS:  00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  tracing_mark_raw_write+0x1fe/0x290
  ? __pfx_tracing_mark_raw_write+0x10/0x10
  ? security_file_permission+0x50/0xf0
  ? rw_verify_area+0x6f/0x4b0
  vfs_write+0x1d8/0xdd0
  ? __pfx_vfs_write+0x10/0x10
  ? __pfx_css_rstat_updated+0x10/0x10
  ? count_memcg_events+0xd9/0x410
  ? fdget_pos+0x53/0x5e0
  ksys_write+0x182/0x200
  ? __pfx_ksys_write+0x10/0x10
  ? do_user_addr_fault+0x4af/0xa30
  do_syscall_64+0x63/0x350
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fa61d318687
 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
 RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687
 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001
 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000
  </TASK>
 ---[ end trace 0000000000000000 ]---

This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.

The size allocated on the ring buffer was actually a bit too big:

  size = sizeof(*entry) + cnt;

But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.

Change the allocation to:

  size = struct_size(entry, buf, cnt - sizeof(entry->id));

and the memcpy() to unsafe_memcpy() with an added justification.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-11 11:27:27 -04:00
Steven Rostedt bda745ee8f tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.

The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.

Pass in the per CPU buffer and not the user space address.

TODO: Add a test to better test trace_marker_raw.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011035243.386098147@kernel.org
Fixes: 64cf7d058a ("tracing: Have trace_marker use per-cpu data to read user space")
Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-10 23:58:44 -04:00
KaFai Wan 4f375ade6a bpf: Avoid RCU context warning when unpinning htab with internal structs
When unpinning a BPF hash table (htab or htab_lru) that contains internal
structures (timer, workqueue, or task_work) in its values, a BUG warning
is triggered:
 BUG: sleeping function called from invalid context at kernel/bpf/hashtab.c:244
 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0
 ...

The issue arises from the interaction between BPF object unpinning and
RCU callback mechanisms:
1. BPF object unpinning uses ->free_inode() which schedules cleanup via
   call_rcu(), deferring the actual freeing to an RCU callback that
   executes within the RCU_SOFTIRQ context.
2. During cleanup of hash tables containing internal structures,
   htab_map_free_internal_structs() is invoked, which includes
   cond_resched() or cond_resched_rcu() calls to yield the CPU during
   potentially long operations.

However, cond_resched() or cond_resched_rcu() cannot be safely called from
atomic RCU softirq context, leading to the BUG warning when attempting
to reschedule.

Fix this by changing from ->free_inode() to ->destroy_inode() and rename
bpf_free_inode() to bpf_destroy_inode() for BPF objects (prog, map, link).
This allows direct inode freeing without RCU callback scheduling,
avoiding the invalid context warning.

Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
Fixes: 68134668c1 ("bpf: Add map side support for bpf timers.")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251008102628.808045-2-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-10 10:10:08 -07:00
Linus Torvalds 5472d60c12 tracing clean up and fixes for v6.18:
- Have osnoise tracer use memdup_user_nul()
 
   The function osnoise_cpus_write() open codes a kmalloc() and then
   a copy_from_user() and then adds a nul byte at the end which is the
   same as simply using memdup_user_nul().
 
 - Fix wakeup and irq tracers when failing to acquire calltime
 
   When the wakeup and irq tracers use the function graph tracer for
   tracing function times, it saves a timestamp into the fgraph shadow
   stack. It is possible that this could fail to be stored. If that
   happens, it exits the routine early. These functions also disable
   nesting of the operations by incremeting the data "disable" counter.
   But if the calltime exits out early, it never increments the counter
   back to what it needs to be.
 
   Since there's only a couple of lines of code that does work after
   acquiring the calltime, instead of exiting out early, reverse the
   if statement to be true if calltime is acquired, and place the code
   that is to be done within that if block. The clean up will always
   be done after that.
 
 - Fix ring_buffer_map() return value on failure of __rb_map_vma()
 
   If __rb_map_vma() fails in ring_buffer_map(), it does not return
   an error. This means the caller will be working against a bad vma
   mapping. Have ring_buffer_map() return an error when __rb_map_vma()
   fails.
 
 - Fix regression of writing to the trace_marker file
 
   A bug fix was made to change __copy_from_user_inatomic() to
   copy_from_user_nofault() in the trace_marker write function.
   The trace_marker file is used by applications to write into
   it (usually with a file descriptor opened at the start of the
   program) to record into the tracing system. It's usually used
   in critical sections so the write to trace_marker is highly
   optimized.
 
   The reason for copying in an atomic section is that the write
   reserves space on the ring buffer and then writes directly into
   it. After it writes, it commits the event. The time between
   reserve and commit must have preemption disabled.
 
   The trace marker write does not have any locking nor can it
   allocate due to the nature of it being a critical path.
 
   Unfortunately, converting __copy_from_user_inatomic() to
   copy_from_user_nofault() caused a regression in Android.
   Now all the writes from its applications trigger the fault that
   is rejected by the _nofault() version that wasn't rejected by
   the _inatomic() version. Instead of getting data, it now just
   gets a trace buffer filled with:
 
     tracing_mark_write: <faulted>
 
   To fix this, on opening of the trace_marker file, allocate
   per CPU buffers that can be used by the write call. Then
   when entering the write call, do the following:
 
     preempt_disable();
     cpu = smp_processor_id();
     buffer = per_cpu_ptr(cpu_buffers, cpu);
     do {
 	cnt = nr_context_switches_cpu(cpu);
 	migrate_disable();
 	preempt_enable();
 	ret = copy_from_user(buffer, ptr, size);
 	preempt_disable();
 	migrate_enable();
     } while (!ret && cnt != nr_context_switches_cpu(cpu));
     if (!ret)
 	ring_buffer_write(buffer);
     preempt_enable();
 
   This works similarly to seqcount. As it must enabled preemption
   to do a copy_from_user() into a per CPU buffer, if it gets
   preempted, the buffer could be corrupted by another task.
   To handle this, read the number of context switches of the current
   CPU, disable migration, enable preemption, copy the data from
   user space, then immediately disable preemption again.
   If the number of context switches is the same, the buffer
   is still valid. Otherwise it must be assumed that the buffer may
   have been corrupted and it needs to try again.
 
   Now the trace_marker write can get the user data even if it has
   to fault it in, and still not grab any locks of its own.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOfVOhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qq6oAP9y+zxuouWjtXIz9/z++aykFgKCkeau
 XHSSdJdn4R+AQgEA4SE0UWKH0F6Bg7qwyocahMMQ1tIJRrpihfNrKBUmmQ4=
 =wDGp
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing clean up and fixes from Steven Rostedt:

 - Have osnoise tracer use memdup_user_nul()

   The function osnoise_cpus_write() open codes a kmalloc() and then a
   copy_from_user() and then adds a nul byte at the end which is the
   same as simply using memdup_user_nul().

 - Fix wakeup and irq tracers when failing to acquire calltime

   When the wakeup and irq tracers use the function graph tracer for
   tracing function times, it saves a timestamp into the fgraph shadow
   stack. It is possible that this could fail to be stored. If that
   happens, it exits the routine early. These functions also disable
   nesting of the operations by incremeting the data "disable" counter.
   But if the calltime exits out early, it never increments the counter
   back to what it needs to be.

   Since there's only a couple of lines of code that does work after
   acquiring the calltime, instead of exiting out early, reverse the if
   statement to be true if calltime is acquired, and place the code that
   is to be done within that if block. The clean up will always be done
   after that.

 - Fix ring_buffer_map() return value on failure of __rb_map_vma()

   If __rb_map_vma() fails in ring_buffer_map(), it does not return an
   error. This means the caller will be working against a bad vma
   mapping. Have ring_buffer_map() return an error when __rb_map_vma()
   fails.

 - Fix regression of writing to the trace_marker file

   A bug fix was made to change __copy_from_user_inatomic() to
   copy_from_user_nofault() in the trace_marker write function. The
   trace_marker file is used by applications to write into it (usually
   with a file descriptor opened at the start of the program) to record
   into the tracing system. It's usually used in critical sections so
   the write to trace_marker is highly optimized.

   The reason for copying in an atomic section is that the write
   reserves space on the ring buffer and then writes directly into it.
   After it writes, it commits the event. The time between reserve and
   commit must have preemption disabled.

   The trace marker write does not have any locking nor can it allocate
   due to the nature of it being a critical path.

   Unfortunately, converting __copy_from_user_inatomic() to
   copy_from_user_nofault() caused a regression in Android. Now all the
   writes from its applications trigger the fault that is rejected by
   the _nofault() version that wasn't rejected by the _inatomic()
   version. Instead of getting data, it now just gets a trace buffer
   filled with:

     tracing_mark_write: <faulted>

   To fix this, on opening of the trace_marker file, allocate per CPU
   buffers that can be used by the write call. Then when entering the
   write call, do the following:

     preempt_disable();
     cpu = smp_processor_id();
     buffer = per_cpu_ptr(cpu_buffers, cpu);
     do {
 	cnt = nr_context_switches_cpu(cpu);
 	migrate_disable();
 	preempt_enable();
 	ret = copy_from_user(buffer, ptr, size);
 	preempt_disable();
 	migrate_enable();
     } while (!ret && cnt != nr_context_switches_cpu(cpu));
     if (!ret)
 	ring_buffer_write(buffer);
     preempt_enable();

   This works similarly to seqcount. As it must enabled preemption to do
   a copy_from_user() into a per CPU buffer, if it gets preempted, the
   buffer could be corrupted by another task.

   To handle this, read the number of context switches of the current
   CPU, disable migration, enable preemption, copy the data from user
   space, then immediately disable preemption again. If the number of
   context switches is the same, the buffer is still valid. Otherwise it
   must be assumed that the buffer may have been corrupted and it needs
   to try again.

   Now the trace_marker write can get the user data even if it has to
   fault it in, and still not grab any locks of its own.

* tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have trace_marker use per-cpu data to read user space
  ring buffer: Propagate __rb_map_vma return value to caller
  tracing: Fix irqoff tracers on failure of acquiring calltime
  tracing: Fix wakeup tracers on failure of acquiring calltime
  tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
2025-10-09 12:18:22 -07:00
Steven Rostedt 64cf7d058a tracing: Have trace_marker use per-cpu data to read user space
It was reported that using __copy_from_user_inatomic() can actually
schedule. Which is bad when preemption is disabled. Even though there's
logic to check in_atomic() is set, but this is a nop when the kernel is
configured with PREEMPT_NONE. This is due to page faulting and the code
could schedule with preemption disabled.

Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud.com/

The solution was to change the __copy_from_user_inatomic() to
copy_from_user_nofault(). But then it was reported that this caused a
regression in Android. There's several applications writing into
trace_marker() in Android, but now instead of showing the expected data,
it is showing:

  tracing_mark_write: <faulted>

After reverting the conversion to copy_from_user_nofault(), Android was
able to get the data again.

Writes to the trace_marker is a way to efficiently and quickly enter data
into the Linux tracing buffer. It takes no locks and was designed to be as
non-intrusive as possible. This means it cannot allocate memory, and must
use pre-allocated data.

A method that is actively being worked on to have faultable system call
tracepoints read user space data is to allocate per CPU buffers, and use
them in the callback. The method uses a technique similar to seqcount.
That is something like this:

	preempt_disable();
	cpu = smp_processor_id();
	buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu);
	do {
		cnt = nr_context_switches_cpu(cpu);
		migrate_disable();
		preempt_enable();
		ret = copy_from_user(buffer, ptr, size);
		preempt_disable();
		migrate_enable();
	} while (!ret && cnt != nr_context_switches_cpu(cpu));

	if (!ret)
		ring_buffer_write(buffer);
	preempt_enable();

It's a little more involved than that, but the above is the basic logic.
The idea is to acquire the current CPU buffer, disable migration, and then
enable preemption. At this moment, it can safely use copy_from_user().
After reading the data from user space, it disables preemption again. It
then checks to see if there was any new scheduling on this CPU. If there
was, it must assume that the buffer was corrupted by another task. If
there wasn't, then the buffer is still valid as only tasks in preemptable
context can write to this buffer and only those that are running on the
CPU.

By using this method, where trace_marker open allocates the per CPU
buffers, trace_marker writes can access user space and even fault it in,
without having to allocate or take any locks of its own.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: Wattson CI <wattson-external@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home
Fixes: 3d62ab32df ("tracing: Fix tracing_marker may trigger page fault during preempt_disable")
Reported-by: Runping Lai <runpinglai@google.com>
Tested-by: Runping Lai <runpinglai@google.com>
Closes: https://lore.kernel.org/linux-trace-kernel/20251007003417.3470979-2-runpinglai@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08 21:50:01 -04:00
Ankit Khushwaha de4cbd7047 ring buffer: Propagate __rb_map_vma return value to caller
The return value from `__rb_map_vma()`, which rejects writable or
executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being
ignored. As a result the caller of `__rb_map_vma` always returned 0
even when the mapping had actually failed, allowing it to proceed
with an invalid VMA.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com
Fixes: 117c39200d ("ring-buffer: Introducing ring-buffer mapping functions")
Reported-by: syzbot+ddc001b92c083dbf2b97@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714bdb
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08 21:48:58 -04:00
Steven Rostedt c834a97962 tracing: Fix irqoff tracers on failure of acquiring calltime
The functions irqsoff_graph_entry() and irqsoff_graph_return() both call
func_prolog_dec() that will test if the data->disable is already set and
if not, increment it and return. If it was set, it returns false and the
caller exits.

The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.

Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home
Fixes: a485ea9e3e ("tracing: Fix irqsoff and wakeup latency tracers when using function graph")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-2-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08 12:10:44 -04:00
Steven Rostedt 4f7bf54b07 tracing: Fix wakeup tracers on failure of acquiring calltime
The functions wakeup_graph_entry() and wakeup_graph_return() both call
func_prolog_preempt_disable() that will test if the data->disable is
already set and if not, increment it and disable preemption. If it was
set, it returns false and the caller exits.

The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.

Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home
Fixes: a485ea9e3e ("tracing: Fix irqsoff and wakeup latency tracers when using function graph")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08 12:10:26 -04:00
Thorsten Blum f0c029d2ff tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
Replace kmalloc() followed by copy_from_user() with memdup_user_nul() to
simplify and improve osnoise_cpus_write(). Remove the manual
NUL-termination.

No functional changes intended.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251001130907.364673-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-08 12:05:46 -04:00
Mike Rapoport (Microsoft) a667300bd5 kho: add support for preserving vmalloc allocations
A vmalloc allocation is preserved using binary structure similar to global
KHO memory tracker.  It's a linked list of pages where each page is an
array of physical address of pages in vmalloc area.

kho_preserve_vmalloc() hands out the physical address of the head page to
the caller.  This address is used as the argument to kho_vmalloc_restore()
to restore the mapping in the vmalloc address space and populate it with
the preserved pages.

[pasha.tatashin@soleen.com: free chunks using free_page() not kfree()]
  Link: https://lkml.kernel.org/r/mafs0a52idbeg.fsf@kernel.org
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/20250921054458.4043761-4-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 13:48:55 -07:00
Mike Rapoport (Microsoft) 8375b76517 kho: replace kho_preserve_phys() with kho_preserve_pages()
to make it clear that KHO operates on pages rather than on a random
physical address.

The kho_preserve_pages() will be also used in upcoming support for vmalloc
preservation.

Link: https://lkml.kernel.org/r/20250921054458.4043761-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 13:48:55 -07:00
Mike Rapoport (Microsoft) 469661d0d3 kho: check if kho is finalized in __kho_preserve_order()
Patch series "kho: add support for preserving vmalloc allocations", v5.

Following the discussion about preservation of memfd with LUO [1] these
patches add support for preserving vmalloc allocations.

Any KHO uses case presumes that there's a data structure that lists
physical addresses of preserved folios (and potentially some additional
metadata).  Allowing vmalloc preservations with KHO allows scalable
preservation of such data structures.

For instance, instead of allocating array describing preserved folios in
the fdt, memfd preservation can use vmalloc:

        preserved_folios = vmalloc_array(nr_folios, sizeof(*preserved_folios));
        memfd_luo_preserve_folios(preserved_folios, folios, nr_folios);
        kho_preserve_vmalloc(preserved_folios, &folios_info);


This patch (of 4):

Instead of checking if kho is finalized in each caller of
__kho_preserve_order(), do it in the core function itself.

Link: https://lkml.kernel.org/r/20250921054458.4043761-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250921054458.4043761-2-rppt@kernel.org
Link: https://lore.kernel.org/all/20250807014442.3829950-30-pasha.tatashin@soleen.com [1]
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-07 13:48:55 -07:00
Linus Torvalds 2215336295 hyperv-next for v6.18
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp
 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61
 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW
 aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od
 UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W
 ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ
 =OR26
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv updates from Wei Liu:

 - Unify guest entry code for KVM and MSHV (Sean Christopherson)

 - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain()
   (Nam Cao)

 - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV
   (Mukesh Rathor)

 - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov)

 - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna
   Kumar T S M)

 - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari,
   Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley)

* tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hyperv: Remove the spurious null directive line
  MAINTAINERS: Mark hyperv_fb driver Obsolete
  fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver
  Drivers: hv: Make CONFIG_HYPERV bool
  Drivers: hv: Add CONFIG_HYPERV_VMBUS option
  Drivers: hv: vmbus: Fix typos in vmbus_drv.c
  Drivers: hv: vmbus: Fix sysfs output format for ring buffer index
  Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()
  x86/hyperv: Switch to msi_create_parent_irq_domain()
  mshv: Use common "entry virt" APIs to do work in root before running guest
  entry: Rename "kvm" entry code assets to "virt" to genericize APIs
  entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
  mshv: Handle NEED_RESCHED_LAZY before transferring to guest
  x86/hyperv: Add kexec/kdump support on Azure CVMs
  Drivers: hv: Simplify data structures for VMBus channel close message
  Drivers: hv: util: Cosmetic changes for hv_utils_transport.c
  mshv: Add support for a new parent partition configuration
  clocksource: hyper-v: Skip unnecessary checks for the root partition
  hyperv: Add missing field to hv_output_map_device_interrupt
2025-10-07 08:40:15 -07:00
Linus Torvalds 21fbefc588 tracing updates for v6.18:
- Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints
 
   Individual system call trace events are pseudo events attached to the
   raw_syscall trace events that just trace the entry and exit of all system
   calls. When any of these individual system call trace events get enabled,
   an element in an array indexed by the system call number is assigned to
   the trace file that defines how to trace it. When the trace event
   triggers, it reads this array and if the array has an element, it uses that
   trace file to know what to write it (the trace file defines the output
   format of the corresponding system call).
 
   The issue is that it uses rcu_dereference_ptr() and marks the elements of
   the array as using RCU. This is incorrect. There is no RCU synchronization
   here. The event file that is pointed to has a completely different way to
   make sure its freed properly. The reading of the array during the system
   call trace event is only to know if there is a value or not. If not, it
   does nothing (it means this system call isn't being traced). If it does,
   it uses the information to store the system call data.
 
   The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE()
   macros.
 
 - Have the system call trace events use "0x" for hex values
 
   Some system call trace events display hex values but do not have "0x" in
   front of it. Seeing "count: 44" can be assumed that it is 44 decimal when
   in actuality it is 44 hex (68 decimal). Display "0x44" instead.
 
 - Use vmalloc_array() in tracing_map_sort_entries()
 
   The function tracing_map_sort_entries() used array_size() and vmalloc()
   when it could have simply used vmalloc_array().
 
 - Use for_each_online_cpu() in trace_osnoise.c()
 
   Instead of open coding for_each_cpu(cpu, cpu_online_mask), use
   for_each_online_cpu().
 
 - Move the buffer field in struct trace_seq to the end
 
   The buffer field in struct trace_seq is architecture dependent in size,
   and caused padding for the fields after it. By moving the buffer to the
   end of the structure, it compacts the trace_seq structure better.
 
 - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer()
 
   The structure that contains cmdline_idx is zeroed by memset(), no need to
   explicitly zero any of its fields after that.
 
 - Use  system_percpu_wq instead of system_wq in user_event_mm_remove()
 
   As system_wq is being deprecated, use the new wq.
 
 - Add cond_resched() is ftrace_module_enable()
 
   Some modules have a lot of functions (thousands of them), and the enabling
   of those functions can take some time. On non preemtable kernels, it was
   triggering a watchdog timeout. Add a cond_resched() to prevent that.
 
 - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power of 2
 
   There's code that depends on PID_MAX_DEFAULT being a power of 2 or it will
   break. If in the future that changes, make sure the build fails to ensure
   that the code is fixed that depends on this.
 
 - Grab mutex_lock() before ever exiting s_start()
 
   The s_start() function is a seq_file start routine. As s_stop() is always
   called even if s_start() fails, and s_stop() expects the event_mutex to be
   held as it will always release it. That mutex must always be taken in
   s_start() even if that function fails.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/4phQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qpToAP4sENpZkZHFOl2PuikmAgjhB6PqiUtL
 LNuMDx45WygcLwD6Awy8DUlEBN6RzTPA761MSjs0+NMg16QLrhPLxWqFEgw=
 =nxDn
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall
   tracepoints

   Individual system call trace events are pseudo events attached to the
   raw_syscall trace events that just trace the entry and exit of all
   system calls. When any of these individual system call trace events
   get enabled, an element in an array indexed by the system call number
   is assigned to the trace file that defines how to trace it. When the
   trace event triggers, it reads this array and if the array has an
   element, it uses that trace file to know what to write it (the trace
   file defines the output format of the corresponding system call).

   The issue is that it uses rcu_dereference_ptr() and marks the
   elements of the array as using RCU. This is incorrect. There is no
   RCU synchronization here. The event file that is pointed to has a
   completely different way to make sure its freed properly. The reading
   of the array during the system call trace event is only to know if
   there is a value or not. If not, it does nothing (it means this
   system call isn't being traced). If it does, it uses the information
   to store the system call data.

   The RCU usage here can simply be replaced by READ_ONCE() and
   WRITE_ONCE() macros.

 - Have the system call trace events use "0x" for hex values

   Some system call trace events display hex values but do not have "0x"
   in front of it. Seeing "count: 44" can be assumed that it is 44
   decimal when in actuality it is 44 hex (68 decimal). Display "0x44"
   instead.

 - Use vmalloc_array() in tracing_map_sort_entries()

   The function tracing_map_sort_entries() used array_size() and
   vmalloc() when it could have simply used vmalloc_array().

 - Use for_each_online_cpu() in trace_osnoise.c()

   Instead of open coding for_each_cpu(cpu, cpu_online_mask), use
   for_each_online_cpu().

 - Move the buffer field in struct trace_seq to the end

   The buffer field in struct trace_seq is architecture dependent in
   size, and caused padding for the fields after it. By moving the
   buffer to the end of the structure, it compacts the trace_seq
   structure better.

 - Remove redundant zeroing of cmdline_idx field in
   saved_cmdlines_buffer()

   The structure that contains cmdline_idx is zeroed by memset(), no
   need to explicitly zero any of its fields after that.

 - Use system_percpu_wq instead of system_wq in user_event_mm_remove()

   As system_wq is being deprecated, use the new wq.

 - Add cond_resched() is ftrace_module_enable()

   Some modules have a lot of functions (thousands of them), and the
   enabling of those functions can take some time. On non preemtable
   kernels, it was triggering a watchdog timeout. Add a cond_resched()
   to prevent that.

 - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power
   of 2

   There's code that depends on PID_MAX_DEFAULT being a power of 2 or it
   will break. If in the future that changes, make sure the build fails
   to ensure that the code is fixed that depends on this.

 - Grab mutex_lock() before ever exiting s_start()

   The s_start() function is a seq_file start routine. As s_stop() is
   always called even if s_start() fails, and s_stop() expects the
   event_mutex to be held as it will always release it. That mutex must
   always be taken in s_start() even if that function fails.

* tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix lock imbalance in s_start() memory allocation failure path
  tracing: Ensure optimized hashing works
  ftrace: Fix softlockup in ftrace_module_enable
  tracing: replace use of system_wq with system_percpu_wq
  tracing: Remove redundant 0 value initialization
  tracing: Move buffer in trace_seq to end of struct
  tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu()
  tracing: Use vmalloc_array() to improve code
  tracing: Have syscall trace events show "0x" for values greater than 10
  tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE()
2025-10-05 09:43:36 -07:00
Linus Torvalds 2cd14dff16 Probes fixes for v6.17
- tracing: Fix race condition in kprobe initialization causing NULL
   pointer dereference. This happens on weak memory model, which
   does not correctly manage the flags access with appropriate
   memory barriers. Use RELEASE-ACQUIRE to fix it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmjeikYbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bEOYIAKk4KwhSGAhx4Q+Mkqzy
 pbNPvYMdAwocYIqxg3C1/OVK22DcauTZxMzRq7c3bW776Na6eWD4y9aykhRhNlGY
 TeKNAtZ0OdRSlq9ISj7P9H4A+Wywrn1CfNkGsZWRrEy0s2xkwUJjEOciI/WA3o8V
 Nd+Qv+/FTRZ9w/678hb647WrTXIsxSyfBG1L+l25GgqKqqwAtKO8Ur+BhOglxZoC
 WVHsJT6Xmisg0QiujXX3BgDFAXmAFkBuIQd+D+LHWXgo7W8I6VB7ilQ+hibyXAzf
 6yYkZ49kyNLeaWf0/4cc/NaKkq8eK43PXvSvs2yhDt8yWthRLiK/DdeAl63ZNihp
 D2g=
 =Ad1o
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probe fix from Masami Hiramatsu:

 - Fix race condition in kprobe initialization causing NULL pointer
   dereference. This happens on weak memory model, which does not
   correctly manage the flags access with appropriate memory barriers.
   Use RELEASE-ACQUIRE to fix it.

* tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
2025-10-05 08:16:20 -07:00
Linus Torvalds 908057d185 This update includes the following changes:
Drivers:
 
 - Add ciphertext hiding support to ccp.
 - Add hashjoin, gather and UDMA data move features to hisilicon.
 - Add lz4 and lz77_only to hisilicon.
 - Add xilinx hwrng driver.
 - Add ti driver with ecb/cbc aes support.
 - Add ring buffer idle and command queue telemetry for GEN6 in qat.
 
 Others:
 
 - Use rcu_dereference_all to stop false alarms in rhashtable.
 - Fix CPU number wraparound in padata.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmjbpJsACgkQxycdCkmx
 i6fuTRAAzv5o0MIw4Kc7EEU3zMgFSX0FdcTUPY+eiFrWZrSrvUVW+jYcH9ppO8J7
 offAYSZYatcyyU9+u8X22CQNKLdXnKQQ0YymWO35TOpvVxveUM1bqEEV1ZK0xaXD
 hlJTLoFIsPaVVhi8CW+ZNhDJBwJHNCv7Yi9TUB6sC7rilWWbJ5LzbEVw3Rtg81Lx
 0hcuGX2LrpsHOVVWYxGdJ534Kt2lrkt+8/gWOFg3ap3RVQ39tohEjS2Adm2p8eiX
 zIdru/aYd89EcYoxuFyylX2d/OLmMAQpFsADy/Fys26eeOWtqggH62V1LAiSyEqw
 vLRBCVKpLhlbNNfnUs0f5nqjjYEUrNk9SA4rgoxITwKoucbWBQMS4zWJTEDKz29n
 iBBqHsukGpwVOE6RY8BzR/QNJKhZCSsJpGkagS1v6VPa5P1QomuKftGXKB7JKXKz
 xoyk+DhJyA8rkb/E5J9Ni7+Tb08Y4zvJ1dpCQHZMlln3DKkK+kk3gkpoxXMZwBV2
 LbEMGTI+sfnAfqkGCJYAZR9gDJ5LQDR9jy/Ds5jvPuVvvjyY5LY/bjETqGPF2QVs
 Rz2Sg0RHl7PVZOP6QgbQzkV7SkJrZfyu5iYd0ZfUqZr7BaHLOHJG/E/HlUW3/mXu
 OjD+Q5gPhiOdc/qn+32+QERTDCFQdbByv0h7khGQA5vHE3XCu8E=
 =knnk
 -----END PGP SIGNATURE-----

Merge tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "Drivers:
   - Add ciphertext hiding support to ccp
   - Add hashjoin, gather and UDMA data move features to hisilicon
   - Add lz4 and lz77_only to hisilicon
   - Add xilinx hwrng driver
   - Add ti driver with ecb/cbc aes support
   - Add ring buffer idle and command queue telemetry for GEN6 in qat

  Others:
   - Use rcu_dereference_all to stop false alarms in rhashtable
   - Fix CPU number wraparound in padata"

* tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (78 commits)
  dt-bindings: rng: hisi-rng: convert to DT schema
  crypto: doc - Add explicit title heading to API docs
  hwrng: ks-sa - fix division by zero in ks_sa_rng_init
  KEYS: X.509: Fix Basic Constraints CA flag parsing
  crypto: anubis - simplify return statement in anubis_mod_init
  crypto: hisilicon/qm - set NULL to qm->debug.qm_diff_regs
  crypto: hisilicon/qm - clear all VF configurations in the hardware
  crypto: hisilicon - enable error reporting again
  crypto: hisilicon/qm - mask axi error before memory init
  crypto: hisilicon/qm - invalidate queues in use
  crypto: qat - Return pointer directly in adf_ctl_alloc_resources
  crypto: aspeed - Fix dma_unmap_sg() direction
  rhashtable: Use rcu_dereference_all and rcu_dereference_all_check
  crypto: comp - Use same definition of context alloc and free ops
  crypto: omap - convert from tasklet to BH workqueue
  crypto: qat - Replace kzalloc() + copy_from_user() with memdup_user()
  crypto: caam - double the entropy delay interval for retry
  padata: WQ_PERCPU added to alloc_workqueue users
  padata: replace use of system_unbound_wq with system_dfl_wq
  crypto: cryptd - WQ_PERCPU added to alloc_workqueue users
  ...
2025-10-04 14:59:29 -07:00
Linus Torvalds 67da125e30 RCU pull request for v6.18
This pull request contains the following branches, non-octopus merged:
 
 Documentation updates:
 
   - Update whatisRCU.rst and checklist.rst for recent RCU API additions.
 
   - Fix RCU documentation formatting and typos.
 
   - Replace dead Ottawa Linux Symposium links in RTFP.txt.
 
 Miscellaneous RCU updates:
 
   - Document that rcu_barrier() hurries RCU_LAZY callbacks.
 
   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler().
 
   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly.
 
   - Make initial set of changes to accommodate upcoming system_percpu_wq
     changes.
 
 SRCU updates:
 
   - Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
     including adding guards.
 
   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast().
 
   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers.
 
   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed().
 
 Torture-test updates:
 
   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary.
 
   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels without
     doing non-debug kernels.
 
   - Fix a number of false-positive diagnostics that were being triggered
     by rcutorture starting before boot completed.  Running multiple
     near-CPU-bound rcutorture processes when there is only the boot CPU
     is after all a bit excessive.
 
   - Substitute kcalloc() for kzalloc().
 
   - Remove a redundant kfree() and NULL out kfree()ed objects.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
 7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
 hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
 pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
 3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
 CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
 nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
 XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
 LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
 OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
 w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
 LSqHR8QGSIpNmQ==
 =FEf1
 -----END PGP SIGNATURE-----

Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux

Pull RCU updates from Paul McKenney:
 "Documentation updates:

   - Update whatisRCU.rst and checklist.rst for recent RCU API additions

   - Fix RCU documentation formatting and typos

   - Replace dead Ottawa Linux Symposium links in RTFP.txt

  Miscellaneous RCU updates:

   - Document that rcu_barrier() hurries RCU_LAZY callbacks

   - Remove redundant interrupt disabling from
     rcu_preempt_deferred_qs_handler()

   - Move list_for_each_rcu from list.h to rculist.h, and adjust the
     include directive in kernel/cgroup/dmem.c accordingly

   - Make initial set of changes to accommodate upcoming
     system_percpu_wq changes

  SRCU updates:

   - Create an srcu_read_lock_fast_notrace() for eventual use in
     tracing, including adding guards

   - Document the reliance on per-CPU operations as implicit RCU readers
     in __srcu_read_{,un}lock_fast()

   - Document the srcu_flip() function's memory-barrier D's relationship
     to SRCU-fast readers

   - Remove a redundant preempt_disable() and preempt_enable() pair from
     srcu_gp_start_if_needed()

  Torture-test updates:

   - Fix jitter.sh spin time so that it actually varies as advertised.
     It is still quite coarse-grained, but at least it does now vary

   - Update torture.sh help text to include the not-so-new --do-normal
     parameter, which permits (for example) testing KCSAN kernels
     without doing non-debug kernels

   - Fix a number of false-positive diagnostics that were being
     triggered by rcutorture starting before boot completed. Running
     multiple near-CPU-bound rcutorture processes when there is only the
     boot CPU is after all a bit excessive

   - Substitute kcalloc() for kzalloc()

   - Remove a redundant kfree() and NULL out kfree()ed objects"

* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
  rcu: WQ_UNBOUND added to sync_wq workqueue
  rcu: WQ_PERCPU added to alloc_workqueue users
  rcu: replace use of system_wq with system_percpu_wq
  refperf: Set reader_tasks to NULL after kfree()
  refperf: Remove redundant kfree() after torture_stop_kthread()
  srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
  srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
  srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
  rculist: move list_for_each_rcu() to where it belongs
  refscale: Use kcalloc() instead of kzalloc()
  rcutorture: Use kcalloc() instead of kzalloc()
  docs: rcu: Replace multiple dead OLS links in RTFP.txt
  doc: Fix typo in RCU's torture.rst documentation
  Documentation: RCU: Retitle toctree index
  Documentation: RCU: Reduce toctree depth
  Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
  rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
  doc: Add RCU guards to checklist.rst
  doc: Update whatisRCU.rst for recent RCU API additions
  rcutorture: Delay forward-progress testing until boot completes
  ...
2025-10-04 11:28:45 -07:00
Linus Torvalds 48e3694ae7 printk changes for 6.18
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM
 xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW
 e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy
 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD
 CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo
 JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC
 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4
 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq
 iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde
 IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE
 m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ
 NUgzh2cOwSWWU4UHt/AiD9cd
 =uFn0
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Add KUnit test for the printk ring buffer

 - Fix the check of the maximal record size which is allowed to be
   stored into the printk ring buffer. It prevents corruptions of the
   ring buffer.

   Note that printk() is on the safe side. The messages are limited by
   1kB buffer and are always small enough for the minimal log buffer
   size 4kB, see CONFIG_LOG_BUF_SHIFT definition.

* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: ringbuffer: Fix data block max size check
  printk: kunit: support offstack cpumask
  printk: kunit: Fix __counted_by() in struct prbtest_rbdata
  printk: ringbuffer: Explain why the KUnit test ignores failed writes
  printk: ringbuffer: Add KUnit test
2025-10-04 11:13:11 -07:00
Linus Torvalds 5b7ce93854 kgdb patches for 6.18
A collection of small cleanups this cycle. Thorsten Blum has replaced a
 number strcpy() calls with safer alternatives (fixing a pointer
 aliasing bug in the process). Colin Ian King has simplified things by
 removing some unreachable code.
 
 Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmjfvoQACgkQfOMlXTn3
 iKGqiA/8DKfwbWqnLuFld5U3lK5AeYSWTnbFbQbk8kEUzl/nAVuSGBGMKSnLHKaF
 S3DVWkQA1syhhoidm/XCco2cjkvDoOPQNzQwD35nxOX1QuoxOxbx2qRT4FRJE9CC
 B5WUonEo++z+tmwqVWh9mMXTrYJIMQItg7igMM9wuKWB9Gv08LAfAFU2g4p7O9hf
 A22mSV3EobcxJWELXZhuPyBoWwmC6W6QoxdJ2mTra2/TaKV7DL4PY8o74vRxGW9+
 RKd+fyxpDSvStbYfXIeRZYBjrvqfZ8UHfPV7gCIfppbsrOSrl2HwNlmnOwyNMlgo
 iy2kNOVoXbdRPZrgisLdvX3Q7/BBMRtX2+wpGqaEMRCHMbIcd8RA/uW6v56XGmhh
 zg0GBK+OPIudT4XsB+mwtk1QNbnpECsS1JrI5H88zYnvTsyD0XJfT0ER95uTa30c
 qvz3FvWlxWGpHrtJP17+N/t/lZDWPRXuQZeFOH4oPpsN5dbxBLmXbylWnOmuv86e
 ilICYU3czpflm2moQmKGg63SRRCtti+1Cs78VJnXu1eNfgJn0ko54HtQi6x2nK76
 z0ugll7hi8WKB43sh+OIl0aF5E6Jn9VHpWXgXwVoyTToNrf/cDhHjDsplypaNFDi
 ewXvQR3bP26g7J6jCcmmm7EhJoZOqHelkKxUemyMBzs0seEbPqg=
 =6bO0
 -----END PGP SIGNATURE-----

Merge tag 'kgdb-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "A collection of small cleanups this cycle.

  Thorsten Blum has replaced a number strcpy() calls with safer
  alternatives (fixing a pointer aliasing bug in the process).

  Colin Ian King has simplified things by removing some unreachable
  code"

* tag 'kgdb-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: remove redundant check for scancode 0xe0
  kdb: Replace deprecated strcpy() with helper function in kdb_defcmd()
  kdb: Replace deprecated strcpy() with memcpy() in parse_grep()
  kdb: Replace deprecated strcpy() with memmove() in vkdb_printf()
  kdb: Replace deprecated strcpy() with memcpy() in kdb_strdup()
  kernel: debug: gdbstub: Replace deprecated strcpy() with strscpy()
2025-10-04 09:59:05 -07:00
Linus Torvalds cbf33b8e0b bpf-fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjfBtMACgkQ6rmadz2v
 bToTuA/+JizieMIW6eqzCDrrhj45DmuKu6gUMrkmM0xte3QCoLly5FDGJ9HMmrqf
 L5xfz/TAExg/Nh9a0zpCGE7/fMr9P10Z5vVHMe/I06rcW+/fDrAERXDurlfpfliv
 /7KqF/lq1D6kp7bcBzWX4dzkT66Nh/Ax6ZJKIXF/K3iwFXBb1A/95Wwgqxuc3xNM
 n0kthEhnX6x5cojy6fla2zYNoebNLVxPvafTcAtnE0QM8vxdOl0Q7KWrdQpVE5V/
 z6WF2M822eTQeUIy6k/ZRckFlmEwa++I+w5EnygBFP8INUHARdYNBQpQVFkg31jz
 GYGlbTs7jaDbMNWkEhKyDbuhL5Xq9/5FGOaLI9CvDtpPjbfYTZMGtIn9L902XtqE
 VEfBcTA/g/qd8Urx5622sfPwau64LrSAKBRE/4CJ7/dsnPx4BdKavsA5DMVuyRIm
 Au6tOxlRFhPWhultnmmbFBjwn33xTleYNUF9wpWQnijNq+Ph29gKZk8Ac5eKKS6A
 dTSCxv7LGafezN/evr779vXohZyF9vWfqHdITVzTo40tsSbi5v4uyfJ2qzoaS7Gf
 UQ4F2nkjqddhG/O1W2TY8aFzmWXP2q92V6yqkcq5zjA3D1Ue6JOxRzG/3kNzgsLX
 Gf1DmPosUyeLNpwdDYRD83Dk7tJ7rAI+x3i6Iwh8qivIhC4CzrU=
 =Kzf9
 -----END PGP SIGNATURE-----

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Fix selftests/bpf (typo, conflicts) and unbreak BPF CI (Jiri Olsa)

 - Remove linux/unaligned.h dependency for libbpf_sha256 (Andrii
   Nakryiko) and add a test (Eric Biggers)

 - Reject negative offsets for ALU operations in the verifier (Yazhou
   Tang) and add a test (Eduard Zingerman)

 - Skip scalar adjustment for BPF_NEG operation if destination register
   is a pointer (Brahmajit Das) and add a test (KaFai Wan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: Fix missing #pragma in libbpf_utils.c
  selftests/bpf: Add tests for rejection of ALU ops with negative offsets
  selftests/bpf: Add test for libbpf_sha256()
  bpf: Reject negative offsets for ALU ops
  libbpf: remove linux/unaligned.h dependency for libbpf_sha256()
  libbpf: move libbpf_sha256() implementation into libbpf_utils.c
  libbpf: move libbpf_errstr() into libbpf_utils.c
  libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE
  libbpf: make libbpf_errno.c into more generic libbpf_utils.c
  selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP
  bpf: Skip scalar adjustment for BPF_NEG if dst is a pointer
  selftests/bpf: Fix realloc size in bpf_get_addrs
  selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict
  selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests
2025-10-03 19:38:19 -07:00
Linus Torvalds a498d59c46 dma-mapping updates for Linux 6.18:
- refactoring of DMA mapping API to physical addresses as the primary
 interface instead of page+offset parameters; this gets much closer to
 Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is
 supporting memdesc project which seeks to substantially transform how
 struct page works; an advantage of this approach is the possibility of
 introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow
 in the common paths, what in turn lets to use recently introduced
 dma_iova_link() API to map PCI P2P MMIO without creating struct page;
 developped by Leon Romanovsky and Jason Gunthorpe
 - minor clean-up by Petr Tesarik and Qianfeng Rong
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaNugqAAKCRCJp1EFxbsS
 RBvDAQCEd4P6pz6ROQHf5BfiF5J1db2H6bWsFLjajx3KfNWf8gD+P0eQ0hTzLrcd
 zuSKZTivviOiyjXlt/9GOaXXPnmTwA0=
 =b0nZ
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping updates from Marek Szyprowski:

 - Refactoring of DMA mapping API to physical addresses as the primary
   interface instead of page+offset parameters

   This gets much closer to Matthew Wilcox's long term wish for
   struct-pageless IO to cacheable DRAM and is supporting memdesc
   project which seeks to substantially transform how struct page works.

   An advantage of this approach is the possibility of introducing
   DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the
   common paths, what in turn lets to use recently introduced
   dma_iova_link() API to map PCI P2P MMIO without creating struct page

   Developped by Leon Romanovsky and Jason Gunthorpe

 - Minor clean-up by Petr Tesarik and Qianfeng Rong

* tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  kmsan: fix missed kmsan_handle_dma() signature conversion
  mm/hmm: properly take MMIO path
  mm/hmm: migrate to physical address-based DMA mapping API
  dma-mapping: export new dma_*map_phys() interface
  xen: swiotlb: Open code map_resource callback
  dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()
  kmsan: convert kmsan_handle_dma to use physical addresses
  dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
  iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys()
  iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
  dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
  dma-debug: refactor to use physical addresses for page mapping
  iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
  dma-mapping: introduce new DMA attribute to indicate MMIO memory
  swiotlb: Remove redundant __GFP_NOWARN
  dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-03 17:41:12 -07:00
Linus Torvalds 50647a1176 file->f_path constification
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaN3daAAKCRBZ7Krx/gZQ
 6zNWAP9kD6rOJRNqDgea4pibDPa47Tps/WM5tsDv3dsLliY29gEA6sveOWZ3guAj
 4oY3ts/NtHLWXvhI7Vd/1mr2aTKEZQk=
 =YNK+
 -----END PGP SIGNATURE-----

Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file->f_path constification from Al Viro:
 "Only one thing was modifying ->f_path of an opened file - acct(2).

  Massaging that away and constifying a bunch of struct path * arguments
  in functions that might be given &file->f_path ends up with the
  situation where we can turn ->f_path into an anon union of const
  struct path f_path and struct path __f_path, the latter modified only
  in a few places in fs/{file_table,open,namei}.c, all for struct file
  instances that are yet to be opened"

* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
  Have cc(1) catch attempts to modify ->f_path
  kernel/acct.c: saner struct file treatment
  configfs:get_target() - release path as soon as we grab configfs_item reference
  apparmor/af_unix: constify struct path * arguments
  ovl_is_real_file: constify realpath argument
  ovl_sync_file(): constify path argument
  ovl_lower_dir(): constify path argument
  ovl_get_verity_digest(): constify path argument
  ovl_validate_verity(): constify {meta,data}path arguments
  ovl_ensure_verity_loaded(): constify datapath argument
  ksmbd_vfs_set_init_posix_acl(): constify path argument
  ksmbd_vfs_inherit_posix_acl(): constify path argument
  ksmbd_vfs_kern_path_unlock(): constify path argument
  ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
  check_export(): constify path argument
  export_operations->open(): constify path argument
  rqst_exp_get_by_name(): constify path argument
  nfs: constify path argument of __vfs_getattr()
  bpf...d_path(): constify path argument
  done_path_create(): constify path argument
  ...
2025-10-03 16:32:36 -07:00