mirror of https://github.com/torvalds/linux.git
49018 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
2be6a7503d |
Remove or hide unused tracepoints
Tracepoints take up memory (around 5K per tracepoint) even when they are
unused. Changes are being made to detect when a tracepoint is defined but
unused and a warning is shown at build. But those changes are not yet
ready for inclusion.
- Fix some of the unused tracepoints that it detected
Some tracepoints were removed and others were hidden by config settings
to match the config settings of where they are instantiated. Some
tracepoints were moved into architecture specific code as only one
architecture used them.
- Call the ftrace_test_filter tracepoint in an unreachable if statement
The ftrace_test_filter tracepoint which is defined when ftrace selftests
are configured and is used to test the filter logic, but the tracepoint is
not actually called. It is put into an if statement to not have it get
compiled out, but also not warn for not being used.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIlYqxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qisrAQD+pu2en9LAXLcgbFxQOwhbACpxOpmT
3LiE2+MvDR3ckQD/Vyi31XebdRmj3leJ7ENf28oa155y1pyK/onrPgDHyQ4=
=nFfn
-----END PGP SIGNATURE-----
Merge tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracepoint cleanup from Steven Rostedt:
"Remove or hide unused tracepoints
Tracepoints take up memory (around 5K per tracepoint) even when they
are unused. Changes are being made to detect when a tracepoint is
defined but unused and a warning is shown at build. But those changes
are not yet ready for inclusion.
- Fix some of the unused tracepoints that it detected
Some tracepoints were removed and others were hidden by config
settings to match the config settings of where they are
instantiated. Some tracepoints were moved into architecture
specific code as only one architecture used them.
- Call the ftrace_test_filter tracepoint in an unreachable if
statement
The ftrace_test_filter tracepoint which is defined when ftrace
selftests are configured and is used to test the filter logic, but
the tracepoint is not actually called. It is put into an if
statement to not have it get compiled out, but also not warn for
not being used"
* tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING
powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64
tracing: Call trace_ftrace_test_filter() for the event
tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
binder: Remove unused binder lock events
PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS
PM: tracing: Hide device_pm_callback events under PM_SLEEP
PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE
PM: cpufreq: powernv/tracing: Move powernv_throttle trace event
alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured
tracing, AER: Hide PCIe AER event when PCIEAER is not configured
|
|
|
|
4ff261e725 |
Runtime verification changes for 6.17
- Added Linear temporal logic monitors for RT application
Real-time applications may have design flaws causing them to have
unexpected latency. For example, the applications may raise page faults, or
may be blocked trying to take a mutex without priority inheritance.
However, while attempting to implement DA monitors for these real-time
rules, deterministic automaton is found to be inappropriate as the
specification language. The automaton is complicated, hard to understand,
and error-prone.
For these cases, linear temporal logic is found to be more suitable. The
LTL is more concise and intuitive.
- Make printk_deferred() public
The new monitors needed access to printk_deferred(). Make them visible for
the entire kernel.
- Add a vpanic() to allow for va_list to be passed to panic.
- Add rtapp container monitor.
A collection of monitors that check for common problems with real-time
applications that cause unexpected latency.
- Add page fault tracepoints to risc-v
These tracepoints are necessary to for the RV monitor to run on risc-v.
- Fix the behaviour of the rv tool with -s and idle tasks.
- Allow the rv tool to gracefully terminate with SIGTERM
- Adjusts dot2c not to create lines over 100 columns
- Properly order nested monitors in the RV Kconfig file
- Return the registration error in all DA monitor instead of 0
- Update and add new sched collection monitors
Replace tss and sncid monitors with more complete sts:
Not only prove that switches occur in scheduling context and scheduling
needs interrupt disabled but also that each call to the scheduler
disables interrupts to (optionally) switch.
New monitor: nrp
Preemption requires need resched which is cleared by any switch
(includes a non optimal workaround for /nested/ preemptions)
New monitor: sssw
suspension requires setting the task to sleepable and, after the
switch occurs, the task requires a wakeup to come back to runnable
New monitor: opid
waking and need-resched operations occur with interrupts and
preemption disabled or in IRQ without explicitly disabling preemption
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIk8cBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qi3DAQCFu6DM7uPSh94oggWlH2LukOYVGk2b
CvGrqMFuefae7QD/aK9nCMfzaBehixMOMQHLHELEh527Hd+RwQCrlnLALQU=
=r5HZ
-----END PGP SIGNATURE-----
Merge tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull runtime verification updates from Steven Rostedt:
- Added Linear temporal logic monitors for RT application
Real-time applications may have design flaws causing them to have
unexpected latency. For example, the applications may raise page
faults, or may be blocked trying to take a mutex without priority
inheritance.
However, while attempting to implement DA monitors for these
real-time rules, deterministic automaton is found to be inappropriate
as the specification language. The automaton is complicated, hard to
understand, and error-prone.
For these cases, linear temporal logic is found to be more suitable.
The LTL is more concise and intuitive.
- Make printk_deferred() public
The new monitors needed access to printk_deferred(). Make them
visible for the entire kernel.
- Add a vpanic() to allow for va_list to be passed to panic.
- Add rtapp container monitor.
A collection of monitors that check for common problems with
real-time applications that cause unexpected latency.
- Add page fault tracepoints to risc-v
These tracepoints are necessary to for the RV monitor to run on
risc-v.
- Fix the behaviour of the rv tool with -s and idle tasks.
- Allow the rv tool to gracefully terminate with SIGTERM
- Adjusts dot2c not to create lines over 100 columns
- Properly order nested monitors in the RV Kconfig file
- Return the registration error in all DA monitor instead of 0
- Update and add new sched collection monitors
Replace tss and sncid monitors with more complete sts:
Not only prove that switches occur in scheduling context and scheduling
needs interrupt disabled but also that each call to the scheduler
disables interrupts to (optionally) switch.
New monitor: nrp
Preemption requires need resched which is cleared by any switch
(includes a non optimal workaround for /nested/ preemptions)
New monitor: sssw
suspension requires setting the task to sleepable and, after the
switch occurs, the task requires a wakeup to come back to runnable
New monitor: opid
waking and need-resched operations occur with interrupts and
preemption disabled or in IRQ without explicitly disabling
preemption"
* tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits)
rv: Add opid per-cpu monitor
rv: Add nrp and sssw per-task monitors
rv: Replace tss and sncid monitors with more complete sts
sched: Adapt sched tracepoints for RV task model
rv: Retry when da monitor detects race conditions
rv: Adjust monitor dependencies
rv: Use strings in da monitors tracepoints
rv: Remove trailing whitespace from tracepoint string
rv: Add da_handle_start_run_event_ to per-task monitors
rv: Fix wrong type cast in reactors_show() and monitor_reactor_show()
rv: Fix wrong type cast in monitors_show()
rv: Remove struct rv_monitor::reacting
rv: Remove rv_reactor's reference counter
rv: Merge struct rv_reactor_def into struct rv_reactor
rv: Merge struct rv_monitor_def into struct rv_monitor
rv: Remove unused field in struct rv_monitor_def
rv: Return init error when registering monitors
verification/rvgen: Organise Kconfig entries for nested monitors
tools/dot2c: Fix generated files going over 100 column limit
tools/rv: Stop gracefully also on SIGTERM
...
|
|
|
|
d50b07d05c |
ring-buffer changes for v6.17:
- Rewind persistent ring buffer on boot When the persistent ring buffer is being used for live kernel tracing and the system crashes, the tool that is reading the trace may not have recorded the data when the system crashed. Although the persistent ring buffer still has that data, when reading it after a reboot, it will start where it left off. That is, what was read will not be accessible. Instead, on reboot, have the persistent ring buffer restart where the data starts and this will allow the tooling to recover what was lost when the crash occurred. - Remove the ring_buffer_read_prepare_sync() logic Reading the trace file required stopping writing to the ring buffer as the trace file is only an iterator and does not consume what it read. It was originally not safe to read the ring buffer in this mode and required disabling writing. The ring_buffer_read_prepare_sync() logic was used to stop each per_cpu ring buffer, call synchronize_rcu() and then start the iterator. This was used instead of calling synchronize_rcu() for each per_cpu buffer. Today, the iterator has been updated where it is safe to read the trace file while writing to the ring buffer is still occurring. There is no more need to do this synchronization and it is causing large delays on machines with many CPUs. Remove this unneeded synchronization. - Make static string array a constant in show_irq_str() Making the string array into a constant has shown to decrease code text/data size. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIkfURQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnx4AQCNXOuKYJXDvXrkwf449agwrn0lCVyI vV0L65nyIrakpAD8COV/lw8DhlCpb/Lijlzzo5L0n9QpEElNpq5uEntNwgE= =1YIy -----END PGP SIGNATURE----- Merge tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Rewind persistent ring buffer on boot When the persistent ring buffer is being used for live kernel tracing and the system crashes, the tool that is reading the trace may not have recorded the data when the system crashed. Although the persistent ring buffer still has that data, when reading it after a reboot, it will start where it left off. That is, what was read will not be accessible. Instead, on reboot, have the persistent ring buffer restart where the data starts and this will allow the tooling to recover what was lost when the crash occurred. - Remove the ring_buffer_read_prepare_sync() logic Reading the trace file required stopping writing to the ring buffer as the trace file is only an iterator and does not consume what it read. It was originally not safe to read the ring buffer in this mode and required disabling writing. The ring_buffer_read_prepare_sync() logic was used to stop each per_cpu ring buffer, call synchronize_rcu() and then start the iterator. This was used instead of calling synchronize_rcu() for each per_cpu buffer. Today, the iterator has been updated where it is safe to read the trace file while writing to the ring buffer is still occurring. There is no more need to do this synchronization and it is causing large delays on machines with many CPUs. Remove this unneeded synchronization. - Make static string array a constant in show_irq_str() Making the string array into a constant has shown to decrease code text/data size. * tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Make the const read-only 'type' static ring-buffer: Remove ring_buffer_read_prepare_sync() tracing: ring_buffer: Rewind persistent ring buffer on reboot |
|
|
|
90a871f74b |
ftrace changes for v6.17:
- Keep track of when fgraph_ops are registered or not Keep accounting of when fgraph_ops are registered as if a fgraph_ops is registered twice it can mess up the accounting and it will not work as expected later. Trigger a warning if something registers it twice as to catch bugs before they are found by things just not working as expected. - Make DYNAMIC_FTRACE always enabled for architectures that support it As static ftrace (where all functions are always traced) is very expensive and only exists to help architectures support ftrace, do not make it an option. As soon as an architecture supports DYNAMIC_FTRACE make it use it. This simplifies the code. - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. - Make pid_ptr string size match the comment In print_graph_proc() the pid_ptr string is of size 11, but the comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaIkVkRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qmdxAPsGcyT/gnyX/wf70cI63QoODrlRAd7M tg3R0J0H41U05QD/apttbA9GSdZ8bDLLSFAXTJgr8f4GvYvbUsmu2sMBBA8= =gd9V -----END PGP SIGNATURE----- Merge tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Keep track of when fgraph_ops are registered or not Keep accounting of when fgraph_ops are registered as if a fgraph_ops is registered twice it can mess up the accounting and it will not work as expected later. Trigger a warning if something registers it twice as to catch bugs before they are found by things just not working as expected. - Make DYNAMIC_FTRACE always enabled for architectures that support it As static ftrace (where all functions are always traced) is very expensive and only exists to help architectures support ftrace, do not make it an option. As soon as an architecture supports DYNAMIC_FTRACE make it use it. This simplifies the code. - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. - Make pid_ptr string size match the comment In print_graph_proc() the pid_ptr string is of size 11, but the comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12. * tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it fgraph: Keep track of when fgraph_ops are registered or not fgraph: Make pid_str size match the comment |
|
|
|
b7dbc2e813 |
Probes updates for v6.17:
- Stack usage reduction for probe events:
- Allocate string buffers from the heap for uprobe, eprobe, kprobe,
and fprobe events to avoid stack overflow.
- Allocate traceprobe_parse_context from the heap to prevent
potential stack overflow.
- Fix a typo in the above commit.
- New features for eprobe and tprobe events:
- Add support for arrays in eprobes.
- Support multiple tprobes on the same tracepoint.
- Improve efficiency:
- Register fprobe-events only when it is enabled to reduce overhead.
- Register tracepoints for tprobe events only when enabled to
resolve a lock dependency.
- Code Cleanup:
- Add kerneldoc for traceprobe_parse_event_name() and __get_insn_slot().
- Sort #include alphabetically in the probes code.
- Remove the unused 'mod' field from the tprobe-event.
- Clean up the entry-arg storing code in probe-events.
- Selftest update
- Enable fprobe events before checking enable_functions in selftests.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmiJ2DQbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bSfkH/06Zn5I55rU85FKSBQll
FN4hipmef/9Nd13skDwpEuFyzLPNS4P1up/UBUuyDQUTlO74+t2zSFO2dpcNrWmu
sPTenQ+6h82H3K591WTIC23VzF54syIbFLXEj8iMBALT3wyU4Nn0bs4DCbnTo5HX
R3NVo77rk6wxNJoKYOtT6ALf/lHonuNlGF+KTUGWP8UbWsIY3fIp0RWWy572M0bt
+YBE8D8RIVrw+ZY+vNKn1LdZdWlR1ton518XDf1gV9isTCfKErcd/6HJKwuj5q2v
qMgwiaKK+Gne/ylAKmWLEg2oNDo7kpyfW+612oiECitgZkqxOXhyYYfWgRt1lFNp
Wb8=
=E+Z6
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
"Stack usage reduction for probe events:
- Allocate string buffers from the heap for uprobe, eprobe, kprobe,
and fprobe events to avoid stack overflow
- Allocate traceprobe_parse_context from the heap to prevent
potential stack overflow
- Fix a typo in the above commit
New features for eprobe and tprobe events:
- Add support for arrays in eprobes
- Support multiple tprobes on the same tracepoint
Improve efficiency:
- Register fprobe-events only when it is enabled to reduce overhead
- Register tracepoints for tprobe events only when enabled to resolve
a lock dependency
Code Cleanup:
- Add kerneldoc for traceprobe_parse_event_name() and
__get_insn_slot()
- Sort #include alphabetically in the probes code
- Remove the unused 'mod' field from the tprobe-event
- Clean up the entry-arg storing code in probe-events
Selftest update
- Enable fprobe events before checking enable_functions in selftests"
* tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: trace_fprobe: Fix typo of the semicolon
tracing: Have eprobes handle arrays
tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()
tracing: uprobe-event: Allocate string buffers from heap
tracing: eprobe-event: Allocate string buffers from heap
tracing: kprobe-event: Allocate string buffers from heap
tracing: fprobe-event: Allocate string buffers from heap
tracing: probe: Allocate traceprobe_parse_context from heap
tracing: probes: Sort #include alphabetically
kprobes: Add missing kerneldoc for __get_insn_slot
tracing: tprobe-events: Register tracepoint when enable tprobe event
selftests: tracing: Enable fprobe events before checking enable_functions
tracing: fprobe-events: Register fprobe-events only when it is enabled
tracing: tprobe-events: Support multiple tprobes on the same tracepoint
tracing: tprobe-events: Remove mod field from tprobe-event
tracing: probe-events: Cleanup entry-arg storing code
|
|
|
|
a03eec7420 |
Probes fixes for v6.16:
- Fix a potential infinite recursion in fprobe by using preempt_*_notrace(). -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmiIdp4bHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8boY8IAMQGLspd1mATbCfnrQKY 2X86OVygOJx7Iq1RCmOV6fhroe5EoNVR/b1RXmZJf2gIoN176zdBdYrBIFC97lYO J1XaU/Ns1McBuKrOjc3TSYYioVPHJrKLiZ1vAoCicTkUsS34MQJXbbAlfdn424pb J1wUeIDJF0WrFH9yVJ4mEs1dH81oCQ3iSG0CYx5/qLggcoubUFrVl4QessJwAuI6 VM+cKDsqMCltBovXFw/fAgWfiQp79z/uq9umOFLdZGsesqutMYTMgJXBS6slKl3a qE2EQ57Op39A2zpk2hUoVoyv5Ey/XkfEjLU7WIMfqjLOL201IGQEKuyvR/mS54Kc HDw= =EeVm -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - Fix a potential infinite recursion in fprobe by using preempt_*_notrace() * tag 'probes-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe: Fix infinite recursion using preempt_*_notrace() |
|
|
|
2db4df0c09 |
RCU pull request for v6.17
This pull request contains the following branches:
rcu-exp.23.07.2025
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization.
- Remove superfluous barrier in task unblock path.
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios.
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU.
rcu.22.07.2025
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU.
- Handle ->defer_qs_iw_pending field data race.
- Enable rcu_normal_wake_from_gp by default on systems with <= 16 CPUs.
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls.
- Refactor expedited handling condition in rcu_read_unlock_special().
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs.
torture-scripts.16.07.2025
- Cleanup and improve scripts : remove superfluous warnings for disabled
tests; better handling of kvm.sh --kconfig arg; suppress some confusing
diagnostics; tolerate bad kvm.sh args; add new diagnostic for build
output; fail allmodconfig testing on warnings.
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels.
- Disable default RCU-tasks and clocksource-wdog testing on arm64.
- Add EXPERT Kconfig option for arm64 KCSAN runs.
- Remove SRCU-lite testing.
rcutorture.16.07.2025
- Start torture writer threads creation after reader threads to handle
race in SRCU-P scenario.
- Add SRCU down_read()/up_read() test.
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(), print
number of up/down readers and the number of such readers which
migrated to other CPU.
- Ignore certain unsupported configurations for trivial RCU test.
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context.
- Enable checks and logs to capture intentionally exercised unexpected
scenarios (too short readers) for BUSTED test.
- Remove SRCU-lite testing.
srcu.19.07.2025
- Expedite SRCU-fast grace periods.
- Remove SRCU-lite implementation.
- Add guards for SRCU-fast readers.
rcu.nocb.18.07.2025
- Dump NOCB group leader state on stall detection.
- Robustify nocb_cb_kthread pointer accesses.
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled.
refscale.07.07.2025
- Fix multiplication overflow in "loops" and "nreaders" calculations.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCaINnRwAKCRAAHS7/6Z0w
pRYJAQC97ZDW2wBegDbQPsg5ECLX9Lyd6+IC65sdi38IENl+TQEA4/oMzUUceIH+
CDCnxv3fAMhPncJfvIukOLzMJpKw0go=
=8t4O
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Expedited grace period updates:
- Protect against early RCU exp quiescent state reporting during exp
grace period initialization
- Remove superfluous barrier in task unblock path
- Remove the CPU online quiescent state report optimization, which is
error prone for certain scenarios
- Add warning for unexpected pending requested expedited quiescent
state on dying CPU
Core:
- Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate
indicators of the actual context tracking state of a CPU
- Handle ->defer_qs_iw_pending field data race
- Enable rcu_normal_wake_from_gp by default on systems with <= 16
CPUs
- Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls
- Refactor expedited handling condition in rcu_read_unlock_special()
- Documentation updates for hotplug and GP init scan ordering,
separation of rcu_state and rnp's gp_seq states, quiescent state
reporting for offline CPUs
torture-scripts:
- Cleanup and improve scripts : remove superfluous warnings for
disabled tests; better handling of kvm.sh --kconfig arg; suppress
some confusing diagnostics; tolerate bad kvm.sh args; add new
diagnostic for build output; fail allmodconfig testing on warnings
- Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels
- Disable default RCU-tasks and clocksource-wdog testing on arm64
- Add EXPERT Kconfig option for arm64 KCSAN runs
- Remove SRCU-lite testing
rcutorture:
- Start torture writer threads creation after reader threads to
handle race in SRCU-P scenario
- Add SRCU down_read()/up_read() test
- Add diagnostics for delayed SRCU up_read(), unmatched up_read(),
print number of up/down readers and the number of such readers
which migrated to other CPU
- Ignore certain unsupported configurations for trivial RCU test
- Fix splats in RT kernels due to inaccurate checks for BH-disabled
context
- Enable checks and logs to capture intentionally exercised
unexpected scenarios (too short readers) for BUSTED test
- Remove SRCU-lite testing
srcu:
- Expedite SRCU-fast grace periods
- Remove SRCU-lite implementation
- Add guards for SRCU-fast readers
rcu nocb:
- Dump NOCB group leader state on stall detection
- Robustify nocb_cb_kthread pointer accesses
- Fix delayed execution of hurry callbacks when LAZY_RCU is enabled
refscale:
- Fix multiplication overflow in "loops" and "nreaders" calculations"
* tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits)
rcu: Document concurrent quiescent state reporting for offline CPUs
rcu: Document separation of rcu_state and rnp's gp_seq
rcu: Document GP init vs hotplug-scan ordering requirements
srcu: Add guards for SRCU-fast readers
rcu: Fix delayed execution of hurry callbacks
rcu: Refactor expedited handling check in rcu_read_unlock_special()
checkpatch: Remove SRCU-lite deprecation
srcu: Remove SRCU-lite implementation
srcu: Expedite SRCU-fast grace periods
rcutorture: Remove support for SRCU-lite
rcutorture: Remove SRCU-lite scenarios
torture: Remove support for SRCU-lite
torture: Make torture.sh --allmodconfig testing fail on warnings
torture: Add "ERROR" diagnostic for testing kernel-build output
torture: Make torture.sh tolerate runs having bad kvm.sh arguments
torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs
torture: Extract testid.txt generation to separate script
torture: Suppress "find" diagnostics from torture.sh --do-none run
torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs
rcu: Fix rcu_read_unlock() deadloop due to IRQ work
...
|
|
|
|
7dff275c66 |
Kernel Concurrency Sanitizer (KCSAN) updates for v6.17
- A single fix to silence an uninitialized variable warning This change has had a few days of linux-next exposure. -----BEGIN PGP SIGNATURE----- iIcEABYKAC8WIQR7t4b/75lzOR3l5rcxsLN3bbyLnwUCaIdstREcZWx2ZXJAZ29v Z2xlLmNvbQAKCRAxsLN3bbyLnxfiAQCHSdHCyTOTP6YghSkd2ZIqfgQ8O9Y8iKGf EBfa6nvDVQD/bvUioqMpn/IgD6sbp76wbSOjmaJN19AGH8sfQIB13gI= =yGgW -----END PGP SIGNATURE----- Merge tag 'kcsan-20250728-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux Pull Kernel Concurrency Sanitizer (KCSAN) update from Marco Elver: - A single fix to silence an uninitialized variable warning * tag 'kcsan-20250728-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux: kcsan: test: Initialize dummy variable |
|
|
|
196d9e72c4 |
RCU wakeup fix for KVM s390 guest entry
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEwGNS88vfc9+v45Yq41TmuOI4ufgFAmiJzvQACgkQ41TmuOI4 ufhSCw/7BTDYOZD24NkBEa8449tJRl4NV4nBrNQdj4E7jhvbs7+Q/zR5opZKwU/g vksnH9vW+YPuA01rplBWjdDk863q1oqnLvJ9lgh5KJVIvDHWf0EPyMqOnm5Y9KdP oALh/prtjek6B6rWA4PsC/OKXtx/w0zn4HulWr9LliUvJsmmsLkOTDvXB1bJeld9 6Yi1AZ4MtqsxzLnKZVKFDfJKPWyJArzcIU0xyV5Rr62FtIIU/0WVGyTdwMj+DtIw +XaI4KSgyymyxChNn5dtV4JlNA9gi5oTggDSSglMWKw8oHeSgdvFtFD/05+txKLr 4Veo1LAtug6iwmRBNPuPiPn1z5LBXpNMp0prwnpFsUuOaib+C1J3lkD+aCsmGPAG 3f0Q9B6dM+m1MgabgzQHjeJVeVL6xMLxMHfVjXEVt2xq9lR4rskjvCr78aaL7UvF 6l+wrpcdiayXkSkQKawdJUcBYS2TorQrc0Kn5XL/pD5qv5gu0BU26kvXRhe+xyyJ 7WP8R/4uZZTLcIEZJWO9QQU3KWvnT+bKOOaPRX34SNXCxtJIhIHYw1ON6IEk/gfs P5WG34TmM4RtQuSLSpQpvG/K+hNU6gFl+5s+YUL4OhxDRCaDdLY/UlYwx6tXuGJ6 D5R4F8HxLb5d32p/EOSSuJbVtYt6/hHT1m8wWACGb5SCe/gcNs4= =+mwu -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD RCU wakeup fix for KVM s390 guest entry |
|
|
|
d9104cec3e |
bpf-next-6.17
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmiINnEACgkQ6rmadz2v bToBnA/9F+A3R6rTwGk4HK3xpfc/nm2Tanl3oRN7S2ub/mskDOtWSIyG6cVFZ0UG 1fK6IkByyRIpAF/5qhdlw8drRXHkQtGLA0lP2L9llm4X1mHLofB18y9OeLrDE1WN KwNP06+IGX9W802lCGSIXOY+VmRscVfXSMokyQt2ilHplKjOnDqJcYkWupi3T2rC mz79FY9aEl2YrIcpj9RXz+8nwP49pZBuW2P0IM5PAIj4BJBXShrUp8T1nz94okNe NFsnAyRxjWpUT0McEgtA9WvpD9lZqujYD8Qp0KlGZWmI3vNpV5d9S1+dBcEb1n7q dyNMkTF3oRrJhhg4VqoHc6fVpzSEoZ9ZxV5Hx4cs+ganH75D4YbdGqx/7mR3DUgH MZh6rHF1pGnK7TAm7h5gl3ZRAOkZOaahbe1i01NKo9CEe5fSh3AqMyzJYoyGHRKi xDN39eQdWBNA+hm1VkbK2Bv93Rbjrka2Kj+D3sSSO9Bo/u3ntcknr7LW39idKz62 Q8dkKHcCEtun7gjk0YXPF013y81nEohj1C+52BmJ2l5JitM57xfr6YOaQpu7DPDE AJbHx6ASxKdyEETecd0b+cXUPQ349zmRXy0+CDMAGKpBicC0H0mHhL14cwOY1Hfu EIpIjmIJGI3JNF6T5kybcQGSBOYebdV0FFgwSllzPvuYt7YsHCs= =/O3j -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ... |
|
|
|
8be4d31cb8 |
Networking changes for 6.17.
Core & protocols
----------------
- Wrap datapath globals into net_aligned_data, to avoid false sharing.
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container).
- Add SO_INQ and SCM_INQ support to AF_UNIX.
- Add SIOCINQ support to AF_VSOCK.
- Add TCP_MAXSEG sockopt to MPTCP.
- Add IPv6 force_forwarding sysctl to enable forwarding per interface.
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB.
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users.
- Convert TCP send queue handling from tasklet to BH workque.
- Improve BPF iteration over TCP sockets to see each socket exactly once.
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code.
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel NAPI
thread when threaded NAPI gets disabled. Previously thread would stick
around until ifdown due to tricky synchronization.
- Allow multicast routing to take effect on locally-generated packets.
- Add output interface argument for End.X in segment routing.
- MCTP: add support for gateway routing, improve bind() handling.
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink.
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed.
- Support NUD_PERMANENT for proxy neighbor entries.
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM.
- Add sequence numbers to netconsole messages. Unregister netconsole's
console when all net targets are removed. Code refactoring.
Add a number of selftests.
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup.
- Support inspecting ref_tracker state via DebugFS.
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links.
- Allow providing upcall pid for the 'execute' command in openvswitch.
- Remove DCCP support from Netfilter's conntrack.
- Disallow multiple packet duplications in the queuing layer.
- Prevent use of deprecated iptables code on PREEMPT_RT.
Driver API
----------
- Support RSS and hashing configuration over ethtool Netlink.
- Add dedicated ethtool callbacks for getting and setting hashing fields.
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc.
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs.
- Support traffic classes in devlink rate API for bandwidth management.
- Remove rtnl_lock dependency from UDP tunnel port configuration.
Device drivers
--------------
- Add a new Broadcom driver for 800G Ethernet (bnge).
- Add a standalone driver for Microchip ZL3073x DPLL.
- Remove IBM's NETIUCV device driver.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is used
by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmiFgLgACgkQMUZtbf5S
IrvafxAAnQRwYBoIG+piCILx6z5pRvBGHkmEQ4AQgSCFuq2eO3ubwMFIqEybfma1
5+QFjUZAV3OgGgKRBS2KGWxtSzdiF+/JGV1VOIN67sX3Mm0a2QgjA4n5CgKL0FPr
o6BEzjX5XwG1zvGcBNQ5BZ19xUUKjoZQgTtnea8sZ57Fsp5RtRgmYRqoewNvNk/n
uImh0NFsDVb0UeOpSzC34VD9l1dJvLGdui4zJAjno/vpvmT1DkXjoK419J/r52SS
X+5WgsfJ6DkjHqVN1tIhhK34yWqBOcwGFZJgEnWHMkFIl2FqRfFKMHyqtfLlVnLA
mnIpSyz8Sq2AHtx0TlgZ3At/Ri8p5+yYJgHOXcDKyABa8y8Zf4wrycmr6cV9JLuL
z54nLEVnJuvfDVDVJjsLYdJXyhMpZFq6+uAItdxKaw8Ugp/QqG4QtoRj+XIHz4ZW
z6OohkCiCzTwEISFK+pSTxPS30eOxq43kCspcvuLiwCCStJBRkRb5GdZA4dm7LA+
1Od4ADAkHjyrFtBqTyyC2scX8UJ33DlAIpAYyIeS6w9Cj9EXxtp1z33IAAAZ03MW
jJwIaJuc8bK2fWKMmiG7ucIXjPo4t//KiWlpkwwqLhPbjZgfDAcxq1AC2TLoqHBL
y4EOgKpHDCMAghSyiFIAn2JprGcEt8dp+11B0JRXIn4Pm/eYDH8=
=lqbe
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Wrap datapath globals into net_aligned_data, to avoid false sharing
- Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container)
- Add SO_INQ and SCM_INQ support to AF_UNIX
- Add SIOCINQ support to AF_VSOCK
- Add TCP_MAXSEG sockopt to MPTCP
- Add IPv6 force_forwarding sysctl to enable forwarding per interface
- Make TCP validation of whether packet fully fits in the receive
window and the rcv_buf more strict. With increased use of HW
aggregation a single "packet" can be multiple 100s of kB
- Add MSG_MORE flag to optimize large TCP transmissions via sockmap,
improves latency up to 33% for sockmap users
- Convert TCP send queue handling from tasklet to BH workque
- Improve BPF iteration over TCP sockets to see each socket exactly
once
- Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code
- Support enabling kernel threads for NAPI processing on per-NAPI
instance basis rather than a whole device. Fully stop the kernel
NAPI thread when threaded NAPI gets disabled. Previously thread
would stick around until ifdown due to tricky synchronization
- Allow multicast routing to take effect on locally-generated packets
- Add output interface argument for End.X in segment routing
- MCTP: add support for gateway routing, improve bind() handling
- Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink
- Add a new neighbor flag ("extern_valid"), which cedes refresh
responsibilities to userspace. This is needed for EVPN multi-homing
where a neighbor entry for a multi-homed host needs to be synced
across all the VTEPs among which the host is multi-homed
- Support NUD_PERMANENT for proxy neighbor entries
- Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM
- Add sequence numbers to netconsole messages. Unregister
netconsole's console when all net targets are removed. Code
refactoring. Add a number of selftests
- Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup
- Support inspecting ref_tracker state via DebugFS
- Don't force bonding advertisement frames tx to ~333 ms boundaries.
Add broadcast_neighbor option to send ARP/ND on all bonded links
- Allow providing upcall pid for the 'execute' command in openvswitch
- Remove DCCP support from Netfilter's conntrack
- Disallow multiple packet duplications in the queuing layer
- Prevent use of deprecated iptables code on PREEMPT_RT
Driver API:
- Support RSS and hashing configuration over ethtool Netlink
- Add dedicated ethtool callbacks for getting and setting hashing
fields
- Add support for power budget evaluation strategy in PSE /
Power-over-Ethernet. Generate Netlink events for overcurrent etc
- Support DPLL phase offset monitoring across all device inputs.
Support providing clock reference and SYNC over separate DPLL
inputs
- Support traffic classes in devlink rate API for bandwidth
management
- Remove rtnl_lock dependency from UDP tunnel port configuration
Device drivers:
- Add a new Broadcom driver for 800G Ethernet (bnge)
- Add a standalone driver for Microchip ZL3073x DPLL
- Remove IBM's NETIUCV device driver
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support zero-copy Tx of DMABUF memory
- take page size into account for page pool recycling rings
- Intel (100G, ice, idpf):
- idpf: XDP and AF_XDP support preparations
- idpf: add flow steering
- add link_down_events statistic
- clean up the TSPLL code
- preparations for live VM migration
- nVidia/Mellanox:
- support zero-copy Rx/Tx interfaces (DMABUF and io_uring)
- optimize context memory usage for matchers
- expose serial numbers in devlink info
- support PCIe congestion metrics
- Meta (fbnic):
- add 25G, 50G, and 100G link modes to phylink
- support dumping FW logs
- Marvell/Cavium:
- support for CN20K generation of the Octeon chips
- Amazon:
- add HW clock (without timestamping, just hypervisor time access)
- Ethernet virtual:
- VirtIO net:
- support segmentation of UDP-tunnel-encapsulated packets
- Google (gve):
- support packet timestamping and clock synchronization
- Microsoft vNIC:
- add handler for device-originated servicing events
- allow dynamic MSI-X vector allocation
- support Tx bandwidth clamping
- Ethernet NICs consumer, and embedded:
- AMD:
- amd-xgbe: hardware timestamping and PTP clock support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- use napi_complete_done() return value to support NAPI polling
- add support for re-starting auto-negotiation
- Broadcom switches (b53):
- support BCM5325 switches
- add bcm63xx EPHY power control
- Synopsys (stmmac):
- lots of code refactoring and cleanups
- TI:
- icssg-prueth: read firmware-names from device tree
- icssg: PRP offload support
- Microchip:
- lan78xx: convert to PHYLINK for improved PHY and MAC management
- ksz: add KSZ8463 switch support
- Intel:
- support similar queue priority scheme in multi-queue and
time-sensitive networking (taprio)
- support packet pre-emption in both
- RealTek (r8169):
- enable EEE at 5Gbps on RTL8126
- Airoha:
- add PPPoE offload support
- MDIO bus controller for Airoha AN7583
- Ethernet PHYs:
- support for the IPQ5018 internal GE PHY
- micrel KSZ9477 switch-integrated PHYs:
- add MDI/MDI-X control support
- add RX error counters
- add cable test support
- add Signal Quality Indicator (SQI) reporting
- dp83tg720: improve reset handling and reduce link recovery time
- support bcm54811 (and its MII-Lite interface type)
- air_en8811h: support resume/suspend
- support PHY counters for QCA807x and QCA808x
- support WoL for QCA807x
- CAN drivers:
- rcar_canfd: support for Transceiver Delay Compensation
- kvaser: report FW versions via devlink dev info
- WiFi:
- extended regulatory info support (6 GHz)
- add statistics and beacon monitor for Multi-Link Operation (MLO)
- support S1G aggregation, improve S1G support
- add Radio Measurement action fields
- support per-radio RTS threshold
- some work around how FIPS affects wifi, which was wrong (RC4 is
used by TKIP, not only WEP)
- improvements for unsolicited probe response handling
- WiFi drivers:
- RealTek (rtw88):
- IBSS mode for SDIO devices
- RealTek (rtw89):
- BT coexistence for MLO/WiFi7
- concurrent station + P2P support
- support for USB devices RTL8851BU/RTL8852BU
- Intel (iwlwifi):
- use embedded PNVM in (to be released) FW images to fix
compatibility issues
- many cleanups (unused FW APIs, PCIe code, WoWLAN)
- some FIPS interoperability
- MediaTek (mt76):
- firmware recovery improvements
- more MLO work
- Qualcomm/Atheros (ath12k):
- fix scan on multi-radio devices
- more EHT/Wi-Fi 7 features
- encapsulation/decapsulation offload
- Broadcom (brcm80211):
- support SDIO 43751 device
- Bluetooth:
- hci_event: add support for handling LE BIG Sync Lost event
- ISO: add socket option to report packet seqnum via CMSG
- ISO: support SCM_TIMESTAMPING for ISO TS
- Bluetooth drivers:
- intel_pcie: support Function Level Reset
- nxpuart: add support for 4M baudrate
- nxpuart: implement powerup sequence, reset, FW dump, and FW loading"
* tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits)
dpll: zl3073x: Fix build failure
selftests: bpf: fix legacy netfilter options
ipv6: annotate data-races around rt->fib6_nsiblings
ipv6: fix possible infinite loop in fib6_info_uses_dev()
ipv6: prevent infinite loop in rt6_nlmsg_size()
ipv6: add a retry logic in net6_rt_notify()
vrf: Drop existing dst reference in vrf_ip6_input_dst
net/sched: taprio: align entry index attr validation with mqprio
net: fsl_pq_mdio: use dev_err_probe
selftests: rtnetlink.sh: remove esp4_offload after test
vsock: remove unnecessary null check in vsock_getname()
igb: xsk: solve negative overflow of nb_pkts in zerocopy mode
stmmac: xsk: fix negative overflow of budget in zerocopy mode
dt-bindings: ieee802154: Convert at86rf230.txt yaml format
net: dsa: microchip: Disable PTP function of KSZ8463
net: dsa: microchip: Setup fiber ports for KSZ8463
net: dsa: microchip: Write switch MAC address differently for KSZ8463
net: dsa: microchip: Use different registers for KSZ8463
net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver
dt-bindings: net: dsa: microchip: Add KSZ8463 switch support
...
|
|
|
|
0dd1274a05 |
tracing: Have eprobes have their own config option
Eprobes were added in 5.15 and were selected whenever any of the other probe events were selected. If kprobe events were enabled (which it is by default if kprobes are enabled) it would enable eprobe events as well. The same for uprobes and fprobes. Have eprobes have its own config and it gets enabled by default if tracing is enabled. Link: https://lore.kernel.org/all/20250729102636.b7cce553e7cc263722b12365@kernel.org/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/20250730140945.360286733@kernel.org Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
4b290aae78 |
Summary
* Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective subsystems. Only 5
(out of the original 50) will remain in kernel/sysctl.c file; these handle
either sysctl or common arch variables.
By decentralizing sysctl registrations, subsystem maintainers regain control
over their sysctl interfaces, improving maintainability and reducing the
likelihood of merge conflicts.
* docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented in the
check-sysctl-docs script. This script can now be used to automatically
identify if documentation is missing.
* Testing
All these have been in linux-next since rc3, giving them a solid 3 to 4 weeks
worth of testing. Additionally, sysctl selftests and kunit were also run
locally on my x86_64
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmiAvd8ACgkQupfNUreW
QU+9nAv/dtxaKoL4BXJSzsA2+49bbo9QfiK5Vjz1wSRYRQTb+jhGr9QdS5hG+NeX
uN2ilvcNQqW7ENdiblU10lvcbPjIn2hw4lbMcpv/+QXnrudtGYlBFXlkWqW5nv7X
AVvHU8y3uzfs6JbRIpROUA7Cn2cDOlfP2mMtwxCXR3iP+orS1ziuVEi1JRoirIyG
iq5I/1rJMJBU3FjqqDTq6yljspLx8AlXO1yc5xUxAM67IcY4ew3ZTxqiZr6M9AhV
DUbR2lu/88wcFNERt8DJmuQ50dSGGqOEpK3FURTmkwtMFxzNLmenFDQeBKKahz3Q
2ntXSDfp2y+ppZNmcOP8tZZkra03Xpy1DQyoOgQ2r9uGekPxyr+wmKXwYPOeJIPO
YWTNBm8omX9qr49zVzaZ1f2foRGfgStHL6aa6xLIf34zzScSDEPtO3og2+5Hw/30
gnp+7v9E19uKpoE6oiGE0PtiFzAi/I6nFxzG2RRqrlMLFXyKVccTKygzY6tCnI3P
6144s/Bt
=R369
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move sysctls out of the kern_table array
This is the final move of ctl_tables into their respective
subsystems. Only 5 (out of the original 50) will remain in
kernel/sysctl.c file; these handle either sysctl or common arch
variables.
By decentralizing sysctl registrations, subsystem maintainers regain
control over their sysctl interfaces, improving maintainability and
reducing the likelihood of merge conflicts.
- docs: Remove false positives from check-sysctl-docs
Stopped falsely identifying sysctls as undocumented or unimplemented
in the check-sysctl-docs script. This script can now be used to
automatically identify if documentation is missing.
* tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits)
docs: Downgrade arm64 & riscv from titles to comment
docs: Replace spaces with tabs in check-sysctl-docs
docs: Remove colon from ctltable title in vm.rst
docs: Add awk section for ucount sysctl entries
docs: Use skiplist when checking sysctl admin-guide
docs: nixify check-sysctl-docs
sysctl: rename kern_table -> sysctl_subsys_table
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
uevent: mv uevent_helper into kobject_uevent.c
sysctl: Removed unused variable
sysctl: Nixify sysctl.sh
sysctl: Remove superfluous includes from kernel/sysctl.c
sysctl: Remove (very) old file changelog
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
sysctl: move cad_pid into kernel/pid.c
sysctl: Move tainted ctl_table into kernel/panic.c
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
fork: mv threads-max into kernel/fork.c
parisc/power: Move soft-power into power.c
mm: move randomize_va_space into memory.c
...
|
|
|
|
6fb44438a5 |
arm64 updates for 6.17:
Perf and PMU updates:
- Add support for new (v3) Hisilicon SLLC and DDRC PMUs
- Add support for Arm-NI PMU integrations that share interrupts between
clock domains within a given instance
- Allow SPE to be configured with a lower sample period than the
minimum recommendation advertised by PMSIDR_EL1.Interval
- Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
- Adjust the perf watchdog period according to cpu frequency changes
- Minor driver fixes and cleanups
Hardware features:
- Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
- Support for reporting the non-address bits during a synchronous MTE
tag check fault (FEAT_MTE_TAGGED_FAR)
- Optimise the TLBI when folding/unfolding contiguous PTEs on hardware
with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts
Software features:
- Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
and using the text-poke API for late module relocations
- Force VMAP_STACK always on and change arm64_efi_rt_init() to use
arch_alloc_vmap_stack() in order to avoid KASAN false positives
ACPI:
- Improve SPCR handling and messaging on systems lacking an SPCR table
Debug:
- Simplify the debug exception entry path
- Drop redundant DBG_MDSCR_* macros
Kselftests:
- Cleanups and improvements for SME, SVE and FPSIMD tests
Miscellaneous:
- Optimise loop to reduce redundant operations in contpte_ptep_get()
- Remove ISB when resetting POR_EL0 during signal handling
- Mark the kernel as tainted on SEA and SError panic
- Remove redundant gcs_free() call
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmiDkgoACgkQa9axLQDI
XvFucQ//bYugRP5/Sdlrq5eDKWBGi1HufYzwfDEBLc4S75Eu8mGL/tuThfu9yFn+
qCowtt4U84HdWsZDTSVo6lym6v2vJUpGOMgXzepvJaFBRnqGv9X9NxH6RQO1LTnu
Pm7rO+7I9tNpfuc7Zu9pHDggsJEw+WzVfmEF6WPSFlT9mUNv6NbSx4rbLQKU86Dm
ouTqXaePEQZ5oiRXVasxyT0otGtiACD20WpgOtNjYGzsfUVwCf/C83V/2DLwwbhr
9cW9lCtFxA/yFdQcA9ThRzWZ9Eo5LAHqjGIq00+zOjuzgDbBtcTT79gpChkhovIR
FBIsWHd9j9i3nYxzf4V4eRKQnyqS3NQWv7g7uKFwNgARif1Zk0VJ77QIlAYk5xLI
ENTRjLKz5WNGGnhdkeCvDlVyxX+OktgcVTp3vqRxAKCRahMMUqBrwxiM8RzVF37e
yzkEQayL8F7uZqy9H7Sjn48UpHZux6frJ1bBQw1oEvR9QmAoAdqavPMSAYIOT3Zr
ze4WIljq/cFr3kBPIFP5pK1e0qYMHXZpSKIm8MAv6y/7KmQuVbMjZthpuPbLSIw0
Q7C0KalB8lToPIbO7qMni/he0dCN4K2+E1YHFTR+pzfcoLuW4rjSg7i8tqMLKMJ8
H+SeGLyPtM5A6bdAPTTpqefcgUUe7064ENUqrGUpDEynGXA7boE=
=5h1C
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"A quick summary: perf support for Branch Record Buffer Extensions
(BRBE), typical PMU hardware updates, small additions to MTE for
store-only tag checking and exposing non-address bits to signal
handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on.
There is also a TLBI optimisation on hardware that does not require
break-before-make when changing the user PTEs between contiguous and
non-contiguous.
More details:
Perf and PMU updates:
- Add support for new (v3) Hisilicon SLLC and DDRC PMUs
- Add support for Arm-NI PMU integrations that share interrupts
between clock domains within a given instance
- Allow SPE to be configured with a lower sample period than the
minimum recommendation advertised by PMSIDR_EL1.Interval
- Add suppport for Arm's "Branch Record Buffer Extension" (BRBE)
- Adjust the perf watchdog period according to cpu frequency changes
- Minor driver fixes and cleanups
Hardware features:
- Support for MTE store-only checking (FEAT_MTE_STORE_ONLY)
- Support for reporting the non-address bits during a synchronous MTE
tag check fault (FEAT_MTE_TAGGED_FAR)
- Optimise the TLBI when folding/unfolding contiguous PTEs on
hardware with FEAT_BBM (break-before-make) level 2 and no TLB
conflict aborts
Software features:
- Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable()
and using the text-poke API for late module relocations
- Force VMAP_STACK always on and change arm64_efi_rt_init() to use
arch_alloc_vmap_stack() in order to avoid KASAN false positives
ACPI:
- Improve SPCR handling and messaging on systems lacking an SPCR
table
Debug:
- Simplify the debug exception entry path
- Drop redundant DBG_MDSCR_* macros
Kselftests:
- Cleanups and improvements for SME, SVE and FPSIMD tests
Miscellaneous:
- Optimise loop to reduce redundant operations in contpte_ptep_get()
- Remove ISB when resetting POR_EL0 during signal handling
- Mark the kernel as tainted on SEA and SError panic
- Remove redundant gcs_free() call"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64/gcs: task_gcs_el0_enable() should use passed task
arm64: Kconfig: Keep selects somewhat alphabetically ordered
arm64: signal: Remove ISB when resetting POR_EL0
kselftest/arm64: Handle attempts to disable SM on SME only systems
kselftest/arm64: Fix SVE write data generation for SME only systems
kselftest/arm64: Test SME on SME only systems in fp-ptrace
kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
kselftest/arm64: Allow sve-ptrace to run on SME only systems
arm64/mm: Drop redundant addr increment in set_huge_pte_at()
kselftest/arm4: Provide local defines for AT_HWCAP3
arm64: Mark kernel as tainted on SAE and SError panic
arm64/gcs: Don't call gcs_free() when releasing task_struct
drivers/perf: hisi: Support PMUs with no interrupt
drivers/perf: hisi: Relax the event number check of v2 PMUs
drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
drivers/perf: hisi: Simplify the probe process for each DDRC version
perf/arm-ni: Support sharing IRQs within an NI instance
perf/arm-ni: Consolidate CPU affinity handling
...
|
|
|
|
72b8944f14 |
Locking updates for v6.16:
Locking primitives:
- Mark devm_mutex_init() as __must_check and fix drivers
that didn't check the return code. (Thomas Weißschuh)
- Reorganize <linux/local_lock.h> to better expose the
internal APIs to local variables. (Sebastian Andrzej Siewior)
- Remove OWNER_SPINNABLE in rwsem (Jinliang Zheng)
- Remove redundant #ifdefs in the mutex code (Ran Xiaokai)
Lockdep:
- Avoid returning struct in lock_stats() (Arnd Bergmann)
- Change `static const` into enum for LOCKF_*_IRQ_*
(Arnd Bergmann)
- Temporarily use synchronize_rcu_expedited() in
lockdep_unregister_key() to speed things up.
(Breno Leitao)
Rust runtime:
- Add #[must_use] to Lock::try_lock() (Jason Devers)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiIbzURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gxlRAAsnrbMN1yUbGbOh2fr7eQh69nn4VLZhvQ
n/Q2+ZpvgBQiPhUnYub4n0B03pO6lQO+taiAQ9WTK6VHi7kzIKIx0MPWP5KV9FCY
NQKQCmRccese0mmWVYccLPjyk6GW8l5gIhRK1vuEYANtLf/XLBYB/ygvE6a8ywNz
dmt7IzYOIknCuEtapDzcJLBZFHG9mVTT8Kk2A5aqn+XCrxNnKrYyVOH0qw395uBw
ulVKPJT7FGQ4qLkxfYguNWH5V1ZneN53tJouwqcM7Xpc+ookQFAZel0xlfWpVu+A
Q2WF3W8GOrS7ER9RzjG0SQF4qYBq60yKPZr3przmjCJFRgFdvEkMEIDvbirl0Gfv
Y04hMIcovsnh8x0iLTYxkrRxlZB/7jm5uLVJ1B6E19iYBXq1HCPkM51XugDQFxwz
fDSLblpRZLf9OoWT9NPiiQXpoSLigwOiFdiGimIMQHRbPKCujF2T9w4XpKLLECN4
UbYGMx/yAGdkTXelSStyru0ZLYhvxP2XMAaUJoMBrjI1ReL2e58Vmp2MqQcuhiuU
PV5NEt0qhBAjilUrP+vuM/27UihPxcBrVgvriT+wDVrrPiy1t5iJVOKxFcrkbMto
B+XHFA7z1EglkwGD7HCdoOFU8V3PM6+GNDMqvs5Ey3tifqampEssmYcP3YA6QYBt
eO7imScWtII=
=RExf
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Locking primitives:
- Mark devm_mutex_init() as __must_check and fix drivers that didn't
check the return code (Thomas Weißschuh)
- Reorganize <linux/local_lock.h> to better expose the internal APIs
to local variables (Sebastian Andrzej Siewior)
- Remove OWNER_SPINNABLE in rwsem (Jinliang Zheng)
- Remove redundant #ifdefs in the mutex code (Ran Xiaokai)
Lockdep:
- Avoid returning struct in lock_stats() (Arnd Bergmann)
- Change `static const` into enum for LOCKF_*_IRQ_* (Arnd Bergmann)
- Temporarily use synchronize_rcu_expedited() in
lockdep_unregister_key() to speed things up. (Breno Leitao)
Rust runtime:
- Add #[must_use] to Lock::try_lock() (Jason Devers)"
* tag 'locking-core-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization
locking/mutex: Remove redundant #ifdefs
locking/lockdep: Change 'static const' variables to enum values
locking/lockdep: Avoid struct return in lock_stats()
locking/rwsem: Use OWNER_NONSPINNABLE directly instead of OWNER_SPINNABLE
rust: sync: Add #[must_use] to Lock::try_lock()
locking/mutex: Mark devm_mutex_init() as __must_check
leds: lp8860: Check return value of devm_mutex_init()
spi: spi-nxp-fspi: Check return value of devm_mutex_init()
local_lock: Move this_cpu_ptr() notation from internal to main header
|
|
|
|
bf76f23aa1 |
Scheduler updates for v6.17:
Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair
scheduler. (Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout
the entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's
data structures and logic on UP kernel too, even though
they are not used, to simplify the scheduler and remove
around 200 #ifdef/[#else]/#endif blocks from the
scheduler. (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling
for better interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris Mason)
- Remove sched_domain_topology_level::flags to simplify the code (Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for mutex-owning
tasks to inherit the scheduling context of higher priority waiters.
Currently limited to a single runqueue and conditional on CONFIG_EXPERT,
and other limitations. (John Stultz, Peter Zijlstra, Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive dl_server handling
(Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock() usage
(Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid unnecessary
function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info (Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set
(Luis Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiHHNIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g7DhAAg9aMW33PuC24A4hCS1XQay6j3rgmR5qC
AOqDofj/CY4Q374HQtOl4m5CYZB/G5csRv6TZliWQKhAy9vr6VWddoyOMJYOAlAx
XRurl1Z3MriOMD6DPgNvtHd5PrR5Un8ygALgT+32d0PRz27KNXORW5TyvEf2Bv4r
BX4/GazlOlK0PdGUdZl0q/3dtkU4Wr5IifQzT8KbarOSBbNwZwVcg+83hLW5gJMx
LgMGLaAATmiN7VuvJWNDATDfEOmOvQOu8veoS8TuP1AOVeJPfPT2JVh9Jen5V1/5
3w1RUOkUI2mQX+cujWDW3koniSxjsA1OegXfHnFkF5BXp4q5e54k6D5sSh1xPFDX
iDhkU5jsbKkkJS2ulD6Vi4bIAct3apMl4IrbJn/OYOLcUVI8WuunHs4UPPEuESAS
TuQExKSdj4Ntrzo3pWEy8kX3/Z9VGa+WDzwsPUuBSvllB5Ir/jjKgvkxPA6zGsiY
rbkmZT8qyI01IZ/GXqfI2AQYCGvgp+SOvFPi755ZlELTQS6sUkGZH2/2M5XnKA9t
Z1wB2iwttoS1VQInx0HgiiAGrXrFkr7IzSIN2T+CfWIqilnL7+nTxzwlJtC206P4
DB97bF6azDtJ6yh1LetRZ1ZMX/Gr56Cy0Z6USNoOu+a12PLqlPk9+fPBBpkuGcdy
BRk8KgysEuk=
=8T0v
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair scheduler
(Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout the
entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's data structures
and logic on UP kernel too, even though they are not used, to
simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
blocks from the scheduler (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling for better
interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
Mason)
- Remove sched_domain_topology_level::flags to simplify the code
(Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for
mutex-owning tasks to inherit the scheduling context of higher
priority waiters.
Currently limited to a single runqueue and conditional on
CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri
Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive
dl_server handling (Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock()
usage (Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid
unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter
Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri
Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info
(Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)"
* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
sched/idle: Remove play_idle()
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
sched: Start blocked_on chain processing in find_proxy_task()
sched: Fix proxy/current (push,pull)ability
sched: Add an initial sketch of the find_proxy_task() function
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Move update_curr_task logic into update_curr_se
locking/mutex: Add p->blocked_on wrappers for correctness checks
locking/mutex: Rework task_struct::blocked_on
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched/topology: Remove sched_domain_topology_level::flags
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
x86/smpboot: moves x86_topology to static initialize and truncate
x86/smpboot: remove redundant CONFIG_SCHED_SMT
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
tools/sched: Add root_domains_dump.py which dumps root domains info
sched/deadline: Fix accounting after global limits change
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Initialize dl_servers after SMP
...
|
|
|
|
04d29e3609 |
- Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS
to enable stuffing independently from Retbleed, do some cleanups to simplify and streamline the code - Simplify SRSO and make mitigation types selection more versatile depending on the Retbleed mitigation selection. Simplify code some - Add the second part of the attack vector controls which provide a lot friendlier user interface to the speculation mitigations than selecting each one by one as it is now. Instead, the selection of whole attack vectors which are relevant to the system in use can be done and protection against only those vectors is enabled, thus giving back some performance to the users -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmiHh6cACgkQEsHwGGHe VUqprw//QMpqtWGVbo4bJ176sLtwn8cdKxOJwx9rWyFH/f3Zcn5hK1x+Zifm22hj NNo7YMLTvEg6BicxIDKp89tfXM5cwLS3pcUabWy7IS7Xzs7yLyRajNQ3hOFhQd9g UAUg8xx33xspCatlXzl4HcbOR0xyxb/qR4vd5H89Gir9GIuiO5+uz+3SdqEzzl8w 2UfPDY5B9cXO8VoGsvJMtLTO1ULUHHZPgRdPaH8rSr9QkGlVFefpgUaw6Budic84 kjNpE4tyJEvVLceZr8UtZWmVBwBS4z9oNRdqHbCFnrpPdYXnzYXA6pKMm1vP3zCz atRuWxmn0U6o9wZfxcBF7ZI2o3k049U8zxLWlz9mX4pXbMuqSX6MsR4kw82ta/Hp IzM9LckPO2STYHvJJlcEOivYbKTKttwYZd0rjfaFtJ0z+vVar4EyPyTbfGAdiH50 T2UUmC9SpffVVhnOcaTUGtT/4SFCVA8ZNsoPm27auGVzZRnLOFSV63iv5fl41o3X pELyVfLzR3XtXFNXrzXY09lEKh5HIiy33Qe+syCNEoF56zTN+IREu37M7dKiWBmx xRJE9U9ZgxZjbEuMV0jKEMPOMzMf1ONQw5HSpfIgoT5OLwKXhP5HptHkKS3rwppG 5Glo2kfvxKzFl/THHv7EPoIvVVL/tezcvO3H7z4owRl/jgw0CvA= =zO6b -----END PGP SIGNATURE----- Merge tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU mitigation updates from Borislav Petkov: - Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS to enable stuffing independently from Retbleed, do some cleanups to simplify and streamline the code - Simplify SRSO and make mitigation types selection more versatile depending on the Retbleed mitigation selection. Simplify code some - Add the second part of the attack vector controls which provide a lot friendlier user interface to the speculation mitigations than selecting each one by one as it is now. Instead, the selection of whole attack vectors which are relevant to the system in use can be done and protection against only those vectors is enabled, thus giving back some performance to the users * tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Print enabled attack vectors x86/bugs: Add attack vector controls for TSA x86/pti: Add attack vector controls for PTI x86/bugs: Add attack vector controls for ITS x86/bugs: Add attack vector controls for SRSO x86/bugs: Add attack vector controls for L1TF x86/bugs: Add attack vector controls for spectre_v2 x86/bugs: Add attack vector controls for BHI x86/bugs: Add attack vector controls for spectre_v2_user x86/bugs: Add attack vector controls for retbleed x86/bugs: Add attack vector controls for spectre_v1 x86/bugs: Add attack vector controls for GDS x86/bugs: Add attack vector controls for SRBDS x86/bugs: Add attack vector controls for RFDS x86/bugs: Add attack vector controls for MMIO x86/bugs: Add attack vector controls for TAA x86/bugs: Add attack vector controls for MDS x86/bugs: Define attack vectors relevant for each bug x86/Kconfig: Add arch attack vector support cpu: Define attack vectors ... |
|
|
|
909d2bb07d |
stop-machine: Improve documentation
Changes ------- * Improve kernel-doc function-header comments * Document preemption and stop_machine() mutual exclusion (Joel Fernandes) -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmiBbxgTHHBhdWxtY2tA a2VybmVsLm9yZwAKCRCevxLzctn7jCgqD/4zu5NL6k5JmzPnYoLKPmjyAhQCZHDf saXTuH7mmZPLj0KB/LoiwHTSLO+ldFzqwlggw2YuyqfdHMQzc91NVBU2ZZMAFncD tEdcIQjDvtBHQmi7PU0b/knNJuVEHSCq6OtpmcnvSjQsZXfPeswrfd1ofuFrgc37 jpmyYhplFMAbM4V0IbkjzX8afLYKQRG0Eb51S+yAmbz0ZdiEr4OuSRs3zJvQfCRE /PYfMM8oczUW6MaVJRRIi67KIAiGBfhN+X4Uc39yir/gAu/HQMEv7/+3ImemUu5d x/ZIca/LtJh7Ga4EHNdc7I9AOtDifzXti07izFiILAHrsMlpALMqIRbftHQAknwf CQQUpoB/C0qH1PxZuYyeOByo+MXYejKAsum8990AY6z9mBjTGtdE4cr/ikeYVFft DZLuDPKoy48C5xu0ycj5Ir9TF8LkTBAUdRsUmXiF+SnQWbOwnmcr79XoPp+7JH2A /70EClttaaTvCjpzDZRTNaHjPZ7bdiARUYwCvR/Vgd8KbqYaUoXnbSZjcLGbnSNC W/GCPiDJCEhxbKnli6Vdc2cEI7aA9mUUZqSMo6XryjixeXoPUEfmv91vZ3GkRrGv QMM+MxBuYLWG//G2rBbyIiGE81ewAn6TmDVU54UfqKbB3AmOioyDN0jI/kJCAYfx UqqLyz7+LMbx+A== =tjJh -----END PGP SIGNATURE----- Merge tag 'stop-machine.2025.07.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull stop-machine documentation updates from Paul McKenney: - Improve kernel-doc function-header comments - Document preemption and stop_machine() mutual exclusion (Joel Fernandes) * tag 'stop-machine.2025.07.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: smp: Document preemption and stop_machine() mutual exclusion stop_machine: Improve kernel-doc function-header comments |
|
|
|
78bb43e51b |
Updates for the generic entry code:
- Split the code into syscall and exception/interrupt parts to ease the
conversion of ARM[64] to the generic entry infrastructure
- Extend syscall user dispatching to support a single intercepted range
instead of the default single non-intercepted range. That allows
monitoring/analysis of a specific executable range, e.g. a library, and
also provides flexibility for sandboxing scenarios.
- Cleanup and extend the user dispatch selftest
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiIg6UTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWn7EACTvQpu7tGd1rN9hCjiB1W5po7nvlCd
gKghjS9Kp0KttTDQPLVcmnH06BhDHWNNn1HXZ1ORea4bpLywiKHtVgqUAsJDsBsv
ETeTHYNphk0sktvAqp3XusA6HF4T0s1KXJQj3W1ACrYZWRkK/VystCLYwBRGpc3r
cj7jAFmJyNpU236R5XYJ7ooHfPYpzZ8VAHBO8ykK7muHDfyBRXEIlmkGep++ctSv
v0uZXAy6LONljKg87YJTien0UA7ze9lFgPTuV1y/qfaLbYNekUaJSDjfuhOpZZUw
TzSh9OYoIvKpd0ylHwB1qMLd5CaXNicaeLfTW3xbX06KaXa7WNAonS35sK0EjhtZ
0bBA9g6bRhphyh0tzR4saF9bczNvJydNCn7/QFo9dKbQUEL/FRXtJiIeusVx/0fJ
+ZqWRTcEdDw2Rsyv52hKgyEJi7F3nL9ovabUN9P1/0aPcTdM3WekMpSOJm1U6wVF
e6oSyeoeNdjcdxgWbQrgRNbmq5CPEV3ig5J+G418r5DTF3ifqZX+WscijUtKTu5K
V5GpLc0PL9eoigQ37LmGkwK/4xoB9SAPTQuzUs9qgh9NidwT0cCfoNxpeGh6GeHX
GLHPGU61vZaefxpwuAuv+SQSgxXSKk2/H/ijPzSjrX/PkUp7MoX9XoOQAh4FxZjO
ok5YEUGXzSJfXQ==
=yaCQ
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull generic entry code updates from Thomas Gleixner:
- Split the code into syscall and exception/interrupt parts to ease the
conversion of ARM[64] to the generic entry infrastructure
- Extend syscall user dispatching to support a single intercepted range
instead of the default single non-intercepted range. That allows
monitoring/analysis of a specific executable range, e.g. a library,
and also provides flexibility for sandboxing scenarios
- Cleanup and extend the user dispatch selftest
* tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Split generic entry into generic exception and syscall entry
selftests: Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON
syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON
selftests: Fix errno checking in syscall_user_dispatch test
|
|
|
|
f38b1f243e |
Update for the futex subsystem:
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs. the single instance rcuref
variant.
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32.
- Cleanups and improvements in selftests and futex bench
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiIiDITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoblTD/0eV9w21tFVmn6ICrhgQgsrejJ0BANs
mm5mE/0d29MZHEhnJO2CSccGXBDfykuk/gxHXHsUZ9tiVSOgjz9dDl1bcrZ8Je9V
YNWMXiHASQrLctmrKLPSdjlcxQnPIxCm+K4lajoa+CyvReHE24sUDgCN8GC3P9pH
VxTmQ7UjGrzvIRlfd4AL9GJBF1IGKNnpPHCeSwjn/cmlDxu4RxEdjRWTbW8Tbz9N
1ay/T8vEE1SykI2qZOXIP16sYZw2dP9FOgARO90Ahb6hwAwbI72MvC69GpZe3lh5
1B1ZgpEiUMa4IT5jJ43Wkm3k8BF6meW+rIUjUBt+y8yjNgaR4degvgnDx44YPZ94
5Ek3cJgpTpVnWbfRxn2b2vRL8rZkRBIq9ezswp0/8KLgC7Gd+zPuQKPvoo2m+n3S
UMufGGT2h5oJbx0qGry5rxZz03eGE6oWAm3H/WRl2wIw5D/kvU5ol6AYKJ5eGTyj
JdPJVzzPBH319iCMZ1olqo/h5er148aYL16ga7w6w9pqhPuxGud30BFf8SHQ8F1R
NIZiu6O3L2ge0RLb/8wxukFkDz3R1gZBWeTLxLEymTJG3TaA3uIByOI6UO03zgW/
QBbNLr7ndkIcm8E31hAWamGQy+EAXj1/e5GYREvhhHOwUV+y/E1FTrrdwtT4GA0S
tBYACfeCbOojsA==
=WqFq
-----END PGP SIGNATURE-----
Merge tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs the single instance rcuref
variant
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32
- Cleanups and improvements in selftests and futex bench
* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
perf bench futex: Remove support for IMMUTABLE
selftests/futex: Remove support for IMMUTABLE
futex: Remove support for IMMUTABLE
futex: Make futex_private_hash_get() static
futex: Use RCU-based per-CPU reference counting instead of rcuref_t
selftests/futex: Adapt the private hash test to RCU related changes
|
|
|
|
02dc9d15d7 |
Updates for the timekeeping and VDSO code:
- Introduce support for auxiliary timekeepers
PTP clocks can be disconnected from the universal CLOCK_TAI reality
for various reasons including regularatory requirements for
functional safety redundancy.
The kernel so far only supports a single notion of time, which means
that all clocks are correlated in frequency and only differ by
offset to each other.
Access to non-correlated PTP clocks has been available so far only
through the file descriptor based "POSIX clock IDs", which are
subject to locking and have to go all the way out to the hardware.
The access is not only horribly slow, as it has to go all the way out
to the NIC/PTP hardware, but that also prevents the kernel to read
the time of such clocks e.g. from the network stack, where it is
required for TSN networking both on the transmit and receive side
unless the hardware provides offloading.
The auxiliary clocks provide a mechanism to support arbitrary clocks
which are not correlated to the system clock. This is not restricted
to the PTP use case on purpose as there is no kernel side association
of these clocks to a particular PTP device because that's a pure user
space configuration decision. Having them independent allows to
utilize them for other purposes and also enables them to be tested
without hardware dependencies.
To avoid pointless overhead these clocks have to be enabled
individualy via a new sysfs interface to reduce the overhead to a
single compare in the hotpath if they are enabled at the Kconfig
level at all.
These clocks utilize the existing timekeeping/NTP infrastructures,
which has been made possible over the recent releases by incrementaly
converting these infrastructures over from a single static instance
to a multi-instance pointer based implementation without any
performance regression reported.
The auxiliary clocks provide the same "emulation" of a "correct"
clock as the existing CLOCK_* variants do with an independent
instance of data and provide the same steering mechanism through the
existing sys_clock_adjtime() interface, which has been confirmed to
work by the chronyd(8) maintainer.
That allows to provide lockless kernel internal and VDSO support so
that applications and kernel internal functionalities can access
these clocks without restrictions and at the same performance as the
existing system clocks.
- Avoid double notifications in the adjtimex() syscall. Not a big issue,
but a trivial to avoid latency source.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGo/MTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWTIEACy/HC2OD7IbAzECgwQUvo59xvmw6ak
p1wRNYpTjOLUgWZjcl/jQfh7jYe60/0PrIzYgAeltXSGwOVtqQDrUIWrTKrAHOUa
wqKUCEfCucTUJRKLQ1Ktnjy/2Pp0Ojpf32Av0v/wgLUMxQk9Av39UdQwMOGyoHOa
07//lrVzfYfqe5Ne7cmuZSbVcHlKyWpXtSvPiVhyk+tHZea4646Pz17sBeVsefps
41mxZBRk7VNiE8yWtRWYKcaXxE/0nYkptjhXOqgmNRTGB/WfyKavDYVLWe31XPrI
G3/QcAAJHBEYZgoGMHRn76L+NWNqnuxeFPtSOaGceBks7HwCKdfvrAwaJzJ1Mr22
IxeoPm0Fzdrtjy2L2PM1txGdEI2iErprCNtQolM4BCQzUslsukP/Fts+SOujgpnC
SJ9rbIIeKJzt2dW6kQai+xGw6WIpQyus7Lbt0sEcyBdWi5Bqvh9g1ZXUn8SHY2xx
/aaEBe2J1RGGhNhHD6bSLYMAXoKDPcpZIrwO+2N96Z18uYee0FU5g3JVKGfuuTdo
wYfpK79xsFmhaBDj8pYAJoU3y/v+WycE2pP2oFgBhN49Xcxo1yYIm5ECb9dYesfD
4nW/Av9bI/NR7J4MilXvw2hSmfapgNgcUGb6sgEuSD7M8/UXkfFbcbqp+RcUBiQj
0XGiNzqLnwgzBQ==
=TTjt
-----END PGP SIGNATURE-----
Merge tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and VDSO updates from Thomas Gleixner:
- Introduce support for auxiliary timekeepers
PTP clocks can be disconnected from the universal CLOCK_TAI reality
for various reasons including regularatory requirements for
functional safety redundancy.
The kernel so far only supports a single notion of time, which means
that all clocks are correlated in frequency and only differ by offset
to each other.
Access to non-correlated PTP clocks has been available so far only
through the file descriptor based "POSIX clock IDs", which are
subject to locking and have to go all the way out to the hardware.
The access is not only horribly slow, as it has to go all the way out
to the NIC/PTP hardware, but that also prevents the kernel to read
the time of such clocks e.g. from the network stack, where it is
required for TSN networking both on the transmit and receive side
unless the hardware provides offloading.
The auxiliary clocks provide a mechanism to support arbitrary clocks
which are not correlated to the system clock. This is not restricted
to the PTP use case on purpose as there is no kernel side association
of these clocks to a particular PTP device because that's a pure user
space configuration decision. Having them independent allows to
utilize them for other purposes and also enables them to be tested
without hardware dependencies.
To avoid pointless overhead these clocks have to be enabled
individualy via a new sysfs interface to reduce the overhead to a
single compare in the hotpath if they are enabled at the Kconfig
level at all.
These clocks utilize the existing timekeeping/NTP infrastructures,
which has been made possible over the recent releases by incrementaly
converting these infrastructures over from a single static instance
to a multi-instance pointer based implementation without any
performance regression reported.
The auxiliary clocks provide the same "emulation" of a "correct"
clock as the existing CLOCK_* variants do with an independent
instance of data and provide the same steering mechanism through the
existing sys_clock_adjtime() interface, which has been confirmed to
work by the chronyd(8) maintainer.
That allows to provide lockless kernel internal and VDSO support so
that applications and kernel internal functionalities can access
these clocks without restrictions and at the same performance as the
existing system clocks.
- Avoid double notifications in the adjtimex() syscall. Not a big
issue, but a trivial to avoid latency source.
* tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
vdso/gettimeofday: Add support for auxiliary clocks
vdso/vsyscall: Update auxiliary clock data in the datapage
vdso: Introduce aux_clock_resolution_ns()
vdso/gettimeofday: Introduce vdso_get_timestamp()
vdso/gettimeofday: Introduce vdso_set_timespec()
vdso/gettimeofday: Introduce vdso_clockid_valid()
vdso/gettimeofday: Return bool from clock_gettime() helpers
vdso/gettimeofday: Return bool from clock_getres() helpers
vdso/helpers: Add helpers for seqlocks of single vdso_clock
vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock()
vdso/vsyscall: Introduce a helper to fill clock configurations
timekeeping: Remove the temporary CLOCK_AUX workaround
timekeeping: Provide ktime_get_clock_ts64()
timekeeping: Provide interface to control auxiliary clocks
timekeeping: Provide update for auxiliary timekeepers
timekeeping: Provide adjtimex() for auxiliary clocks
timekeeping: Prepare do_adtimex() for auxiliary clocks
timekeeping: Make do_adjtimex() reusable
timekeeping: Add auxiliary clock support to __timekeeping_inject_offset()
timekeeping: Make timekeeping_inject_offset() reusable
...
|
|
|
|
d614399b28 |
Updates for the timer core:
- Simplify the logic in the timer migration code
- Simplify the clocksource code by utilizing the more modern cpumask+*()
interfaces
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGlSYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoajDEACHGVGjbnOumTHEAp1yrUZe0qQh8FUw
wB9E6DeYS3H5Ajh5zP8BzsTi7USnJYDTv5OWz2hTl8ji3ORcsXjRzMuRa6VD8rsp
drtxROiXA8i8H+pjKTehCZWy96vWybxIW+s0gkgK91dQKP6CAEehk4u7hELO6gU0
ZPIcwZ3Kx5za4SJagqnYY8PGzRl52Oy1mt/jLgiQUlVcR1bel8xEd8Q+tbWI0yC+
NA51niSsPZrx+n/AKOdKiaARtgZnzZJYfmv4bJfbBfV4QEiM13hGCPLq64FJZDPD
ZkqMzdj9X4g77t4agMoRyGEki9Sfk9+qMJrLSpstBd3GtQU2pSTr9l24sRPcT5oa
A0o2M+tPINgwZlnS1OLqzq8e4RWFhh6WpvDQTlMJ87nB46mYto8rTNsgjuT9e0hY
bvnOdmce/p4BCSS5JDJj2Ix0+knr2VKkKD470U2SvoZa3s0KMvwCU2iPfhLQKFoP
xx8LJABK0FrSbsdXV8SjeLUbVGN/HX4A1Zz8xx88fCUADgUr3b6y8Bd4btatQOK9
hTGmkLQfgkQ6Dm5Qg1pmFhKiQqkWRCnW3uX5+9JmhCeu55ZU+EMUlzcklCh9lqGa
njimFS//6/ikp/hSeYS19Z59gYHNdTavRkdJZJFPN4R67YNvEliKPYDEqTNRPzEx
K7hjrpnMe6Ti6A==
=A19q
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Simplify the logic in the timer migration code
- Simplify the clocksource code by utilizing the more modern
cpumask+*() interfaces
* tag 'timers-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Use cpumask_next_wrap() in clocksource_watchdog()
clocksource: Use cpumask_any_but() in clocksource_verify_choose_cpus()
timers/migration: Clean up the loop in tmigr_quick_check()
|
|
|
|
99e731bcb8 |
A treewide cleanup of struct cycle_counter const annotations:
The initial idea of making them const was correct as they were seperate instances. When they got embedded into larger data structures, which are even modified by the callback this got moot. The only reason why this went unnoticed is that the required container_of() casts the const attribute forcefully away. Stop pretending that it is const. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGkt0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodb/D/47tAOcL6qyTxh+E5fDj4h5M2aVQYp+ oLiBY7ejroObCAiMo6v/ajaGNEXvLXEtratFBQv1o7n12rxsKnv/XNdIbAzvMwKm 3vNx9anQ0THFnWiUKCAf/mZBc/Z9uXnjBXY0TeaFLuj4DuUMqGGs/ga32UIvEwW+ VXS592NIHYoK87VA/AIOK8AGX9VesYoxUqZmfmYn5NuJwsJDSgGaQpoH2fBVhU6p B7NRnVLYtFYpeAvL0ihFuOP03HZes5hM2wmqNtoCJJF9e39Eg9DXhsaUw8AIa1fu UCxm5sfNpMR+QN2AizbI5deHZJbFwYPy/cFrZ0AiwCfZt6NATjO7bQjvu/23yPFn klby8GizV0Q7qmiLi45EdPdid5oiBm/lNy6ICP9fN/XlHZFxAsqE/OGx626P+LPs xQK1MpHNjbkCK//X49hxQX6a0BCqSEAG4GOuEEqRxTeD9SZVzIQbcQ6CKOuBqPqc hc5o5N3suzdmq5sujQD0IiL9R960WfzsSgbmdQm+njhRz+LjAx6T419MlyePiu/5 hf8pZjl/SHn1O6RpPfJ501U+tbo1auEnUjXMGcg12dx7KE4w9s7fboFQLXruuVRr UNeVXeMRxZqXSAB1HyKQMI8lZ7nHF6FZUA+xLMusCwIYi7YU0h7I3BxDzxpZV62l zJgslDp7ZqSopw== =C0ua -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide cleanup of struct cycle_counter const annotations. The initial idea of making them const was correct as they were seperate instances. When they got embedded into larger data structures, which are even modified by the callback this got moot. The only reason why this went unnoticed is that the required container_of() casts the const attribute forcefully away. Stop pretending that it is const" * tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/timecounter: Fix the lie that struct cyclecounter is const |
|
|
|
0b29600a30 |
Updates for interrupt chip drivers:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled.\
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so it
supports arbitrary hart indices
- Add a command line parameter and related handling to disable the generic
RISCV IMSIC mechanism on platforms which use a trap-emulated IMSIC.
Unfortunatly this is required because there is no mechanism available to
discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend variants.
- The usual cleanups and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGj60THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoSh6D/9wY0G2dGz+EJeiDsldzB1n5jmf5I0k
3XsI3o5j0Ma/Yy+nu9Re3fZq0+qzPFZZErxkBp5igCJbSoaIGheqOyXQDuQu/8tm
s2t8Wx9k6er7Cywg9rU9pWKzJ6AFXFvcKOEvGG2q2+lFbJbIoGdAM93qPrOJhqeo
a3NyhQv6kNl7xAjOVyEZmOlCZgCYotFwC0+K1TVQgGDGbwHWH3wad64gLTyyQQlK
RvtbUBKfCBqqwLJ7Mww7Xclezjk/Hpgm/OppxBAglv5WyRd0e15u2dpdTdM9r4BC
4wX5Old3ZgbqBQjdHGNlljthu4lO2S0PXU6j0EC8W2NiQjN+hPaKK/EPeerlAJCz
UmxY0/E3HFNUk8ZHkiif3PGiOSvAbn0JwWi3+D6HuK1rlVTXNs07NIgUBk727Ty5
S7r5JEuUA5s9dGta4pszxHGn/0Dqg/WvnMZGcbPNaV6POH47wNnPlO2mj14I1HLk
SfG+deohJM34pVVq7fiqgGukLVPm6PfiJkXx90MK6l+BfE58uo7Oue9mm9pqT2dy
b6K1gdNPRsZzG7AoAqkx3UrjQuD7maWIpDGb4VZeUW/34bthLygIDUY4OZhpdrUZ
m33T8zv0PrmNuvnMdFt0RyoDTu8PC9rYS0XVvsIMqsMxJDE/URVGH2tCi5CVMiEg
PbRWL56yGyT1NA==
=5LGj
-----END PGP SIGNATURE-----
Merge tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt chip driver updates from Thomas Gleixner:
- Add support of forced affinity setting to yet offline CPUs for the
MIPS-GIC to ensure that the affinity of per CPU interrupts can be set
during the early bringup phase of a secondary CPU in the hotplug code
before the CPU is set online and interrupts are enabled
- Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI
interrupt chip
- Make the interrupt routing to RISV-V harts specification compliant so
it supports arbitrary hart indices
- Add a command line parameter and related handling to disable the
generic RISCV IMSIC mechanism on platforms which use a trap-emulated
IMSIC. Unfortunatly this is required because there is no mechanism
available to discover this programatically.
- Enable wakeup sources on the Renesas RZV2H driver
- Convert interrupt chip drivers, which use a open coded variant of
msi_create_parent_irq_domain() to use the new functionality
- Convert interrupt chip drivers, which use the old style two level
implementation of MSI support over to the MSI parent mechanism to
prepare for removing at least one of the three PCI/MSI backend
variants.
- The usual cleanups and improvements all over the place
* tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
irqchip/riscv-imsic: Add kernel parameter to disable IPIs
irqchip/gic-v3: Fix GICD_CTLR register naming
irqchip/ls-scfg-msi: Fix NULL dereference in error handling
irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain()
irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Switch to msi_create_parent_irq_domain()
irqchip/alpine-msi: Convert to __free
irqchip/alpine-msi: Convert to lock guards
irqchip/alpine-msi: Clean up whitespace style
irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain()
irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain()
irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper
irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper
irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain()
irqdomain: Add device pointer to irq_domain_info and msi_domain_info
irqchip/renesas-rzv2h: Remove unneeded includes
irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
irqchip/aslint-sswi: Resolve hart index
...
|
|
|
|
b34111a89f |
A set of updates for SMP function calls:
- Improve localitu of smp_call_function_any() by utilizing
sched_numa_find_nth_cpu() instead of picking a random CPU
- Wait for work completion in smp_call_function_many_cond() only when
there was actually work enqueued
- Simplify functions by unutlizing the appropriate cpumask_*()
interfaces
- Trivial cleanups
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGkfQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVFFD/9OyKVhAlk3fP4PJG3VBZs/8IDp52Wo
vXHZPAyjRm0mtgonmRKQfNh9Xow6/ISiSxoE6yy98aEXRnzPgygHpwZfVwpEP5Q+
Ys0Y6DpaDW2Uw+a9qfBvpnEawmWK+b5N58ApLSMbabv6MdZhElI2SEjZKtqTda0j
161nRGADXPYm6uIw2kbAGseHpTslKCqTLdMHvvCnSx2Qa6Otw3VMWlYBpsOoqf7n
9+OA7rwpSArjgjGHJJKgwtdRfvobIYReEWUXOP6QF7Vgm4H5i9kgvD7NuFCa9Ykv
2kZnknuIplp9V+AvSsFjMu+RdxpktlL348Pnl6tZdjYrHQrgCWjhb11aD8gi8pb5
sdqAupJ2+N7woqfwuKFuzcEBjnjSbV0Jeks8GDQzuWOiniMn4BCj3qWPtIszZ80z
YddgGXf4RNJjytWjMyohh472YBQ+O3rlvVDmR011GnNdIphl8ovrtI9r+Ra6FwVg
eHmjr8yGjzmntay6KjbP+iQVjzqCFz6Lz7kTQBXGP3MPcd7du9R7KBGY6rm1+FJ5
3D4yIxIgK9sWg5GEr//1fdoi9wIrxsAfvgIsqpliwpHZ7wScyG98Iq74QsPGoimP
LgTHkHsxcMnsaHM8lLTo4mArbunQJTFtx/lYRk++lj1jfqxlLNUXmH6mQmKC+fla
Jz6duXcmFOoI3A==
=dFmz
-----END PGP SIGNATURE-----
Merge tag 'smp-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp updates from Thomas Gleixner:
"A set of updates for SMP function calls:
- Improve locality of smp_call_function_any() by utilizing
sched_numa_find_nth_cpu() instead of picking a random CPU
- Wait for work completion in smp_call_function_many_cond() only when
there was actually work enqueued
- Simplify functions by unutlizing the appropriate cpumask_*()
interfaces
- Trivial cleanups"
* tag 'smp-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Wait only if work was enqueued
smp: Defer check for local execution in smp_call_function_many_cond()
smp: Use cpumask_any_but() in smp_call_function_many_cond()
smp: Improve locality in smp_call_function_any()
smp: Fix typo in comment for raw_smp_processor_id()
|
|
|
|
dba3ec9f2a |
Updates for the interrupt core subsystem:
- Prevent a interrupt migration related live lock in handle_edge_irq()
If the interrupt affinity is moved to a new target CPU and the
interrupt is currently handled on the previous target CPU for edge type
interrupts the handler might get stuck on the previous target for a
long time, which causes both involved CPUs to waste cycles and
eventually run into a soft-lockup situation.
Solve this by checking whether the interrupt is redirected to a new
target CPU and if the interrupt is handled on that new target CPU, busy
wait for completion instead of masking it and sending the pending but
which would cause the old CPU to re-run the handler and in the worst
case repeating this excercise for a long time.
This only works on architectures which use single CPU interrupt
targets, but that's so far the only ones where this behaviour has been
observed.
- Add a kunit test for interrupt disable depth counts
The nested interrupt disable depth has been an issue in the past
especially vs. free_irq(), interrupt shutdown and CPU hotplug and their
interactions. The test exercises the combinations of these scenarios
and checks for correctness.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGjEUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWemD/0XUR2xjN5pdlJSDqSehKfUbI2psqij
TPWyNmttyw3vLAkzQZjlsrTANfJq8XnEvAYZSK/Qu+4ogQvTYlw3FPoesWLDScfD
yUN0vKnKM8xJ04YRSKBOWcbLve/MjwbAp77PTPalOTF8wpcj2InoawwjD7pTcWwL
8PG0cW26Rx6iuIFVfFhv+QMUZC+oS/66CfQTQ/Uf+02mAOhprs7I5pgINb5F+drQ
XR4TyMnaePmZ1i8ULa4SPnw7AzfPRhJM/m+jhA2NZJwbpTGnKGueE7SLJ//wX08p
woT3cf7DQCHeMPIZQAOLJ181LWbHz7EJFxJ+7Ba/FWP/X/Ep0EG9BwuK5Gg6/Auj
73xEnria3+Nw1hM6hlfRVX0+osizO35X91q63QUIejYDtM9Y9ByOjmi1o334sVjl
iFLq2jAw8H0phYia67kDtoPmc94GsvqLJwRt5JtxN6MRF5L1CspqEveQOJuoO/Tg
ZlToL+U16bkf7PI25e6d1t9OjqeK5GhOL3/Ygt8HwO9m4jIE7JCIIlMnQXpxl54I
s5H467lJwVqeiOqiU4UWhPM++hvxCE0/FuOZHAgbC4/fYPR/+WkZdEfyA/Rhz88g
B4yhq5+SBE8cMr0kh2e7c6mC505KICCEeyAU3iadfKlu3bhd7sVfFrgjRLoiCAym
XC+Vi+Q7lu0J7w==
=IgyG
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
- Prevent a interrupt migration related live lock in handle_edge_irq()
If the interrupt affinity is moved to a new target CPU and the
interrupt is currently handled on the previous target CPU for edge
type interrupts the handler might get stuck on the previous target
for a long time, which causes both involved CPUs to waste cycles and
eventually run into a soft-lockup situation.
Solve this by checking whether the interrupt is redirected to a new
target CPU and if the interrupt is handled on that new target CPU,
busy wait for completion instead of masking it and sending the
pending but which would cause the old CPU to re-run the handler and
in the worst case repeating this excercise for a long time.
This only works on architectures which use single CPU interrupt
targets, but that's so far the only ones where this behaviour has
been observed.
- Add a kunit test for interrupt disable depth counts
The nested interrupt disable depth has been an issue in the past
especially vs. free_irq(), interrupt shutdown and CPU hotplug and
their interactions. The test exercises the combinations of these
scenarios and checks for correctness.
* tag 'irq-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent migration live lock in handle_edge_irq()
genirq: Split up irq_pm_check_wakeup()
genirq: Move irq_wait_for_poll() to call site
genirq: Remove pointless local variable
genirq: Add kunit tests for depth counts
|
|
|
|
22c5696e3f |
Driver core changes for 6.17-rc1
- DEBUGFS
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
- SYSFS
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
- Support cache-ids for device-tree systems
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
- Rust
- Device
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
- Misc
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()
-----BEGIN PGP SIGNATURE-----
iHQEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaIjkhwAKCRBFlHeO1qrK
LpXuAP9RWwfD9ZGgQZ9OsMk/0pZ2mDclaK97jcmI9TAeSxeZMgD1FHnOMTY7oSIi
iG7Muq0yLD+A5gk9HUnMUnFNrngWCg==
=jgRj
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Remove unneeded debugfs_file_{get,put}() instances
- Remove last remnants of debugfs_real_fops()
- Allow storing non-const void * in struct debugfs_inode_info::aux
sysfs:
- Switch back to attribute_group::bin_attrs (treewide)
- Switch back to bin_attribute::read()/write() (treewide)
- Constify internal references to 'struct bin_attribute'
Support cache-ids for device-tree systems:
- Add arch hook arch_compact_of_hwid()
- Use arch_compact_of_hwid() to compact MPIDR values on arm64
Rust:
- Device:
- Introduce CoreInternal device context (for bus internal methods)
- Provide generic drvdata accessors for bus devices
- Provide Driver::unbind() callbacks
- Use the infrastructure above for auxiliary, PCI and platform
- Implement Device::as_bound()
- Rename Device::as_ref() to Device::from_raw() (treewide)
- Implement fwnode and device property abstractions
- Implement example usage in the Rust platform sample driver
- Devres:
- Remove the inner reference count (Arc) and use pin-init instead
- Replace Devres::new_foreign_owned() with devres::register()
- Require T to be Send in Devres<T>
- Initialize the data kept inside a Devres last
- Provide an accessor for the Devres associated Device
- Device ID:
- Add support for ACPI device IDs and driver match tables
- Split up generic device ID infrastructure
- Use generic device ID infrastructure in net::phy
- DMA:
- Implement the dma::Device trait
- Add DMA mask accessors to dma::Device
- Implement dma::Device for PCI and platform devices
- Use DMA masks from the DMA sample module
- I/O:
- Implement abstraction for resource regions (struct resource)
- Implement resource-based ioremap() abstractions
- Provide platform device accessors for I/O (remap) requests
- Misc:
- Support fallible PinInit types in Revocable
- Implement Wrapper<T> for Opaque<T>
- Merge pin-init blanket dependencies (for Devres)
Misc:
- Fix OF node leak in auxiliary_device_create()
- Use util macros in device property iterators
- Improve kobject sample code
- Add device_link_test() for testing device link flags
- Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits
- Hint to prefer container_of_const() over container_of()"
* tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits)
rust: io: fix broken intra-doc links to `platform::Device`
rust: io: fix broken intra-doc link to missing `flags` module
rust: io: mem: enable IoRequest doc-tests
rust: platform: add resource accessors
rust: io: mem: add a generic iomem abstraction
rust: io: add resource abstraction
rust: samples: dma: set DMA mask
rust: platform: implement the `dma::Device` trait
rust: pci: implement the `dma::Device` trait
rust: dma: add DMA addressing capabilities
rust: dma: implement `dma::Device` trait
rust: net::phy Change module_phy_driver macro to use module_device_table macro
rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id
rust: device_id: split out index support into a separate trait
device: rust: rename Device::as_ref() to Device::from_raw()
arm64: cacheinfo: Provide helper to compress MPIDR value into u32
cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
cacheinfo: Set cache 'id' based on DT data
container_of: Document container_of() is not to be used in new code
driver core: auxiliary bus: fix OF node leak
...
|
|
|
|
6443cdf567 |
ring-buffer: Make the const read-only 'type' static
Don't populate the read-only 'type' on the stack at run time, instead make it static. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250714160858.1234719-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
5e32d0f15c |
unwind_user/deferred: Add unwind_user_faultable()
Add a new API to retrieve a user space callstack called unwind_user_faultable(). The difference between this user space stack tracer from the current user space stack tracer is that this must be called from faultable context as it may use routines to access user space data that needs to be faulted in. It can be safely called from entering or exiting a system call as the code can still be faulted in there. This code is based on work by Josh Poimboeuf's deferred unwinding code: Link: https://lore.kernel.org/all/6052e8487746603bdb29b65f4033e739092d9925.1737511963.git.jpoimboe@kernel.org/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182405.147896868@kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
71753c6ed2 |
unwind_user: Add user space unwinding API with frame pointer support
Introduce a generic API for unwinding user stacks. In order to expand user space unwinding to be able to handle more complex scenarios, such as deferred unwinding and reading user space information, create a generic interface that all architectures can use that support the various unwinding methods. This is an alternative method for handling user space stack traces from the simple stack_trace_save_user() API. This does not replace that interface, but this interface will be used to expand the functionality of user space stack walking. None of the structures introduced will be exposed to user space tooling. Support for frame pointer unwinding is added. For an architecture to support frame pointer unwinding it needs to enable CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME. By encoding the frame offsets in struct unwind_user_frame, much of this code can also be reused for future unwinder implementations like sframe. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Indu Bhagat <indu.bhagat@oracle.com> Cc: "Jose E. Marchesi" <jemarch@gnu.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/20250729182404.975790139@kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Co-developed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/all/20250710164301.3094-2-mathieu.desnoyers@efficios.com/ Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Co-developed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
1a967e92bf |
tracing: Remove "__attribute__()" from the type field of event format
With CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, `__user` is
converted to `__attribute__((btf_type_tag("user")))`. In this case,
some syscall events have it for __user data, like below;
/sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 720
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:int dfd; offset:16; size:8; signed:0;
field:const char __attribute__((btf_type_tag("user"))) * filename; offset:24; size:8; signed:0;
field:int flags; offset:32; size:8; signed:0;
field:umode_t mode; offset:40; size:8; signed:0;
Then the trace event filter fails to set the string acceptable flag
(FILTER_PTR_STRING) to the field and rejects setting string filter;
# echo 'filename.ustring ~ "*ftracetest-dir.wbx24v*"' \
>> events/syscalls/sys_enter_openat/filter
sh: write error: Invalid argument
# cat error_log
[ 723.743637] event filter parse error: error: Expecting numeric field
Command: filename.ustring ~ "*ftracetest-dir.wbx24v*"
Since this __attribute__ makes format parsing complicated and not
needed, remove the __attribute__(.*) from the type string.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/175376583493.1688759.12333973498014733551.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
f02b1bcc73 |
Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD
KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique. |
|
|
|
a3e892ab0f |
tracing: fprobe: Fix infinite recursion using preempt_*_notrace()
Since preempt_count_add/del() are tracable functions, it is not allowed
to use preempt_disable/enable() in ftrace handlers. Without this fix,
probing on `preempt_count_add%return` will cause an infinite recursion
of fprobes.
To fix this problem, use preempt_disable/enable_notrace() in
fprobe_return().
Link: https://lore.kernel.org/all/175374642359.1471729.1054175011228386560.stgit@mhiramat.tok.corp.google.com/
Fixes:
|
|
|
|
53edfecef6 |
Power management updates for 6.17-rc1
- Fix two initialization ordering issues in the cpufreq core and a
governor initialization error path in it, and clean it up (Lifeng
Zheng)
- Add Granite Rapids support in no-HWP mode to the intel_pstate cpufreq
driver (Li RongQing)
- Make intel_pstate always use HWP_DESIRED_PERF when operating in the
passive mode (Rafael Wysocki)
- Allow building the tegra124 cpufreq driver as a module (Aaron Kling)
- Do minor cleanups for Rust cpufreq and cpumask APIs and fix MAINTAINERS
entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas Bulwahn)
- Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
- Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver (Prashant
Malani)
- Fix minimum performance state label error in the amd-pstate driver
documentation (Shouye Liu)
- Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
governor and explain HW coordination influence on it in the
documentation (Shashank Balaji)
- Fix opencoded for_each_cpu() in idle_state_valid() in the DT cpuidle
driver (Yury Norov)
- Remove info about non-existing QoS interfaces from the PM QoS
documentation (Ulf Hansson)
- Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
- Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
- Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)
- Simplify the sun8i-a33-mbus devfreq driver by using more devm
functions (Uwe Kleine-König)
- Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
- Check devfreq governor before using governor->name (Lifeng Zheng)
- Remove a redundant devfreq_get_freq_range() call from
devfreq_add_device() (Lifeng Zheng)
- Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
- Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the device
suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
- Document return values of suspend-related API functions in the
runtime PM framework (Sakari Ailus)
- Mark last busy stamp in multiple autosuspend-related functions in the
runtime PM framework and update its documentation (Sakari Ailus)
- Take active children into account in pm_runtime_get_if_in_use() for
consistency (Rafael Wysocki)
- Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
power capping driver (Sivan Zohar-Kotzer)
- Add support for the Bartlett Lake platform to the Intel RAPL power
capping driver (Qiao Wei)
- Add PL4 support for Panther Lake to the intel_rapl_msr power capping
driver (Zhang Rui)
- Update contact information in the PM ABI docs and maintainer
information in the power domains DT binding (Rafael Wysocki)
- Update PM header inclusions to follow the IWYU (Include What You Use)
principle (Andy Shevchenko)
- Add flags to specify power on attach/detach for PM domains, make the
driver core detach PM domains in device_unbind_cleanup(), and drop
the dev_pm_domain_detach() call from the platform bus type (Claudiu
Beznea)
- Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
- Fix printing of CORE, CPU fields in cpupower-monitor (Gautham Shenoy)
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmh/wC4SHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO1O6MIAJtfclAleksv+PzbEyC+yk72zKinJg35
WJUk4Kz1yMOqAPazbpXRXt1tuxqyB3HWeixnTFyZbz+bbhZjYJ0lvpWGkdsFaS0i
NSbILSpHNGtOrP6s6hVKTBmLAdAzdWYWMQizlWgGrkhOiN5BnQzL7pAi2aGqu9KS
tGqnIg/3QwBAvnxijgpkm7qozOUMPJ9dzSvxMaFeB6JH7SNbTOODVFtsoD+mbJlH
YVMMWxih8b4MRJgAo4N2bL1Glp/Qnwg4ACawnQokt8Rknbtwku57QF9YwTbubr36
Ok7qbNnUSx0h9KtMQQNogLLkFreTJkbGknVWEwaWWhXNeW9l4cr6MWo=
=xVF9
-----END PGP SIGNATURE-----
Merge tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"As is tradition, cpufreq is the part with the largest number of
updates that include core fixes and cleanups as well as updates of
several assorted drivers, but there are also quite a few updates
related to system sleep, mostly focused on asynchronous suspend and
resume of devices and on making the integration of system suspend
and resume with runtime PM easier.
Runtime PM is also updated to allow some code duplication in drivers
to be eliminated going forward and to work more consistently overall
in some cases.
Apart from that, there are some driver core updates related to PM
domains that should help to address ordering issues with devm_ cleanup
routines relying on PM domains, some assorted devfreq updates
including core fixes and cleanups, tooling updates, and documentation
and MAINTAINERS updates.
Specifics:
- Fix two initialization ordering issues in the cpufreq core and a
governor initialization error path in it, and clean it up (Lifeng
Zheng)
- Add Granite Rapids support in no-HWP mode to the intel_pstate
cpufreq driver (Li RongQing)
- Make intel_pstate always use HWP_DESIRED_PERF when operating in the
passive mode (Rafael Wysocki)
- Allow building the tegra124 cpufreq driver as a module (Aaron
Kling)
- Do minor cleanups for Rust cpufreq and cpumask APIs and fix
MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas
Bulwahn)
- Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter,
Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng)
- Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver
(Prashant Malani)
- Fix minimum performance state label error in the amd-pstate driver
documentation (Shouye Liu)
- Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq
governor and explain HW coordination influence on it in the
documentation (Shashank Balaji)
- Fix opencoded for_each_cpu() in idle_state_valid() in the DT
cpuidle driver (Yury Norov)
- Remove info about non-existing QoS interfaces from the PM QoS
documentation (Ulf Hansson)
- Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu)
- Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan)
- Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan)
- Simplify the sun8i-a33-mbus devfreq driver by using more devm
functions (Uwe Kleine-König)
- Fix an index typo in trans_stat() in devfreq (Chanwoo Choi)
- Check devfreq governor before using governor->name (Lifeng Zheng)
- Remove a redundant devfreq_get_freq_range() call from
devfreq_add_device() (Lifeng Zheng)
- Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng)
- Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng)
- Extend the asynchronous suspend and resume of devices to handle
suppliers like parents and consumers like children (Rafael Wysocki)
- Make pm_runtime_force_resume() work for drivers that set the
DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that
collaborate with the general ACPI PM domain to set it (Rafael
Wysocki)
- Add kernel parameter to disable asynchronous suspend/resume of
devices (Tudor Ambarus)
- Drop redundant might_sleep() calls from some functions in the
device suspend/resume core code (Zhongqiu Han)
- Fix the handling of monitors connected right before waking up the
system from sleep (tuhaowen)
- Clean up MAINTAINERS entries for suspend and hibernation (Rafael
Wysocki)
- Fix error code path in the KEXEC_JUMP flow and drop a redundant
pm_restore_gfp_mask() call from it (Rafael Wysocki)
- Rearrange suspend/resume error handling in the core device suspend
and resume code (Rafael Wysocki)
- Fix up white space that does not follow coding style in the
hibernation core code (Darshan Rathod)
- Document return values of suspend-related API functions in the
runtime PM framework (Sakari Ailus)
- Mark last busy stamp in multiple autosuspend-related functions in
the runtime PM framework and update its documentation (Sakari
Ailus)
- Take active children into account in pm_runtime_get_if_in_use() for
consistency (Rafael Wysocki)
- Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu
power capping driver (Sivan Zohar-Kotzer)
- Add support for the Bartlett Lake platform to the Intel RAPL power
capping driver (Qiao Wei)
- Add PL4 support for Panther Lake to the intel_rapl_msr power
capping driver (Zhang Rui)
- Update contact information in the PM ABI docs and maintainer
information in the power domains DT binding (Rafael Wysocki)
- Update PM header inclusions to follow the IWYU (Include What You
Use) principle (Andy Shevchenko)
- Add flags to specify power on attach/detach for PM domains, make
the driver core detach PM domains in device_unbind_cleanup(), and
drop the dev_pm_domain_detach() call from the platform bus type
(Claudiu Beznea)
- Improve Python binding's Makefile for cpupower (John B. Wyatt IV)
- Fix printing of CORE, CPU fields in cpupower-monitor (Gautham
Shenoy)"
* tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag
PM: docs: Use my kernel.org address in ABI docs and DT bindings
PM: hibernate: Fix up white space that does not follow coding style
PM: sleep: Rearrange suspend/resume error handling in the core
Documentation: amd-pstate:fix minimum performance state label error
PM: runtime: Take active children into account in pm_runtime_get_if_in_use()
kexec_core: Drop redundant pm_restore_gfp_mask() call
kexec_core: Fix error code path in the KEXEC_JUMP flow
PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation
drivers: cpufreq: add Tegra114 support
rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs
cpufreq: Exit governor when failed to start old governor
cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq()
cpufreq: Init policy->rwsem before it may be possibly used
cpufreq: Initialize cpufreq-based frequency-invariance later
cpufreq: Remove duplicate check in __cpufreq_offline()
cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs
cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
PM / devfreq: Add HiSilicon uncore frequency scaling driver
...
|
|
|
|
863aab3d4d |
bpf: Add log for attaching tracing programs to functions in deny list
Show the rejected function name when attaching tracing programs to functions in deny list. With this change, we know why tracing programs can't attach to functions like __rcu_read_lock() from log. $ ./fentry libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG -- Attaching tracing programs to function '__rcu_read_lock' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
|
|
|
a5a6b29a70 |
bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions
With this change, we know the precise rejected function name when attaching fexit/fmod_ret to __noreturn functions from log. $ ./fexit libbpf: prog 'fexit': BPF program load failed: -EINVAL libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG -- Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
|
|
|
e833f7dfe3 |
audit/stable-6.17 PR 20250725
-----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmiD0Y4UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNc3RAAnLjNithviO7pal68D+4oZDvAhh7S B1zH3hS9mcRZtDa0ZwdzxlzZm0qacpZZgqha3vAPCnke/nYlnCJa8Iz6XzRYw1Dy dHVa9W/F53KZgUIaI3FIuMN9aaO8bjZFJLHMtjZyL+DjmIoOfDBE1b1wdgqDFOI6 bx2czQaRP/3dP9xFRlTxDviMJXBajonvUZaOOTd/2kHaHRNSSjEja1zE0E4CVKTd 6eMT3Oj1wjVJWIVgGd5BkuXTa1aSWNu2qYrigL8572VeIllv9yZQvJEaypmmicTD XL7aAdYYitTPOL9KGxANmCOfKfan2JYDPqWBhm+UjlKXq4PYbw0zgEYniJ6wmYkq LDKOc+Is3QbkPCxbIxu11pugcIUlI8Kzy43bd2SwrpVqXAB378eFqmUkLEohSxaj TVnW/r+GMh3dx8Rsig0XenhlKvJLKJrPHHRgJ2Kob4aPf2917VX/HRngqwJ6V8C4 p4jhcky8f5/XwcBmZBt8xYuHDY7QCe1U9dwYjjrm6szzDnPrMv1lupxn55C/Bton wdlNHk/2199aguWkSAchJlPXE64eam+G88KGjxxCW5jQmbQU8myOrqipf1lxavyp sDlve5f1mhgeV6ZoZzJkiRUKSz5VvMKoiook8hfyj3rRtzh1Kz5J0ojFaPH46keA ixmmkpn6Ydw7yV8= =fLqm -----END PGP SIGNATURE----- Merge tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit update from Paul Moore: "A single audit patch that restores logging of an audit event in the module load failure case" * tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit,module: restore audit logging in load failure case |
|
|
|
13150742b0 |
Crypto library updates for 6.17
This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants. These APIs will allow further simplifications across the
kernel as users start using them instead of the old-school crypto API.
(I've already written a lot of such conversion patches, removing over
1000 more lines of code. But most of those will target 6.18 or later.)
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaIZ93BQcZWJpZ2dlcnNA
a2VybmVsLm9yZwAKCRDzXCl4vpKOK8HCAQD3O9P0qd6wscne5XuRwaybzKHQ2AqU
OlhlDZWQQEvYAgD/aa6KP/DS+8RKGj0TBn6bACAJyXyDygFXq5a5s9pGzAs=
=UmMM
-----END PGP SIGNATURE-----
Merge tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library updates from Eric Biggers:
"This is the main crypto library pull request for 6.17. The main focus
this cycle is on reorganizing the SHA-1 and SHA-2 code, providing
high-quality library APIs for SHA-1 and SHA-2 including HMAC support,
and establishing conventions for lib/crypto/ going forward:
- Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares
most of the SHA-512 code) into lib/crypto/. This includes both the
generic and architecture-optimized code. Greatly simplify how the
architecture-optimized code is integrated. Add an easy-to-use
library API for each SHA variant, including HMAC support. Finally,
reimplement the crypto_shash support on top of the library API.
- Apply the same reorganization to the SHA-256 code (and also SHA-224
which shares most of the SHA-256 code). This is a somewhat smaller
change, due to my earlier work on SHA-256. But this brings in all
the same additional improvements that I made for SHA-1 and SHA-512.
There are also some smaller changes:
- Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code
from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For
these algorithms it's just a move, not a full reorganization yet.
- Fix the MIPS chacha-core.S to build with the clang assembler.
- Fix the Poly1305 functions to work in all contexts.
- Fix a performance regression in the x86_64 Poly1305 code.
- Clean up the x86_64 SHA-NI optimized SHA-1 assembly code.
Note that since the new organization of the SHA code is much simpler,
the diffstat of this pull request is negative, despite the addition of
new fully-documented library APIs for multiple SHA and HMAC-SHA
variants.
These APIs will allow further simplifications across the kernel as
users start using them instead of the old-school crypto API. (I've
already written a lot of such conversion patches, removing over 1000
more lines of code. But most of those will target 6.18 or later)"
* tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits)
lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils
lib/crypto: x86/sha1-ni: Convert to use rounds macros
lib/crypto: x86/sha1-ni: Minor optimizations and cleanup
crypto: sha1 - Remove sha1_base.h
lib/crypto: x86/sha1: Migrate optimized code into library
lib/crypto: sparc/sha1: Migrate optimized code into library
lib/crypto: s390/sha1: Migrate optimized code into library
lib/crypto: powerpc/sha1: Migrate optimized code into library
lib/crypto: mips/sha1: Migrate optimized code into library
lib/crypto: arm64/sha1: Migrate optimized code into library
lib/crypto: arm/sha1: Migrate optimized code into library
crypto: sha1 - Use same state format as legacy drivers
crypto: sha1 - Wrap library and add HMAC support
lib/crypto: sha1: Add HMAC support
lib/crypto: sha1: Add SHA-1 library functions
lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()
crypto: x86/sha1 - Rename conflicting symbol
lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()
lib/crypto: arm/poly1305: Remove unneeded empty weak function
lib/crypto: x86/poly1305: Fix performance regression on short messages
...
|
|
|
|
8e736a2eea |
hardening updates for v6.17-rc1
- Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIfUkgAKCRA2KwveOeQk uypLAP92r6f47sWcOw/5B9aVffX6Bypsb7dqBJQpCNxI5U1xcAEAiCrZ98UJyOeQ JQgnXd4N67K4EsS2JDc+FutRn3Yi+A8= =+5Bq -----END PGP SIGNATURE----- Merge tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO * tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) sched/task_stack: Add missing const qualifier to end_of_stack() kstack_erase: Support Clang stack depth tracking kstack_erase: Add -mgeneral-regs-only to silence Clang warnings init.h: Disable sanitizer coverage for __init and __head kstack_erase: Disable kstack_erase for all of arm compressed boot code x86: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatch powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON configs/hardening: Enable CONFIG_KSTACK_ERASE stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Rename STACKLEAK to KSTACK_ERASE seq_buf: Introduce KUnit tests string: Group str_has_prefix() and strstarts() kunit/fortify: Add back "volatile" for sizeof() constants acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings ... |
|
|
|
d900c4ce63 |
execve updates for v6.17
- Introduce regular REGSET note macros arch-wide (Dave Martin) - Remove arbitrary 4K limitation of program header size (Yin Fengwei) - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi) -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaIVKiAAKCRA2KwveOeQk u4zBAP4zUNj2+XyixVPXCzv+Hkle6zWs7yrzdA2yLxe8Qtwj5AD+N2I6MUGcCFGW W+uWxlWTtGLDqh1CplIUqTlxMi39Og4= =vYnE -----END PGP SIGNATURE----- Merge tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Introduce regular REGSET note macros arch-wide (Dave Martin) - Remove arbitrary 4K limitation of program header size (Yin Fengwei) - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi) * tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits) fork: reorder function qualifiers for copy_clone_args_from_user binfmt_elf: remove the 4k limitation of program header size binfmt_elf: Warn on missing or suspicious regset note names xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names ... |
|
|
|
6e11664f14 |
for-6.17/block-20250728
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmiHdZ8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptRED/9o3dQ1QHL5yNM/AyCCGox0V4zra8qGS/Vc
cBWpAVrmPGRw0IYlLZENtN9PdwKcbMzJq3l6cxeC7dBnAZP0AxTzP4YYJYUNVsqo
WtJ3d/k5+cVp0OyOp4uabaqNeMeLoPk9/JXe1Ml2KxtDmHtj5yee0JRh7zlPZmZj
tsrpIUTeHgAPn6yR1EI+0ybx/mjCb05Mv2Y8gF5hkUPA2PuON+MTFixJmqoy2ySh
n+22mz/prqlyOSYh/VVv1+9jcQ94wMjcW0JIpg9lM3Kg8BCPU4IetvO1UiX6X33v
154zEh2aJJDBx+yORS4BM4JMXjRZI7lYea2dkHM8Cajctu1Wpja9bNwnK9ibXvEc
WtyBwztleLbAZef25fA/W87JE23fGa/r3nwIb2cF4QqkAFslCvhjA93WkOzNJCgQ
qsWOrlCh3IK2NUu4b1Ncs3ZHOPvc51+zzjMzC6SUr54xhrxDK+gngDPhRy7XDqWJ
DTMpIlr366o8GdJqnib0/e/CPBrThS6Vl6u0tgLnNbwdpK1svgo/uHW5ksKvDqHX
kGEIhyRRJJC+4wyl4dsYKXa2twcyFrlWdAE+pZguEC2nZRYqYl9uXftOtvfp1x0y
/skDX0FIDjvyjRqCLcqF03FSGqwCGS8WuWXZjPhVhcfz47NvbHeFDh1G/jMzsbpj
S9zrPve/DQ==
=e86T
-----END PGP SIGNATURE-----
Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- MD pull request via Yu:
- call del_gendisk synchronously (Xiao)
- cleanup unused variable (John)
- cleanup workqueue flags (Ryo)
- fix faulty rdev can't be removed during resync (Qixing)
- NVMe pull request via Christoph:
- try PCIe function level reset on init failure (Keith Busch)
- log TLS handshake failures at error level (Maurizio Lombardi)
- pci-epf: do not complete commands twice if nvmet_req_init()
fails (Rick Wertenbroek)
- misc cleanups (Alok Tiwari)
- Removal of the pktcdvd driver
This has been more than a decade coming at this point, and some
recently revealed breakages that had it causing issues even for cases
where it isn't required made me re-pull the trigger on this one. It's
known broken and nobody has stepped up to maintain the code
- Series for ublk supporting batch commands, enabling the use of
multishot where appropriate
- Speed up ublk exit handling
- Fix for the two-stage elevator fixing which could leak data
- Convert NVMe to use the new IOVA based API
- Increase default max transfer size to something more reasonable
- Series fixing write operations on zoned DM devices
- Add tracepoints for zoned block device operations
- Prep series working towards improving blk-mq queue management in the
presence of isolated CPUs
- Don't allow updating of the block size of a loop device that is
currently under exclusively ownership/open
- Set chunk sectors from stacked device stripe size and use it for the
atomic write size limit
- Switch to folios in bcache read_super()
- Fix for CD-ROM MRW exit flush handling
- Various tweaks, fixes, and cleanups
* tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits)
block: restore two stage elevator switch while running nr_hw_queue update
cdrom: Call cdrom_mrw_exit from cdrom_release function
sunvdc: Balance device refcount in vdc_port_mpgroup_check
nvme-pci: try function level reset on init failure
dm: split write BIOs on zone boundaries when zone append is not emulated
block: use chunk_sectors when evaluating stacked atomic write limits
dm-stripe: limit chunk_sectors to the stripe size
md/raid10: set chunk_sectors limit
md/raid0: set chunk_sectors limit
block: sanitize chunk_sectors for atomic write limits
ilog2: add max_pow_of_two_factor()
nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails
nvme-tcp: log TLS handshake failures at error level
docs: nvme: fix grammar in nvme-pci-endpoint-target.rst
nvme: fix typo in status code constant for self-test in progress
nvmet: remove redundant assignment of error code in nvmet_ns_enable()
nvme: fix incorrect variable in io cqes error message
nvme: fix multiple spelling and grammar issues in host drivers
block: fix blk_zone_append_update_request_bio() kernel-doc
md/raid10: fix set but not used variable in sync_request_write()
...
|
|
|
|
133c302a0c |
tracing: trace_fprobe: Fix typo of the semicolon
Fix a typo that uses ',' instead of ';' for line delimiter. Link: https://lore.kernel.org/linux-trace-kernel/175366879192.487099.5714468217360139639.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
|
|
|
7e7bc8335b |
vfs-6.17-rc1.bpf
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCjwAKCRCRxhvAZXjc osnVAQCv4rM7sF4yJvGlm1myIJcJy5Sabk2q31qMdI1VHmkcOwD+Mxs7d1aByTS8 /6djhVleq6lcT2LpP9j8YI3Rb+x30QY= =PF3o -----END PGP SIGNATURE----- Merge tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs bpf updates from Christian Brauner: "These changes allow bpf to read extended attributes from cgroupfs. This is useful in redirecting AF_UNIX socket connections based on cgroup membership of the socket. One use-case is the ability to implement log namespaces in systemd so services and containers are redirected to different journals" * tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/kernfs: test xattr retrieval selftests/bpf: Add tests for bpf_cgroup_read_xattr bpf: Mark cgroup_subsys_state->cgroup RCU safe bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node kernfs: remove iattr_mutex |
|
|
|
672dcda246 |
vfs-6.17-rc1.pidfs
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINCiQAKCRCRxhvAZXjc
orltAQDq3y1anYETz5/FD6P2gXY1W5hXdSm3EHHeacQ1JjTXvgEA2g1lWO7J4anf
oOVE8aSvMow/FOjivLZBYmI65pkYJAE=
=oDKB
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- persistent info
Persist exit and coredump information independent of whether anyone
currently holds a pidfd for the struct pid.
The current scheme allocated pidfs dentries on-demand repeatedly.
This scheme is reaching it's limits as it makes it impossible to pin
information that needs to be available after the task has exited or
coredumped and that should not be lost simply because the pidfd got
closed temporarily. The next opener should still see the stashed
information.
This is also a prerequisite for supporting extended attributes on
pidfds to allow attaching meta information to them.
If someone opens a pidfd for a struct pid a pidfs dentry is allocated
and stashed in pid->stashed. Once the last pidfd for the struct pid
is closed the pidfs dentry is released and removed from pid->stashed.
So if 10 callers create a pidfs dentry for the same struct pid
sequentially, i.e., each closing the pidfd before the other creates a
new one then a new pidfs dentry is allocated every time.
Because multiple tasks acquiring and releasing a pidfd for the same
struct pid can race with each another a task may still find a valid
pidfs entry from the previous task in pid->stashed and reuse it. Or
it might find a dead dentry in there and fail to reuse it and so
stashes a new pidfs dentry. Multiple tasks may race to stash a new
pidfs dentry but only one will succeed, the other ones will put their
dentry.
The current scheme aims to ensure that a pidfs dentry for a struct
pid can only be created if the task is still alive or if a pidfs
dentry already existed before the task was reaped and so exit
information has been was stashed in the pidfs inode.
That's great except that it's buggy. If a pidfs dentry is stashed in
pid->stashed after pidfs_exit() but before __unhash_process() is
called we will return a pidfd for a reaped task without exit
information being available.
The pidfds_pid_valid() check does not guard against this race as it
doens't sync at all with pidfs_exit(). The pid_has_task() check might
be successful simply because we're before __unhash_process() but
after pidfs_exit().
Introduce a new scheme where the lifetime of information associated
with a pidfs entry (coredump and exit information) isn't bound to the
lifetime of the pidfs inode but the struct pid itself.
The first time a pidfs dentry is allocated for a struct pid a struct
pidfs_attr will be allocated which will be used to store exit and
coredump information.
If all pidfs for the pidfs dentry are closed the dentry and inode can
be cleaned up but the struct pidfs_attr will stick until the struct
pid itself is freed. This will ensure minimal memory usage while
persisting relevant information.
The new scheme has various advantages. First, it allows to close the
race where we end up handing out a pidfd for a reaped task for which
no exit information is available. Second, it minimizes memory usage.
Third, it allows to remove complex lifetime tracking via dentries
when registering a struct pid with pidfs. There's no need to get or
put a reference. Instead, the lifetime of exit and coredump
information associated with a struct pid is bound to the lifetime of
struct pid itself.
- extended attributes
Now that we have a way to persist information for pidfs dentries we
can start supporting extended attributes on pidfds. This will allow
userspace to attach meta information to tasks.
One natural extension would be to introduce a custom pidfs.* extended
attribute space and allow for the inheritance of extended attributes
across fork() and exec().
The first simple scheme will allow privileged userspace to set
trusted extended attributes on pidfs inodes.
- Allow autonomous pidfs file handles
Various filesystems such as pidfs and drm support opening file
handles without having to require a file descriptor to identify the
filesystem. The filesystem are global single instances and can be
trivially identified solely on the information encoded in the file
handle.
This makes it possible to not have to keep or acquire a sentinal file
descriptor just to pass it to open_by_handle_at() to identify the
filesystem. That's especially useful when such sentinel file
descriptor cannot or should not be acquired.
For pidfs this means a file handle can function as full replacement
for storing a pid in a file. Instead a file handle can be stored and
reopened purely based on the file handle.
Such autonomous file handles can be opened with or without specifying
a a file descriptor. If no proper file descriptor is used the
FD_PIDFS_ROOT sentinel must be passed. This allows us to define
further special negative fd sentinels in the future.
Userspace can trivially test for support by trying to open the file
handle with an invalid file descriptor.
- Allow pidfds for reaped tasks with SCM_PIDFD messages
This is a logical continuation of the earlier work to create pidfds
for reaped tasks through the SO_PEERPIDFD socket option merged in
|
|
|
|
614384533d |
rv: Add opid per-cpu monitor
Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
Monitor to ensure wakeup and need_resched occur with irq and
preemption disabled or in irq handlers.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-10-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
e8440a88e5 |
rv: Add nrp and sssw per-task monitors
Add 2 per-task monitors as part of the sched model:
* nrp: need-resched preempts
Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
Monitor to ensure sched_set_state to sleepable leads to sleeping and
sleeping tasks require wakeup.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/20250728135022.255578-9-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
d0096c2f9c |
rv: Replace tss and sncid monitors with more complete sts
The tss monitor currently guarantees task switches can happen only while scheduling, whereas the sncid monitor enforces scheduling occurs with interrupt disabled. Replace the monitors with a more comprehensive specification which implies both but also ensures that: * each scheduler call disable interrupts to switch * each task switch happens with interrupts disabled Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nam Cao <namcao@linutronix.de> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250728135022.255578-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
adcc3bfa88 |
sched: Adapt sched tracepoints for RV task model
Add the following tracepoint:
* sched_set_need_resched(tsk, cpu, tif)
Called when a task is set the need resched [lazy] flag
Remove the unused ip parameter from sched_entry and sched_exit and alter
sched_entry to have a value of preempt consistent with the one used in
sched_switch.
Also adapt all monitors using sched_{entry,exit} to avoid breaking build.
These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
9d475d80c9 |
rv: Retry when da monitor detects race conditions
DA monitor can be accessed from multiple cores simultaneously, this is likely, for instance when dealing with per-task monitors reacting on events that do not always occur on the CPU where the task is running. This can cause race conditions where two events change the next state and we see inconsistent values. E.g.: [62] event_srs: 27: sleepable x sched_wakeup -> running (final) [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable [63] error_srs: 27: event sched_switch_suspend not expected in the state running In this case the monitor fails because the event on CPU 62 wins against the one on CPU 63, although the correct state should have been sleepable, since the task get suspended. Detect if the current state was modified by using try_cmpxchg while storing the next value. If it was, try again reading the current state. After a maximum number of failed retries, react by calling a special tracepoint, print on the console and reset the monitor. Remove the functions da_monitor_curr_state() and da_monitor_set_state() as they only hide the underlying implementation in this case. Monitors where this type of condition can occur must be able to account for racing events in any possible order, as we cannot know the winner. Cc: Ingo Molnar <mingo@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250728135022.255578-6-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
79de661707 |
rv: Adjust monitor dependencies
RV monitors relying on the preemptirqs tracepoints are set as dependent
on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
enable the tracepoints but are not the minimal configurations enabling
them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
manually).
Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
monitors.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-5-gmonaco@redhat.com
Fixes:
|
|
|
|
7f904ff6e5 |
rv: Use strings in da monitors tracepoints
Using DA monitors tracepoints with KASAN enabled triggers the following
warning:
BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
Read of size 32 at addr ffffffffaada8980 by task ...
Call Trace:
<TASK>
[...]
do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
? trace_event_sncid+0x83/0x200
trace_event_sncid+0x163/0x200
[...]
The buggy address belongs to the variable:
automaton_snep+0x4e0/0x5e0
This is caused by the tracepoints reading 32 bytes __array instead of
__string from the automata definition. Such strings are literals and
reading 32 bytes ends up in out of bound memory accesses (e.g. the next
automaton's data in this case).
The error is harmless as, while printing the string, we stop at the null
terminator, but it should still be fixed.
Use the __string facilities while defining the tracepoints to avoid
reading out of bound memory.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-4-gmonaco@redhat.com
Fixes:
|
|
|
|
7b70ac4cad |
rv: Remove trailing whitespace from tracepoint string
RV event tracepoints print a line with the format:
"event_xyz: S0 x event -> S1 "
"event_xyz: S1 x event -> S0 (final)"
While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").
Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250728135022.255578-3-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
117eab5c6e |
vfs-6.17-rc1.coredump
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaINAYAAKCRCRxhvAZXjc
opJiAQDXGs+gQcxJ+4BpV4QszT2OJC19oI/f5AQ4PWMJdHgr4AEA7fc6NbBrpmW7
L/tbdAwIiWp8bL1Q8Wy7Q2qldHtcggM=
=KbD9
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull coredump updates from Christian Brauner:
"This contains an extension to the coredump socket and a proper rework
of the coredump code.
- This extends the coredump socket to allow the coredump server to
tell the kernel how to process individual coredumps. This allows
for fine-grained coredump management. Userspace can decide to just
let the kernel write out the coredump, or generate the coredump
itself, or just reject it.
* COREDUMP_KERNEL
The kernel will write the coredump data to the socket.
* COREDUMP_USERSPACE
The kernel will not write coredump data but will indicate to the
parent that a coredump has been generated. This is used when
userspace generates its own coredumps.
* COREDUMP_REJECT
The kernel will skip generating a coredump for this task.
* COREDUMP_WAIT
The kernel will prevent the task from exiting until the coredump
server has shutdown the socket connection.
The flexible coredump socket can be enabled by using the "@@"
prefix instead of the single "@" prefix for the regular coredump
socket:
@@/run/systemd/coredump.socket
- Cleanup the coredump code properly while we have to touch it
anyway.
Split out each coredump mode in a separate helper so it's easy to
grasp what is going on and make the code easier to follow. The core
coredump function should now be very trivial to follow"
* tag 'vfs-6.17-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cleanup: add a scoped version of CLASS()
coredump: add coredump_skip() helper
coredump: avoid pointless variable
coredump: order auto cleanup variables at the top
coredump: add coredump_cleanup()
coredump: auto cleanup prepare_creds()
cred: add auto cleanup method
coredump: directly return
coredump: auto cleanup argv
coredump: add coredump_write()
coredump: use a single helper for the socket
coredump: move pipe specific file check into coredump_pipe()
coredump: split pipe coredumping into coredump_pipe()
coredump: move core_pipe_count to global variable
coredump: prepare to simplify exit paths
coredump: split file coredumping into coredump_file()
coredump: rename do_coredump() to vfs_coredump()
selftests/coredump: make sure invalid paths are rejected
coredump: validate socket path in coredump_parse()
coredump: don't allow ".." in coredump socket path
...
|
|
|
|
5b4c54ac49 |
bpf: Fix various typos in verifier.c comments
This patch fixes several minor typos in comments within the BPF verifier. No changes in functionality. Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Link: https://lore.kernel.org/r/20250727081754.15986-1-suchitkarunakaran@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
|
|
|
5dbb19b16a |
bpf: Add third round of bounds deduction
Commit
|
|
|
|
00bf8d0c6c |
bpf: Improve bounds when s64 crosses sign boundary
__reg64_deduce_bounds currently improves the s64 range using the u64
range and vice versa, but only if it doesn't cross the sign boundary.
This patch improves __reg64_deduce_bounds to cover the case where the
s64 range crosses the sign boundary but overlaps with the u64 range on
only one end. In that case, we can improve both ranges. Consider the
following example, with the s64 range crossing the sign boundary:
0 U64_MAX
| [xxxxxxxxxxxxxx u64 range xxxxxxxxxxxxxx] |
|----------------------------|----------------------------|
|xxxxx s64 range xxxxxxxxx] [xxxxxxx|
0 S64_MAX S64_MIN -1
The u64 range overlaps only with positive portion of the s64 range. We
can thus derive the following new s64 and u64 ranges.
0 U64_MAX
| [xxxxxx u64 range xxxxx] |
|----------------------------|----------------------------|
| [xxxxxx s64 range xxxxx] |
0 S64_MAX S64_MIN -1
The same logic can probably apply to the s32/u32 ranges, but this patch
doesn't implement that change.
In addition to the selftests, the __reg64_deduce_bounds change was
also tested with Agni, the formal verification tool for the range
analysis [1].
Link: https://github.com/bpfverif/agni [1]
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/933bd9ce1f36ded5559f92fdc09e5dbc823fa245.1753695655.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
3cfb9c1a7d |
rv: Fix wrong type cast in reactors_show() and monitor_reactor_show()
Argument 'p' of reactors_show() and monitor_reactor_show() is not a pointer to struct rv_reactor, it is actually a pointer to the list_head inside struct rv_reactor. Therefore it's wrong to cast 'p' to struct rv_reactor *. This wrong type cast has been there since the beginning. But it still worked because the list_head was the first field in struct rv_reactor_def. This is no longer true since commit |
|
|
|
e82aea50fe |
rv: Fix wrong type cast in monitors_show()
Argument 'p' of monitors_show() is not a pointer to struct rv_monitor, it is actually a pointer to the list_head inside struct rv_monitor. Therefore it is wrong to cast 'p' to struct rv_monitor *. This wrong type cast has been there since the beginning. But it still worked because the list_head was the first field in struct rv_monitor_def. This is no longer true since commit |
|
|
|
5345e64760 |
bpf: Simplify bounds refinement from s32
During the bounds refinement, we improve the precision of various ranges
by looking at other ranges. Among others, we improve the following in
this order (other things happen between 1 and 2):
1. Improve u32 from s32 in __reg32_deduce_bounds.
2. Improve s/u64 from u32 in __reg_deduce_mixed_bounds.
3. Improve s/u64 from s32 in __reg_deduce_mixed_bounds.
In particular, if the s32 range forms a valid u32 range, we will use it
to improve the u32 range in __reg32_deduce_bounds. In
__reg_deduce_mixed_bounds, under the same condition, we will use the s32
range to improve the s/u64 ranges.
If at (1) we were able to learn from s32 to improve u32, we'll then be
able to use that in (2) to improve s/u64. Hence, as (3) happens under
the same precondition as (1), it won't improve s/u64 ranges further than
(1)+(2) did. Thus, we can get rid of (3).
In addition to the extensive suite of selftests for bounds refinement,
this patch was also tested with the Agni formal verification tool [1].
Additionally, Eduard mentioned:
The argument appears to be as follows:
Under precondition `(u32)reg->s32_min <= (u32)reg->s32_max`
__reg32_deduce_bounds produces:
reg->u32_min = max_t(u32, reg->s32_min, reg->u32_min);
reg->u32_max = min_t(u32, reg->s32_max, reg->u32_max);
And then first part of __reg_deduce_mixed_bounds assigns:
a. reg->umin umax= (reg->umin & ~0xffffffffULL) | max_t(u32, reg->s32_min, reg->u32_min);
b. reg->umax umin= (reg->umax & ~0xffffffffULL) | min_t(u32, reg->s32_max, reg->u32_max);
And then second part of __reg_deduce_mixed_bounds assigns:
c. reg->umin umax= (reg->umin & ~0xffffffffULL) | (u32)reg->s32_min;
d. reg->umax umin= (reg->umax & ~0xffffffffULL) | (u32)reg->s32_max;
But assignment (c) is a noop because:
max_t(u32, reg->s32_min, reg->u32_min) >= (u32)reg->s32_min
Hence RHS(a) >= RHS(c) and umin= does nothing.
Also assignment (d) is a noop because:
min_t(u32, reg->s32_max, reg->u32_max) <= (u32)reg->s32_max
Hence RHS(b) <= RHS(d) and umin= does nothing.
Plus the same reasoning for the part dealing with reg->s{min,max}_value:
e. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | max_t(u32, reg->s32_min_value, reg->u32_min_value);
f. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | min_t(u32, reg->s32_max_value, reg->u32_max_value);
vs
g. reg->smin_value smax= (reg->smin_value & ~0xffffffffULL) | (u32)reg->s32_min_value;
h. reg->smax_value smin= (reg->smax_value & ~0xffffffffULL) | (u32)reg->s32_max_value;
RHS(e) >= RHS(g) and RHS(f) <= RHS(h), hence smax=,smin= do nothing.
This appears to be correct.
Also, Shung-Hsi:
Beside going through the reasoning, I also played with CBMC a bit to
double check that as far as a single run of __reg_deduce_bounds() is
concerned (and that the register state matches certain handwavy
expectations), the change indeed still preserve the original behavior.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://github.com/bpfverif/agni [1]
Link: https://lore.kernel.org/bpf/aIJwnFnFyUjNsCNa@mail.gmail.com
|
|
|
|
b711733e89 |
A single fix for the PTP systemcounter mechanism:
The rework of this mechanism added a 'use_nsec' member to struct system_counterval. get_device_system_crosststamp() instantiates that struct on the stack and hands a pointer to the driver callback. Only the drivers which set use_nsec to true, initialize that field, but all others ignore it. As get_device_system_crosststamp() does not initialize the struct, the use_nsec field contains random stack content in those cases. That causes a miscalulation usually resulting in a failing range check in the best case. Initialize the structure before handing it to the drivers to cure that. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmiGFA4THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRcsEACvQI0LmKTOigzSZvBT1CZnGcwpeqYi Ez0v/w+tpyfbwQgf9kxR+ZbjNdwqCYFnR8PZPFKFuvsanWRTIcYaTkIQWvDhcEX/ U4AFI3VkdZUFckCEY/fv7j3/jkp7pbLVHMq001Z9xaMMcE+ox1AlHpEW0Khd3gqL VFLXU5S7Q9H6J6ujjFAXAMuhgjk6WOz8q+ew3hnc3dxwyuEBAz83jOScH/be3dTl 10ydzoxFEa+ZlacAHX+SqZ7nhS7ExxNlwlUuTYj/EkBCQ8UIoS93YLA5bYMcWCao W5rs6vFJmMO6NR6lkqwfKmKyjovx79jHMVNKoxydZGvkqcNMtfc/eUfByxAkyCDP gmTCFwgKVGdjGsYwkGqafejmJt5OFrD1hMyWfBhGWQ/Z8CXuuJNEa/8trSyUK/CS DFD1InOLltbYuw7rY5gRxb+xmgBTxUMj8gF/hXYs7wNzJqNJXXNae/2Sue+Xi+mV iieEF8UonmpMe9k9w3+fFGGDWYa4lYnT5O3VMQ0nEjj6dt5RVQqRvjTa+GtQJzUs h4fUs+BIKyCkh6DgRKyIsDruzryOSnZ+vqMcGMm0gvPttc3cGYksLiQVlWYjQhxs pTFrHNGOSXMT5WBQ7KWKzGypHlf3WYhVWk1+dmPJedrdyr23AfgKAGM6zva1Oqjc 81w9DvppBL0sOA== =j6Po -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for the PTP systemcounter mechanism: The rework of this mechanism added a 'use_nsec' member to struct system_counterval. get_device_system_crosststamp() instantiates that struct on the stack and hands a pointer to the driver callback. Only the drivers which set use_nsec to true, initialize that field, but all others ignore it. As get_device_system_crosststamp() does not initialize the struct, the use_nsec field contains random stack content in those cases. That causes a miscalulation usually resulting in a failing range check in the best case. Initialize the structure before handing it to the drivers to cure that" * tag 'timers-urgent-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Zero initialize system_counterval when querying time from phc drivers |
|
|
|
6676fd3c99 |
kstack_erase: Add -mgeneral-regs-only to silence Clang warnings
Once CONFIG_KSTACK_ERASE is enabled with Clang on i386, the build warns: kernel/kstack_erase.c:168:2: warning: function with attribute 'no_caller_saved_registers' should only call a function with attribute 'no_caller_saved_registers' or be compiled with '-mgeneral-regs-only' [-Wexcessive-regsave] Add -mgeneral-regs-only for the kstack_erase handler, to make Clang feel better (it is effectively a no-op flag for the kernel). No binary changes encountered. Build & boot tested with Clang 21 on x86_64, and i386. Build tested with GCC 14.2.0 on x86_64, i386, arm64, and arm. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/all/20250726004313.GA3650901@ax162 Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
3ba58312e6 |
bpf: Move bpf_jit_get_prog_name() to core.c
bpf_jit_get_prog_name() will be used by all JITs when enabling support for private stack. This function is currently implemented in the x86 JIT. Move the function to core.c so that other JITs can easily use it in their implementation of private stack. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250724120257.7299-2-puranjay@kernel.org |
|
|
|
b7b3500bd4 |
umd: Remove usermode driver framework
The code is unused since
|
|
|
|
2b03164eee |
bpf/preload: Don't select USERMODE_DRIVER
The usermode driver framework is not used anymore by the BPF
preload code.
Fixes:
|
|
|
|
b8a7fba39c |
rv: Remove struct rv_monitor::reacting
The field 'reacting' in struct rv_monitor is set but never used. Delete it. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/a6c16f845d2f1a09c4d0934ab83f3cb14478a71d.1753378331.git.namcao@linutronix.de Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
3d3800b4f7 |
rv: Remove rv_reactor's reference counter
rv_reactor has a reference counter to ensure it is not removed while monitors are still using it. However, this is futile, as __exit functions are not expected to fail and will proceed normally despite rv_unregister_reactor() returning an error. At the moment, reactors do not support being built as modules, therefore they are never removed and the reference counters are not necessary. If we support building RV reactors as modules in the future, kernel module's centralized facilities such as try_module_get(), module_put() or MODULE_SOFTDEP should be used instead of this custom implementation. Remove this reference counter. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/bb946398436a5e17fb0f5b842ef3313c02291852.1753378331.git.namcao@linutronix.de Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
3d3c376118 |
rv: Merge struct rv_reactor_def into struct rv_reactor
Each struct rv_reactor has a unique struct rv_reactor_def associated with
it. struct rv_reactor is statically allocated, while struct rv_reactor_def
is dynamically allocated.
This makes the code more complicated than it should be:
- Lookup is required to get the associated rv_reactor_def from rv_reactor
- Dynamic memory allocation is required for rv_reactor_def. This is
harder to get right compared to static memory. For instance, there is
an existing mistake: rv_unregister_reactor() does not free the memory
allocated by rv_register_reactor(). This is fortunately not a real
memory leak problem as rv_unregister_reactor() is never called.
Simplify and merge rv_reactor_def into rv_reactor.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/71cb91c86cd40df5b8c492b788787f2a73c3eaa3.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
24cbfe18d5 |
rv: Merge struct rv_monitor_def into struct rv_monitor
Each struct rv_monitor has a unique struct rv_monitor_def associated with
it. struct rv_monitor is statically allocated, while struct rv_monitor_def
is dynamically allocated.
This makes the code more complicated than it should be:
- Lookup is required to get the associated rv_monitor_def from rv_monitor
- Dynamic memory allocation is required for rv_monitor_def. This is
harder to get right compared to static memory. For instance, there is
an existing mistake: rv_unregister_monitor() does not free the memory
allocated by rv_register_monitor(). This is fortunately not a real
memory leak problem, as rv_unregister_monitor() is never called.
Simplify and merge rv_monitor_def into rv_monitor.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/194449c00f87945c207aab4c96920c75796a4f53.1753378331.git.namcao@linutronix.de
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
b0c08dd534 |
rv: Remove unused field in struct rv_monitor_def
rv_monitor_def::task_monitor is not used. Delete it. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/502d94f2696435690a2b1fdbe80a9e56c96fcabf.1753378331.git.namcao@linutronix.de Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
2942242dde |
11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues
or aren't considered necessary for -stable kernels. 7 are for MM. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaILYBgAKCRDdBJ7gKXxA jo0uAQDvTlAjH6TcgRW/cbqHRIeiRoZ9Bwh/RUlJXM9neDR2LgEA41B+ohTsxUmZ OhM3Ce94tiGrHnVlW3SsmVaO+1TjGAU= =KUR9 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 7 are for MM" * tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: sprintf.h requires stdarg.h resource: fix false warning in __request_region() mm/damon/core: commit damos_quota_goal->nid kasan: use vmalloc_dump_obj() for vmalloc error reports mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() mm: update MAINTAINERS entry for HMM nilfs2: reject invalid file types when reading inodes selftests/mm: fix split_huge_page_test for folio_split() tests mailmap: add entry for Senozhatsky mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list |
|
|
|
71c52411c5 |
net: Create separate gro_flush_normal function
Move multiple copies of same code snippet doing `gro_flush` and `gro_normal_list` into separate helper function. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250723013031.2911384-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
|
|
8c4e53a1a0 |
tracing: Call trace_ftrace_test_filter() for the event
The trace event filter bootup self test tests a bunch of filter logic against the ftrace_test_filter event, but does not actually call the event. Work is being done to cause a warning if an event is defined but not used. To quiet the warning call the trace event under an if statement where it is disabled so it doesn't get optimized out. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas.schier@linux.dev> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250723194212.274458858@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
a4f5759b6f |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ6NaUOruQGUkvPdG4raS+Z+3y5EwUCaIJYlAAKCRAraS+Z+3y5 E5MFAQDW29BJyjRbB75oy6RxmFZX+xFmGgmy1XO3w822gIwgzQD/WzhsmFPDYv/F 7iOpLvez6zTySUdTJXJGCTvYJG5EHwU= =U8S4 -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-07-24 We've added 3 non-merge commits during the last 3 day(s) which contain a total of 4 files changed, 40 insertions(+), 15 deletions(-). The main changes are: 1) Improved verifier error message for incorrect narrower load from pointer field in ctx, from Paul Chaignon. 2) Disabled migration in nf_hook_run_bpf to address a syzbot report, from Kuniyuki Iwashima. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Test invalid narrower ctx load bpf: Reject narrower access to pointer ctx fields bpf: Disable migration in nf_hook_run_bpf(). ==================== Link: https://patch.msgid.link/20250724173306.3578483-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
|
|
91a229bb7b |
resource: fix false warning in __request_region()
A warning is raised when __request_region() detects a conflict with a
resource whose resource.desc is IORES_DESC_DEVICE_PRIVATE_MEMORY.
But this warning is only valid for iomem_resources.
The hmem device resource uses resource.desc as the numa node id, which can
cause spurious warnings.
This warning appeared on a machine with multiple cxl memory expanders.
One of the NUMA node id is 6, which is the same as the value of
IORES_DESC_DEVICE_PRIVATE_MEMORY.
In this environment it was just a spurious warning, but when I saw the
warning I suspected a real problem so it's better to fix it.
This change fixes this by restricting the warning to only iomem_resource.
This also adds a missing new line to the warning message.
Link: https://lkml.kernel.org/r/20250719112604.25500-1-akinobu.mita@gmail.com
Fixes:
|
|
|
|
8245d47cfa |
x86: Handle KCOV __init vs inline mismatches
GCC appears to have kind of fragile inlining heuristics, in the sense that it can change whether or not it inlines something based on optimizations. It looks like the kcov instrumentation being added (or in this case, removed) from a function changes the optimization results, and some functions marked "inline" are _not_ inlined. In that case, we end up with __init code calling a function not marked __init, and we get the build warnings I'm trying to eliminate in the coming patch that adds __no_sanitize_coverage to __init functions: WARNING: modpost: vmlinux: section mismatch in reference: xbc_exit+0x8 (section: .text.unlikely) -> _xbc_exit (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: real_mode_size_needed+0x15 (section: .text.unlikely) -> real_mode_blob_end (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: __set_percpu_decrypted+0x16 (section: .text.unlikely) -> early_set_memory_decrypted (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: memblock_alloc_from+0x26 (section: .text.unlikely) -> memblock_alloc_try_nid (section: .init.text) WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_set_root_pointer+0xc (section: .text.unlikely) -> x86_init (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: acpi_arch_get_root_pointer+0x8 (section: .text.unlikely) -> x86_init (section: .init.data) WARNING: modpost: vmlinux: section mismatch in reference: efi_config_table_is_usable+0x16 (section: .text.unlikely) -> xen_efi_config_table_is_usable (section: .init.text) This problem is somewhat fragile (though using either __always_inline or __init will deterministically solve it), but we've tripped over this before with GCC and the solution has usually been to just use __always_inline and move on. For x86 this means forcing several functions to be inline with __always_inline. Link: https://lore.kernel.org/r/20250724055029.3623499-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
8b5a19b4ff |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c |
|
|
|
58d5f0d437 |
rv: Return init error when registering monitors
Monitors generated with dot2k have their registration function (the one called during monitor initialisation) return always 0, even if the registration failed on RV side. This can hide potential errors. Return the value returned by the RV register function. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-6-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
560473f2e2 |
verification/rvgen: Organise Kconfig entries for nested monitors
The current behaviour of rvgen when running with the -a option is to append the necessary lines at the end of the configuration for Kconfig, Makefile and tracepoints. This is not always the desired behaviour in case of nested monitors: while tracepoints are not affected by nesting and the Makefile's only requirement is that the parent monitor is built before its children, in the Kconfig it is better to have children defined right after their parent, otherwise the result has wrong indentation: [*] foo_parent monitor [*] foo_child1 monitor [*] foo_child2 monitor [*] bar_parent monitor [*] bar_child1 monitor [*] bar_child2 monitor [*] foo_child3 monitor [*] foo_child4 monitor Adapt rvgen to look for a different marker for nested monitors in the Kconfig file and append the line right after the last sibling, instead of the last monitor. Also add the marker when creating a new parent monitor. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-5-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
9efcf59082 |
tools/dot2c: Fix generated files going over 100 column limit
The dot2c.py script generates all states in a single line. This breaks the 100 column limit when the state machines are non-trivial. Change dot2c.py to generate the states in separate lines in case the generated line is going to be too long. Also adapt existing monitors with line length over the limit. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-4-gmonaco@redhat.com Suggested-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
dabd3e7dcc |
tracing: Have eprobes handle arrays
eprobes are dynamic events that can read other events using their fields
to create new events. Currently it doesn't work with arrays. When the new
event field is attached to the old event field, it looks at the size of
the field to determine what type of field the new field should be. For 1
byte fields it's a char, for 2 bytes, it's a short and for 4 bytes it's an
integer. For all other sizes it just defaults to "long". This also reads
the contents of the field for such cases.
For arrays that are bigger than the size of long, return the value of the
address of the content itself. This will allow eprobes to read other
values in the array of the old event.
This is useful when raw_syscalls is enabled but the syscall events are
not. The syscall events are created from the raw_syscalls as they have an
array of "args" that holds the 6 long words passed to the syscall entry
point. To read the value of "filename" from sys_openat, the eprobe could
attach to the raw_syscall and read the second value.
It can then even be passed to a synthetic event and converted back to
another eprobe to get the value of "filename" after it has been read by
the kernel during the system call:
[
Create an eprobe called "sys" and attach it to sys_enter.
Read the id of the system call and the second argument
]
# echo 'e:sys raw_syscalls.sys_enter nr=$id:u32 arg2=+8($args):u64' >> /sys/kernel/tracing/dynamic_events
[
Create a synthetic event "path" that will hold the address of the
sys_openat filename. This is on a 64bit machine, so make it 64 bits
]
# echo 's:path u64 file;' >> /sys/kernel/tracing/dynamic_events
[
Add a histogram to the eprobe/sys which tiggers if the "nr" field is
257 (sys_openat), and save the filename in the "file" variable.
]
# echo 'hist:keys=common_pid:file=arg2 if nr == 257' > /sys/kernel/tracing/events/eprobes/sys/trigger
[
Attach a histogram to sys_exit event that triggers the "path" synthetic
event and records the "filename" that was passed from the sys eprobe.
]
# echo 'hist:keys=common_pid:f=$file:onmatch(eprobes.sys).trace(path,$f)' >> /sys/kernel/tracing/events/raw_syscalls/sys_exit/trigger
[
Create another eprobe that dereferences the "file" field as a user
space string and displays it.
]
# echo 'e:open synthetic.path file=+0($file):ustring' >> /sys/kernel/tracing/dynamic_events
# echo 1 > /sys/kernel/tracing/events/eprobes/open/enable
# cat trace_pipe
cat-1142 [003] ...5. 799.521912: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.521934: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522065: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522080: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522296: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
cat-1142 [003] ...5. 799.522319: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
less-1143 [005] ...5. 799.522327: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522333: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
cat-1142 [003] ...5. 799.522348: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
less-1143 [005] ...5. 799.522349: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522363: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
less-1143 [005] ...5. 799.522477: open: (synthetic.path) file="/etc/ld.so.cache"
cat-1142 [003] ...5. 799.522489: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
less-1143 [005] ...5. 799.522492: open: (synthetic.path) file="/etc/ld.so.cache"
less-1143 [005] ...5. 799.522720: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
less-1143 [005] ...5. 799.522744: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
less-1143 [005] ...5. 799.522759: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libtinfo.so.6"
cat-1142 [003] ...5. 799.522850: open: (synthetic.path) file="/lib/x86_64-linux-gnu/libc.so.6"
Link: https://lore.kernel.org/all/20250723124202.4f7475be@batman.local.home/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
|
|
e09299225d |
bpf: Reject narrower access to pointer ctx fields
The following BPF program, simplified from a syzkaller repro, causes a
kernel warning:
r0 = *(u8 *)(r1 + 169);
exit;
With pointer field sk being at offset 168 in __sk_buff. This access is
detected as a narrower read in bpf_skb_is_valid_access because it
doesn't match offsetof(struct __sk_buff, sk). It is therefore allowed
and later proceeds to bpf_convert_ctx_access. Note that for the
"is_narrower_load" case in the convert_ctx_accesses(), the insn->off
is aligned, so the cnt may not be 0 because it matches the
offsetof(struct __sk_buff, sk) in the bpf_convert_ctx_access. However,
the target_size stays 0 and the verifier errors with a kernel warning:
verifier bug: error during ctx access conversion(1)
This patch fixes that to return a proper "invalid bpf_context access
off=X size=Y" error on the load instruction.
The same issue affects multiple other fields in context structures that
allow narrow access. Some other non-affected fields (for sk_msg,
sk_lookup, and sockopt) were also changed to use bpf_ctx_range_ptr for
consistency.
Note this syzkaller crash was reported in the "Closes" link below, which
used to be about a different bug, fixed in
commit
|
|
|
|
9f0cb91767 |
tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise, ipi_entry and ipi_exit are only used by arm and arm64. This means these trace events are wasting memory in all the other architectures that do not use them. Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to enable these trace events. The config makes it easy if other architectures decide to trace these as well. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Will Deacon <will@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> |
|
|
|
cc1d1365f0 | Merge branches 'rcu-exp.23.07.2025', 'rcu.22.07.2025', 'torture-scripts.16.07.2025', 'srcu.19.07.2025', 'rcu.nocb.18.07.2025' and 'refscale.07.07.2025' into rcu.merge.23.07.2025 | |
|
|
558d5f3cd2 |
tracing: probes: Add a kerneldoc for traceprobe_parse_event_name()
Since traceprobe_parse_event_name() is a bit complicated, add a kerneldoc for explaining the behavior. Link: https://lore.kernel.org/all/175323430565.57270.2602609519355112748.stgit@devnote2/ Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
97e8230f89 |
tracing: uprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing uprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323429593.57270.12369235525923902341.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
4c6edb43ea |
tracing: eprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing eprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323428599.57270.988038042425748956.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
33b4e38baa |
tracing: kprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for parsing kprobe-events from heap instead of stack. Link: https://lore.kernel.org/all/175323427627.57270.5105357260879695051.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
d643eaa708 |
tracing: fprobe-event: Allocate string buffers from heap
Allocate temporary string buffers for fprobe-event from heap instead of stack. This fixes the stack frame exceed limit error. Link: https://lore.kernel.org/all/175323426643.57270.6657152008331160704.stgit@devnote2/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
43beb5e89b |
tracing: probe: Allocate traceprobe_parse_context from heap
Instead of allocating traceprobe_parse_context on stack, allocate it dynamically from heap (slab). This change is likely intended to prevent potential stack overflow issues, which can be a concern in the kernel environment where stack space is limited. Link: https://lore.kernel.org/all/175323425650.57270.280750740753792504.stgit@devnote2/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506240416.nZIhDXoO-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
2f02a61d84 |
tracing: probes: Sort #include alphabetically
Sort the #include directives in trace_probe* files alphabetically for easier maintenance and avoid double includes. This also groups headers as linux-generic, asm-generic, and local headers. Link: https://lore.kernel.org/all/175323424678.57270.11975372127870059007.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
|
|
|
9ba817fb7c |
tracing: Deprecate auto-mounting tracefs in debugfs
In January 2015, tracefs was created to allow access to the tracing infrastructure without needing to compile in debugfs. When tracefs is configured, the directory /sys/kernel/tracing will exist and tooling is expected to use that path to access the tracing infrastructure. To allow backward compatibility, when debugfs is mounted, it would automount tracefs in its "tracing" directory so that tooling that had hard coded /sys/kernel/debug/tracing would still work. It has been over 10 years since the new interface was introduced, and all tooling should now be using it. Start the process of deprecating the old path so that it doesn't need to be maintained anymore. A new config is added to allow distributions to disable automounting of tracefs on debugfs. If /sys/kernel/debug/tracing is accessed, a pr_warn() will trigger stating: "NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030" Expect to remove this feature in 5 years (2030). Cc: <linux-trace-users@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/20250722170806.40c068c6@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
73184c8e4f |
sysctl: rename kern_table -> sysctl_subsys_table
Renamed sysctl table from kern_table to sysctl_subsys_table and grouped the two arch specific ctls to the end of the array. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
25ebbce1f1 |
kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c
Moved ctl_tables elements for overflowuid and overflowgid into in kernel/sys.c. Create a register function that keeps them under "kernel" and run it after core with postcore_initcall. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
88eddb0502 |
uevent: mv uevent_helper into kobject_uevent.c
Move both uevent_helper table into lib/kobject_uevent.c. Place the registration early in the initcall order with postcore_initcall. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
6519dba9af |
sysctl: Remove superfluous includes from kernel/sysctl.c
Remove the following headers from the include list in sysctl.c.
* These are removed as the related variables are no longer there.
=================== ====================
Include Related Var
=================== ====================
linux/kmod.h usermodehelper
asm/nmi.h nmi_watchdoc_enabled
asm/io.h io_delay_type
linux/pid.h pid_max_{,min,max}
linux/sched/sysctl.h sysctl_{sched_*,numa_*,timer_*}
linux/mount.h sysctl_mount_max
linux/reboot.h poweroff_cmd
linux/ratelimit.h {,printk_}ratelimit_state
linux/printk.h kptr_restrict
linux/security.h CONFIG_SECURITY_CAPABILITIES
linux/net.h net_table
linux/key.h key_sysctls
linux/nvs_fs.h acpi_video_flags
linux/acpi.h acpi_video_flags
linux/fs.h proc_nr_files
* These are no longer needed as intermediate includes
==============
Include
==============
linux/filter.h
linux/binfmts.h
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
|
|
ad0800b1d4 |
sysctl: Remove (very) old file changelog
These comments are older than 2003 and therefore do not bare any relevance on the current state of the sysctl.c file. Remove them as they confuse more than clarify. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
5a477e9341 |
sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c
This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
e054bcbe7e |
sysctl: move cad_pid into kernel/pid.c
Move cad_pid as well as supporting function proc_do_cad_pid into kernel/pic.c. Replaced call to __do_proc_dointvec with proc_dointvec inside proc_do_cad_pid which requires the copy of the ctl_table to handle the temp value. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
942b296a6c |
sysctl: Move tainted ctl_table into kernel/panic.c
Move the ctl_table with the "tainted" proc_name into kernel/panic.c. With it moves the proc_tainted helper function. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
79ac8df974 |
Input: sysrq: mv sysrq into drivers/tty/sysrq.c
Move both sysrq ctl_table and supported sysrq_sysctl_handler helper function into drivers/tty/sysrq.c. Replaced the __do_proc_dointvec in helper function with do_proc_dointvec_minmax as the former is local to kernel/sysctl.c. Here we use the minmax version of do_proc_dointvec because do_proc_dointvec is static and calling do_proc_dointvec_minmax with a NULL min and max is the same as calling do_proc_dointvec. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
8e5f04b0d5 |
fork: mv threads-max into kernel/fork.c
make sysctl_max_threads static as it no longer needs to be exported into sysctl.c. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
9e2f403dd8 |
parisc/power: Move soft-power into power.c
Move the soft-power ctl table into parisc/power.c. As a consequence the pwrsw_enabled var is made static. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
851911aa72 |
mm: move randomize_va_space into memory.c
Move the randomize_va_space variable together with all its sysctl table elements into memory.c. Register it to the "kernel" directory by adding it to the subsys initialization calls This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
fff6703fc8 |
rcu: Move rcu_stall related sysctls into rcu/tree_stall.h
Move sysctl_panic_on_rcu_stall and sysctl_max_rcu_stall_to_panic into the kernel/rcu subdirectory. Make these static in tree_stall.h and removed them as extern from panic.h as their scope is now confined into one file. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
f1b4f23a52 |
locking/rtmutex: Move max_lock_depth into rtmutex.c
Move the max_lock_depth sysctl table element into rtmutex_api.c. Removed the rtmutex.h include from sysctl.c. Chose to move into rtmutex_api.c to avoid multiple registrations every time rtmutex.c is included in other files. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
d0d05f602c |
module: Move modprobe_path and modules_disabled ctl_tables into the module subsys
Move module sysctl (modprobe_path and modules_disabled) out of sysctl.c and into the modules subsystem. Make modules_disabled static as it no longer needs to be exported. Remove module.h from the includes in sysctl as it no longer uses any module exported variables. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kernel/sysctl.c. Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Joel Granados <joel.granados@kernel.org> |
|
|
|
9872916ad1 |
kcsan: test: Initialize dummy variable
Newer compiler versions rightfully point out:
kernel/kcsan/kcsan_test.c:591:41: error: variable 'dummy' is
uninitialized when passed as a const pointer argument here
[-Werror,-Wuninitialized-const-pointer]
591 | KCSAN_EXPECT_READ_BARRIER(atomic_read(&dummy), false);
| ^~~~~
1 error generated.
Although this particular test does not care about the value stored in
the dummy atomic variable, let's silence the warning.
Link: https://lkml.kernel.org/r/CA+G9fYu8JY=k-r0hnBRSkQQrFJ1Bz+ShdXNwC1TNeMt0eXaxeA@mail.gmail.com
Fixes:
|
|
|
|
502ffa4399 |
tracing: Fix comment in trace_module_remove_events()
Fix typo "allocade" -> "allocated". Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250710095628.42ed6b06@batman.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
4d6d0a6263 |
tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD
Ftrace is tightly coupled with architecture specific code because it requires the use of trampolines written in assembly. This means that when a new feature or optimization is made, it must be done for all architectures. To simplify the approach, CONFIG_HAVE_FTRACE_* configs are added to denote which architecture has the new enhancement so that other architectures can still function until they too have been updated. The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. Remove the HAVE_FTRACE_MCOUNT config and use DYNAMIC_FTRACE directly where applicable. Link: https://lore.kernel.org/all/20250703154916.48e3ada7@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250704104838.27a18690@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
07c3f391bc |
tracing: Remove EVENT_FILE_FL_SOFT_MODE flag
When soft disabling of trace events was first created, it needed to have a
way to know if a file had a user that was using it with soft disabled (for
triggers that need to enable or disable events from a context that can not
really enable or disable the event, it would set SOFT_DISABLED to state it
is disabled). The flag SOFT_MODE was used to denote that an event had a
user that would enable or disable it via the SOFT_DISABLED flag.
Commit
|
|
|
|
c897c1e5b1 |
tracing: Remove pointless memory barriers
Memory barriers are useful to ensure memory accesses from one CPU appear in the original order as seen by other CPUs. Some smp_rmb() and smp_wmb() are used, but they are not ordering multiple memory accesses. Remove them. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626151940.1756398-1-namcao@linutronix.de Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
9b4d5d330f |
ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it
ftrace has two flavors:
1) static: Where every function always calls the ftrace trampoline
2) dynamic: Where each function has nops that can be changed on demand to
jump to the ftrace trampoline when needed.
The static flavor has very high performance overhead and was only created
to make it easier for architectures to implement the dynamic flavor. An
architecture developer can first implement the static ftrace to make sure
the trampolines work before working on the more complicated dynamic aspect
of ftrace. Once the architecture can support dynamic ftrace, there's no
reason to continue to support the static flavor. In fact, the static
flavor tends to bitrot and bugs start to appear in them.
Remove the prompt to pick DYNAMIC_FTRACE and simply enable it if the
architecture supports it.
Link: https://lore.kernel.org/all/f7e12c6d-892e-4ca3-9ef0-fbb524d04a48@ghiti.fr/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: ChenMiao <chenmiao.ku@gmail.com>
Link: https://lore.kernel.org/20250703115222.2d7c8cd5@batman.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
218d372ce8 |
fgraph: Keep track of when fgraph_ops are registered or not
Add a warning if unregister_ftrace_graph() is called without ever registering it, or if register_ftrace_graph() is called twice. This can detect errors when they happen and not later when there's a side effect: Link: https://lore.kernel.org/all/20250617120830.24fbdd62@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20250701194451.22e34724@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
119a5d5736 |
ring-buffer: Remove ring_buffer_read_prepare_sync()
When the ring buffer was first introduced, reading the non-consuming "trace" file required disabling the writing of the ring buffer. To make sure the writing was fully disabled before iterating the buffer with a non-consuming read, it would set the disable flag of the buffer and then call an RCU synchronization to make sure all the buffers were synchronized. The function ring_buffer_read_start() originally would initialize the iterator and call an RCU synchronization, but this was for each individual per CPU buffer where this would get called many times on a machine with many CPUs before the trace file could be read. The commit |
|
|
|
2c9e7f8574 |
genirq: Teach handle_simple_irq() to resend an in-progress interrupt
It appears that the defect outlined in
|
|
|
|
8d39d6ec4d |
genirq: Prevent migration live lock in handle_edge_irq()
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.
If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:
CPU 0 (previous target) CPU 1 (new target)
handle_edge_irq()
repeat:
handle_event() handle_edge_irq()
if (INPROGESS) {
set(PENDING);
mask();
return;
}
if (PENDING) {
clear(PENDING);
unmask();
goto repeat;
}
The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.
Prevent this by checking whether the CPU which observes the INPROGRESS flag
is the new affinity target. If that's the case, do not set the PENDING flag
and wait for the INPROGRESS flag to be cleared instead, so that the new
interrupt is handled on the new target CPU and the previous CPU is released
from the action.
This is restricted to the edge type handler and only utilized on systems,
which use single CPU targets for interrupt affinity.
Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
Reported-by: Liangyan <liangyan.peng@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
|
|
|
|
c609045abc |
genirq: Split up irq_pm_check_wakeup()
Let the calling code check for the IRQD_WAKEUP_ARMED flag to prepare for a live lock mitigation in the edge type handler. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Link: https://lore.kernel.org/all/20250718185312.012392426@linutronix.de |
|
|
|
4e879dedd5 |
genirq: Move irq_wait_for_poll() to call site
Move it to the call site so that the waiting for the INPROGRESS flag can be reused by an upcoming mitigation for a potential live lock in the edge type handler. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/all/20250718185311.948555026@linutronix.de |
|
|
|
46958a7bac |
genirq: Remove pointless local variable
The variable is only used at one place, which can simply take the constant as function argument. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Liangyan <liangyan.peng@bytedance.com> Link: https://lore.kernel.org/all/20250718185311.884314473@linutronix.de |
|
|
|
67c632b4a7 |
timekeeping: Zero initialize system_counterval when querying time from phc drivers
Most drivers only populate the fields cycles and cs_id of system_counterval
in their get_time_fn() callback for get_device_system_crosststamp(), unless
they explicitly provide nanosecond values.
When the use_nsecs field was added to struct system_counterval, most
drivers did not care. Clock sources other than CSID_GENERIC could then get
converted in convert_base_to_cs() based on an uninitialized use_nsecs field,
which usually results in -EINVAL during the following range check.
Pass in a fully zero initialized system_counterval_t to cure that.
Fixes:
|
|
|
|
5d71c2b53f |
rcu: Document concurrent quiescent state reporting for offline CPUs
The synchronization of CPU offlining with GP initialization is confusing to put it mildly (rightfully so as the issue it deals with is complex). Recent discussions brought up a question -- what prevents the rcu_implicit_dyntick_qs() from warning about QS reports for offline CPUs (missing QS reports for offline CPUs causing indefinite hangs). QS reporting for now-offline CPUs should only happen from: - gp_init() - rcutree_cpu_report_dead() Add some documentation on this and refer to it from comments in the code explaining how QS reporting is not missed when these functions are concurrently running. I referred heavily to this post [1] about the need for the ofl_lock. [1] https://lore.kernel.org/all/20180924164443.GF4222@linux.ibm.com/ [ Applied paulmck feedback on moving documentation to Requirements.rst ] Link: https://lore.kernel.org/all/01b4d228-9416-43f8-a62e-124b92e8741a@paulmck-laptop/ Co-developed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> |
|
|
|
186779c036 |
rcu: Document separation of rcu_state and rnp's gp_seq
The details of this are subtle and was discussed recently. Add a quick-quiz about this and refer to it from the code, for more clarity. Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> |
|
|
|
30a7806ada |
rcu: Document GP init vs hotplug-scan ordering requirements
Add detailed comments explaining the critical ordering constraints during RCU grace period initialization, based on discussions with Frederic. Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Co-developed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> |
|
|
|
437641a72d |
configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON
To reduce stale data lifetimes, enable CONFIG_INIT_ON_FREE_DEFAULT_ON as well. This matches the addition of CONFIG_STACKLEAK=y, which is doing similar for stack memory. Link: https://lore.kernel.org/r/20250717232519.2984886-13-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
4c56d9f7e7 |
configs/hardening: Enable CONFIG_KSTACK_ERASE
Since we can wipe the stack with both Clang and GCC plugins, enable this for the "hardening.config" for wider testing. Link: https://lore.kernel.org/r/20250717232519.2984886-12-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
9ea1e8d28a |
stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth
The Clang stack depth tracking implementation has a fixed name for the stack depth tracking callback, "__sanitizer_cov_stack_depth", so rename the GCC plugin function to match since the plugin has no external dependencies on naming. Link: https://lore.kernel.org/r/20250717232519.2984886-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
57fbad15c2 |
stackleak: Rename STACKLEAK to KSTACK_ERASE
In preparation for adding Clang sanitizer coverage stack depth tracking that can support stack depth callbacks: - Add the new top-level CONFIG_KSTACK_ERASE option which will be implemented either with the stackleak GCC plugin, or with the Clang stack depth callback support. - Rename CONFIG_GCC_PLUGIN_STACKLEAK as needed to CONFIG_KSTACK_ERASE, but keep it for anything specific to the GCC plugin itself. - Rename all exposed "STACKLEAK" names and files to "KSTACK_ERASE" (named for what it does rather than what it protects against), but leave as many of the internals alone as possible to avoid even more churn. While here, also split "prev_lowest_stack" into CONFIG_KSTACK_ERASE_METRICS, since that's the only place it is referenced from. Suggested-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250717232519.2984886-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
95993dc303 |
bpf: Use ERR_CAST instead of ERR_PTR(PTR_ERR(...))
Intel linux test robot reported a warning that ERR_CAST can be used for error pointer casting instead of more-complicated/rarely-used ERR_PTR(PTR_ERR(...)) style. There is no functionality change, but still let us replace two such instances as it improves consistency and readability. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507201048.bceHy8zX-lkp@intel.com/ Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://patch.msgid.link/20250720164754.3999140-1-yonghong.song@linux.dev |
|
|
|
647fe16b46 |
PM: cpufreq: powernv/tracing: Move powernv_throttle trace event
As the trace event powernv_throttle is only used by the powernv code, move
it to a separate include file and have that code directly enable it.
Trace events can take up around 5K of memory when they are defined
regardless if they are used or not. It wastes memory to have them defined
in configurations where the tracepoint is not used.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/20250612145407.906308844@goodmis.org
Fixes:
|
|
|
|
ee4a2e08c1 |
entry: Add arch_in_rcu_eqs()
All architectures have an interruptible RCU extended quiescent state (EQS) as part of their idle sequences, where interrupts can occur without RCU watching. Entry code must account for this and wake RCU as necessary; the common entry code deals with this in irqentry_enter() by treating any interrupt from an idle thread as potentially having occurred within an EQS and waking RCU for the duration of the interrupt via rcu_irq_enter() .. rcu_irq_exit(). Some architectures may have other interruptible EQSs which require similar treatment. For example, on s390 it is necessary to enable interrupts around guest entry in the middle of a period where core KVM code has entered an EQS. So that architectures can wake RCU in these cases, this patch adds a new arch_in_rcu_eqs() hook to the common entry code which is checked in addition to the existing is_idle_thread() check, with RCU woken if either returns true. A default implementation is provided which always returns false, which suffices for most architectures. As no architectures currently implement arch_in_rcu_eqs(), there should be no functional change as a result of this patch alone. A subsequent patch will add an s390 implementation to fix a latent bug with missing RCU wakeups. [ajd@linux.ibm.com: rebase, fix commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20250708092742.104309-2-ajd@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20250708092742.104309-2-ajd@linux.ibm.com> |
|
|
|
2013e8c2e6 |
Tracing fixes for 6.16:
- Fix timerlat with use of FORTIFY_SOURCE
FORTIFY_SOURCE was added to the stack tracer where it compares the
entry->caller array to having entry->size elements.
timerlat has the following:
memcpy(&entry->caller, fstack->calls, size);
entry->size = size;
Which triggers FORTIFY_SOURCE as the caller is populated before the
entry->size is initialized.
Swap the order to satisfy FORTIFY_SOURCE logic.
- Add down_write(trace_event_sem) when adding trace events in modules
Trace events being added to the ftrace_events array are protected by
the trace_event_sem semaphore. But when loading modules that have
trace events, the addition of the events are not protected by the
semaphore and loading two modules that have events at the same time
can corrupt the list.
Also add a lockdep_assert_held(trace_event_sem) to
_trace_add_event_dirs() to confirm its held when iterating the list.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaH06gBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qoJsAP0a+/E0f+5g7O/OtYPVEDSCREv1vj9c
3dr0iWopqaOC7gEAw8Vc5iWIHKcB/JuJ+GqALoutL+lihruG26MWkFFsOgU=
=zH5J
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix timerlat with use of FORTIFY_SOURCE
FORTIFY_SOURCE was added to the stack tracer where it compares the
entry->caller array to having entry->size elements.
timerlat has the following:
memcpy(&entry->caller, fstack->calls, size);
entry->size = size;
Which triggers FORTIFY_SOURCE as the caller is populated before the
entry->size is initialized.
Swap the order to satisfy FORTIFY_SOURCE logic.
- Add down_write(trace_event_sem) when adding trace events in modules
Trace events being added to the ftrace_events array are protected by
the trace_event_sem semaphore. But when loading modules that have
trace events, the addition of the events are not protected by the
semaphore and loading two modules that have events at the same time
can corrupt the list.
Also add a lockdep_assert_held(trace_event_sem) to
_trace_add_event_dirs() to confirm it is held when iterating the
list.
* tag 'trace-v6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Add down_write(trace_event_sem) when adding trace event
tracing/osnoise: Fix crash in timerlat_dump_stack()
|
|
|
|
62347e2790 |
A single fix for the scheduler. A recent commit changed the runqueue
counter nr_uninterruptible to an unsigned int. Due to the fact that the counters are not updated on migration of a uninterruptble task to a different CPU, these counters can exceed INT_MAX. The counter is cast to long in the load average calculation, which means that the cast expands into negative space resulting in bogus load average values. Convert it back to unsigned long to fix this. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmh82RYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXNlEACyQxl/OCXSmKpcxrIvMjnalXh3Ibs2 KqbZ3MA/hY1ZWhSKXBDMv8hkxE00PY8WXsPDtLeRz0n6GegbUx4zVsUQpzn24gRe uEl+qIuANaz5uMu2qlmQwSuxVQ0SDqLVFObrNgKQ554jJckjXKcgvrBjwczTw9lJ u6yTVLrknf0IsbQ79yxToaNf6jD3HcSIGX06Hs4EYucPzYd54VScr6LGGwux3rmI CN0RSA9RduUzcf/aKhe9/tmM6oss4tdByNuVSdx/yZ5QvVw5oOrWdI8sU0BlZsrC IjvqLvvLFKRPZre24UeFnIVxhtv+cwPXxrA6tR/SM2eUbHKzU8jHz2TlPu8xwOwb sZMMBjPx1Y+fpVQoGxEC1cEWbCUPSyQ7NGN2uS3Quk4XoQLRz6t//B+63rT65bfV wMAZpsgZ9/c23yDuPG11XxBMQYVbS0sFyLDymBC93Co8vh8PowDIf2xmgPFLxM78 HdOCCsHsBPqZutQGQ7/qw//e7T/9fv0MDP8D4fuyLrhUdvBAvGHZp85T4YmOUcjp bTTE28Rw1sBgdAnTNjGWfwRN7Hwc1DzaLh40Um+nnjMA+yG9uq7gjTYDx4w6W/EA V70tx9tfJOM11g/qfIURkJDpf8PezJ3KHg1ZOiPn9e+I08LoPVMeIqbFEnJ9VAc7 VStWpQd+SG99AQ== =Aqcr -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "A single fix for the scheduler. A recent commit changed the runqueue counter nr_uninterruptible to an unsigned int. Due to the fact that the counters are not updated on migration of a uninterruptble task to a different CPU, these counters can exceed INT_MAX. The counter is cast to long in the load average calculation, which means that the cast expands into negative space resulting in bogus load average values. Convert it back to unsigned long to fix this. * tag 'sched-urgent-2025-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Change nr_uninterruptible type to unsigned long |
|
|
|
77da18de55 |
hung_task: extend hung task blocker tracking to rwsems
Inspired by mutex blocker tracking[1], and having already extended it to semaphores, let's now add support for reader-writer semaphores (rwsems). The approach is simple: when a task enters TASK_UNINTERRUPTIBLE while waiting for an rwsem, we just call hung_task_set_blocker(). The hung task detector can then query the rwsem's owner to identify the lock holder. Tracking works reliably for writers, as there can only be a single writer holding the lock, and its task struct is stored in the owner field. The main challenge lies with readers. The owner field points to only one of many concurrent readers, so we might lose track of the blocker if that specific reader unlocks, even while others remain. This is not a significant issue, however. In practice, long-lasting lock contention is almost always caused by a writer. Therefore, reliably tracking the writer is the primary goal of this patch series ;) With this change, the hung task detector can now show blocker task's info like below: [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 blocked for more than 122 seconds. [Fri Jun 27 15:21:34 2025] Tainted: G S 6.16.0-rc3 #8 [Fri Jun 27 15:21:34 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Jun 27 15:21:34 2025] task:cat state:D stack:0 pid:28631 tgid:28631 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? policy_nodemask+0x215/0x340 [Fri Jun 27 15:21:34 2025] ? _raw_spin_lock_irq+0x8a/0xe0 [Fri Jun 27 15:21:34 2025] ? __pfx__raw_spin_lock_irq+0x10/0x10 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_preempt_disabled+0x15/0x30 [Fri Jun 27 15:21:34 2025] rwsem_down_read_slowpath+0x55e/0xe10 [Fri Jun 27 15:21:34 2025] ? __pfx_rwsem_down_read_slowpath+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx___might_resched+0x10/0x10 [Fri Jun 27 15:21:34 2025] down_read+0xc9/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_down_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __debugfs_file_get+0x14d/0x700 [Fri Jun 27 15:21:34 2025] ? __pfx___debugfs_file_get+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? handle_pte_fault+0x52a/0x710 [Fri Jun 27 15:21:34 2025] ? selinux_file_permission+0x3a9/0x590 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_read+0x4a/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f3f8faefb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffdeda5ab98 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f3f8faefb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 00000000010fa000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 00000000010fa000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffdeda59fe0 R11: 0000000000000246 R12: 00000000010fa000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 <reader> blocked on an rw-semaphore likely owned by task cat:28630 <writer> [Fri Jun 27 15:21:34 2025] task:cat state:S stack:0 pid:28630 tgid:28630 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __mod_timer+0x304/0xa80 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_timeout+0xfb/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_schedule_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx_process_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? down_write+0xc4/0x140 [Fri Jun 27 15:21:34 2025] msleep_interruptible+0xbe/0x150 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_write+0x54/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f8f288efb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffffb631038 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f8f288efb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 000000002a4b5000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 000000002a4b5000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffffb630460 R11: 0000000000000246 R12: 000000002a4b5000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/ Link: https://lkml.kernel.org/r/20250627072924.36567-3-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <zi.li@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
ae2da51def |
locking/rwsem: make owner helpers globally available
Patch series "extend hung task blocker tracking to rwsems". Inspired by mutex blocker tracking[1], and having already extended it to semaphores, let's now add support for reader-writer semaphores (rwsems). The approach is simple: when a task enters TASK_UNINTERRUPTIBLE while waiting for an rwsem, we just call hung_task_set_blocker(). The hung task detector can then query the rwsem's owner to identify the lock holder. Tracking works reliably for writers, as there can only be a single writer holding the lock, and its task struct is stored in the owner field. The main challenge lies with readers. The owner field points to only one of many concurrent readers, so we might lose track of the blocker if that specific reader unlocks, even while others remain. This is not a significant issue, however. In practice, long-lasting lock contention is almost always caused by a writer. Therefore, reliably tracking the writer is the primary goal of this patch series ;) With this change, the hung task detector can now show blocker task's info like below: [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 blocked for more than 122 seconds. [Fri Jun 27 15:21:34 2025] Tainted: G S 6.16.0-rc3 #8 [Fri Jun 27 15:21:34 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Jun 27 15:21:34 2025] task:cat state:D stack:0 pid:28631 tgid:28631 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? policy_nodemask+0x215/0x340 [Fri Jun 27 15:21:34 2025] ? _raw_spin_lock_irq+0x8a/0xe0 [Fri Jun 27 15:21:34 2025] ? __pfx__raw_spin_lock_irq+0x10/0x10 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_preempt_disabled+0x15/0x30 [Fri Jun 27 15:21:34 2025] rwsem_down_read_slowpath+0x55e/0xe10 [Fri Jun 27 15:21:34 2025] ? __pfx_rwsem_down_read_slowpath+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx___might_resched+0x10/0x10 [Fri Jun 27 15:21:34 2025] down_read+0xc9/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_down_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __debugfs_file_get+0x14d/0x700 [Fri Jun 27 15:21:34 2025] ? __pfx___debugfs_file_get+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? handle_pte_fault+0x52a/0x710 [Fri Jun 27 15:21:34 2025] ? selinux_file_permission+0x3a9/0x590 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_read+0x4a/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f3f8faefb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffdeda5ab98 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f3f8faefb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 00000000010fa000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 00000000010fa000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffdeda59fe0 R11: 0000000000000246 R12: 00000000010fa000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 <reader> blocked on an rw-semaphore likely owned by task cat:28630 <writer> [Fri Jun 27 15:21:34 2025] task:cat state:S stack:0 pid:28630 tgid:28630 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __mod_timer+0x304/0xa80 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_timeout+0xfb/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_schedule_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx_process_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? down_write+0xc4/0x140 [Fri Jun 27 15:21:34 2025] msleep_interruptible+0xbe/0x150 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_write+0x54/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f8f288efb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffffb631038 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f8f288efb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 000000002a4b5000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 000000002a4b5000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffffb630460 R11: 0000000000000246 R12: 000000002a4b5000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> This patch (of 3): In preparation for extending blocker tracking to support rwsems, make the rwsem_owner() and is_rwsem_reader_owned() helpers globally available for determining if the blocker is a writer or one of the readers. Additionally, a stale owner pointer in a reader-owned rwsem can lead to false positives in blocker tracking when CONFIG_DETECT_HUNG_TASK_BLOCKER is enabled. To mitigate this, clear the owner field on the reader unlock path, similar to what CONFIG_DEBUG_RWSEMS does. A NULL owner is better than a stale one for diagnostics. Link: https://lkml.kernel.org/r/20250627072924.36567-1-lance.yang@linux.dev Link: https://lkml.kernel.org/r/20250627072924.36567-2-lance.yang@linux.dev Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/ [1] Signed-off-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <zi.li@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
ee13240cd7 |
panic: add note that panic_print sysctl interface is deprecated
Add a dedicated core parameter 'panic_console_replay' for controlling console replay, and add note that 'panic_print' sysctl interface will be obsoleted by 'panic_sys_info' and 'panic_console_replay'. When it happens, the SYS_INFO_PANIC_CONSOLE_REPLAY can be removed as well. Link: https://lkml.kernel.org/r/20250703021004.42328-6-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
9743d12d0c |
panic: add 'panic_sys_info=' setup option for kernel cmdline
'panic_sys_info=' sysctl interface is already added for runtime setting. Add counterpart kernel cmdline option for boottime setting. Link: https://lkml.kernel.org/r/20250703021004.42328-5-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
d747755917 |
panic: add 'panic_sys_info' sysctl to take human readable string parameter
Bitmap definition for 'panic_print' is hard to remember and decode. Add 'panic_sys_info='sysctl to take human readable string like "tasks,mem,timers,locks,ftrace,..." and translate it into bitmap. The detailed mapping is: SYS_INFO_TASKS "tasks" SYS_INFO_MEM "mem" SYS_INFO_TIMERS "timers" SYS_INFO_LOCKS "locks" SYS_INFO_FTRACE "ftrace" SYS_INFO_ALL_CPU_BT "all_bt" SYS_INFO_BLOCKED_TASKS "blocked_tasks" [nathan@kernel.org: add __maybe_unused to sys_info_avail] Link: https://lkml.kernel.org/r/20250708-fix-clang-sys_info_avail-warning-v1-1-60d239eacd64@kernel.org Link: https://lkml.kernel.org/r/20250703021004.42328-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
b76e89e50f |
panic: generalize panic_print's function to show sys info
'panic_print' was introduced to help debugging kernel panic by dumping different kinds of system information like tasks' call stack, memory, ftrace buffer, etc. Actually this function could also be used to help debugging other cases like task-hung, soft/hard lockup, etc. where user may need the snapshot of system info at that time. Extract system info dump function related code from panic.c to separate file sys_info.[ch], for wider usage by other kernel parts for debugging. Also modify the macro names about singulars/plurals. Link: https://lkml.kernel.org/r/20250703021004.42328-3-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
261743b013 |
panic: clean up code for console replay
Patch series "generalize panic_print's dump function to be used by other kernel parts", v3. When working on kernel stability issues, panic, task-hung and software/hardware lockup are frequently met. And to debug them, user may need lots of system information at that time, like task call stacks, lock info, memory info etc. panic case already has panic_print_sys_info() for this purpose, and has a 'panic_print' bitmask to control what kinds of information is needed, which is also helpful to debug other task-hung and lockup cases. So this patchset extracts the function out to a new file 'lib/sys_info.c', and makes it available for other cases which also need to dump system info for debugging. Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to take human readable string like "tasks,mem,locks,timers,ftrace,....", and eventually obsolete the current 'panic_print' bitmap interface. In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled with the new sys_info dump interface. In v2, they are kept out for better review of current change, and will be posted later. Locally these have been used in our bug chasing for stability issues and was proven helpful. Many thanks to Petr Mladek for great suggestions on both the code and architectures! This patch (of 5): Currently the panic_print_sys_info() was called twice with different parameters to handle console replay case, which is kind of confusing. Add panic_console_replay() explicitly and rename 'PANIC_PRINT_ALL_PRINTK_MSG' to 'PANIC_CONSOLE_REPLAY', to make the code straightforward. The related kernel document is also updated. Link: https://lkml.kernel.org/r/20250703021004.42328-1-feng.tang@linux.alibaba.com Link: https://lkml.kernel.org/r/20250703021004.42328-2-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
e1280f3071 |
kdump: wait for DMA to finish when using CMA
When re-using the CMA area for kdump there is a risk of pending DMA into pinned user pages in the CMA area. Pages residing in CMA areas can usually not get long-term pinned and are instead migrated away from the CMA area, so long-term pinning is typically not a concern. (BUGs in the kernel might still lead to long-term pinning of such pages if everything goes wrong.) Pages pinned without FOLL_LONGTERM remain in the CMA and may possibly be the source or destination of a pending DMA transfer. Although there is no clear specification how long a page may be pinned without FOLL_LONGTERM, pinning without the flag shows an intent of the caller to only use the memory for short-lived DMA transfers, not a transfer initiated by a device asynchronously at a random time in the future. Add a delay of CMA_DMA_TIMEOUT_SEC seconds before starting the kdump kernel, giving such short-lived DMA transfers time to finish before the CMA memory is re-used by the kdump kernel. Set CMA_DMA_TIMEOUT_SEC to 10 seconds - chosen arbitrarily as both a huge margin for a DMA transfer, yet not increasing the kdump time too significantly. Link: https://lkml.kernel.org/r/aEqpgDIBndZ5LXSo@dwarf.suse.cz Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Donald Dutile <ddutile@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Philipp Rudo <prudo@redhat.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Tao Liu <ltao@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
ab475510e0 |
kdump: implement reserve_crashkernel_cma
reserve_crashkernel_cma() reserves CMA ranges for the crash kernel. If allocating the requested size fails, try to reserve in smaller blocks. Store the reserved ranges in the crashk_cma_ranges array and the number of ranges in crashk_cma_cnt. Link: https://lkml.kernel.org/r/aEqpBwOy_ekm0gw9@dwarf.suse.cz Signed-off-by: Jiri Bohac <jbohac@suse.cz> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Donald Dutile <ddutile@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Philipp Rudo <prudo@redhat.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Tao Liu <ltao@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
35c18f2933 |
Add a new optional ",cma" suffix to the crashkernel= command line option
Patch series "kdump: crashkernel reservation from CMA", v5. This series implements a way to reserve additional crash kernel memory using CMA. Currently, all the memory for the crash kernel is not usable by the 1st (production) kernel. It is also unmapped so that it can't be corrupted by the fault that will eventually trigger the crash. This makes sense for the memory actually used by the kexec-loaded crash kernel image and initrd and the data prepared during the load (vmcoreinfo, ...). However, the reserved space needs to be much larger than that to provide enough run-time memory for the crash kernel and the kdump userspace. Estimating the amount of memory to reserve is difficult. Being too careful makes kdump likely to end in OOM, being too generous takes even more memory from the production system. Also, the reservation only allows reserving a single contiguous block (or two with the "low" suffix). I've seen systems where this fails because the physical memory is fragmented. By reserving additional crashkernel memory from CMA, the main crashkernel reservation can be just large enough to fit the kernel and initrd image, minimizing the memory taken away from the production system. Most of the run-time memory for the crash kernel will be memory previously available to userspace in the production system. As this memory is no longer wasted, the reservation can be done with a generous margin, making kdump more reliable. Kernel memory that we need to preserve for dumping is normally not allocated from CMA, unless it is explicitly allocated as movable. Currently this is only the case for memory ballooning and zswap. Such movable memory will be missing from the vmcore. User data is typically not dumped by makedumpfile. When dumping of user data is intended this new CMA reservation cannot be used. There are five patches in this series: The first adds a new ",cma" suffix to the recenly introduced generic crashkernel parsing code. parse_crashkernel() takes one more argument to store the cma reservation size. The second patch implements reserve_crashkernel_cma() which performs the reservation. If the requested size is not available in a single range, multiple smaller ranges will be reserved. The third patch updates Documentation/, explicitly mentioning the potential DMA corruption of the CMA-reserved memory. The fourth patch adds a short delay before booting the kdump kernel, allowing pending DMA transfers to finish. The fifth patch enables the functionality for x86 as a proof of concept. There are just three things every arch needs to do: - call reserve_crashkernel_cma() - include the CMA-reserved ranges in the physical memory map - exclude the CMA-reserved ranges from the memory available through /proc/vmcore by excluding them from the vmcoreinfo PT_LOAD ranges. Adding other architectures is easy and I can do that as soon as this series is merged. With this series applied, specifying crashkernel=100M craskhernel=1G,cma on the command line will make a standard crashkernel reservation of 100M, where kexec will load the kernel and initrd. An additional 1G will be reserved from CMA, still usable by the production system. The crash kernel will have 1.1G memory available. The 100M can be reliably predicted based on the size of the kernel and initrd. The new cma suffix is completely optional. When no crashkernel=size,cma is specified, everything works as before. This patch (of 5): Add a new cma_size parameter to parse_crashkernel(). When not NULL, call __parse_crashkernel to parse the CMA reservation size from "crashkernel=size,cma" and store it in cma_size. Set cma_size to NULL in all calls to parse_crashkernel(). Link: https://lkml.kernel.org/r/aEqnxxfLZMllMC8I@dwarf.suse.cz Link: https://lkml.kernel.org/r/aEqoQckgoTQNULnh@dwarf.suse.cz Signed-off-by: Jiri Bohac <jbohac@suse.cz> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Donald Dutile <ddutile@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Philipp Rudo <prudo@redhat.com> Cc: Pingfan Liu <piliu@redhat.com> Cc: Tao Liu <ltao@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
b5e8acc14d |
tracing: Add down_write(trace_event_sem) when adding trace event
When a module is loaded, it adds trace events defined by the module. It
may also need to modify the modules trace printk formats to replace enum
names with their values.
If two modules are loaded at the same time, the adding of the event to the
ftrace_events list can corrupt the walking of the list in the code that is
modifying the printk format strings and crash the kernel.
The addition of the event should take the trace_event_sem for write while
it adds the new event.
Also add a lockdep_assert_held() on that semaphore in
__trace_add_event_dirs() as it iterates the list.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250718223158.799bfc0c@batman.local.home
Reported-by: Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com>
Closes: https://lore.kernel.org/all/20250717105007.46ccd18f@batman.local.home/
Fixes:
|
|
|
|
bf61759db4 |
sched_ext: Fixes for v6.16-rc6
- Fix handling of migration disabled tasks in default idle selection. - update_locked_rq() called __this_cpu_write() spuriously with NULL when @rq was not locked. As the writes were spurious, it didn't break anything directly. However, the function could be called in a preemptible leading to a context warning in __this_cpu_write(). Skip the spurious NULL writes. - Selftest fix on UP. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaHvPZw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGabMAP4jSAr4gYWEBOUaD9btwnPxZwlSiAEQtqBDBVRb /UunFAD/WBwUPk/u7BchLHjuH3sYW5gQb40kbtUnmNvB+RNUUgc= =3WAD -----END PGP SIGNATURE----- Merge tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix handling of migration disabled tasks in default idle selection - update_locked_rq() called __this_cpu_write() spuriously with NULL when @rq was not locked. As the writes were spurious, it didn't break anything directly. However, the function could be called in a preemptible leading to a context warning in __this_cpu_write(). Skip the spurious NULL writes. - Selftest fix on UP * tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: idle: Handle migration-disabled tasks in idle selection sched/ext: Prevent update_locked_rq() calls with NULL rq selftests/sched_ext: Fix exit selftest hang on UP |
|
|
|
c64004df89 |
cgroup: Fixes for v6.16-rc6
An earlier commit to suppress a warning introduced a race condition where tasks can escape cgroup1 freezer. Revert the commit and simply remove the warning which was spurious to begin with. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaHvMvw4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGadfAP0cT4QXwtw0VXyiNr5PMqxQ74rYsngJ+NevRbod fK6hIwD/T+owQc/ivYp5/N/XUgpT+Ixp7YRj2RIzQbL6SPjzOwE= =IlrN -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "An earlier commit to suppress a warning introduced a race condition where tasks can escape cgroup1 freezer. Revert the commit and simply remove the warning which was spurious to begin with" * tag 'cgroup-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup_freezer: cgroup_freezing: Check if not frozen" sched,freezer: Remove unnecessary warning in __thaw_task |
|
|
|
646faf36d7 |
cgroup: Add compatibility option for content of /proc/cgroups
/proc/cgroups lists only v1 controllers by default, however, this is only enforced since the commit |
|
|
|
85a3bce695 |
tracing/osnoise: Fix crash in timerlat_dump_stack()
We have observed kernel panics when using timerlat with stack saving, with the following dmesg output: memcpy: detected buffer overflow: 88 byte write of buffer size 0 WARNING: CPU: 2 PID: 8153 at lib/string_helpers.c:1032 __fortify_report+0x55/0xa0 CPU: 2 UID: 0 PID: 8153 Comm: timerlatu/2 Kdump: loaded Not tainted 6.15.3-200.fc42.x86_64 #1 PREEMPT(lazy) Call Trace: <TASK> ? trace_buffer_lock_reserve+0x2a/0x60 __fortify_panic+0xd/0xf __timerlat_dump_stack.cold+0xd/0xd timerlat_dump_stack.part.0+0x47/0x80 timerlat_fd_read+0x36d/0x390 vfs_read+0xe2/0x390 ? syscall_exit_to_user_mode+0x1d5/0x210 ksys_read+0x73/0xe0 do_syscall_64+0x7b/0x160 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e __timerlat_dump_stack() constructs the ftrace stack entry like this: struct stack_entry *entry; ... memcpy(&entry->caller, fstack->calls, size); entry->size = fstack->nr_entries; Since commit |
|
|
|
beb1097ec8 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc6
Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
|
|
|
d786aba320 |
bpf-fixes
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmh5r+QACgkQ6rmadz2v bTqrFA/+OW+z+vh8k43C9TVZttqC5poGcFqF5zRlYTArQT3AB+QuhG/CRjVQFbGL 2YfVbq+5pxNo0I/FtCoWVui2OMf1UsRKKvM0pSn50yn3ytRfotZjQ/AWACm/9t5y fyRLBIS3ArjashQ9/S71tAIfG6l/B+FGX81wOVa1uL50ab15+4NrplhZHY421o9a lH2E2wnpy/BnrB9F/FO4iQbelixvBfMwj8epruhCVbipfx6BOKPMzKVtcm61FVT1 hDsQZ0bIpVKgpRNBlTUHjVyzYo8oeXzqVhhY7hsmpHxJSiol7KLWyHEJD5ExS9Qg XVPK34b9IPgAfS8f/DgGAkWsAht7BMLsR0GUWyVIiacHHqTinRPVfWbzqWa5yjdD +8Vp4RVrcUONx69upx+IDrb4uMfQYktdpcvQtSl0SSinsG/INXurT1Vyz8aBPfkv WbiBeXhW/dCD9NuL5D9gnyZWaPXIAmbK7+pXJOSIpfKC24WRXTONDXhGP1b6ef31 zHQu3r98ekYnHr3hbsvdHOWB7LKkJ1bcg2+OsmtYUUmnCiQTM1H8ILTwbSQ4EfXJ 6iRxYeFp+VJOPScRzmNU/A3ibQWfV+foiO4S6hmazJOy3mmHX6hgPZoj2fjV8Ejf xZeOpQbCaZQCzbxxOdtjykwfe+zPWGnyRPnQpVIVdi7Abk1EZaE= =eUL7 -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix handling of BPF arena relocations (Andrii Nakryiko) - Fix race in bpf_arch_text_poke() on s390 (Ilya Leoshkevich) - Fix use of virt_to_phys() on arm64 when mmapping BTF (Lorenz Bauer) - Reject %p% format string in bprintf-like BPF helpers (Paul Chaignon) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: Fix handling of BPF arena relocations btf: Fix virt_to_phys() on arm64 when mmapping BTF selftests/bpf: Stress test attaching a BPF prog to another BPF prog s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL again selftests/bpf: Add negative test cases for snprintf bpf: Reject %p% format string in bprintf-like helpers |
|
|
|
380b84e168 |
vdso/vsyscall: Update auxiliary clock data in the datapage
Expose the auxiliary clock data so it can be read from the vDSO. Architectures not using the generic vDSO time framework, namely SPARC64, are not supported. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250701-vdso-auxclock-v1-11-df7d9f87b9b8@linutronix.de |