mirror of https://github.com/torvalds/linux.git
49532 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
67029a49db |
tracing fixes for v6.18:
The previous fix to trace_marker required updating trace_marker_raw
as well. The difference between trace_marker_raw from trace_marker
is that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings.
This is for applications that will read the raw data from the ring
buffer and get the data structures directly. It's a bit quicker than
using the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was needed for
both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly fixed
the trace_marker write function, but the trace_marker_raw file wasn't
fixed properly. The user space data was correctly written into the per CPU
buffer, but the code that wrote into the ring buffer still used the user
space pointer and not the per CPU buffer that had the user space data
already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(), another
issue appeared. As writes to the trace_marker_raw expects binary data, the
first entry is a 4 byte identifier. The entry structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the cnt
used also includes the entry->id portion. Allocating sizeof(*entry)
plus cnt is actually allocating 4 bytes more than what is needed.
Change the size function to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
And update the memcpy() to unsafe_memcpy().
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOq0fhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4HAQCNbFEDjzNNEueCWLOptC5YfeJgdvPB
399CzuFJl02ZOgD/flPJa1r+NaeYOBhe1BgpFF9FzB/SPXAXkXUGLM7WIgg=
=goks
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"The previous fix to trace_marker required updating trace_marker_raw as
well. The difference between trace_marker_raw from trace_marker is
that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings. This
is for applications that will read the raw data from the ring buffer
and get the data structures directly. It's a bit quicker than using
the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was
needed for both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly
fixed the trace_marker write function, but the trace_marker_raw
file wasn't fixed properly. The user space data was correctly
written into the per CPU buffer, but the code that wrote into the
ring buffer still used the user space pointer and not the per CPU
buffer that had the user space data already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(),
another issue appeared. As writes to the trace_marker_raw expects
binary data, the first entry is a 4 byte identifier. The entry
structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the
cnt used also includes the entry->id portion. Allocating
sizeof(*entry) plus cnt is actually allocating 4 bytes more than
what is needed.
Change the size function to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
And update the memcpy() to unsafe_memcpy()"
* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
|
|
|
|
fbde105f13 |
bpf-fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjpp+sACgkQ6rmadz2v
bTo+9Q//bUzEc2C64NbG0DTCcxnkDEadzBLIB0BAwnAkuRjR8HJiPGoCdBJhUqzM
/hNfIHTtDdyspU2qZbM+r4nVJ6zRAwIHrT2d/knERxXtRQozaWvUlRhVmf5tdWYm
DkbThS9sAfAOs21YjV7OWPrf7bC7T9syQTAfN0CE8cujZY7OnqCyzNwfb8iIusyo
+Ctm0/qUDVtd6SjdPAQjzp82fHIIwnMFZtWJiZml5LklL1Mx5cuVrT/sWgr5KATW
vZ9rUfgaiJkAsSX0sSlLnAI76+kJRB+IkmK1TRdWFlwW6dTsa/7MkDeXXPN1dEDi
o5ZqhcvaY0eAMbU4iX72Juf6gVFF6AgVwsrHmM79ICjg5umCLN/90QqYPc0ChRxl
EYuSWVQ6/cgV3W6l+KU53cwmRjjdSzyJQFei03COZ0iKF6xic0cynS3BKMQL6HkU
3BfTj19h+dxt7qywRaJFsrWK4t/uBX6N75XlVa9od/sk91tR/ibtJ6hcyuJGATr5
nkfMkyN155upAffUnkhv37TXtMXyX8/kd7BddCet31JJXyJuJZ0vYuOcur6awGyN
aB0T1ueG15sTfGf0zpxVNWhVqswHI/1Suk8EXwbDeHRcsmtrp8XWYawf5StIMwW0
Uy8GjS5KVl5bcrfDbcvj79jajpVjwnvFR1Sir9C4aROm5OpH3Hs=
=77JH
-----END PGP SIGNATURE-----
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Finish constification of 1st parameter of bpf_d_path() (Rong Tao)
- Harden userspace-supplied xdp_desc validation (Alexander Lobakin)
- Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel
Borkmann)
- Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers)
- Use correct context to unpin bpf hash map with special types (KaFai
Wan)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for unpinning htab with internal timer struct
bpf: Avoid RCU context warning when unpinning htab with internal structs
xsk: Harden userspace-supplied xdp_desc validation
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
libbpf: Fix undefined behavior in {get,put}_unaligned_be32()
bpf: Finish constification of 1st parameter of bpf_d_path()
|
|
|
|
ae13bd2310 |
Just one series here - Mike Rappoport has taught KEXEC handover to
preserve vmalloc allocations across handover. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaOmDWAAKCRDdBJ7gKXxA jh+MAQDUPBj3mFm238CXI5DC1gJ3ETe3NJjJvfzIjLs51c+dFgD+PUuvDA0GUtKH LCl6T+HJXh2FgGn1F2Kl/0hwPtEvuA4= =HYr7 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more updates from Andrew Morton: "Just one series here - Mike Rappoport has taught KEXEC handover to preserve vmalloc allocations across handover" * tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt kho: add support for preserving vmalloc allocations kho: replace kho_preserve_phys() with kho_preserve_pages() kho: check if kho is finalized in __kho_preserve_order() MAINTAINERS, .mailmap: update Umang's email address |
|
|
|
54b91e54b1 |
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
The way tracing_mark_raw_write() records its data is that it has the
following structure:
struct {
struct trace_entry;
int id;
char buf[];
};
But memcpy(&entry->id, buf, size) triggers the following warning when the
size is greater than the id:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4)
WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
Modules linked in:
CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0
Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48
RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001
RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40
R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010
R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000
FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0
Call Trace:
<TASK>
tracing_mark_raw_write+0x1fe/0x290
? __pfx_tracing_mark_raw_write+0x10/0x10
? security_file_permission+0x50/0xf0
? rw_verify_area+0x6f/0x4b0
vfs_write+0x1d8/0xdd0
? __pfx_vfs_write+0x10/0x10
? __pfx_css_rstat_updated+0x10/0x10
? count_memcg_events+0xd9/0x410
? fdget_pos+0x53/0x5e0
ksys_write+0x182/0x200
? __pfx_ksys_write+0x10/0x10
? do_user_addr_fault+0x4af/0xa30
do_syscall_64+0x63/0x350
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa61d318687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687
RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001
RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006
R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000
</TASK>
---[ end trace 0000000000000000 ]---
This is because fortify string sees that the size of entry->id is only 4
bytes, but it is writing more than that. But this is OK as the
dynamic_array is allocated to handle that copy.
The size allocated on the ring buffer was actually a bit too big:
size = sizeof(*entry) + cnt;
But cnt includes the 'id' and the buffer data, so adding cnt to the size
of *entry actually allocates too much on the ring buffer.
Change the allocation to:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
and the memcpy() to unsafe_memcpy() with an added justification.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home
Fixes:
|
|
|
|
bda745ee8f |
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
The fix to use a per CPU buffer to read user space tested only the writes
to trace_marker. But it appears that the selftests are missing tests to
the trace_maker_raw file. The trace_maker_raw file is used by applications
that writes data structures and not strings into the file, and the tools
read the raw ring buffer to process the structures it writes.
The fix that reads the per CPU buffers passes the new per CPU buffer to
the trace_marker file writes, but the update to the trace_marker_raw write
read the data from user space into the per CPU buffer, but then still used
then passed the user space address to the function that records the data.
Pass in the per CPU buffer and not the user space address.
TODO: Add a test to better test trace_marker_raw.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20251011035243.386098147@kernel.org
Fixes:
|
|
|
|
4f375ade6a |
bpf: Avoid RCU context warning when unpinning htab with internal structs
When unpinning a BPF hash table (htab or htab_lru) that contains internal
structures (timer, workqueue, or task_work) in its values, a BUG warning
is triggered:
BUG: sleeping function called from invalid context at kernel/bpf/hashtab.c:244
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0
...
The issue arises from the interaction between BPF object unpinning and
RCU callback mechanisms:
1. BPF object unpinning uses ->free_inode() which schedules cleanup via
call_rcu(), deferring the actual freeing to an RCU callback that
executes within the RCU_SOFTIRQ context.
2. During cleanup of hash tables containing internal structures,
htab_map_free_internal_structs() is invoked, which includes
cond_resched() or cond_resched_rcu() calls to yield the CPU during
potentially long operations.
However, cond_resched() or cond_resched_rcu() cannot be safely called from
atomic RCU softirq context, leading to the BUG warning when attempting
to reschedule.
Fix this by changing from ->free_inode() to ->destroy_inode() and rename
bpf_free_inode() to bpf_destroy_inode() for BPF objects (prog, map, link).
This allows direct inode freeing without RCU callback scheduling,
avoiding the invalid context warning.
Reported-by: Le Chen <tom2cat@sjtu.edu.cn>
Closes: https://lore.kernel.org/all/1444123482.1827743.1750996347470.JavaMail.zimbra@sjtu.edu.cn/
Fixes:
|
|
|
|
5472d60c12 |
tracing clean up and fixes for v6.18:
- Have osnoise tracer use memdup_user_nul()
The function osnoise_cpus_write() open codes a kmalloc() and then
a copy_from_user() and then adds a nul byte at the end which is the
same as simply using memdup_user_nul().
- Fix wakeup and irq tracers when failing to acquire calltime
When the wakeup and irq tracers use the function graph tracer for
tracing function times, it saves a timestamp into the fgraph shadow
stack. It is possible that this could fail to be stored. If that
happens, it exits the routine early. These functions also disable
nesting of the operations by incremeting the data "disable" counter.
But if the calltime exits out early, it never increments the counter
back to what it needs to be.
Since there's only a couple of lines of code that does work after
acquiring the calltime, instead of exiting out early, reverse the
if statement to be true if calltime is acquired, and place the code
that is to be done within that if block. The clean up will always
be done after that.
- Fix ring_buffer_map() return value on failure of __rb_map_vma()
If __rb_map_vma() fails in ring_buffer_map(), it does not return
an error. This means the caller will be working against a bad vma
mapping. Have ring_buffer_map() return an error when __rb_map_vma()
fails.
- Fix regression of writing to the trace_marker file
A bug fix was made to change __copy_from_user_inatomic() to
copy_from_user_nofault() in the trace_marker write function.
The trace_marker file is used by applications to write into
it (usually with a file descriptor opened at the start of the
program) to record into the tracing system. It's usually used
in critical sections so the write to trace_marker is highly
optimized.
The reason for copying in an atomic section is that the write
reserves space on the ring buffer and then writes directly into
it. After it writes, it commits the event. The time between
reserve and commit must have preemption disabled.
The trace marker write does not have any locking nor can it
allocate due to the nature of it being a critical path.
Unfortunately, converting __copy_from_user_inatomic() to
copy_from_user_nofault() caused a regression in Android.
Now all the writes from its applications trigger the fault that
is rejected by the _nofault() version that wasn't rejected by
the _inatomic() version. Instead of getting data, it now just
gets a trace buffer filled with:
tracing_mark_write: <faulted>
To fix this, on opening of the trace_marker file, allocate
per CPU buffers that can be used by the write call. Then
when entering the write call, do the following:
preempt_disable();
cpu = smp_processor_id();
buffer = per_cpu_ptr(cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
This works similarly to seqcount. As it must enabled preemption
to do a copy_from_user() into a per CPU buffer, if it gets
preempted, the buffer could be corrupted by another task.
To handle this, read the number of context switches of the current
CPU, disable migration, enable preemption, copy the data from
user space, then immediately disable preemption again.
If the number of context switches is the same, the buffer
is still valid. Otherwise it must be assumed that the buffer may
have been corrupted and it needs to try again.
Now the trace_marker write can get the user data even if it has
to fault it in, and still not grab any locks of its own.
-----BEGIN PGP SIGNATURE-----
iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaOfVOhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qq6oAP9y+zxuouWjtXIz9/z++aykFgKCkeau
XHSSdJdn4R+AQgEA4SE0UWKH0F6Bg7qwyocahMMQ1tIJRrpihfNrKBUmmQ4=
=wDGp
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing clean up and fixes from Steven Rostedt:
- Have osnoise tracer use memdup_user_nul()
The function osnoise_cpus_write() open codes a kmalloc() and then a
copy_from_user() and then adds a nul byte at the end which is the
same as simply using memdup_user_nul().
- Fix wakeup and irq tracers when failing to acquire calltime
When the wakeup and irq tracers use the function graph tracer for
tracing function times, it saves a timestamp into the fgraph shadow
stack. It is possible that this could fail to be stored. If that
happens, it exits the routine early. These functions also disable
nesting of the operations by incremeting the data "disable" counter.
But if the calltime exits out early, it never increments the counter
back to what it needs to be.
Since there's only a couple of lines of code that does work after
acquiring the calltime, instead of exiting out early, reverse the if
statement to be true if calltime is acquired, and place the code that
is to be done within that if block. The clean up will always be done
after that.
- Fix ring_buffer_map() return value on failure of __rb_map_vma()
If __rb_map_vma() fails in ring_buffer_map(), it does not return an
error. This means the caller will be working against a bad vma
mapping. Have ring_buffer_map() return an error when __rb_map_vma()
fails.
- Fix regression of writing to the trace_marker file
A bug fix was made to change __copy_from_user_inatomic() to
copy_from_user_nofault() in the trace_marker write function. The
trace_marker file is used by applications to write into it (usually
with a file descriptor opened at the start of the program) to record
into the tracing system. It's usually used in critical sections so
the write to trace_marker is highly optimized.
The reason for copying in an atomic section is that the write
reserves space on the ring buffer and then writes directly into it.
After it writes, it commits the event. The time between reserve and
commit must have preemption disabled.
The trace marker write does not have any locking nor can it allocate
due to the nature of it being a critical path.
Unfortunately, converting __copy_from_user_inatomic() to
copy_from_user_nofault() caused a regression in Android. Now all the
writes from its applications trigger the fault that is rejected by
the _nofault() version that wasn't rejected by the _inatomic()
version. Instead of getting data, it now just gets a trace buffer
filled with:
tracing_mark_write: <faulted>
To fix this, on opening of the trace_marker file, allocate per CPU
buffers that can be used by the write call. Then when entering the
write call, do the following:
preempt_disable();
cpu = smp_processor_id();
buffer = per_cpu_ptr(cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
This works similarly to seqcount. As it must enabled preemption to do
a copy_from_user() into a per CPU buffer, if it gets preempted, the
buffer could be corrupted by another task.
To handle this, read the number of context switches of the current
CPU, disable migration, enable preemption, copy the data from user
space, then immediately disable preemption again. If the number of
context switches is the same, the buffer is still valid. Otherwise it
must be assumed that the buffer may have been corrupted and it needs
to try again.
Now the trace_marker write can get the user data even if it has to
fault it in, and still not grab any locks of its own.
* tag 'trace-v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have trace_marker use per-cpu data to read user space
ring buffer: Propagate __rb_map_vma return value to caller
tracing: Fix irqoff tracers on failure of acquiring calltime
tracing: Fix wakeup tracers on failure of acquiring calltime
tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
|
|
|
|
64cf7d058a |
tracing: Have trace_marker use per-cpu data to read user space
It was reported that using __copy_from_user_inatomic() can actually
schedule. Which is bad when preemption is disabled. Even though there's
logic to check in_atomic() is set, but this is a nop when the kernel is
configured with PREEMPT_NONE. This is due to page faulting and the code
could schedule with preemption disabled.
Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud.com/
The solution was to change the __copy_from_user_inatomic() to
copy_from_user_nofault(). But then it was reported that this caused a
regression in Android. There's several applications writing into
trace_marker() in Android, but now instead of showing the expected data,
it is showing:
tracing_mark_write: <faulted>
After reverting the conversion to copy_from_user_nofault(), Android was
able to get the data again.
Writes to the trace_marker is a way to efficiently and quickly enter data
into the Linux tracing buffer. It takes no locks and was designed to be as
non-intrusive as possible. This means it cannot allocate memory, and must
use pre-allocated data.
A method that is actively being worked on to have faultable system call
tracepoints read user space data is to allocate per CPU buffers, and use
them in the callback. The method uses a technique similar to seqcount.
That is something like this:
preempt_disable();
cpu = smp_processor_id();
buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
It's a little more involved than that, but the above is the basic logic.
The idea is to acquire the current CPU buffer, disable migration, and then
enable preemption. At this moment, it can safely use copy_from_user().
After reading the data from user space, it disables preemption again. It
then checks to see if there was any new scheduling on this CPU. If there
was, it must assume that the buffer was corrupted by another task. If
there wasn't, then the buffer is still valid as only tasks in preemptable
context can write to this buffer and only those that are running on the
CPU.
By using this method, where trace_marker open allocates the per CPU
buffers, trace_marker writes can access user space and even fault it in,
without having to allocate or take any locks of its own.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: Wattson CI <wattson-external@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home
Fixes:
|
|
|
|
de4cbd7047 |
ring buffer: Propagate __rb_map_vma return value to caller
The return value from `__rb_map_vma()`, which rejects writable or
executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being
ignored. As a result the caller of `__rb_map_vma` always returned 0
even when the mapping had actually failed, allowing it to proceed
with an invalid VMA.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com
Fixes:
|
|
|
|
c834a97962 |
tracing: Fix irqoff tracers on failure of acquiring calltime
The functions irqsoff_graph_entry() and irqsoff_graph_return() both call
func_prolog_dec() that will test if the data->disable is already set and
if not, increment it and return. If it was set, it returns false and the
caller exits.
The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.
Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home
Fixes:
|
|
|
|
4f7bf54b07 |
tracing: Fix wakeup tracers on failure of acquiring calltime
The functions wakeup_graph_entry() and wakeup_graph_return() both call
func_prolog_preempt_disable() that will test if the data->disable is
already set and if not, increment it and disable preemption. If it was
set, it returns false and the caller exits.
The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.
Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home
Fixes:
|
|
|
|
f0c029d2ff |
tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
Replace kmalloc() followed by copy_from_user() with memdup_user_nul() to simplify and improve osnoise_cpus_write(). Remove the manual NUL-termination. No functional changes intended. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251001130907.364673-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
a667300bd5 |
kho: add support for preserving vmalloc allocations
A vmalloc allocation is preserved using binary structure similar to global KHO memory tracker. It's a linked list of pages where each page is an array of physical address of pages in vmalloc area. kho_preserve_vmalloc() hands out the physical address of the head page to the caller. This address is used as the argument to kho_vmalloc_restore() to restore the mapping in the vmalloc address space and populate it with the preserved pages. [pasha.tatashin@soleen.com: free chunks using free_page() not kfree()] Link: https://lkml.kernel.org/r/mafs0a52idbeg.fsf@kernel.org [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20250921054458.4043761-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
8375b76517 |
kho: replace kho_preserve_phys() with kho_preserve_pages()
to make it clear that KHO operates on pages rather than on a random physical address. The kho_preserve_pages() will be also used in upcoming support for vmalloc preservation. Link: https://lkml.kernel.org/r/20250921054458.4043761-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Chris Li <chrisl@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
469661d0d3 |
kho: check if kho is finalized in __kho_preserve_order()
Patch series "kho: add support for preserving vmalloc allocations", v5.
Following the discussion about preservation of memfd with LUO [1] these
patches add support for preserving vmalloc allocations.
Any KHO uses case presumes that there's a data structure that lists
physical addresses of preserved folios (and potentially some additional
metadata). Allowing vmalloc preservations with KHO allows scalable
preservation of such data structures.
For instance, instead of allocating array describing preserved folios in
the fdt, memfd preservation can use vmalloc:
preserved_folios = vmalloc_array(nr_folios, sizeof(*preserved_folios));
memfd_luo_preserve_folios(preserved_folios, folios, nr_folios);
kho_preserve_vmalloc(preserved_folios, &folios_info);
This patch (of 4):
Instead of checking if kho is finalized in each caller of
__kho_preserve_order(), do it in the core function itself.
Link: https://lkml.kernel.org/r/20250921054458.4043761-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250921054458.4043761-2-rppt@kernel.org
Link: https://lore.kernel.org/all/20250807014442.3829950-30-pasha.tatashin@soleen.com [1]
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
2215336295 |
hyperv-next for v6.18
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmjkpakTHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXip5B/48MvTFJ1qwRGPVzevZQ8Z4SDogEREp 69VS/xRf1YCIzyXyanwqf1dXLq8NAqicSp6ewpJAmNA55/9O0cwT2EtohjeGCu61 krPIvS3KT7xI0uSEniBdhBtALYBscnQ0e3cAbLNzL7bwA6Q6OmvoIawpBADgE/cW aZNCK9jy+WUqtXc6lNtkJtST0HWGDn0h04o2hjqIkZ+7ewjuEEJBUUB/JZwJ41Od UxbID0PAcn9O4n/u/Y/GH65MX+ddrdCgPHEGCLAGAKT24lou3NzVv445OuCw0c4W ilALIRb9iea56ZLVBW5O82+7g9Ag41LGq+841MNlZjeRNONGykaUpTWZ =OR26 -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Unify guest entry code for KVM and MSHV (Sean Christopherson) - Switch Hyper-V MSI domain to use msi_create_parent_irq_domain() (Nam Cao) - Add CONFIG_HYPERV_VMBUS and limit the semantics of CONFIG_HYPERV (Mukesh Rathor) - Add kexec/kdump support on Azure CVMs (Vitaly Kuznetsov) - Deprecate hyperv_fb in favor of Hyper-V DRM driver (Prasanna Kumar T S M) - Miscellaneous enhancements, fixes and cleanups (Abhishek Tiwari, Alok Tiwari, Nuno Das Neves, Wei Liu, Roman Kisel, Michael Kelley) * tag 'hyperv-next-signed-20251006' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv: Remove the spurious null directive line MAINTAINERS: Mark hyperv_fb driver Obsolete fbdev/hyperv_fb: deprecate this in favor of Hyper-V DRM driver Drivers: hv: Make CONFIG_HYPERV bool Drivers: hv: Add CONFIG_HYPERV_VMBUS option Drivers: hv: vmbus: Fix typos in vmbus_drv.c Drivers: hv: vmbus: Fix sysfs output format for ring buffer index Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store() x86/hyperv: Switch to msi_create_parent_irq_domain() mshv: Use common "entry virt" APIs to do work in root before running guest entry: Rename "kvm" entry code assets to "virt" to genericize APIs entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper mshv: Handle NEED_RESCHED_LAZY before transferring to guest x86/hyperv: Add kexec/kdump support on Azure CVMs Drivers: hv: Simplify data structures for VMBus channel close message Drivers: hv: util: Cosmetic changes for hv_utils_transport.c mshv: Add support for a new parent partition configuration clocksource: hyper-v: Skip unnecessary checks for the root partition hyperv: Add missing field to hv_output_map_device_interrupt |
|
|
|
21fbefc588 |
tracing updates for v6.18:
- Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints Individual system call trace events are pseudo events attached to the raw_syscall trace events that just trace the entry and exit of all system calls. When any of these individual system call trace events get enabled, an element in an array indexed by the system call number is assigned to the trace file that defines how to trace it. When the trace event triggers, it reads this array and if the array has an element, it uses that trace file to know what to write it (the trace file defines the output format of the corresponding system call). The issue is that it uses rcu_dereference_ptr() and marks the elements of the array as using RCU. This is incorrect. There is no RCU synchronization here. The event file that is pointed to has a completely different way to make sure its freed properly. The reading of the array during the system call trace event is only to know if there is a value or not. If not, it does nothing (it means this system call isn't being traced). If it does, it uses the information to store the system call data. The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE() macros. - Have the system call trace events use "0x" for hex values Some system call trace events display hex values but do not have "0x" in front of it. Seeing "count: 44" can be assumed that it is 44 decimal when in actuality it is 44 hex (68 decimal). Display "0x44" instead. - Use vmalloc_array() in tracing_map_sort_entries() The function tracing_map_sort_entries() used array_size() and vmalloc() when it could have simply used vmalloc_array(). - Use for_each_online_cpu() in trace_osnoise.c() Instead of open coding for_each_cpu(cpu, cpu_online_mask), use for_each_online_cpu(). - Move the buffer field in struct trace_seq to the end The buffer field in struct trace_seq is architecture dependent in size, and caused padding for the fields after it. By moving the buffer to the end of the structure, it compacts the trace_seq structure better. - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer() The structure that contains cmdline_idx is zeroed by memset(), no need to explicitly zero any of its fields after that. - Use system_percpu_wq instead of system_wq in user_event_mm_remove() As system_wq is being deprecated, use the new wq. - Add cond_resched() is ftrace_module_enable() Some modules have a lot of functions (thousands of them), and the enabling of those functions can take some time. On non preemtable kernels, it was triggering a watchdog timeout. Add a cond_resched() to prevent that. - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power of 2 There's code that depends on PID_MAX_DEFAULT being a power of 2 or it will break. If in the future that changes, make sure the build fails to ensure that the code is fixed that depends on this. - Grab mutex_lock() before ever exiting s_start() The s_start() function is a seq_file start routine. As s_stop() is always called even if s_start() fails, and s_stop() expects the event_mutex to be held as it will always release it. That mutex must always be taken in s_start() even if that function fails. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/4phQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qpToAP4sENpZkZHFOl2PuikmAgjhB6PqiUtL LNuMDx45WygcLwD6Awy8DUlEBN6RzTPA761MSjs0+NMg16QLrhPLxWqFEgw= =nxDn -----END PGP SIGNATURE----- Merge tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Use READ_ONCE() and WRITE_ONCE() instead of RCU for syscall tracepoints Individual system call trace events are pseudo events attached to the raw_syscall trace events that just trace the entry and exit of all system calls. When any of these individual system call trace events get enabled, an element in an array indexed by the system call number is assigned to the trace file that defines how to trace it. When the trace event triggers, it reads this array and if the array has an element, it uses that trace file to know what to write it (the trace file defines the output format of the corresponding system call). The issue is that it uses rcu_dereference_ptr() and marks the elements of the array as using RCU. This is incorrect. There is no RCU synchronization here. The event file that is pointed to has a completely different way to make sure its freed properly. The reading of the array during the system call trace event is only to know if there is a value or not. If not, it does nothing (it means this system call isn't being traced). If it does, it uses the information to store the system call data. The RCU usage here can simply be replaced by READ_ONCE() and WRITE_ONCE() macros. - Have the system call trace events use "0x" for hex values Some system call trace events display hex values but do not have "0x" in front of it. Seeing "count: 44" can be assumed that it is 44 decimal when in actuality it is 44 hex (68 decimal). Display "0x44" instead. - Use vmalloc_array() in tracing_map_sort_entries() The function tracing_map_sort_entries() used array_size() and vmalloc() when it could have simply used vmalloc_array(). - Use for_each_online_cpu() in trace_osnoise.c() Instead of open coding for_each_cpu(cpu, cpu_online_mask), use for_each_online_cpu(). - Move the buffer field in struct trace_seq to the end The buffer field in struct trace_seq is architecture dependent in size, and caused padding for the fields after it. By moving the buffer to the end of the structure, it compacts the trace_seq structure better. - Remove redundant zeroing of cmdline_idx field in saved_cmdlines_buffer() The structure that contains cmdline_idx is zeroed by memset(), no need to explicitly zero any of its fields after that. - Use system_percpu_wq instead of system_wq in user_event_mm_remove() As system_wq is being deprecated, use the new wq. - Add cond_resched() is ftrace_module_enable() Some modules have a lot of functions (thousands of them), and the enabling of those functions can take some time. On non preemtable kernels, it was triggering a watchdog timeout. Add a cond_resched() to prevent that. - Add a BUILD_BUG_ON() to make sure PID_MAX_DEFAULT is always a power of 2 There's code that depends on PID_MAX_DEFAULT being a power of 2 or it will break. If in the future that changes, make sure the build fails to ensure that the code is fixed that depends on this. - Grab mutex_lock() before ever exiting s_start() The s_start() function is a seq_file start routine. As s_stop() is always called even if s_start() fails, and s_stop() expects the event_mutex to be held as it will always release it. That mutex must always be taken in s_start() even if that function fails. * tag 'trace-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix lock imbalance in s_start() memory allocation failure path tracing: Ensure optimized hashing works ftrace: Fix softlockup in ftrace_module_enable tracing: replace use of system_wq with system_percpu_wq tracing: Remove redundant 0 value initialization tracing: Move buffer in trace_seq to end of struct tracing/osnoise: Use for_each_online_cpu() instead of for_each_cpu() tracing: Use vmalloc_array() to improve code tracing: Have syscall trace events show "0x" for values greater than 10 tracing: Replace syscall RCU pointer assignment with READ/WRITE_ONCE() |
|
|
|
2cd14dff16 |
Probes fixes for v6.17
- tracing: Fix race condition in kprobe initialization causing NULL pointer dereference. This happens on weak memory model, which does not correctly manage the flags access with appropriate memory barriers. Use RELEASE-ACQUIRE to fix it. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmjeikYbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bEOYIAKk4KwhSGAhx4Q+Mkqzy pbNPvYMdAwocYIqxg3C1/OVK22DcauTZxMzRq7c3bW776Na6eWD4y9aykhRhNlGY TeKNAtZ0OdRSlq9ISj7P9H4A+Wywrn1CfNkGsZWRrEy0s2xkwUJjEOciI/WA3o8V Nd+Qv+/FTRZ9w/678hb647WrTXIsxSyfBG1L+l25GgqKqqwAtKO8Ur+BhOglxZoC WVHsJT6Xmisg0QiujXX3BgDFAXmAFkBuIQd+D+LHWXgo7W8I6VB7ilQ+hibyXAzf 6yYkZ49kyNLeaWf0/4cc/NaKkq8eK43PXvSvs2yhDt8yWthRLiK/DdeAl63ZNihp D2g= =Ad1o -----END PGP SIGNATURE----- Merge tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe fix from Masami Hiramatsu: - Fix race condition in kprobe initialization causing NULL pointer dereference. This happens on weak memory model, which does not correctly manage the flags access with appropriate memory barriers. Use RELEASE-ACQUIRE to fix it. * tag 'probes-fixes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix race condition in kprobe initialization causing NULL pointer dereference |
|
|
|
908057d185 |
This update includes the following changes:
Drivers: - Add ciphertext hiding support to ccp. - Add hashjoin, gather and UDMA data move features to hisilicon. - Add lz4 and lz77_only to hisilicon. - Add xilinx hwrng driver. - Add ti driver with ecb/cbc aes support. - Add ring buffer idle and command queue telemetry for GEN6 in qat. Others: - Use rcu_dereference_all to stop false alarms in rhashtable. - Fix CPU number wraparound in padata. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmjbpJsACgkQxycdCkmx i6fuTRAAzv5o0MIw4Kc7EEU3zMgFSX0FdcTUPY+eiFrWZrSrvUVW+jYcH9ppO8J7 offAYSZYatcyyU9+u8X22CQNKLdXnKQQ0YymWO35TOpvVxveUM1bqEEV1ZK0xaXD hlJTLoFIsPaVVhi8CW+ZNhDJBwJHNCv7Yi9TUB6sC7rilWWbJ5LzbEVw3Rtg81Lx 0hcuGX2LrpsHOVVWYxGdJ534Kt2lrkt+8/gWOFg3ap3RVQ39tohEjS2Adm2p8eiX zIdru/aYd89EcYoxuFyylX2d/OLmMAQpFsADy/Fys26eeOWtqggH62V1LAiSyEqw vLRBCVKpLhlbNNfnUs0f5nqjjYEUrNk9SA4rgoxITwKoucbWBQMS4zWJTEDKz29n iBBqHsukGpwVOE6RY8BzR/QNJKhZCSsJpGkagS1v6VPa5P1QomuKftGXKB7JKXKz xoyk+DhJyA8rkb/E5J9Ni7+Tb08Y4zvJ1dpCQHZMlln3DKkK+kk3gkpoxXMZwBV2 LbEMGTI+sfnAfqkGCJYAZR9gDJ5LQDR9jy/Ds5jvPuVvvjyY5LY/bjETqGPF2QVs Rz2Sg0RHl7PVZOP6QgbQzkV7SkJrZfyu5iYd0ZfUqZr7BaHLOHJG/E/HlUW3/mXu OjD+Q5gPhiOdc/qn+32+QERTDCFQdbByv0h7khGQA5vHE3XCu8E= =knnk -----END PGP SIGNATURE----- Merge tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Drivers: - Add ciphertext hiding support to ccp - Add hashjoin, gather and UDMA data move features to hisilicon - Add lz4 and lz77_only to hisilicon - Add xilinx hwrng driver - Add ti driver with ecb/cbc aes support - Add ring buffer idle and command queue telemetry for GEN6 in qat Others: - Use rcu_dereference_all to stop false alarms in rhashtable - Fix CPU number wraparound in padata" * tag 'v6.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (78 commits) dt-bindings: rng: hisi-rng: convert to DT schema crypto: doc - Add explicit title heading to API docs hwrng: ks-sa - fix division by zero in ks_sa_rng_init KEYS: X.509: Fix Basic Constraints CA flag parsing crypto: anubis - simplify return statement in anubis_mod_init crypto: hisilicon/qm - set NULL to qm->debug.qm_diff_regs crypto: hisilicon/qm - clear all VF configurations in the hardware crypto: hisilicon - enable error reporting again crypto: hisilicon/qm - mask axi error before memory init crypto: hisilicon/qm - invalidate queues in use crypto: qat - Return pointer directly in adf_ctl_alloc_resources crypto: aspeed - Fix dma_unmap_sg() direction rhashtable: Use rcu_dereference_all and rcu_dereference_all_check crypto: comp - Use same definition of context alloc and free ops crypto: omap - convert from tasklet to BH workqueue crypto: qat - Replace kzalloc() + copy_from_user() with memdup_user() crypto: caam - double the entropy delay interval for retry padata: WQ_PERCPU added to alloc_workqueue users padata: replace use of system_unbound_wq with system_dfl_wq crypto: cryptd - WQ_PERCPU added to alloc_workqueue users ... |
|
|
|
67da125e30 |
RCU pull request for v6.18
This pull request contains the following branches, non-octopus merged:
Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions.
- Fix RCU documentation formatting and typos.
- Replace dead Ottawa Linux Symposium links in RTFP.txt.
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks.
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler().
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly.
- Make initial set of changes to accommodate upcoming system_percpu_wq
changes.
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in tracing,
including adding guards.
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast().
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers.
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed().
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary.
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels without
doing non-debug kernels.
- Fix a number of false-positive diagnostics that were being triggered
by rcutorture starting before boot completed. Running multiple
near-CPU-bound rcutorture processes when there is only the boot CPU
is after all a bit excessive.
- Substitute kcalloc() for kzalloc().
- Remove a redundant kfree() and NULL out kfree()ed objects.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmjWzFcTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jAUeD/4xp/3rvVlfX6UB1ax/lYbQopOm2Hns
7DVO/lp8ih6jUFCRappw7do+jbU3EcP+76sGUd5qkKlRbJIierTHBrolULpKph/p
hQ/LddPYIHg+bPtbq/vA6fFwFI/xwbBNQOMS1bWzzU5iWn/p/ETS1kWptz+g2wk5
pOxx9PnH52Ls5BgCCnb8kGpUuG3cy63sFb52ORrN206EaUq59Q12CLA7aTge4QGV
3fvBIv+U+chagELG0usoPPTC+fMUt8oj8vfVYyDqUPjoyxATDtvxAv7ORFI7ZmEm
CVemKrHWEaVEERfXSSbMp0TpPrNnkB4SoYOGT6vakyjgpcQHLh0GMaUZfluvyROt
nQNaQFbOLfh9GGBjZQpAu1Aa2aAvMcOxyrL1JPhvwFVT0G4hF6s4Zs0zLY7+MAPD
XT0wSf79U4lTzlg4eNrwZazoFGIeUhwyH2X59yc04yXytM7QUBFw+7XIK/PA8Wn4
LgJYrwn6poFinGT2HlwKPrIUKSfCpLS8ePutmMgMUqLhiVtKvoaE6S38h2T97d9i
OuLFDVMm+j0awMadS3cJD5kqtGVbqnPsUuaC3OFlpyng8K+OHHTNCfcFMiMbhgdw
w6cMioYdq/a9mLoN6T50ylvTOKeoKWxXI4X6q6x7vZYUztMpFby+b5mTTbR12dJs
LSqHR8QGSIpNmQ==
=FEf1
-----END PGP SIGNATURE-----
Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Paul McKenney:
"Documentation updates:
- Update whatisRCU.rst and checklist.rst for recent RCU API additions
- Fix RCU documentation formatting and typos
- Replace dead Ottawa Linux Symposium links in RTFP.txt
Miscellaneous RCU updates:
- Document that rcu_barrier() hurries RCU_LAZY callbacks
- Remove redundant interrupt disabling from
rcu_preempt_deferred_qs_handler()
- Move list_for_each_rcu from list.h to rculist.h, and adjust the
include directive in kernel/cgroup/dmem.c accordingly
- Make initial set of changes to accommodate upcoming
system_percpu_wq changes
SRCU updates:
- Create an srcu_read_lock_fast_notrace() for eventual use in
tracing, including adding guards
- Document the reliance on per-CPU operations as implicit RCU readers
in __srcu_read_{,un}lock_fast()
- Document the srcu_flip() function's memory-barrier D's relationship
to SRCU-fast readers
- Remove a redundant preempt_disable() and preempt_enable() pair from
srcu_gp_start_if_needed()
Torture-test updates:
- Fix jitter.sh spin time so that it actually varies as advertised.
It is still quite coarse-grained, but at least it does now vary
- Update torture.sh help text to include the not-so-new --do-normal
parameter, which permits (for example) testing KCSAN kernels
without doing non-debug kernels
- Fix a number of false-positive diagnostics that were being
triggered by rcutorture starting before boot completed. Running
multiple near-CPU-bound rcutorture processes when there is only the
boot CPU is after all a bit excessive
- Substitute kcalloc() for kzalloc()
- Remove a redundant kfree() and NULL out kfree()ed objects"
* tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
rcu: WQ_UNBOUND added to sync_wq workqueue
rcu: WQ_PERCPU added to alloc_workqueue users
rcu: replace use of system_wq with system_percpu_wq
refperf: Set reader_tasks to NULL after kfree()
refperf: Remove redundant kfree() after torture_stop_kthread()
srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed()
srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast
srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers
rculist: move list_for_each_rcu() to where it belongs
refscale: Use kcalloc() instead of kzalloc()
rcutorture: Use kcalloc() instead of kzalloc()
docs: rcu: Replace multiple dead OLS links in RTFP.txt
doc: Fix typo in RCU's torture.rst documentation
Documentation: RCU: Retitle toctree index
Documentation: RCU: Reduce toctree depth
Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block
rcu: docs: Requirements.rst: Abide by conventions of kernel documentation
doc: Add RCU guards to checklist.rst
doc: Update whatisRCU.rst for recent RCU API additions
rcutorture: Delay forward-progress testing until boot completes
...
|
|
|
|
48e3694ae7 |
printk changes for 6.18
-----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ NUgzh2cOwSWWU4UHt/AiD9cd =uFn0 -----END PGP SIGNATURE----- Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add KUnit test for the printk ring buffer - Fix the check of the maximal record size which is allowed to be stored into the printk ring buffer. It prevents corruptions of the ring buffer. Note that printk() is on the safe side. The messages are limited by 1kB buffer and are always small enough for the minimal log buffer size 4kB, see CONFIG_LOG_BUF_SHIFT definition. * tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: ringbuffer: Fix data block max size check printk: kunit: support offstack cpumask printk: kunit: Fix __counted_by() in struct prbtest_rbdata printk: ringbuffer: Explain why the KUnit test ignores failed writes printk: ringbuffer: Add KUnit test |
|
|
|
5b7ce93854 |
kgdb patches for 6.18
A collection of small cleanups this cycle. Thorsten Blum has replaced a number strcpy() calls with safer alternatives (fixing a pointer aliasing bug in the process). Colin Ian King has simplified things by removing some unreachable code. Signed-off-by: Daniel Thompson (RISCstar) <danielt@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmjfvoQACgkQfOMlXTn3 iKGqiA/8DKfwbWqnLuFld5U3lK5AeYSWTnbFbQbk8kEUzl/nAVuSGBGMKSnLHKaF S3DVWkQA1syhhoidm/XCco2cjkvDoOPQNzQwD35nxOX1QuoxOxbx2qRT4FRJE9CC B5WUonEo++z+tmwqVWh9mMXTrYJIMQItg7igMM9wuKWB9Gv08LAfAFU2g4p7O9hf A22mSV3EobcxJWELXZhuPyBoWwmC6W6QoxdJ2mTra2/TaKV7DL4PY8o74vRxGW9+ RKd+fyxpDSvStbYfXIeRZYBjrvqfZ8UHfPV7gCIfppbsrOSrl2HwNlmnOwyNMlgo iy2kNOVoXbdRPZrgisLdvX3Q7/BBMRtX2+wpGqaEMRCHMbIcd8RA/uW6v56XGmhh zg0GBK+OPIudT4XsB+mwtk1QNbnpECsS1JrI5H88zYnvTsyD0XJfT0ER95uTa30c qvz3FvWlxWGpHrtJP17+N/t/lZDWPRXuQZeFOH4oPpsN5dbxBLmXbylWnOmuv86e ilICYU3czpflm2moQmKGg63SRRCtti+1Cs78VJnXu1eNfgJn0ko54HtQi6x2nK76 z0ugll7hi8WKB43sh+OIl0aF5E6Jn9VHpWXgXwVoyTToNrf/cDhHjDsplypaNFDi ewXvQR3bP26g7J6jCcmmm7EhJoZOqHelkKxUemyMBzs0seEbPqg= =6bO0 -----END PGP SIGNATURE----- Merge tag 'kgdb-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "A collection of small cleanups this cycle. Thorsten Blum has replaced a number strcpy() calls with safer alternatives (fixing a pointer aliasing bug in the process). Colin Ian King has simplified things by removing some unreachable code" * tag 'kgdb-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: remove redundant check for scancode 0xe0 kdb: Replace deprecated strcpy() with helper function in kdb_defcmd() kdb: Replace deprecated strcpy() with memcpy() in parse_grep() kdb: Replace deprecated strcpy() with memmove() in vkdb_printf() kdb: Replace deprecated strcpy() with memcpy() in kdb_strdup() kernel: debug: gdbstub: Replace deprecated strcpy() with strscpy() |
|
|
|
cbf33b8e0b |
bpf-fixes
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjfBtMACgkQ6rmadz2v bToTuA/+JizieMIW6eqzCDrrhj45DmuKu6gUMrkmM0xte3QCoLly5FDGJ9HMmrqf L5xfz/TAExg/Nh9a0zpCGE7/fMr9P10Z5vVHMe/I06rcW+/fDrAERXDurlfpfliv /7KqF/lq1D6kp7bcBzWX4dzkT66Nh/Ax6ZJKIXF/K3iwFXBb1A/95Wwgqxuc3xNM n0kthEhnX6x5cojy6fla2zYNoebNLVxPvafTcAtnE0QM8vxdOl0Q7KWrdQpVE5V/ z6WF2M822eTQeUIy6k/ZRckFlmEwa++I+w5EnygBFP8INUHARdYNBQpQVFkg31jz GYGlbTs7jaDbMNWkEhKyDbuhL5Xq9/5FGOaLI9CvDtpPjbfYTZMGtIn9L902XtqE VEfBcTA/g/qd8Urx5622sfPwau64LrSAKBRE/4CJ7/dsnPx4BdKavsA5DMVuyRIm Au6tOxlRFhPWhultnmmbFBjwn33xTleYNUF9wpWQnijNq+Ph29gKZk8Ac5eKKS6A dTSCxv7LGafezN/evr779vXohZyF9vWfqHdITVzTo40tsSbi5v4uyfJ2qzoaS7Gf UQ4F2nkjqddhG/O1W2TY8aFzmWXP2q92V6yqkcq5zjA3D1Ue6JOxRzG/3kNzgsLX Gf1DmPosUyeLNpwdDYRD83Dk7tJ7rAI+x3i6Iwh8qivIhC4CzrU= =Kzf9 -----END PGP SIGNATURE----- Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix selftests/bpf (typo, conflicts) and unbreak BPF CI (Jiri Olsa) - Remove linux/unaligned.h dependency for libbpf_sha256 (Andrii Nakryiko) and add a test (Eric Biggers) - Reject negative offsets for ALU operations in the verifier (Yazhou Tang) and add a test (Eduard Zingerman) - Skip scalar adjustment for BPF_NEG operation if destination register is a pointer (Brahmajit Das) and add a test (KaFai Wan) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: Fix missing #pragma in libbpf_utils.c selftests/bpf: Add tests for rejection of ALU ops with negative offsets selftests/bpf: Add test for libbpf_sha256() bpf: Reject negative offsets for ALU ops libbpf: remove linux/unaligned.h dependency for libbpf_sha256() libbpf: move libbpf_sha256() implementation into libbpf_utils.c libbpf: move libbpf_errstr() into libbpf_utils.c libbpf: remove unused libbpf_strerror_r and STRERR_BUFSIZE libbpf: make libbpf_errno.c into more generic libbpf_utils.c selftests/bpf: Add test for BPF_NEG alu on CONST_PTR_TO_MAP bpf: Skip scalar adjustment for BPF_NEG if dst is a pointer selftests/bpf: Fix realloc size in bpf_get_addrs selftests/bpf: Fix typo in subtest_basic_usdt after merge conflict selftests/bpf: Fix open-coded gettid syscall in uprobe syscall tests |
|
|
|
a498d59c46 |
dma-mapping updates for Linux 6.18:
- refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters; this gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works; an advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page; developped by Leon Romanovsky and Jason Gunthorpe - minor clean-up by Petr Tesarik and Qianfeng Rong -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSrngzkoBtlA8uaaJ+Jp1EFxbsSRAUCaNugqAAKCRCJp1EFxbsS RBvDAQCEd4P6pz6ROQHf5BfiF5J1db2H6bWsFLjajx3KfNWf8gD+P0eQ0hTzLrcd zuSKZTivviOiyjXlt/9GOaXXPnmTwA0= =b0nZ -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_*map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_*map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages() |
|
|
|
50647a1176 |
file->f_path constification
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaN3daAAKCRBZ7Krx/gZQ
6zNWAP9kD6rOJRNqDgea4pibDPa47Tps/WM5tsDv3dsLliY29gEA6sveOWZ3guAj
4oY3ts/NtHLWXvhI7Vd/1mr2aTKEZQk=
=YNK+
-----END PGP SIGNATURE-----
Merge tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull file->f_path constification from Al Viro:
"Only one thing was modifying ->f_path of an opened file - acct(2).
Massaging that away and constifying a bunch of struct path * arguments
in functions that might be given &file->f_path ends up with the
situation where we can turn ->f_path into an anon union of const
struct path f_path and struct path __f_path, the latter modified only
in a few places in fs/{file_table,open,namei}.c, all for struct file
instances that are yet to be opened"
* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
Have cc(1) catch attempts to modify ->f_path
kernel/acct.c: saner struct file treatment
configfs:get_target() - release path as soon as we grab configfs_item reference
apparmor/af_unix: constify struct path * arguments
ovl_is_real_file: constify realpath argument
ovl_sync_file(): constify path argument
ovl_lower_dir(): constify path argument
ovl_get_verity_digest(): constify path argument
ovl_validate_verity(): constify {meta,data}path arguments
ovl_ensure_verity_loaded(): constify datapath argument
ksmbd_vfs_set_init_posix_acl(): constify path argument
ksmbd_vfs_inherit_posix_acl(): constify path argument
ksmbd_vfs_kern_path_unlock(): constify path argument
ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
check_export(): constify path argument
export_operations->open(): constify path argument
rqst_exp_get_by_name(): constify path argument
nfs: constify path argument of __vfs_getattr()
bpf...d_path(): constify path argument
done_path_create(): constify path argument
...
|
|
|
|
51e9889ab1 |
vfs_parse_fs_string() stuff
change on vfs_parse_fs_string() calling conventions - get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNhllQAKCRBZ7Krx/gZQ 6wyiAP9TmyFLBWKC/sDNtRAiGRybEtcwvVZj1agpx0kZIWshUwD7BVg4AfDs+vN3 RoYYg9DR1SP5kZF7h2Ve1T39XDq6ZQU= =YZFG -----END PGP SIGNATURE----- Merge tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs_context updates from Al Viro: "Change vfs_parse_fs_string() calling conventions Get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length" * tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_nfs4_mount(): switch to vfs_parse_fs_string() change the calling conventions for vfs_parse_fs_string() |
|
|
|
e64aeecbbb |
mount-related stuff for this cycle
* saner handling of guards in fs/namespace.c, getting
rid of needlessly strong locking in some of the users.
* lock_mount() calling conventions change - have it set
the environment for attaching to given location, storing the
results in caller-supplied object, without altering the passed
struct path. Make unlock_mount() called as __cleanup for those
objects. It's not exactly guard(), but similar to it.
* MNT_WRITE_HOLD done right - mnt_hold_writers() does *not*
mess with ->mnt_flags anymore, so insertion of a new mount into
->s_mounts of underlying superblock does not, in itself, expose
->mnt_flags of that mount to concurrent modifications.
* getting rid of pathological cases when umount() spends
quadratic time removing the victims from propagation graph -
part of that had been dealt with last cycle, this should finish
it.
* a bunch of stuff constified.
* assorted cleanups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaNhzLAAKCRBZ7Krx/gZQ
63/IAP4yxJ6e3Pt66Uw0MeuSNmeLsQwb7mYo72lsYHpxjYANZAEAspMaLDU9NHxM
Dy6WDVoJnf7+aDlD6E443YMfPX8XRQM=
=5T+t
-----END PGP SIGNATURE-----
Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
"Several piles this cycle, this mount-related one being the largest and
trickiest:
- saner handling of guards in fs/namespace.c, getting rid of
needlessly strong locking in some of the users
- lock_mount() calling conventions change - have it set the
environment for attaching to given location, storing the results in
caller-supplied object, without altering the passed struct path.
Make unlock_mount() called as __cleanup for those objects. It's not
exactly guard(), but similar to it
- MNT_WRITE_HOLD done right.
mnt_hold_writers() does *not* mess with ->mnt_flags anymore, so
insertion of a new mount into ->s_mounts of underlying superblock
does not, in itself, expose ->mnt_flags of that mount to concurrent
modifications
- getting rid of pathological cases when umount() spends quadratic
time removing the victims from propagation graph - part of that had
been dealt with last cycle, this should finish it
- a bunch of stuff constified
- assorted cleanups
* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
constify {__,}mnt_is_readonly()
WRITE_HOLD machinery: no need for to bump mount_lock seqcount
struct mount: relocate MNT_WRITE_HOLD bit
preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
setup_mnt(): primitive for connecting a mount to filesystem
simplify the callers of mnt_unhold_writers()
copy_mnt_ns(): use guards
copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
open_detached_copy(): separate creation of namespace into helper
open_detached_copy(): don't bother with mount_lock_hash()
path_has_submounts(): use guard(mount_locked_reader)
fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
ecryptfs: get rid of pointless mount references in ecryptfs dentries
umount_tree(): take all victims out of propagation graph at once
do_mount(): use __free(path_put)
do_move_mount_old(): use __free(path_put)
constify can_move_mount_beneath() arguments
path_umount(): constify struct path argument
may_copy_tree(), __do_loopback(): constify struct path argument
path_mount(): constify struct path argument
...
|
|
|
|
61e19cd2e5 |
tracing: Fix lock imbalance in s_start() memory allocation failure path
When s_start() fails to allocate memory for set_event_iter, it returns NULL before acquiring event_mutex. However, the corresponding s_stop() function always tries to unlock the mutex, causing a lock imbalance warning: WARNING: bad unlock balance detected! 6.17.0-rc7-00175-g2b2e0c04f78c #7 Not tainted ------------------------------------- syz.0.85611/376514 is trying to release lock (event_mutex) at: [<ffffffff8dafc7a4>] traverse.part.0.constprop.0+0x2c4/0x650 fs/seq_file.c:131 but there are no more locks to release! The issue was introduced by commit |
|
|
|
e406d57be7 |
Patch series in this pull request:
- The 3 patch series "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API. - The 9 patch series "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place. - The 5 patch series "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool. - The 3 patch series "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO. - The 2 patch series "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark. - Plus another 50-odd singleton patches all over the place. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN78zwAKCRDdBJ7gKXxA jhLeAQCddTv0XtSUTrvBvmrJVUBrQQeJc+LtNopMIjfAF/WAWAEAogSVKxg+HHEB GaVixx4zDriNzEqrqiCx9rm4l+YooQA= =XRe0 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API - "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place - "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO - "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark - plus another 50-odd singleton patches all over the place * tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits) Squashfs: reject negative file sizes in squashfs_read_inode() kallsyms: use kmalloc_array() instead of kmalloc() MAINTAINERS: update Sibi Sankar's email address Squashfs: add SEEK_DATA/SEEK_HOLE support Squashfs: add additional inode sanity checking lib/genalloc: fix device leak in of_gen_pool_get() panic: remove CONFIG_PANIC_ON_OOPS_VALUE ocfs2: fix double free in user_cluster_connect() checkpatch: suppress strscpy warnings for userspace tools cramfs: fix incorrect physical page address calculation kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit Squashfs: fix uninit-value in squashfs_get_parent kho: only fill kimage if KHO is finalized ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name() kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock coccinelle: platform_no_drv_owner: handle also built-in drivers coccinelle: of_table: handle SPI device ID tables lib/decompress: use designated initializers for struct compress_format efi: support booting with kexec handover (KHO) ... |
|
|
|
8804d970fa |
Summary of significant series in this pull request:
- The 3 patch series "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation. - The 4 patch series "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs. - The 2 patch series "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters. - The 3 patch series "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps. - The 2 patch series "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code. - The 11 patch series "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code. - The 5 patch series "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero. - The 3 patch series "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature. - The 10 patch series "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs. - The 2 patch series "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code. - The 7 patch series "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code. - The 7 patch series "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations. - The 11 patch series "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc. - The 3 patch series "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path. - The 5 patch series "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code. - The 2 patch series "test that rmap behaves as expected" from Wei Yang adds some rmap selftests. - The 3 patch series "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers. - The 2 patch series "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues. - The 3 patch series "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks. - The 2 patch series "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code. - The 11 patch series "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem. - The 4 patch series "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/. - The 2 patch series "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c. - The 2 patch series "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation. - The 3 patch series "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc). - The 2 patch series "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code. - The 37 patch series "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function. - The 2 patch series "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only. - The 3 patch series "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code. - The 12 patch series "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy. - The 7 patch series "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages(). - The 3 patch series "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver. - The 2 patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code. - The 14 patch series "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations. - The 3 patch series "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little. - The 3 patch series "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code. - The 2 patch series "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature. - The 3 patch series "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work. - The 2 patch series "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem. - The 2 patch series "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code. - The 10 patch series "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources. - The 5 patch series "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON. - The 7 patch series "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix. - The 2 patch series "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information. - The 2 patch series "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma. - The 2 patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems. - The 6 patch series "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate. - The 2 patch series "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters. - The 2 patch series "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN3cywAKCRDdBJ7gKXxA jtaPAQDmIuIu7+XnVUK5V11hsQ/5QtsUeLHV3OsAn4yW5/3dEQD/UddRU08ePN+1 2VRB0EwkLAdfMWW7TfiNZ+yhuoiL/AA= =4mhY -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ... |
|
|
|
24d9e8b3c9 |
slab updates for 6.18
-----BEGIN PGP SIGNATURE----- iQFPBAABCAA5FiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmja74IbFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJELvgsHXSRYiacR4H/04aBsr7LZnTJVeZLQwK HKoOwXBqiQyqPdjKXGKnp7Mh9gRp2W3V11VsYTuDJNUS+Vz5YXW0z8cRnUfZ3SYs l+GZC3vZeAy2EVJE1U6Mb673hU8vziI80IO2q/tGzaj9a+wC3L0lemc+YFQTwG+u pMtt8zU2vHRjgkx8TNNqJBBOLLDV+RzIl8pqXVnh4eju6x6ZdreGnjXaePYMdjG0 fXLf9XwIeWREqbfeOCEOB50Ts71kkdiOeskwnJyfCTDT8WTu3zC/dICqfh66e3Gg 8hQKvMsuKpm/FwbtgdB0WvaDjENH6PmY+ubLYVxwvNpcsTSqfe0IYGm+HpUP+TPf m+Y= =w+JL -----END PGP SIGNATURE----- Merge tag 'slab-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - A new layer for caching objects for allocation and free via percpu arrays called sheaves. The aim is to combine the good parts of SLAB (lower-overhead and simpler percpu caching, compared to SLUB) without the past issues with arrays for freeing remote NUMA node objects and their flushing. It also allows more efficient kfree_rcu(), and cheaper object preallocations for cases where the exact number of objects is unknown, but an upper bound is. Currently VMAs and maple nodes are using this new caching, with a plan to enable it for all caches and remove the complex SLUB fastpath based on cpu (partial) slabs and this_cpu_cmpxchg_double(). (Vlastimil Babka, with Liam Howlett and Pedro Falcato for the maple tree changes) - Re-entrant kmalloc_nolock(), which allows opportunistic allocations from NMI and tracing/kprobe contexts. Building on prior page allocator and memcg changes, it will result in removing BPF-specific caches on top of slab (Alexei Starovoitov) - Various fixes and cleanups. (Kuan-Wei Chiu, Matthew Wilcox, Suren Baghdasaryan, Ye Liu) * tag 'slab-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (40 commits) slab: Introduce kmalloc_nolock() and kfree_nolock(). slab: Reuse first bit for OBJEXTS_ALLOC_FAIL slab: Make slub local_(try)lock more precise for LOCKDEP mm: Introduce alloc_frozen_pages_nolock() mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock(). locking/local_lock: Introduce local_lock_is_locked(). maple_tree: Convert forking to use the sheaf interface maple_tree: Add single node allocation support to maple state maple_tree: Prefilled sheaf conversion and testing tools/testing: Add support for prefilled slab sheafs maple_tree: Replace mt_free_one() with kfree() maple_tree: Use kfree_rcu in ma_free_rcu testing/radix-tree/maple: Hack around kfree_rcu not existing tools/testing: include maple-shim.c in maple.c maple_tree: use percpu sheaves for maple_node_cache mm, vma: use percpu sheaves for vm_area_struct cache tools/testing: Add support for changes to slab for sheaves slab: allow NUMA restricted allocations to use percpu sheaves tools/testing/vma: Implement vm_refcnt reset slab: skip percpu sheaves for remote object freeing ... |
|
|
|
07fdad3a93 |
Networking changes for 6.18.
Core & protocols
----------------
- Improve drop account scalability on NUMA hosts for RAW and UDP sockets
and the backlog, almost doubling the Pps capacity under DoS.
- Optimize the UDP RX performance under stress, reducing contention,
revisiting the binary layout of the involved data structs and
implementing NUMA-aware locking. This improves UDP RX performance by
an additional 50%, even more under extreme conditions.
- Add support for PSP encryption of TCP connections; this mechanism has
some similarities with IPsec and TLS, but offers superior HW offloads
capabilities.
- Ongoing work to support Accurate ECN for TCP. AccECN allows more than
one congestion notification signal per RTT and is a building block for
Low Latency, Low Loss, and Scalable Throughput (L4S).
- Reorganize the TCP socket binary layout for data locality, reducing
the number of touched cachelines in the fastpath.
- Refactor skb deferral free to better scale on large multi-NUMA hosts,
this improves TCP and UDP RX performances significantly on such HW.
- Increase the default socket memory buffer limits from 256K to 4M to
better fit modern link speeds.
- Improve handling of setups with a large number of nexthop, making dump
operating scaling linearly and avoiding unneeded synchronize_rcu() on
delete.
- Improve bridge handling of VLAN FDB, storing a single entry per bridge
instead of one entry per port; this makes the dump order of magnitude
faster on large switches.
- Restore IP ID correctly for encapsulated packets at GSO segmentation
time, allowing GRO to merge packets in more scenarios.
- Improve netfilter matching performance on large sets.
- Improve MPTCP receive path performance by leveraging recently
introduced core infrastructure (skb deferral free) and adopting recent
TCP autotuning changes.
- Allow bridges to redirect to a backup port when the bridge port is
administratively down.
- Introduce MPTCP 'laminar' endpoint that con be used only once per
connection and simplify common MPTCP setups.
- Add RCU safety to dst->dev, closing a lot of possible races.
- A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing
code duplication.
- Supports pulling data from an skb frag into the linear area of an XDP
buffer.
Things we sprinkled into general kernel code
--------------------------------------------
- Generate netlink documentation from YAML using an integrated
YAML parser.
Driver API
----------
- Support using IPv6 Flow Label in Rx hash computation and RSS queue
selection.
- Introduce API for fetching the DMA device for a given queue, allowing
TCP zerocopy RX on more H/W setups.
- Make XDP helpers compatible with unreadable memory, allowing more
easily building DevMem-enabled drivers with a unified XDP/skbs
datapath.
- Add a new dedicated ethtool callback enabling drivers to provide the
number of RX rings directly, improving efficiency and clarity in RX
ring queries and RSS configuration.
- Introduce a burst period for the health reporter, allowing better
handling of multiple errors due to the same root cause.
- Support for DPLL phase offset exponential moving average, controlling
the average smoothing factor.
Device drivers
--------------
- Add a new Huawei driver for 3rd gen NIC (hinic3).
- Add a new SpacemiT driver for K1 ethernet MAC.
- Add a generic abstraction for shared memory communication devices
(dibps)
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- Use multiple per-queue doorbell, to avoid MMIO contention issues
- support adjacent functions, allowing them to delegate their
SR-IOV VFs to sibling PFs
- support RSS for IPSec offload
- support exposing raw cycle counters in PTP and mlx5
- support for disabling host PFs.
- Intel (100G, ice, idpf):
- ice: support for SRIOV VFs over an Active-Active link aggregate
- ice: support for firmware logging via debugfs
- ice: support for Earliest TxTime First (ETF) hardware offload
- idpf: support basic XDP functionalities and XSk
- Broadcom (bnxt):
- support Hyper-V VF ID
- dynamic SRIOV resource allocations for RoCE
- Meta (fbnic):
- support queue API, zero-copy Rx and Tx
- support basic XDP functionalities
- devlink health support for FW crashes and OTP mem corruptions
- expand hardware stats coverage to FEC, PHY, and Pause
- Wangxun:
- support ethtool coalesce options
- support for multiple RSS contexts
- Ethernet virtual:
- Macsec:
- replace custom netlink attribute checks with policy-level checks
- Bonding:
- support aggregator selection based on port priority
- Microsoft vNIC:
- use page pool fragments for RX buffers instead of full pages to
improve memory efficiency
- Ethernet NICs consumer, and embedded:
- Qualcomm: support Ethernet function for IPQ9574 SoC
- Airoha: implement wlan offloading via NPU
- Freescale
- enetc: add NETC timer PTP driver and add PTP support
- fec: enable the Jumbo frame support for i.MX8QM
- Renesas (R-Car S4): support HW offloading for layer 2 switching
- support for RZ/{T2H, N2H} SoCs
- Cadence (macb): support TAPRIO traffic scheduling
- TI:
- support for Gigabit ICSS ethernet SoC (icssm-prueth)
- Synopsys (stmmac): a lot of cleanups
- Ethernet PHYs:
- Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
driver
- Support bcm63268 GPHY power control
- Support for Micrel lan8842 PHY and PTP
- Support for Aquantia AQR412 and AQR115
- CAN:
- a large CAN-XL preparation work
- reorganize raw_sock and uniqframe struct to minimize memory usage
- rcar_canfd: update the CAN-FD handling
- WiFi:
- extended Neighbor Awareness Networking (NAN) support
- S1G channel representation cleanup
- improve S1G support
- WiFi drivers:
- Intel (iwlwifi):
- major refactor and cleanup
- Broadcom (brcm80211):
- support for AP isolation
- RealTek (rtw88/89) rtw88/89:
- preparation work for RTL8922DE support
- MediaTek (mt76):
- HW restart improvements
- MLO support
- Qualcomm/Atheros (ath10k_
- GTK rekey fixes
- Bluetooth drivers:
- btusb: support for several new IDs for MT7925
- btintel: support for BlazarIW core
- btintel_pcie: support for _suspend() / _resume()
- btintel_pcie: support for Scorpious, Panther Lake-H484 IDs
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmjdBdsSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkWnAP/2Jz0ExnYMDxLxkkKqMte+3JNeF/KNjB
zVIslYN/5ekkd1TMXyDLFlWHqUOUMSF9+cK5ZRMHG6/lzOueSzFuuuDFsWs6Kf2f
q7Q9MzlXhR9++YCsdES1uS5x3PPjMInzo2ZivMFa6fUDLLFSzeAOKL9k+RS0EggU
VlXv2Wkm73R0O6KAssgDsHke9cnNz+F0DzhQ0S3qkyZF9tS5NrDeUl7fZ47XZgwb
ZuXdEzKmTTepo2XvZGxJoF2D7nekWFlHhLjEPpDJtET19nwhukCry41/FplJwlKR
dWsYkqLkrOEQKFQ+q++/5c3BgFOG+IrQLbP5bGQF1tZRMl4Dqp9PDxK5fKJfccbS
0VY3Y2qWOSYejT2Ji2kEHR5E5rPyFm3Y5A4Rnpz0yEHj14vL2v4zf7CZRkCyyDfD
doIZXFGkM0+N7QeXLEN833cje9zjaXuP9GAE+2bb+wAWAZAqof4KX8JgHh+y5Rwm
pvUtvFxmEtntlMwNBap8aT3FquGtfZncU8pzAE5kvWjuMvyF0NVWiE5rB2kSQM/X
NLmUdvDyjwwJqthXAJG0fl+a0mNJ/kOAqSOKJDJrfKDGWa+ovwY0iY06SpK0wIbO
Wz7tpMk5MSlYXW8xWMlbyhvvU6T9xuoQ2KV4QTdMxc6Ir3sNX6YkQr+gjQjxB0gx
ST2QF6lZeWFh
=w2Kz
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core & protocols:
- Improve drop account scalability on NUMA hosts for RAW and UDP
sockets and the backlog, almost doubling the Pps capacity under DoS
- Optimize the UDP RX performance under stress, reducing contention,
revisiting the binary layout of the involved data structs and
implementing NUMA-aware locking. This improves UDP RX performance
by an additional 50%, even more under extreme conditions
- Add support for PSP encryption of TCP connections; this mechanism
has some similarities with IPsec and TLS, but offers superior HW
offloads capabilities
- Ongoing work to support Accurate ECN for TCP. AccECN allows more
than one congestion notification signal per RTT and is a building
block for Low Latency, Low Loss, and Scalable Throughput (L4S)
- Reorganize the TCP socket binary layout for data locality, reducing
the number of touched cachelines in the fastpath
- Refactor skb deferral free to better scale on large multi-NUMA
hosts, this improves TCP and UDP RX performances significantly on
such HW
- Increase the default socket memory buffer limits from 256K to 4M to
better fit modern link speeds
- Improve handling of setups with a large number of nexthop, making
dump operating scaling linearly and avoiding unneeded
synchronize_rcu() on delete
- Improve bridge handling of VLAN FDB, storing a single entry per
bridge instead of one entry per port; this makes the dump order of
magnitude faster on large switches
- Restore IP ID correctly for encapsulated packets at GSO
segmentation time, allowing GRO to merge packets in more scenarios
- Improve netfilter matching performance on large sets
- Improve MPTCP receive path performance by leveraging recently
introduced core infrastructure (skb deferral free) and adopting
recent TCP autotuning changes
- Allow bridges to redirect to a backup port when the bridge port is
administratively down
- Introduce MPTCP 'laminar' endpoint that con be used only once per
connection and simplify common MPTCP setups
- Add RCU safety to dst->dev, closing a lot of possible races
- A significant crypto library API for SCTP, MPTCP and IPv6 SR,
reducing code duplication
- Supports pulling data from an skb frag into the linear area of an
XDP buffer
Things we sprinkled into general kernel code:
- Generate netlink documentation from YAML using an integrated YAML
parser
Driver API:
- Support using IPv6 Flow Label in Rx hash computation and RSS queue
selection
- Introduce API for fetching the DMA device for a given queue,
allowing TCP zerocopy RX on more H/W setups
- Make XDP helpers compatible with unreadable memory, allowing more
easily building DevMem-enabled drivers with a unified XDP/skbs
datapath
- Add a new dedicated ethtool callback enabling drivers to provide
the number of RX rings directly, improving efficiency and clarity
in RX ring queries and RSS configuration
- Introduce a burst period for the health reporter, allowing better
handling of multiple errors due to the same root cause
- Support for DPLL phase offset exponential moving average,
controlling the average smoothing factor
Device drivers:
- Add a new Huawei driver for 3rd gen NIC (hinic3)
- Add a new SpacemiT driver for K1 ethernet MAC
- Add a generic abstraction for shared memory communication
devices (dibps)
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- Use multiple per-queue doorbell, to avoid MMIO contention
issues
- support adjacent functions, allowing them to delegate their
SR-IOV VFs to sibling PFs
- support RSS for IPSec offload
- support exposing raw cycle counters in PTP and mlx5
- support for disabling host PFs.
- Intel (100G, ice, idpf):
- ice: support for SRIOV VFs over an Active-Active link
aggregate
- ice: support for firmware logging via debugfs
- ice: support for Earliest TxTime First (ETF) hardware offload
- idpf: support basic XDP functionalities and XSk
- Broadcom (bnxt):
- support Hyper-V VF ID
- dynamic SRIOV resource allocations for RoCE
- Meta (fbnic):
- support queue API, zero-copy Rx and Tx
- support basic XDP functionalities
- devlink health support for FW crashes and OTP mem corruptions
- expand hardware stats coverage to FEC, PHY, and Pause
- Wangxun:
- support ethtool coalesce options
- support for multiple RSS contexts
- Ethernet virtual:
- Macsec:
- replace custom netlink attribute checks with policy-level
checks
- Bonding:
- support aggregator selection based on port priority
- Microsoft vNIC:
- use page pool fragments for RX buffers instead of full pages
to improve memory efficiency
- Ethernet NICs consumer, and embedded:
- Qualcomm: support Ethernet function for IPQ9574 SoC
- Airoha: implement wlan offloading via NPU
- Freescale
- enetc: add NETC timer PTP driver and add PTP support
- fec: enable the Jumbo frame support for i.MX8QM
- Renesas (R-Car S4):
- support HW offloading for layer 2 switching
- support for RZ/{T2H, N2H} SoCs
- Cadence (macb): support TAPRIO traffic scheduling
- TI:
- support for Gigabit ICSS ethernet SoC (icssm-prueth)
- Synopsys (stmmac): a lot of cleanups
- Ethernet PHYs:
- Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
driver
- Support bcm63268 GPHY power control
- Support for Micrel lan8842 PHY and PTP
- Support for Aquantia AQR412 and AQR115
- CAN:
- a large CAN-XL preparation work
- reorganize raw_sock and uniqframe struct to minimize memory
usage
- rcar_canfd: update the CAN-FD handling
- WiFi:
- extended Neighbor Awareness Networking (NAN) support
- S1G channel representation cleanup
- improve S1G support
- WiFi drivers:
- Intel (iwlwifi):
- major refactor and cleanup
- Broadcom (brcm80211):
- support for AP isolation
- RealTek (rtw88/89) rtw88/89:
- preparation work for RTL8922DE support
- MediaTek (mt76):
- HW restart improvements
- MLO support
- Qualcomm/Atheros (ath10k):
- GTK rekey fixes
- Bluetooth drivers:
- btusb: support for several new IDs for MT7925
- btintel: support for BlazarIW core
- btintel_pcie: support for _suspend() / _resume()
- btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"
* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
net: stmmac: Add support for Allwinner A523 GMAC200
dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
Revert "Documentation: net: add flow control guide and document ethtool API"
octeontx2-pf: fix bitmap leak
octeontx2-vf: fix bitmap leak
net/mlx5e: Use extack in set rxfh callback
net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
net/mlx5e: Introduce mlx5e_rss_init_params
net/mlx5e: Remove unused mdev param from RSS indir init
net/mlx5: Improve QoS error messages with actual depth values
net/mlx5e: Prevent entering switchdev mode with inconsistent netns
net/mlx5: HWS, Generalize complex matchers
net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
selftests/net: add tcp_port_share to .gitignore
Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
net: add NUMA awareness to skb_attempt_defer_free()
net: use llist for sd->defer_list
net: make softnet_data.defer_count an atomic
selftests: drv-net: psp: add tests for destroying devices
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
...
|
|
|
|
d7a018eb76 |
Kernel Concurrency Sanitizer (KCSAN) updates for v6.18
- Replace deprecated strcpy() with strscpy() This change has had 6 weeks of linux-next exposure. -----BEGIN PGP SIGNATURE----- iIcEABYKAC8WIQR7t4b/75lzOR3l5rcxsLN3bbyLnwUCaNpmMxEcZWx2ZXJAZ29v Z2xlLmNvbQAKCRAxsLN3bbyLnxmiAP0cwJKy4B307v7TJUCrvXVqXJtyuGmdBdKv j2MOuvt4ygD9HIVd/pFeFkjH0MVNuSbYNl1ICb/UK+vx/5MA/doXPAM= =jiYN -----END PGP SIGNATURE----- Merge tag 'kcsan-20250929-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux Pull Kernel Concurrency Sanitizer (KCSAN) update from Marco Elver: - Replace deprecated strcpy() with strscpy() * tag 'kcsan-20250929-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux: kcsan: test: Replace deprecated strcpy() with strscpy() |
|
|
|
7a75a5da79 | Merge branch 'rework/ringbuffer-kunit-test' into for-linus | |
|
|
30bbcb4470 |
linux_kselftest-kunit-6.18-rc1
- A seven patch series adds a new parameterized test features
KUnit parameterized tests currently support two primary methods for
getting parameters:
1. Defining custom logic within a generate_params() function.
2. Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC()
macros with a pre-defined static array and passing
the created *_gen_params() to KUNIT_CASE_PARAM().
These methods present limitations when dealing with dynamically
generated parameter arrays, or in scenarios where populating parameters
sequentially via generate_params() is inefficient or overly complex.
These limitations are fixed with a parameterized test method.
- Fixes issues in kunit build artifacts cleanup,
- Fixes parsing skipped test problem in kselftest framework,
- Enables PCI on UML without triggering WARN()
- a few other fixes and adds support for new configs such as MIPS
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmjbB0QACgkQCwJExA0N
QxxpUQ/8CslEjTv2+/LA122eGDtg8np61W1MTNEQslYyiobhZ9CXeFw4yJlTTb0w
hShFK1pGAWgkVCVKIJOaeItY0hF3BIxyVtlQxDUKvukpMnLvZRI0KhG7p8aYnCq+
jfQRy4gqwjaHyLQekQ1v6vRdHdTfh5mB6OUOCYa7QPQuKhOkBOaq2ZJ+9eFnkPXl
KUGTcC3pL0jQQmaSgo7zFUbdGSq0JZkNbpMj0fAYB0zs+MSpfkAnQYEw3qaxU1/Z
lu+CQdom+77Kt0r7UyEb6xUBeRveqYAWv6oFKq3CBzhlEGCkqVv6zgNg48qMC0QO
cVW0E61u8bVAalhb7hOT3QsEp1k01Lr2N9/VBLP4LRb2HMlD2qlXDo6ftBjNgx/y
m5Kukh1wQoOTaTJekdDMlaRXwApdn3MegheXE83n4dr44C5oyhlR8fFk/0Y6gSEU
RKUg8ZwUNUhCdw4VgBn8zoSkK18D74feRoustmuMS00I/8to3iGvQ7kn3nz2fPMb
AaZ6a2K95gk9TrD4UZdHF1aPHDudn3e8g6LdSOV/NoRNI9H+fXfhvxcLwW8WCqOw
u5qfTxNUk4mPQrCs9hcDnWm9Ppuu5YeY989rKkA7j4z71dhyMOPynpOM+vXRijB8
9zRcvyls87tgDAquIZvTp6Q/oEmi60OF40DRC9MjgOw0Iw7feIc=
=VfHu
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- New parameterized test features
KUnit parameterized tests supported two primary methods for getting
parameters:
- Defining custom logic within a generate_params() function.
- Using the KUNIT_ARRAY_PARAM() and KUNIT_ARRAY_PARAM_DESC() macros
with a pre-defined static array and passing the created
*_gen_params() to KUNIT_CASE_PARAM().
These methods present limitations when dealing with dynamically
generated parameter arrays, or in scenarios where populating
parameters sequentially via generate_params() is inefficient or
overly complex.
These limitations are fixed with a parameterized test method
- Fix issues in kunit build artifacts cleanup
- Fix parsing skipped test problem in kselftest framework
- Enable PCI on UML without triggering WARN()
- a few other fixes and adds support for new configs such as MIPS
* tag 'linux_kselftest-kunit-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Extend kconfig help text for KUNIT_UML_PCI
rust: kunit: allow `cfg` on `test`s
kunit: qemu_configs: Add MIPS configurations
kunit: Enable PCI on UML without triggering WARN()
Documentation: kunit: Document new parameterized test features
kunit: Add example parameterized test with direct dynamic parameter array setup
kunit: Add example parameterized test with shared resource management using the Resource API
kunit: Enable direct registration of parameter arrays to a KUnit test
kunit: Pass parameterized test context to generate_params()
kunit: Introduce param_init/exit for parameterized test context management
kunit: Add parent kunit for parameterized test context
kunit: tool: Accept --raw_output=full as an alias of 'all'
kunit: tool: Parse skipped tests from kselftest.h
kunit: Always descend into kunit directory during build
|
|
|
|
991053178e |
Power management updates for 6.18-rc1
- Rearrange variable declarations involving __free() in the cpufreq
core and intel_pstate driver to follow common coding style (Rafael
Wysocki)
- Fix object lifecycle issue in update_qos_request(), rearrange
freq QoS updates using __free(), and adjust frequency percentage
computations in the intel_pstate driver (Rafael Wysocki)
- Update intel_pstate to allow it to enable HWP without EPP if the
new DEC (Dynamic Efficiency Control) HW feature is enabled (Rafael
Wysocki)
- Use on_each_cpu_mask() in drv_write() in the ACPI cpufreq driver
to simplify the code (Rafael Wysocki)
- Use likely() optimization in intel_pstate_sample() (Yaxiong Tian)
- Remove dead EPB-related code from intel_pstate (Srinivas Pandruvada)
- Use scope-based cleanup for cpufreq policy references in multiple
cpufreq drivers (Zihuan Zhang)
- Avoid calling get_governor() for the first policy in the cpufreq core
to simplify the initial policy path (Zihuan Zhang)
- Clean up the cpufreq core in multiple places (Zihuan Zhang)
- Use int type to store negative error codes in the cpufreq core and
update the speedstep-lib to use int for error codes (Qianfeng Rong)
- Update the efficient idle check for Intel extended Families in the
ondemand cpufreq governor (Sohil Mehta)
- Replace sscanf() with kstrtouint() in the conservative cpufreq
governor (Kaushlendra Kumar)
- Rename CpumaskVar::as[_mut]_ref to from_raw[_mut] in the cpumask
Rust code and mark CpumaskVar as transparent (Alice Ryhl, Baptiste
Lepers)
- Update ARef and AlwaysRefCounted imports from sync::aref in the OPP
Rust code (Shankari Anand)
- Add support for AN7583 SoC to the airoha cpufreq driver (Christian
Marangi)
- Enable cpufreq for ipq5424 in the qcom-nvmem cpufreq driver (Md Sadre
Alam)
- Add support for MT8196 to the mediatek-hw cpufreq driver, refactor
that driver and add mediatek,mt8196-cpufreq-hw DT binding (Nicolas
Frattaroli)
- Avoid redundant conditions in the mediatek cpufreq driver (Liao
Yuanhong)
- Add support for AM62D2 to the ti cpufreq driver and blocklist
ti,am62d2 SoC in dt-platdev (Paresh Bhagat)
- Support more speed grades on AM62Px SoC in the ti cpufreq driver,
allow all silicon revisions to support OPPs in it, and fix supported
hardware for 1GHz OPP (Judith Mendez)
- Add QCS615 compatible to DT bindings for cpufreq-qcom-hw (Taniya Das)
- Minor assorted updates of the scmi, longhaul, CPPC, and armada-37xx
cpufreq drivers (Akhilesh Patil, BowenYu, Dennis Beier, and Florian
Fainelli)
- Remove outdated cpufreq-dt.txt (Frank Li)
- Fix python gnuplot package names in the amd_pstate_tracer utility
(Kuan-Wei Chiu)
- Saravana Kannan will maintain the virtual-cpufreq driver (Saravana
Kannan)
- Prevent CPU capacity updates after registering a perf domain from
failing on a first CPU that is not present (Christian Loehle)
- Add support for the cases in which frequency alone is not sufficient
to uniquely identify an OPP (Krishna Chaitanya Chundru)
- Use to_result() for OPP error handling in Rust (Onur Özkan)
- Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq
driver (Nicolas Frattaroli)
- Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X)
are reported as half by adding a cycle multiplier to the DFI driver
in rockchip-dfi devfreq-event driver (Nicolas Frattaroli)
- Fix missing error pointer dereference check of regulator instance in
the mtk-cci devfreq driver probe and remove a redundant condition from
an if () statement in that driver (Dan Carpenter, Liao Yuanhong)
- Fail cpuidle device registration if there is one already to avoid
sysfs-related issues (Rafael Wysocki)
- Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf() in
cpuidle (Vivek Yadav)
- Fix device and OF node leaks at probe in the qcom-spm cpuidle driver
and drop unnecessary initialisations from it (Johan Hovold)
- Remove unnecessary address-of operators from the intel_idle cpuidle
driver (Kaushlendra Kumar)
- Rearrange main loop in menu_select() to make the code in that funtion
easier to follow (Rafael Wysocki)
- Convert values in microseconds to ktime using us_to_ktime() where
applicable in the intel_idle power capping driver (Xichao Zhao)
- Annotate loops walking device links in the power management core
code as _srcu and add macros for walking device links to reduce the
likelihood of coding mistakes related to them (Rafael Wysocki)
- Document time units for *_time functions in the runtime PM API (Brian
Norris)
- Clear power.must_resume in noirq suspend error path to avoid resuming
a dependant device under a suspended parent or supplier (Rafael
Wysocki)
- Fix GFP mask handling during hybrid suspend and make the amdgpu
driver handle hybrid suspend correctly (Mario Limonciello, Rafael
Wysocki)
- Fix GFP mask handling after aborted hibernation in platform mode and
combine exit paths in power_down() to avoid code duplication (Rafael
Wysocki)
- Use vmalloc_array() and vcalloc() in the hibernation core to avoid
open-coded size computations (Qianfeng Rong)
- Fix typo in hibernation core code comment (Li Jun)
- Call pm_wakeup_clear() in the same place where other functions that do
bookkeeping prior to suspend_prepare() are called (Samuel Wu)
- Fix and clean up the x86_energy_perf_policy utility and update its
documentation (Len Brown, Kaushlendra Kumar)
- Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra
Kumar)
- Fix incorrect size in cpuidle_state_disable() and the error return
value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar)
-----BEGIN PGP SIGNATURE-----
iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmjafbMSHHJqd0Byand5
c29ja2kubmV0AAoJEO5fvZ0v1OO174EH/jAm4GBn1sbjMt0CybSHTTP9iryTiN6m
cXML5OpMoLbTnngfXjbEe9t52Fc0YV4awCG/S8Ufbut6ubWOEaVzInlw3zQAeE7c
V91ioxKrodrykpBxn5UFyCxpT2NZWteWl5rOEPeN7j+hqS4I4GTO0HsSo+E+1Y9F
DKELrbkLsn7rHy+ZvrOhcvq1IZE8gvINuji0QEf1cZz1VrgrLbQHUFqySpCUJw3F
/MfnA3l0kA2TXQ+UpDWJw8l1weZpXpOPJiyhYQKKeYHVA4osBwiFA/9pq+8Xb/AJ
GORHIl9y3x+JDXkEYyqmLn0k4FbHCX+4p5tpV9YDHebtP8dTLJBDF5Q=
=d4BP
-----END PGP SIGNATURE-----
Merge tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"The majority of these are cpufreq changes, which has been a recurring
pattern for a few recent cycles.
Those changes include new hardware support (AN7583 SoC support in the
airoha cpufreq driver, ipq5424 support in the qcom-nvmem cpufreq
driver, MT8196 support in the mediatek cpufreq driver, AM62D2 support
in the ti cpufreq driver), DT bindings and Rust code updates, cleanups
of the core and governors, and multiple driver fixes and cleanups.
Beyond that, there are hibernation fixes (some remaining 6.16 cycle
fallout and an issue related to hybrid suspend in the amdgpu driver),
cleanups of the PM core code, runtime PM documentation update, cpuidle
and power capping cleanups, and tooling updates.
Specifics:
- Rearrange variable declarations involving __free() in the cpufreq
core and intel_pstate driver to follow common coding style (Rafael
Wysocki)
- Fix object lifecycle issue in update_qos_request(), rearrange freq
QoS updates using __free(), and adjust frequency percentage
computations in the intel_pstate driver (Rafael Wysocki)
- Update intel_pstate to allow it to enable HWP without EPP if the
new DEC (Dynamic Efficiency Control) HW feature is enabled (Rafael
Wysocki)
- Use on_each_cpu_mask() in drv_write() in the ACPI cpufreq driver to
simplify the code (Rafael Wysocki)
- Use likely() optimization in intel_pstate_sample() (Yaxiong Tian)
- Remove dead EPB-related code from intel_pstate (Srinivas
Pandruvada)
- Use scope-based cleanup for cpufreq policy references in multiple
cpufreq drivers (Zihuan Zhang)
- Avoid calling get_governor() for the first policy in the cpufreq
core to simplify the initial policy path (Zihuan Zhang)
- Clean up the cpufreq core in multiple places (Zihuan Zhang)
- Use int type to store negative error codes in the cpufreq core and
update the speedstep-lib to use int for error codes (Qianfeng Rong)
- Update the efficient idle check for Intel extended Families in the
ondemand cpufreq governor (Sohil Mehta)
- Replace sscanf() with kstrtouint() in the conservative cpufreq
governor (Kaushlendra Kumar)
- Rename CpumaskVar::as[_mut]_ref to from_raw[_mut] in the cpumask
Rust code and mark CpumaskVar as transparent (Alice Ryhl, Baptiste
Lepers)
- Update ARef and AlwaysRefCounted imports from sync::aref in the OPP
Rust code (Shankari Anand)
- Add support for AN7583 SoC to the airoha cpufreq driver (Christian
Marangi)
- Enable cpufreq for ipq5424 in the qcom-nvmem cpufreq driver (Md
Sadre Alam)
- Add support for MT8196 to the mediatek-hw cpufreq driver, refactor
that driver and add mediatek,mt8196-cpufreq-hw DT binding (Nicolas
Frattaroli)
- Avoid redundant conditions in the mediatek cpufreq driver (Liao
Yuanhong)
- Add support for AM62D2 to the ti cpufreq driver and blocklist
ti,am62d2 SoC in dt-platdev (Paresh Bhagat)
- Support more speed grades on AM62Px SoC in the ti cpufreq driver,
allow all silicon revisions to support OPPs in it, and fix
supported hardware for 1GHz OPP (Judith Mendez)
- Add QCS615 compatible to DT bindings for cpufreq-qcom-hw (Taniya
Das)
- Minor assorted updates of the scmi, longhaul, CPPC, and armada-37xx
cpufreq drivers (Akhilesh Patil, BowenYu, Dennis Beier, and Florian
Fainelli)
- Remove outdated cpufreq-dt.txt (Frank Li)
- Fix python gnuplot package names in the amd_pstate_tracer utility
(Kuan-Wei Chiu)
- Saravana Kannan will maintain the virtual-cpufreq driver (Saravana
Kannan)
- Prevent CPU capacity updates after registering a perf domain from
failing on a first CPU that is not present (Christian Loehle)
- Add support for the cases in which frequency alone is not
sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru)
- Use to_result() for OPP error handling in Rust (Onur Özkan)
- Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi
devfreq driver (Nicolas Frattaroli)
- Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X)
are reported as half by adding a cycle multiplier to the DFI driver
in rockchip-dfi devfreq-event driver (Nicolas Frattaroli)
- Fix missing error pointer dereference check of regulator instance
in the mtk-cci devfreq driver probe and remove a redundant
condition from an if () statement in that driver (Dan Carpenter,
Liao Yuanhong)
- Fail cpuidle device registration if there is one already to avoid
sysfs-related issues (Rafael Wysocki)
- Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf()
in cpuidle (Vivek Yadav)
- Fix device and OF node leaks at probe in the qcom-spm cpuidle
driver and drop unnecessary initialisations from it (Johan Hovold)
- Remove unnecessary address-of operators from the intel_idle cpuidle
driver (Kaushlendra Kumar)
- Rearrange main loop in menu_select() to make the code in that
funtion easier to follow (Rafael Wysocki)
- Convert values in microseconds to ktime using us_to_ktime() where
applicable in the intel_idle power capping driver (Xichao Zhao)
- Annotate loops walking device links in the power management core
code as _srcu and add macros for walking device links to reduce the
likelihood of coding mistakes related to them (Rafael Wysocki)
- Document time units for *_time functions in the runtime PM API
(Brian Norris)
- Clear power.must_resume in noirq suspend error path to avoid
resuming a dependant device under a suspended parent or supplier
(Rafael Wysocki)
- Fix GFP mask handling during hybrid suspend and make the amdgpu
driver handle hybrid suspend correctly (Mario Limonciello, Rafael
Wysocki)
- Fix GFP mask handling after aborted hibernation in platform mode
and combine exit paths in power_down() to avoid code duplication
(Rafael Wysocki)
- Use vmalloc_array() and vcalloc() in the hibernation core to avoid
open-coded size computations (Qianfeng Rong)
- Fix typo in hibernation core code comment (Li Jun)
- Call pm_wakeup_clear() in the same place where other functions that
do bookkeeping prior to suspend_prepare() are called (Samuel Wu)
- Fix and clean up the x86_energy_perf_policy utility and update its
documentation (Len Brown, Kaushlendra Kumar)
- Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra
Kumar)
- Fix incorrect size in cpuidle_state_disable() and the error return
value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar)"
* tag 'pm-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
PM: hibernate: Combine return paths in power_down()
PM: hibernate: Restrict GFP mask in power_down()
PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage
PM: runtime: Documentation: ABI: Document time units for *_time
tools/power x86_energy_perf_policy.8: Emphasize preference for SW interfaces
tools/power x86_energy_perf_policy: Add make snapshot target
tools/power x86_energy_perf_policy: Prefer driver HWP limits
tools/power x86_energy_perf_policy: EPB access is only via sysfs
tools/power x86_energy_perf_policy: Prepare for MSR/sysfs refactoring
tools/power x86_energy_perf_policy: Enhance HWP enable
tools/power x86_energy_perf_policy: Enhance HWP enabled check
tools/power x86_energy_perf_policy: Fix incorrect fopen mode usage
tools/power turbostat: Fix incorrect sorting of PMT telemetry
drm/amd: Fix hybrid sleep
PM: hibernate: Add pm_hibernation_mode_is_suspend()
PM: hibernate: Fix hybrid-sleep
tools/cpupower: Fix incorrect size in cpuidle_state_disable()
tools/power/x86/amd_pstate_tracer: Fix python gnuplot package names
cpufreq: Replace pointer subtraction with iteration macro
cpuidle: Fail cpuidle device registration if there is one already
...
|
|
|
|
9cf9aa7b0a |
tracing: Fix race condition in kprobe initialization causing NULL pointer dereference
There is a critical race condition in kprobe initialization that can lead to
NULL pointer dereference and kernel crash.
[1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
...
[1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
[1135630.269239] pc : kprobe_perf_func+0x30/0x260
[1135630.277643] lr : kprobe_dispatcher+0x44/0x60
[1135630.286041] sp : ffffaeff4977fa40
[1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
[1135630.302837] x27: 0000000000000000 x26: 0000000000000000
[1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
[1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
[1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
[1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
[1135630.349985] x17: 0000000000000000 x16: 0000000000000000
[1135630.359285] x15: 0000000000000000 x14: 0000000000000000
[1135630.368445] x13: 0000000000000000 x12: 0000000000000000
[1135630.377473] x11: 0000000000000000 x10: 0000000000000000
[1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
[1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
[1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
[1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
[1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
[1135630.429410] Call trace:
[1135630.434828] kprobe_perf_func+0x30/0x260
[1135630.441661] kprobe_dispatcher+0x44/0x60
[1135630.448396] aggr_pre_handler+0x70/0xc8
[1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0
[1135630.462435] brk_handler+0xbc/0xd8
[1135630.468437] do_debug_exception+0x84/0x138
[1135630.475074] el1_dbg+0x18/0x8c
[1135630.480582] security_file_permission+0x0/0xd0
[1135630.487426] vfs_write+0x70/0x1c0
[1135630.493059] ksys_write+0x5c/0xc8
[1135630.498638] __arm64_sys_write+0x24/0x30
[1135630.504821] el0_svc_common+0x78/0x130
[1135630.510838] el0_svc_handler+0x38/0x78
[1135630.516834] el0_svc+0x8/0x1b0
kernel/trace/trace_kprobe.c: 1308
0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120]
include/linux/compiler.h: 294
0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0]
kernel/trace/trace_kprobe.c
1308: head = this_cpu_ptr(call->perf_events);
1309: if (hlist_empty(head))
1310: return 0;
crash> struct trace_event_call -o
struct trace_event_call {
...
[120] struct hlist_head *perf_events; //(call->perf_event)
...
}
crash> struct trace_event_call ffffaf015340e528
struct trace_event_call {
...
perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
...
}
Race Condition Analysis:
The race occurs between kprobe activation and perf_events initialization:
CPU0 CPU1
==== ====
perf_kprobe_init
perf_trace_event_init
tp_event->perf_events = list;(1)
tp_event->class->reg (2)← KPROBE ACTIVE
Debug exception triggers
...
kprobe_dispatcher
kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
head = this_cpu_ptr(call->perf_events)(3)
(perf_events is still NULL)
Problem:
1. CPU0 executes (1) assigning tp_event->perf_events = list
2. CPU0 executes (2) enabling kprobe functionality via class->reg()
3. CPU1 triggers and reaches kprobe_dispatcher
4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
5. CPU1 calls kprobe_perf_func() and crashes at (3) because
call->perf_events is still NULL
CPU1 sees that kprobe functionality is enabled but does not see that
perf_events has been assigned.
Add pairing read and write memory barriers to guarantee that if CPU1
sees that kprobe functionality is enabled, it must also see that
perf_events has been assigned.
Link: https://lore.kernel.org/all/20251001022025.44626-1-chenyuan_fl@163.com/
Fixes:
|
|
|
|
55c0ced59f |
bpf: Reject negative offsets for ALU ops
When verifying BPF programs, the check_alu_op() function validates
instructions with ALU operations. The 'offset' field in these
instructions is a signed 16-bit integer.
The existing check 'insn->off > 1' was intended to ensure the offset is
either 0, or 1 for BPF_MOD/BPF_DIV. However, because 'insn->off' is
signed, this check incorrectly accepts all negative values (e.g., -1).
This commit tightens the validation by changing the condition to
'(insn->off != 0 && insn->off != 1)'. This ensures that any value
other than the explicitly permitted 0 and 1 is rejected, hardening the
verifier against malformed BPF programs.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Fixes:
|
|
|
|
34904582b5 |
bpf: Skip scalar adjustment for BPF_NEG if dst is a pointer
In check_alu_op(), the verifier currently calls check_reg_arg() and
adjust_scalar_min_max_vals() unconditionally for BPF_NEG operations.
However, if the destination register holds a pointer, these scalar
adjustments are unnecessary and potentially incorrect.
This patch adds a check to skip the adjustment logic when the destination
register contains a pointer.
Reported-by: syzbot+d36d5ae81e1b0a53ef58@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d36d5ae81e1b0a53ef58
Fixes:
|
|
|
|
eb3289fc47 |
Driver core changes for 6.18-rc1
- Auxiliary:
- Drop call to dev_pm_domain_detach() in auxiliary_bus_probe()
- Optimize logic of auxiliary_match_id()
- Rust:
- Auxiliary:
- Use primitive C types from prelude
- DebugFs:
- Add debugfs support for simple read/write files and custom callbacks
through a File-type-based and directory-scope-based API
- Sample driver code for the File-type-based API
- Sample module code for the directory-scope-based API
- I/O:
- Add io::poll module and implement Rust specific read_poll_timeout()
helper
- IRQ:
- Implement support for threaded and non-threaded device IRQs based on
(&Device<Bound>, IRQ number) tuples (IrqRequest)
- Provide &Device<Bound> cookie in IRQ handlers
- PCI:
- Support IRQ requests from IRQ vectors for a specific pci::Device<Bound>
- Implement accessors for subsystem IDs, revision, devid and resource start
- Provide dedicated pci::Vendor and pci::Class types for vendor and class
ID numbers
- Implement Display to print actual vendor and class names; Debug to print
the raw ID numbers
- Add pci::DeviceId::from_class_and_vendor() helper
- Use primitive C types from prelude
- Various minor inline and (safety) comment improvements
- Platform:
- Support IRQ requests from IRQ vectors for a specific
platform::Device<Bound>
- Nova:
- Use pci::DeviceId::from_class_and_vendor() to avoid probing
non-display/compute PCI functions
- Misc:
- Add helper for cpu_relax()
- Update ARef import from sync::aref
- sysfs:
- Remove bin_attrs_new field from struct attribute_group
- Remove read_new() and write_new() from struct bin_attribute
- Misc:
- Document potential race condition in get_dev_from_fwnode()
- Constify node_group argument in software node registration functions
- Fix order of kernel-doc parameters in various functions
- Set power.no_pm flag for faux devices
- Set power.no_callbacks flag along with the power.no_pm flag
- Constify the pmu_bus bus type
- Minor spelling fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQS2q/xV6QjXAdC7k+1FlHeO1qrKLgUCaNmQGwAKCRBFlHeO1qrK
LmPzAP9msIvK8eFT4CEDK4buX1gd+VBOdy8mAjAeJ2F80FIo8wEAtOdddNaaqWVF
m4ac2/a2bSRKMGPX+wIM7d2HGyC7sgY=
=XbU+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"Auxiliary:
- Drop call to dev_pm_domain_detach() in auxiliary_bus_probe()
- Optimize logic of auxiliary_match_id()
Rust:
- Auxiliary:
- Use primitive C types from prelude
- DebugFs:
- Add debugfs support for simple read/write files and custom
callbacks through a File-type-based and directory-scope-based
API
- Sample driver code for the File-type-based API
- Sample module code for the directory-scope-based API
- I/O:
- Add io::poll module and implement Rust specific
read_poll_timeout() helper
- IRQ:
- Implement support for threaded and non-threaded device IRQs
based on (&Device<Bound>, IRQ number) tuples (IrqRequest)
- Provide &Device<Bound> cookie in IRQ handlers
- PCI:
- Support IRQ requests from IRQ vectors for a specific
pci::Device<Bound>
- Implement accessors for subsystem IDs, revision, devid and
resource start
- Provide dedicated pci::Vendor and pci::Class types for vendor
and class ID numbers
- Implement Display to print actual vendor and class names; Debug
to print the raw ID numbers
- Add pci::DeviceId::from_class_and_vendor() helper
- Use primitive C types from prelude
- Various minor inline and (safety) comment improvements
- Platform:
- Support IRQ requests from IRQ vectors for a specific
platform::Device<Bound>
- Nova:
- Use pci::DeviceId::from_class_and_vendor() to avoid probing
non-display/compute PCI functions
- Misc:
- Add helper for cpu_relax()
- Update ARef import from sync::aref
sysfs:
- Remove bin_attrs_new field from struct attribute_group
- Remove read_new() and write_new() from struct bin_attribute
Misc:
- Document potential race condition in get_dev_from_fwnode()
- Constify node_group argument in software node registration
functions
- Fix order of kernel-doc parameters in various functions
- Set power.no_pm flag for faux devices
- Set power.no_callbacks flag along with the power.no_pm flag
- Constify the pmu_bus bus type
- Minor spelling fixes"
* tag 'driver-core-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (43 commits)
rust: pci: display symbolic PCI vendor names
rust: pci: display symbolic PCI class names
rust: pci: fix incorrect platform reference in PCI driver probe doc comment
rust: pci: fix incorrect platform reference in PCI driver unbind doc comment
perf: make pmu_bus const
samples: rust: Add scoped debugfs sample driver
rust: debugfs: Add support for scoped directories
samples: rust: Add debugfs sample driver
rust: debugfs: Add support for callback-based files
rust: debugfs: Add support for writable files
rust: debugfs: Add support for read-only files
rust: debugfs: Add initial support for directories
driver core: auxiliary bus: Optimize logic of auxiliary_match_id()
driver core: auxiliary bus: Drop dev_pm_domain_detach() call
driver core: Fix order of the kernel-doc parameters
driver core: get_dev_from_fwnode(): document potential race
drivers: base: fix "publically"->"publicly"
driver core/PM: Set power.no_callbacks along with power.no_pm
driver core: faux: Set power.no_pm for faux devices
rust: pci: inline several tiny functions
...
|
|
|
|
ae28ed4578 |
bpf-next-6.18
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmjZH40ACgkQ6rmadz2v bTrG7w//X/5CyDoKIYJCqynYRdMtfqYuCe8Jhud4p5++iBVqkDyS6Y8EFLqZVyg/ UHTqaSE4Nz8/pma0WSjhUYn6Chs1AeH+Rw/g109SovE/YGkek2KNwY3o2hDrtPMX +oD0my8qF2HLKgEyteXXyZ5Ju+AaF92JFiGko4/wNTX8O99F9nyz2pTkrctS9Vl9 VwuTxrEXpmhqrhP3WCxkfNfcbs9HP+AALpgOXZKdMI6T4KI0N1gnJ0ZWJbiXZ8oT tug0MTPkNRidYMl0wHY2LZ6ZG8Q3a7Sgc+M0xFzaHGvGlJbBg1HjsDMtT6j34CrG TIVJ/O8F6EJzAnQ5Hio0FJk8IIgMRgvng5Kd5GXidU+mE6zokTyHIHOXitYkBQNH Hk+lGA7+E2cYqUqKvB5PFoyo+jlucuIH7YwrQlyGfqz+98n65xCgZKcmdVXr0hdB 9v3WmwJFtVIoPErUvBC3KRANQYhFk4eVk1eiGV/20+eIVyUuNbX6wqSWSA9uEXLy n5fm/vlk4RjZmrPZHxcJ0dsl9LTF1VvQQHkgoC1Sz/Cc+jA6k4I+ECVHAqEbk36p 1TUF52yPOD2ViaJKkj+962JaaaXlUn6+Dq7f1GMP6VuyHjz4gsI3mOo4XarqNdWd c7TnYmlGO/cGwqd4DdbmWiF1DDsrBcBzdbC8+FgffxQHLPXGzUg= =LeQi -----END PGP SIGNATURE----- Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) * tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ... |
|
|
|
4b81e2eb9e |
Updates for the VDSO subsystem:
- Further consolidation of the VDSO infrastructure and the common data
store.
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaUJgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT5VD/9l+TP42vJzwygPArefVM9QijAc4k+g
iptaz5M+Lhk0BlwMUu502Tp1zcwGq5529LlE2P72DuYoZRtZKSFCGES2mye0JiAs
A5Vk8d1zicg3IIGjYgw39shA9ghzuzV3gZdgHVqQ6024uSeKxUY3SqF156GGfPFG
azXWDV7i4RGQvKLm2ye8m7JNYCE3P9f8Wh+GvPaq7qBlW1MsMCWC4FP5qu4iStf3
mol9uA/EM2zSDwkpIpq6P5L2TpIH1dGDNGxVf/HG3J16UhNXzzXv/rxuKHKPNGWm
MQZ8WdPjF+uwdUsNXaImC6aCE0hoQ1Ga7GE2YA3pMQYfySeiJ/fA5I1KfhlUCiHD
oUgnb6PQfZjaBDrIODKcrFHAYc5pTOyJXnJ6q1Lc1yrOfoVcIpMaRXHO/phCeuwI
61gfb9ggbFEH/7cL26Z9XGSqh4BDCKVP5Z+y98OnNEzV74ScpX13/szB3VgdN6LW
tmTOeMPYCo4FQoPw8OV1xkeYOKD2JyqH+garnxDZNFjaVaeUkzlAMKz+1JUjxRZj
UpUhZ8y5KFVjgKkqZHxhucr9cnwV/xd4n+IzZeG4l6XZp4VyirG+5QpAikGYe+Jq
WFU4Ao1xvevyizrJEhoGJKznwtrNrsy6AQ+wLQxYiRR4aTnng8n9T/ry2DVvhJf8
diYWxuRTR7KMxg==
=i3UB
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:
- Further consolidation of the VDSO infrastructure and the common data
store
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: Drop vdso_test_clock_getres
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
vdso: Drop kconfig GENERIC_COMPAT_VDSO
vdso: Drop kconfig GENERIC_VDSO_32
riscv: vdso: Untangle Kconfig logic
time: Build generic update_vsyscall() only with generic time vDSO
vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
vdso: Move ENABLE_COMPAT_VDSO from core to arm64
ARM: VDSO: Remove cntvct_ok global variable
vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
|
|
|
|
70de5572a8 |
Updates for the clocksource/clockevents drivers subsystem:
- Further preparations for modular clocksource/event drivers
- The usual device tree updates to support new chip variants and the
related changes to thise drivers
- Avoid a 64-bit division in the TEGRA186 driver, which caused a build
fail on 32-bit machines.
- Small fixes, improvements and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaTgwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRwiD/9MdweTwv+Ami1i/4DAQzEcIuXqOnmO
6yDnRXuqHguiuAkmCCwaW0IoqWZbIDtVwkMHv8CxDQoHm03wFg9QbMLN3YkR3D7P
GWyWGKDMT5/ATQxvx/nZhFjpxnjRI7nFyQuooLIFgh8d9GvxBGFZOrdndudTW+dR
FdLiv0+kyG9XDRijepUBXV9TSZNxZoF6IrlEbfSE0DFdmVAQhagN47QVhRk2cNux
LhAp5wFJ/hzBRUIoQyPDAJFPpGrVC/Goz+ulgxmy4NRZ/TVoiAcdQrDTp1o8AOiV
VxpNQ/pbPVSeij6zhZF1loHxDPDJGUlg9enx6NJDRk3CBNeu2Z617cBfuoCsEb45
EmvyXLESQ8JzjIpGNzAgAuPa92OI0AJG7K4wg4mWzawUAD+Pvm3Xs+9rRareJ5t+
OngGPHNhxMyRTO0z6BbFDZub3jv5PWEs/MB1a7O6XNVXokf2Wl4WgigWyKbEn+TU
rjM34OgwQvcFVSE8nnwNEyzedpqJKAbvRG3WbHkucgkXEDtaLHZ9EWYIekcBYH39
W4nXFWfQb8E29YmZifaq2KeaQI+Nb2utEyTMTmbZchVlZW7LBRxsfrhoyeNwYqc5
byGvI4ZbvdNzDKq9JMD2H8P4//EneMrF/r13SoJTz9lGddjoobseKfTJArDrScXR
fFCOVxwsJKRGNQ==
=LZUc
-----END PGP SIGNATURE-----
Merge tag 'timers-clocksource-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Thomas Gleixner:
- Further preparations for modular clocksource/event drivers
- The usual device tree updates to support new chip variants and the
related changes to thise drivers
- Avoid a 64-bit division in the TEGRA186 driver, which caused a build
fail on 32-bit machines.
- Small fixes, improvements and cleanups all over the place
* tag 'timers-clocksource-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
dt-bindings: timer: exynos4210-mct: Add compatible for ARTPEC-9 SoC
clocksource/drivers/sh_cmt: Split start/stop of clock source and events
clocksource/drivers/clps711x: Fix resource leaks in error paths
clocksource/drivers/arm_global_timer: Add auto-detection for initial prescaler values
clocksource/drivers/ingenic-sysost: Convert from round_rate() to determine_rate()
clocksource/drivers/timer-tegra186: Don't print superfluous errors
clocksource/drivers/timer-rtl-otto: Simplify documentation
clocksource/drivers/timer-rtl-otto: Do not interfere with interrupts
clocksource/drivers/timer-rtl-otto: Drop set_counter function
clocksource/drivers/timer-rtl-otto: Work around dying timers
clocksource/drivers/timer-ti-dm : Capture functionality for OMAP DM timer
clocksource/drivers/arm_arch_timer_mmio: Add MMIO clocksource
clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver
clocksource/drivers/arm_arch_timer: Add standalone MMIO driver
ACPI: GTDT: Generate platform devices for MMIO timers
clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support
dt: bindings: fsl,vf610-pit: Add compatible for s32g2 and s32g3
clocksource/drivers/vf-pit: Rename the VF PIT to NXP PIT
clocksource/drivers/vf-pit: Unify the function name for irq ack
clocksource/drivers/vf-pit: Consolidate calls to pit_*_disable/enable
...
|
|
|
|
c5448d46b3 |
Updates for the time(rs) core subsystem:
- Address the inconsistent shutdown sequence of per CPU clockevents on
CPU hotplug, which onoly removed it from the core but failed to invoke
the actual device driver shutdown callback. This keeps the timer
active, which prevents power savings and causes pointless noise in
virtualization.
- Encapsulate the open coded access to the hrtimer clock base, which is a
private implementation detail, so that the implementation can be
changed without breaking a lot of usage sites.
- Enhance the debug output of the clocksource watchdog to provide better
information for analysis.
- The usual set of cleanups and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaT+ETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT8HD/47VRqHbFKazGcwZ4+OLKfFTq0PKzIU
kIhimNMSKVqiPlmWhud/4idGX5JFqMP3dS9ZaDlp7xCtu1ScCqzH72kbCrab0l4g
wjRgneGWsHhv06PY50Ty9FusFSetkjA8XSYavFV9HZNgCRIvho958akdckpazMcF
i4V5WTIVkwhGynhcGwqBXcmHASUa7x6gEjZYBbX6wspPl2Wk5z+vn/zAVIHAozwS
9GPw8HlUeKqM72U12mWGkt4KLjy2gzoupvx9vD4giXRvqkt09rfHtRqvEdkew+/G
ZANhmTJqfA1AiihYP30fHgY5lQbNQG+9O10UKhlUyrlBKPKHKu6dIuJQRSBy0j59
Bqef4UubPBlMP4xRgTbt11SFrjuqDo68bTIDGmQNQvb1BXSXhGA08PDPbIKkFdJh
8cXwUHzjLkDk0tL6kVXRWlsZcCmjz87TVS7PufGE3EIZkh0J2Ob+g7nFS3PsORKW
65ROSRoYmh4CRh/l9CTNbPWFvz3jahIEnxlrSc70o0A4DUeeWTwT/M+MoNOtSJuV
D4qSD35JsFaEbKFfSJwF/0CybG3Nx3UPI8nvEEYQtS4D+BCJHy7INg1P/NtWw+vb
W7puAFd+WbjSN14uQfYkYWC53vQ22isW0nU8W6G5NbMRD5K4WIld4UhDksJIGCLe
2LZ0VHxv4gaXkg==
=A9Tu
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Address the inconsistent shutdown sequence of per CPU clockevents on
CPU hotplug, which only removed it from the core but failed to invoke
the actual device driver shutdown callback. This kept the timer
active, which prevented power savings and caused pointless noise in
virtualization.
- Encapsulate the open coded access to the hrtimer clock base, which is
a private implementation detail, so that the implementation can be
changed without breaking a lot of usage sites.
- Enhance the debug output of the clocksource watchdog to provide
better information for analysis.
- The usual set of cleanups and enhancements all over the place
* tag 'timers-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix spelling mistakes in comments
clocksource: Print durations for sync check unconditionally
LoongArch: Remove clockevents shutdown call on offlining
tick: Do not set device to detached state in tick_shutdown()
hrtimer: Reorder branches in hrtimer_clockid_to_base()
hrtimer: Remove hrtimer_clock_base:: Get_time
hrtimer: Use hrtimer_cb_get_time() helper
media: pwm-ir-tx: Avoid direct access to hrtimer clockbase
ALSA: hrtimer: Avoid direct access to hrtimer clockbase
lib: test_objpool: Avoid direct access to hrtimer clockbase
sched/core: Avoid direct access to hrtimer clockbase
timers/itimer: Avoid direct access to hrtimer clockbase
posix-timers: Avoid direct access to hrtimer clockbase
jiffies: Remove obsolete SHIFTED_HZ comment
|
|
|
|
c574fb2ed7 |
A set of updates for futexes and related selftests:
- Plug the ptrace_may_access() race against a concurrent exec() which
allows to pass the check before the target's process transition in
exec() by taking a read lock on signal->ext_update_lock.
- A large set of cleanups and enhancement to the futex selftests. The
bulk of changes is the conversion to the kselftest harness.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaS8QTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYof44EAC7amBzdJlaluVjLTpCV07G8K4bDkzz
iN5vOm+vrxJlg7kq9L4CffuGKS8yFj7sgNkSexOv/+JUFp/E9WpV3CT+kqJNm71x
bDPDAFR3reZQ7u27dzpciOTgtCEnq+hqeAovx2c1v2GuESvnTfqWTFSQBLfzqGW7
leCiaR6VJ87+RB10ZrlxJDdrFnVbKBa48k1dZjy97ivkw+SUS5xSEadSBG2vw7Rd
C+SQDa8Hhh7xAMJvySWkES6rfzZvC/ubkpq5DmjRZvHu/wS4wDNaKj06wEtbLxEA
5eaemJVBeEbjC6GNBgba6cUmtrT45PQG7wNFoPmgJsO0nlnfwfmm713CAa9vK+dT
MTUnGWKSpf0Vj1yHkzLeHer226jqfOe1lA4anRzD6FoZy2XaJiNCymQysfIrIWTG
BCcA8za3W4PTiUHxDe5JGsbA4f57xDV+bL0SmjaXv5wSo/FX21iJBisBI2FFl531
9XlgU3ssvUUfQfH6bPIdgcZrWD2Uc9W5aj2dZv+zN6LT8EhvLaQwcM2bxWbuJfWu
wmU07LMZS9zVrC8B6T3KvrYwnIuETWRZiPhenDy+Shtn8UM1N/MmlNNmAEvdOYue
V3o1BTm9wDWfSMJAq7q7erjl15QOgUyOpte1gTuzDbqtcWj132q31HDegtxlGIyu
NLOx8otYdLEOqQ==
=vUpN
-----END PGP SIGNATURE-----
Merge tag 'locking-futex-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:
"A set of updates for futexes and related selftests:
- Plug the ptrace_may_access() race against a concurrent exec() which
allows to pass the check before the target's process transition in
exec() by taking a read lock on signal->ext_update_lock.
- A large set of cleanups and enhancement to the futex selftests. The
bulk of changes is the conversion to the kselftest harness"
* tag 'locking-futex-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
selftest/futex: Fix spelling mistake "boundarie" -> "boundary"
selftests/futex: Remove logging.h file
selftests/futex: Drop logging.h include from futex_numa
selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h
selftests/futex: Refactor futex_priv_hash with kselftest_harness.h
selftests/futex: Refactor futex_waitv with kselftest_harness.h
selftests/futex: Refactor futex_requeue with kselftest_harness.h
selftests/futex: Refactor futex_wait with kselftest_harness.h
selftests/futex: Refactor futex_wait_private_mapped_file with kselftest_harness.h
selftests/futex: Refactor futex_wait_unitialized_heap with kselftest_harness.h
selftests/futex: Refactor futex_wait_wouldblock with kselftest_harness.h
selftests/futex: Refactor futex_wait_timeout with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_signal_restart with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_mismatched_ops with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi with kselftest_harness.h
selftests: kselftest: Create ksft_print_dbg_msg()
futex: Don't leak robust_list pointer on exec race
selftest/futex: Compile also with libnuma < 2.0.16
selftest/futex: Reintroduce "Memory out of range" numa_mpol's subtest
selftest/futex: Make the error check more precise for futex_numa_mpol
...
|
|
|
|
d8de3685f1 |
An update of the stale smp_call_function_many() documentation to bring it
back in sync with the actual implementation. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaTGETHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoSdrD/967aAnjFKB1l/HZh1PPvRWuejrOrlX ZaOdwXvnHtYpzTGWjS8KxAxoIZhh6YjEFenknBPZnKTYM4g+3Umw5NobKxrfOg7b BHpz90M+Rd1IpUXHwnRcf+j8lWCw/ABq5hSOD+HRxHNratcevwuhShwJJpjmfxL4 xw1wyryfRniKv7r1X8caguhXa6OgLV0zKEKs6uzPyxtQPgg4vNIaYyF2b/vJ5Mcv XGL7QotBEHaZagL6Wfrc1Z5H7h2mukPVtXyOFzVOHFMHPsM5WLr9UVn9EHaVE/aI ENYRE7kh+3MVJBJyVUOy11KCoNRuPoqVFEWDrQnam5uOzG/PVOMDMshWwvWM6ez+ g/ea9T1CAzEJK4B/ALJTWmiNDO3i+vM8oBJyE2TNNvw7j3U/M2ThYifiTSsosBcG 2u07Z6MZJ28fZmDXZTYs8D8xBCPPiIQQJs6KgApcP4JPTTIebWnGFiX5whPHNBv9 HldWt+p8F86kDgpyneA12eTeK+KLoGdI6IK5BX2TdpdAuSCH6mkQ7vkv15mynrpH uI1Oe3T53yXdJwNlGaPuAgar8pGb6YWj7au5+aJCOUCRBuAsyhJ82eXqZI2gFQjF M0+WpjbrvG4Gv/5xXlqKzxUV0GblXkvqVooDMMCufDgkOd2xxQ/UodevrDgg8DYV YGgBdunjKbz7Ig== =8bI8 -----END PGP SIGNATURE----- Merge tag 'smp-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp doc fixlet from Thomas Gleixner: "An update of the stale smp_call_function_many() documentation to bring it back in sync with the actual implementation" * tag 'smp-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Fix up and expand the smp_call_function_many() kerneldoc |
|
|
|
3b2074c77d |
A set of updates for the interrupt core subsystem:
- Introduce irq_chip_[startup|shutdown]_parent() to prepare for
addressing a few short comings in the PCI/MSI interrupt subsystem.
It allows to utilize the interrupt chip startup/shutdown callbacks for
initializing the interrupt chip hierarchy properly on certain RISCV
implementations and provides a mechanism to reduce the overhead of
masking and unmasking PCI/MSI interrupts during operation when the
underlying MSI provider can mask the interrupt.
The actual usage comes with the interrupt driver pull request.
- Add generic error handling for devm_request_*_irq()
This allows to remove the zoo of random error printk's all over the
usage sites.
- Add a mechanism to warn about long-running interrupt handlers
Long running interrupt handlers can introduce latencies and tracking
them down is a tedious task. The tracking has to be enabled with a
threshold on the kernel command line and utilizes a static branch to
remove the overhead when disabled.
- Update and extend the selftests which validate the CPU hotplug
interrupt migration logic
- Allow dropping the per CPU softirq lock on PREEMPT_RT kernels, which
causes contention and latencies all over the place.
The serialization requirements have been pushed down into the actual
affected usage sites already.
- The usual small cleanups and improvements
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaRdUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZTHD/0anRJCLFFQ8Xtuuu96MDfcvi8m6uhB
E8gF6lXrljRgEzt33hnqSB4d41PYaOTje4h/Z0nCV2xva8cg6cErzjKlbRU/SMmj
yEBV5XMHP8Nxog+A5q5OT3lXQ6pT5ZxDZ7l0wven1xX6LbyF63MQFdzFBiLrXnb1
mOZM7F89FQKhutEMDqVwVLXLmJasNtZK0kBVrdjA/vDBlAk0bYCaPX4BxW1PQIw4
56KkG1llK221ruT2YbMSv/oFmYulDYbnnM3Aw6stUsR6SPMkQX8Yke2wLQSq+GVV
RI0f4GTa6XnlIiUivQ77HnV0m0g0Ybkj1PoKeGJIjWDFJ5hqkA0dRQMBRCna68wK
c3IEWhn2zPl3YxHYMogLwlE4nJAVFUjeEtGPb/Z/mmHaBdd/Y7iqVjTgHJzDvi5Q
SwgISZEB87vZINpraTuEFNimeIwjCVdBj5ImBHT4T/Av3DFzuvqvxCSP0AlVRdhs
OdwF5/6jFdPP787+FtytEl/kFYLzf/7R9zVHLcpiCD3eYCU4pHU/ULjUIGxFkSrS
ltezvfZkGnbqiKTR33nOxWBnM2Fr63FMv0bKMJ97x49TMN6Tw4L8mra/uxuMQSIf
yRk3jRgUcx0xuigCXcb62dip+2WnsGjnaDj30yOs/d1Ab2eALrJCo0e1tgdK2JDu
VgCX3xCbvj/NzA==
=GZty
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq core updates from Thomas Gleixner:
"A set of updates for the interrupt core subsystem:
- Introduce irq_chip_[startup|shutdown]_parent() to prepare for
addressing a few short comings in the PCI/MSI interrupt subsystem.
It allows to utilize the interrupt chip startup/shutdown callbacks
for initializing the interrupt chip hierarchy properly on certain
RISCV implementations and provides a mechanism to reduce the
overhead of masking and unmasking PCI/MSI interrupts during
operation when the underlying MSI provider can mask the interrupt.
The actual usage comes with the interrupt driver pull request.
- Add generic error handling for devm_request_*_irq()
This allows to remove the zoo of random error printk's all over the
usage sites.
- Add a mechanism to warn about long-running interrupt handlers
Long running interrupt handlers can introduce latencies and
tracking them down is a tedious task. The tracking has to be
enabled with a threshold on the kernel command line and utilizes a
static branch to remove the overhead when disabled.
- Update and extend the selftests which validate the CPU hotplug
interrupt migration logic
- Allow dropping the per CPU softirq lock on PREEMPT_RT kernels,
which causes contention and latencies all over the place.
The serialization requirements have been pushed down into the
actual affected usage sites already.
- The usual small cleanups and improvements"
* tag 'irq-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
softirq: Provide a handshake for canceling tasklets via polling
genirq/test: Ensure CPU 1 is online for hotplug test
genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions
genirq/test: Depend on SPARSE_IRQ
genirq/test: Fail early if interrupt request fails
genirq/test: Factor out fake-virq setup
genirq/test: Select IRQ_DOMAIN
genirq/test: Fix depth tests on architectures with NOREQUEST by default.
genirq: Add support for warning on long-running interrupt handlers
genirq/devres: Add error handling in devm_request_*_irq()
genirq: Add irq_chip_(startup/shutdown)_parent()
genirq: Remove GENERIC_IRQ_LEGACY
|
|
|
|
9be7e1e320 |
entry: Rename "kvm" entry code assets to "virt" to genericize APIs
Rename the "kvm" entry code files and Kconfigs to use generic "virt" nomenclature so that the code can be reused by other hypervisors (or rather, their root/dom0 partition drivers), without incorrectly suggesting the code somehow relies on and/or involves KVM. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> |
|
|
|
6d0386ea99 |
entry/kvm: KVM: Move KVM details related to signal/-EINTR into KVM proper
Move KVM's morphing of pending signals into userspace exits into KVM proper, and drop the @vcpu param from xfer_to_guest_mode_handle_work(). How KVM responds to -EINTR is a detail that really belongs in KVM itself, and invoking kvm_handle_signal_exit() from kernel code creates an inverted module dependency. E.g. attempting to move kvm_handle_signal_exit() into kvm_main.c would generate an linker error when building kvm.ko as a module. Dropping KVM details will also converting the KVM "entry" code into a more generic virtualization framework so that it can be used when running as a Hyper-V root partition. Lastly, eliminating usage of "struct kvm_vcpu" outside of KVM is also nice to have for KVM x86 developers, as keeping the details of kvm_vcpu purely within KVM allows changing the layout of the structure without having to boot into a new kernel, e.g. allows rebuilding and reloading kvm.ko with a modified kvm_vcpu structure as part of debug/development. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Wei Liu <wei.liu@kernel.org> |
|
|
|
1d17e808cf |
Two fixes for RSEQ:
1) Protect the event mask modification against the membarrier() IPI as
otherwise the RmW operation is unprotected and events might be lost.
2) Fix the weak symbol reference in rseq selftests
The current weak RSEQ symbols definitions which were added to allow
static linkage are not working correctly as the effectively re-define
the glibc symbols leading to multiple versions of the symbols when
compiled with -fno-common. Mark them as 'extern' to convert them from
weak symbol definitions to weak symbol references. That works with
static and dynamic linkage independent of -fcommon and -fno-common.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaQc4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTj1D/0Un97b1GJRNUjmvrhSE8uFy5RxgYdX
R2dJkfuQSF0fx4MzcrUg9b2KeklwDkYtq0YkmbStQeihp4jYpgODNFAdFdB4Cfhv
e1JfgQ6iTS9zWq625ImygHNZv5NOiDBxf9jtsRTUA3lv804taoS/jHost4Qi9HwO
jO9YoBrSmxZIZA1cPO9mA/AEpAdhKFxpwZK6yNNA4lH83gVhhSXKyEOFJQ0nWlZJ
pycEmwM+v9iW67QZEbWkEBPukflnMXYtcUDLmawYDsMZ2gsB4yPYvZwHNTeiB7gu
n0dfZH4jfw4DkO7MuU1CV1xlXhTO4sJQjmRsJuu2ypnZFVPh6iR+J2DxnZ5PcgvR
LifuEa/+mN/LM5/gjohDJwny10EXysrEDJ/vZUR4BBTdTsWK6FwLdSnTuP8WF3wo
LyY+PfeUmosHYOAa6Q8pLJUJzgXHtzlJLjVhYgb61dQjlgFuRB/0ksA8VoVeVhSz
6A5v/rcVoZIhRj/fxzsJOXtfP24ghIXtRFyfeDeNsWWlqEuL/X28xYf45YBjws7F
zApbCqeg+JYwJRaWmCDxjmmLsyMvpH3yujvlPZxuit6TX2PfxYG5CNpIxt192wZ2
fFk8qcsgJ5x/9ah5mpio7wHDESiDkdaPanYA+ercdXogbzI/8Y7fUua8RgJKm4Mj
3eo3tY3e8q9A3A==
=VV+N
-----END PGP SIGNATURE-----
Merge tag 'core-rseq-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq updates from Thomas Gleixner:
"Two fixes for RSEQ:
- Protect the event mask modification against the membarrier() IPI as
otherwise the RmW operation is unprotected and events might be lost
- Fix the weak symbol reference in rseq selftests
The current weak RSEQ symbols definitions which were added to allow
static linkage are not working correctly as they effectively
re-define the glibc symbols leading to multiple versions of the
symbols when compiled with -fno-common.
Mark them as 'extern' to convert them from weak symbol definitions
to weak symbol references. That works with static and dynamic
linkage independent of -fcommon and -fno-common"
* tag 'core-rseq-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq/selftests: Use weak symbol reference, not definition, to link with glibc
rseq: Protect event mask against membarrier IPI
|