mirror of https://github.com/torvalds/linux.git
2811 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
7cd122b552 |
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere. That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one. Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done. Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).
Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries. Having it set claims responsibility
for +1 in refcount.
The end result this series is aiming for:
* get these unbalanced dget() and dput() replaced with new primitives that
would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
"leaked" references (e.g. for all tmpfs files that hadn't been removed
prior to umount), have the regular shrink_dcache_for_umount() strip
DCACHE_PERSISTENT of all dentries, dropping the corresponding
reference if it had been set. After that kill_litter_super() becomes
an equivalent of kill_anon_super().
Doing that in a single step is not feasible - it would affect too many places
in too many filesystems. It has to be split into a series.
This work has really started early in 2024; quite a few preliminary pieces
have already gone into mainline. This chunk is finally getting to the
meat of that stuff - infrastructure and most of the conversions to it.
Some pieces are still sitting in the local branches, but the bulk of
that stuff is here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
=rKq+
-----END PGP SIGNATURE-----
Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull persistent dentry infrastructure and conversion from Al Viro:
"Some filesystems use a kinda-sorta controlled dentry refcount leak to
pin dentries of created objects in dcache (and undo it when removing
those). A reference is grabbed and not released, but it's not actually
_stored_ anywhere.
That works, but it's hard to follow and verify; among other things, we
have no way to tell _which_ of the increments is intended to be an
unpaired one. Worse, on removal we need to decide whether the
reference had already been dropped, which can be non-trivial if that
removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done. Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).
Things get simpler if we introduce a new dentry flag
(DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
claims responsibility for +1 in refcount.
The end result this series is aiming for:
- get these unbalanced dget() and dput() replaced with new primitives
that would, in addition to adjusting refcount, set and clear
persistency flag.
- instead of having kill_litter_super() mess with removing the
remaining "leaked" references (e.g. for all tmpfs files that hadn't
been removed prior to umount), have the regular
shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
dropping the corresponding reference if it had been set. After that
kill_litter_super() becomes an equivalent of kill_anon_super().
Doing that in a single step is not feasible - it would affect too many
places in too many filesystems. It has to be split into a series.
This work has really started early in 2024; quite a few preliminary
pieces have already gone into mainline. This chunk is finally getting
to the meat of that stuff - infrastructure and most of the conversions
to it.
Some pieces are still sitting in the local branches, but the bulk of
that stuff is here"
* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
d_make_discardable(): warn if given a non-persistent dentry
kill securityfs_recursive_remove()
convert securityfs
get rid of kill_litter_super()
convert rust_binderfs
convert nfsctl
convert rpc_pipefs
convert hypfs
hypfs: swich hypfs_create_u64() to returning int
hypfs: switch hypfs_create_str() to returning int
hypfs: don't pin dentries twice
convert gadgetfs
gadgetfs: switch to simple_remove_by_name()
convert functionfs
functionfs: switch to simple_remove_by_name()
functionfs: fix the open/removal races
functionfs: need to cancel ->reset_work in ->kill_sb()
functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
functionfs: don't abuse ffs_data_closed() on fs shutdown
convert selinuxfs
...
|
|
|
|
6dfafbd029 |
drm-next for 6.19-rc1:
new driver:
- Arm Ethos-U65/U85 accel driver
core:
- support the drm color pipeline in vkms/amdgfx
- add support for drm colorop pipeline
- add COLOR PIPELINE plane property
- add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
- throttle dirty worker with vblank
- use drm_for_each_bridge_in_chain_scoped in drm's bridge code
- Ensure drm_client_modeset tests are enabled in UML
- add simulated vblank interrupt - use in drivers
- dumb buffer sizing helper
- move freeing of drm client memory to driver
- crtc sharpness strength property
- stop using system_wq in scheduler/drivers
- support emergency restore in drm-client
rust:
- make slice::as_flattened usable on all supported rustc
- add FromBytes::from_bytes_prefix() method
- remove redundant device ptr from Rust GEM object
- Change how AlwaysRefCounted is implemented for GEM objects
gpuvm:
- Add deferred vm_bo cleanup to GPUVM (for rust)
atomic:
- cleanup and improve state handling interfaces
buddy:
- optimize block management
dma-buf:
- heaps: Create heap per CMA reserved location
- improve userspace documentation
dp:
- add POST_LT_ADJ_REQ training sequence
- DPCD dSC quirk for synaptics panamera devices
- helpers to query branch DSC max throughput
ttm:
- Rename ttm_bo_put to ttm_bo_fini
- allow page protection flags on risc-v
- rework pipelined eviction fence handling
amdgpu:
- enable amdgpu by default for SI/CI dGPUs
- enable DC by default on SI
- refactor CIK/SI enablement
- add ABM KMS property
- Re-enable DM idle optimizations
- DC Analog encoders support
- Powerplay fixes for fiji/iceland
- Enable DC on bonaire by default
- HMM cleanup
- Add new RAS framework
- DML2.1 updates
- YCbCr420 fixes
- DC FP fixes
- DMUB fixes
- LTTPR fixes
- DTBCLK fixes
- DMU cursor offload handling
- Userq validation improvements
- Unify shutdown callback handling
- Suspend improvements
- Power limit code cleanup
- SR-IOV fixes
- AUX backlight fixes
- DCN 3.5 fixes
- HDMI compliance fixes
- DCN 4.0.1 cursor updates
- DCN interrupt fix
- DC KMS full update improvements
- Add additional HDCP traces
- DCN 3.2 fixes
- DP MST fixes
- Add support for new SR-IOV mailbox interface
- UQ reset support
- HDP flush rework
- VCE1 support
amdkfd:
- HMM cleanups
- Relax checks on save area overallocations
- Fix GPU mappings after prefetch
radeon:
- refactor CIK/SI enablement.
xe:
- Initial Xe3P support
- panic support on VRAM for display
- fix stolen size check
- Loosen used tracking restriction
- New SR-IOV debugfs structure and debugfs updates
- Hide the GPU madvise flag behind a VM_BIND flag
- Always expose VRAM provisioning data on discrete GPUs
- Allow VRAM mappings for userptr when used with SVM
- Allow pinning of p2p dma-buf
- Use per-tile debugfs where appropriate
- Add documentation for Execution Queues
- PF improvements
- VF migration recovery redesign work
- User / Kernel VRAM partitioning
- Update Tile-based messages
- Allow configfs to disable specific GT types
- VF provisioning and migration improvements
- use SVM range helpers in PT layer
- Initial CRI support
- access VF registers using dedicated MMIO view
- limit number of jobs per exec queue
- add sriov_admin sysfs tree
- more crescent island specific support
- debugfs residency counter
- SRIOV migration work
- runtime registers for GFX 35
i915:
- add initial Xe3p_LPD display version 35 support
- Enable LNL+ content adaptive sharpness filter
- Use optimized VRR guardband
- Enable Xe3p LT PHY
- enable FBC support for Xe3p_LPD display
- add display 30.02 firmware support
- refactor SKL+ watermark latency setup
- refactor fbdev handling
- call i915/xe runtime PM via function pointers
- refactor i915/xe stolen memory/display interfaces
- use display version instead of gfx version in display code
- extend i915_display_info with Type-C port details
- lots of display cleanups/refactorings
- set O_LARGEFILE in __create_shmem
- skuip guc communication warning on reset
- fix time conversions
- defeature DRRS on LNL+
- refactor intel_frontbuffer split between i915/xe/display
- convert inteL_rom interfaces to struct drm_device
- unify display register polling interfaces
- aovid lock inversion when pinning to GGTT on CHV/BXT+VTD
panel:
- Add KD116N3730A08/A12, chromebook mt8189
- JT101TM023, LQ079L1SX01,
- GLD070WX3-SL01 MIPI DSI
- Samsung LTL106AL0, Samsung LTL106AL01
- Raystar RFF500F-AWH-DNN
- Winstar WF70A8SYJHLNGA,
- Wanchanglong w552946aaa
- Samsung SOFEF00
- Lenovo X13s panel.
- ilitek-ili9881c : add rpi 5" support
- visionx-rm69299 - add backlight support
- edp - support AUI B116XAN02.0
bridge:
- improve ref counting
- ti-sn65dsi86 - add support for DP mode with HPD
- synopsis: support CEC, init timer with correct freq
- ASL CS5263 DP-to-HDMI bridge support
nova-core:
- introduce bitfield! macro
- introduce safe integer converters
- GSP inits to fully booted state on Ampere
- Use more future-proof register for GPU identification
nova-drm:
- select NOVA_CORE
- 64-bit only
nouveau:
- improve reclocking on tegra 186+
- add large page and compression support
msm:
- GPU:
- Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
- A612 support
- MDSS:
- Added support for Glymur and QCS8300 platforms
- DPU:
- Enabled Quad-Pipe support, unlocking higher resolutions support
- Added support for Glymur platform
- Documented DPU on QCS8300 platform as supported
- DisplayPort:
- Added support for Glymur platform
- Added support lame remapping inside DP block
- Documented DisplayPort controller on QCS8300 and SM6150/QCS615 as
supported
tegra:
- NVJPG driver
panfrost:
- display JM contexts over debugfs
- export JM contexts to userspace
- improve error and job handling
panthor:
- support custom ASN_HASH for mt8196
- support mali-G1 GPU
- flush shmem write before mapping buffers uncached
- make timeout per-queue instead of per-job
mediatek:
- MT8195/88 HDMIv2/DDCv2 support
rockchip:
- dsi: add support for RK3368
amdxdna:
- enhance runtime PM
- last hardware error reading uapi
- support firmware debug output
- add resource and telemetry data uapi
- preemption support
imx:
- add driver for HDMI TX Parallel audio interface
ivpu:
- add support for user-managed preemption buffer
- add userptr support
- update JSM firware API to 3.33.0
- add better alloc/free warnings
- fix page fault in unbind all bos
- rework bind/unbind of imported buffers
- enable MCA ECC signalling
- split fw runtime and global memory buffers
- add fdinfo memory statistics
tidss:
- convert to drm logging
- logging cleanup
ast:
- refactor generation init paths
- add per chip generation detect_tx_chip
- set quirks for each chip model
atmel-hlcdc:
- set LCDC_ATTRE register in plane disable
- set correct values for plane scaler
solomon:
- use drm helper for get_modes and move_valid
sitronix:
- fix output position when clearing screens
qaic:
- support dma-buf exports
- support new firmware's READ_DATA implementation
- sahara AIC200 image table update
- add sysfs support
- add coredump support
- add uevents support
- PM support
sun4i:
- layer refactors to decouple plane from output
- improve DE33 support
vc4:
- switch to generic CEC helpers
komeda:
- use drm_ logging functions
vkms:
- configfs support for display configuration
vgem:
- fix fence timer deadlock
etnaviv:
- add HWDB entry for GC8000 Nano Ultra VIP r6205
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmkv4wAACgkQDHTzWXnE
hr7GNw/9HITakN5B4DXKZg+Rz38WjsAsSbw1HdgoQR2NMYRw4oVXx3iAVTVLGbvV
vvMCpgWtFE85IEtg96KWynpFjSHOx7xtXe/lckZ5s+VMi00K4c+wMmvFqKwv1Aul
9VqPTe+28xNAzk+gglOtNjd4DRVuXQIhPMQdTELork32jQCteJorod+sZnM+RNBG
0PiiYAKc0VwsPwklHQF7+dBt1yu+8r3ZH5Fc7SjHTuf8Q45Dpayd0HjatSgt9ycn
s8uxAAY41iDLHnj4LwyYtBFYoMH65VqOAzDeaeVdM9xA/3KbIl9of8QnBpvYWhaR
9dnh2Ceve3SjOufOBwK2p6H0RXu8Xo3OLnmw7+u0KwgvbmeBTILp8kgNQuk9beIp
wvkjwpFsDnfo1rKevFMz4R+3U9W3NXP1fjU2tvmnD1lfaby2o3f3AdboaYPbX/xK
YGh79KDsuAkKH4HdMJNg/FpJWIdSKaaWoFBzZqWBZgZBEPOquR8MHUAaLkVJYK/1
wBcJZceQpHRssldsg3bBZxGHK811J46QlvMWaID6TSGIoZ8GC6+1Zt0+bwzxMyUR
49jwRvOWbtYJmZEcR2JF+k1KteMXAp+bXIYqx9e+69c5Guz4w7JW7OLQFVvXszZW
aObzfrkObPpCGYDS1r85/6vT8yqwVUWtwfPvAeDH9130sHBP8Xw=
=gpnQ
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
"There was a rather late merge of a new color pipeline feature, that
some userspace projects are blocked on, and has seen a lot of work in
amdgpu. This should have seen some time in -next. There is additional
support for this for Intel, that if it arrives in the next day or two
I'll pass it on in another pull request and you can decide if you want
to take it.
Highlights:
- Arm Ethos NPU accelerator driver
- new DRM color pipeline support
- amdgpu will now run discrete SI/CIK cards instead of radeon, which
enables vulkan support in userspace
- msm gets gen8 gpu support
- initial Xe3P support in xe
Full detail summary:
New driver:
- Arm Ethos-U65/U85 accel driver
Core:
- support the drm color pipeline in vkms/amdgfx
- add support for drm colorop pipeline
- add COLOR PIPELINE plane property
- add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
- throttle dirty worker with vblank
- use drm_for_each_bridge_in_chain_scoped in drm's bridge code
- Ensure drm_client_modeset tests are enabled in UML
- add simulated vblank interrupt - use in drivers
- dumb buffer sizing helper
- move freeing of drm client memory to driver
- crtc sharpness strength property
- stop using system_wq in scheduler/drivers
- support emergency restore in drm-client
Rust:
- make slice::as_flattened usable on all supported rustc
- add FromBytes::from_bytes_prefix() method
- remove redundant device ptr from Rust GEM object
- Change how AlwaysRefCounted is implemented for GEM objects
gpuvm:
- Add deferred vm_bo cleanup to GPUVM (for rust)
atomic:
- cleanup and improve state handling interfaces
buddy:
- optimize block management
dma-buf:
- heaps: Create heap per CMA reserved location
- improve userspace documentation
dp:
- add POST_LT_ADJ_REQ training sequence
- DPCD dSC quirk for synaptics panamera devices
- helpers to query branch DSC max throughput
ttm:
- Rename ttm_bo_put to ttm_bo_fini
- allow page protection flags on risc-v
- rework pipelined eviction fence handling
amdgpu:
- enable amdgpu by default for SI/CI dGPUs
- enable DC by default on SI
- refactor CIK/SI enablement
- add ABM KMS property
- Re-enable DM idle optimizations
- DC Analog encoders support
- Powerplay fixes for fiji/iceland
- Enable DC on bonaire by default
- HMM cleanup
- Add new RAS framework
- DML2.1 updates
- YCbCr420 fixes
- DC FP fixes
- DMUB fixes
- LTTPR fixes
- DTBCLK fixes
- DMU cursor offload handling
- Userq validation improvements
- Unify shutdown callback handling
- Suspend improvements
- Power limit code cleanup
- SR-IOV fixes
- AUX backlight fixes
- DCN 3.5 fixes
- HDMI compliance fixes
- DCN 4.0.1 cursor updates
- DCN interrupt fix
- DC KMS full update improvements
- Add additional HDCP traces
- DCN 3.2 fixes
- DP MST fixes
- Add support for new SR-IOV mailbox interface
- UQ reset support
- HDP flush rework
- VCE1 support
amdkfd:
- HMM cleanups
- Relax checks on save area overallocations
- Fix GPU mappings after prefetch
radeon:
- refactor CIK/SI enablement
xe:
- Initial Xe3P support
- panic support on VRAM for display
- fix stolen size check
- Loosen used tracking restriction
- New SR-IOV debugfs structure and debugfs updates
- Hide the GPU madvise flag behind a VM_BIND flag
- Always expose VRAM provisioning data on discrete GPUs
- Allow VRAM mappings for userptr when used with SVM
- Allow pinning of p2p dma-buf
- Use per-tile debugfs where appropriate
- Add documentation for Execution Queues
- PF improvements
- VF migration recovery redesign work
- User / Kernel VRAM partitioning
- Update Tile-based messages
- Allow configfs to disable specific GT types
- VF provisioning and migration improvements
- use SVM range helpers in PT layer
- Initial CRI support
- access VF registers using dedicated MMIO view
- limit number of jobs per exec queue
- add sriov_admin sysfs tree
- more crescent island specific support
- debugfs residency counter
- SRIOV migration work
- runtime registers for GFX 35
i915:
- add initial Xe3p_LPD display version 35 support
- Enable LNL+ content adaptive sharpness filter
- Use optimized VRR guardband
- Enable Xe3p LT PHY
- enable FBC support for Xe3p_LPD display
- add display 30.02 firmware support
- refactor SKL+ watermark latency setup
- refactor fbdev handling
- call i915/xe runtime PM via function pointers
- refactor i915/xe stolen memory/display interfaces
- use display version instead of gfx version in display code
- extend i915_display_info with Type-C port details
- lots of display cleanups/refactorings
- set O_LARGEFILE in __create_shmem
- skuip guc communication warning on reset
- fix time conversions
- defeature DRRS on LNL+
- refactor intel_frontbuffer split between i915/xe/display
- convert inteL_rom interfaces to struct drm_device
- unify display register polling interfaces
- aovid lock inversion when pinning to GGTT on CHV/BXT+VTD
panel:
- Add KD116N3730A08/A12, chromebook mt8189
- JT101TM023, LQ079L1SX01,
- GLD070WX3-SL01 MIPI DSI
- Samsung LTL106AL0, Samsung LTL106AL01
- Raystar RFF500F-AWH-DNN
- Winstar WF70A8SYJHLNGA
- Wanchanglong w552946aaa
- Samsung SOFEF00
- Lenovo X13s panel
- ilitek-ili9881c - add rpi 5" support
- visionx-rm69299 - add backlight support
- edp - support AUI B116XAN02.0
bridge:
- improve ref counting
- ti-sn65dsi86 - add support for DP mode with HPD
- synopsis: support CEC, init timer with correct freq
- ASL CS5263 DP-to-HDMI bridge support
nova-core:
- introduce bitfield! macro
- introduce safe integer converters
- GSP inits to fully booted state on Ampere
- Use more future-proof register for GPU identification
nova-drm:
- select NOVA_CORE
- 64-bit only
nouveau:
- improve reclocking on tegra 186+
- add large page and compression support
msm:
- GPU:
- Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
- A612 support
- MDSS:
- Added support for Glymur and QCS8300 platforms
- DPU:
- Enabled Quad-Pipe support, unlocking higher resolutions support
- Added support for Glymur platform
- Documented DPU on QCS8300 platform as supported
- DisplayPort:
- Added support for Glymur platform
- Added support lame remapping inside DP block
- Documented DisplayPort controller on QCS8300 and SM6150/QCS615
as supported
tegra:
- NVJPG driver
panfrost:
- display JM contexts over debugfs
- export JM contexts to userspace
- improve error and job handling
panthor:
- support custom ASN_HASH for mt8196
- support mali-G1 GPU
- flush shmem write before mapping buffers uncached
- make timeout per-queue instead of per-job
mediatek:
- MT8195/88 HDMIv2/DDCv2 support
rockchip:
- dsi: add support for RK3368
amdxdna:
- enhance runtime PM
- last hardware error reading uapi
- support firmware debug output
- add resource and telemetry data uapi
- preemption support
imx:
- add driver for HDMI TX Parallel audio interface
ivpu:
- add support for user-managed preemption buffer
- add userptr support
- update JSM firware API to 3.33.0
- add better alloc/free warnings
- fix page fault in unbind all bos
- rework bind/unbind of imported buffers
- enable MCA ECC signalling
- split fw runtime and global memory buffers
- add fdinfo memory statistics
tidss:
- convert to drm logging
- logging cleanup
ast:
- refactor generation init paths
- add per chip generation detect_tx_chip
- set quirks for each chip model
atmel-hlcdc:
- set LCDC_ATTRE register in plane disable
- set correct values for plane scaler
solomon:
- use drm helper for get_modes and move_valid
sitronix:
- fix output position when clearing screens
qaic:
- support dma-buf exports
- support new firmware's READ_DATA implementation
- sahara AIC200 image table update
- add sysfs support
- add coredump support
- add uevents support
- PM support
sun4i:
- layer refactors to decouple plane from output
- improve DE33 support
vc4:
- switch to generic CEC helpers
komeda:
- use drm_ logging functions
vkms:
- configfs support for display configuration
vgem:
- fix fence timer deadlock
etnaviv:
- add HWDB entry for GC8000 Nano Ultra VIP r6205"
* tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits)
Revert "drm/amd: Skip power ungate during suspend for VPE"
drm/amdgpu: use common defines for HUB faults
drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
drm/amdgpu: use static ids for ACP platform devs
drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
drm/amdgpu: Forward VMID reservation errors
drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc6: Cache VM fault info
drm/amdgpu/gmc6: Don't print MC client as it's unknown
drm/amdgpu/cz_ih: Enable soft IRQ handler ring
drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
drm/amdgpu/cik_ih: Enable soft IRQ handler ring
drm/amdgpu/si_ih: Enable soft IRQ handler ring
drm/amd/display: fix typo in display_mode_core_structs.h
drm/amd/display: fix Smart Power OLED not working after S4
drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
...
|
|
|
|
2ddcf4962c |
Kbuild updates for v6.19
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
Signed-off-by: Nicolas Schier <nsc@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEh0E3p4c3JKeBvsLGB1IKcBYmEmkFAmkttSoACgkQB1IKcBYm
Emkt6g/+NLW1rJBCcONXIe3UlfHvo4MBziA+ItuLkYx7FwhhTTOivQgav68wXSQK
/JqQ5N8O0nO4Gznv5aJWoW+GUCKFjqaDbJDXW9pRdDNGQjXJfL82OHu9gPiUAiDJ
ffZwyeRfXtBaglWv6ovM5z6817OqUCLErqjESAMHZv4nXbXjFNi2YKwgJGmAjfNM
kiNUU1ieO2/dxgI/mMCW0UuK0jjiC0EsY1as3IkxBdVZi3dRPobLNaL4JiKHGFlZ
BlrtAoGHrqwqEQgFACVJE2BzWhW+Hwz7SyfBvF4c8T0eM/02ogcL43COh6scq0pV
2om7hFwvE/NBdbCvz0KB/q3khx9jJRagCm0o/+YEexHb1rn+LcCbomQZYRjIEcHU
mBc3DTD+B+jwiN6mow6e6E1uuKvpTrmPGBNVyQ5BMFpdlhfga7zsttLO5kMIXqAH
Fc8MsSx06YggrwGevc/39C+3uzI501Rcxu9BWdyHXfeZKeq0gZGNtkXVN2edzzKm
c4uFeDbbF3M0oeGXmtlm8I//jqeJuc0wK8d8tSXjKodo+vgppXj0s2okAwALY2jW
NVAlxqKwS7CnjBbXOsHIXSCDDL5IQ+np/gIaOkSbl6DXU50BvFV+KGvqW9hdjmwR
AEQXemqUhSvGFmYG92mLwewXWAivnxUvGTCifOhXipyikXhX/wI=
=Gn4V
-----END PGP SIGNATURE-----
Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
|
|
|
|
2b09f480f0 |
A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which are
caused by the related overhead. It turned out that the decision to invoke
the exit to user work was not really a decision. More or less each
context switch caused that. There is a long list of small issues which
sums up nicely and results in a 3-4% regression in I/O benchmarks.
The other detail which caused issues due to extra work in context switch
and task migration is the CID (memory context ID) management. It also
requires to use a task work to consolidate the CID space, which is
executed in the context of an arbitrary task and results in sporadic
uncontrolled exit latencies.
The rewrite addresses this by:
- Removing deprecated and long unsupported functionality
- Moving the related data into dedicated data structures which are
optimized for fast path processing.
- Caching values so actual decisions can be made
- Replacing the current implementation with a optimized inlined variant.
- Separating fast and slow path for architectures which use the generic
entry code, so that only fault and error handling goes into the
TIF_NOTIFY_RESUME handler.
- Rewriting the CID management so that it becomes mostly invisible in the
context switch path. That moves the work of switching modes into the
fork/exit path, which is a reasonable tradeoff. That work is only
required when a process creates more threads than the cpuset it is
allowed to run on or when enough threads exit after that. An artificial
thread pool benchmarks which triggers this did not degrade, it actually
improved significantly.
The main effect in migration heavy scenarios is that runqueue lock held
time and therefore contention goes down significantly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksaRYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoencEADA5he8PAFPmSRRPo6+2G5mHzWe8kIU
5ZViQStWFNAA0qqy8VXryWiJ6qqrO6la9o7K4YOXASUtlkVjquRp1DF7PabqGwuy
zshbRCXNlT51J8uqanN8VrGVjlf+bMdHDbGoI1SLkUTxG8b+kDD5PXUQE1ARelPP
Slbg9u+EMrxj6D5MDTPbuW6TqryJEkPtiNScyOz43emp9ww9+WVxenOcRqU4D+Th
mjWmrGIzkroSf4XReMoD/wg9TPTpUjXnNCwl2viY9JvBpkMfYtU4tJAGK3aNFOWy
zsAN0O9CaFGrUEFne7qUmtwhNLdtnjx5HN5pe7yZd1EhdTuQKq4jPiiQnwwm8w72
c0o6m45FNPmPoSyfaZWCkLjbTEUXonT9JF61iN35JVxim8gBDDJjHFKnLxDmLrH3
X0eESE48ReY2EneDV6Y8RJRo6oG14Fccvc39aTf/2Rw3trpmtt2agvConQzupQIg
DzANw4jhUUzFRrHrMHACNsqKFXh9ratue/S9DM3xxTpGO/bKdeK7jGIgzNf8O34M
J0O6Hvk5jMdcWlIJTx21GoGzoSkkXnR49g/71aCcp+MwdY4x9zFz5SWi8LWQRmkx
xRo6tY27Bma8/SEwMJjPpAUXDTpq6v+j3cPisybL1yGsyt9lh+p8LX7VUtwcoEqe
6ZelC5Kgw/+/kg==
=n5KT
-----END PGP SIGNATURE-----
Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq updates from Thomas Gleixner:
"A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which
are caused by the related overhead. It turned out that the decision to
invoke the exit to user work was not really a decision. More or less
each context switch caused that. There is a long list of small issues
which sums up nicely and results in a 3-4% regression in I/O
benchmarks.
The other detail which caused issues due to extra work in context
switch and task migration is the CID (memory context ID) management.
It also requires to use a task work to consolidate the CID space,
which is executed in the context of an arbitrary task and results in
sporadic uncontrolled exit latencies.
The rewrite addresses this by:
- Removing deprecated and long unsupported functionality
- Moving the related data into dedicated data structures which are
optimized for fast path processing.
- Caching values so actual decisions can be made
- Replacing the current implementation with a optimized inlined
variant.
- Separating fast and slow path for architectures which use the
generic entry code, so that only fault and error handling goes into
the TIF_NOTIFY_RESUME handler.
- Rewriting the CID management so that it becomes mostly invisible in
the context switch path. That moves the work of switching modes
into the fork/exit path, which is a reasonable tradeoff. That work
is only required when a process creates more threads than the
cpuset it is allowed to run on or when enough threads exit after
that. An artificial thread pool benchmarks which triggers this did
not degrade, it actually improved significantly.
The main effect in migration heavy scenarios is that runqueue lock
held time and therefore contention goes down significantly"
* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched/mmcid: Switch over to the new mechanism
sched/mmcid: Implement deferred mode change
irqwork: Move data struct to a types header
sched/mmcid: Provide CID ownership mode fixup functions
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Introduce per task/CPU ownership infrastructure
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Move initialization out of line
signal: Move MMCID exit out of sighand lock
sched/mmcid: Convert mm CID mask to a bitmap
cpumask: Cache num_possible_cpus()
sched/mmcid: Use cpumask_weighted_or()
cpumask: Introduce cpumask_weighted_or()
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
sched/mmcid: Move scheduler code out of global header
sched: Fixup whitespace damage
sched/mmcid: Cacheline align MM CID storage
sched/mmcid: Use proper data structures
sched/mmcid: Revert the complex CID management
...
|
|
|
|
1d18101a64 |
kernel-6.19-rc1.cred
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
orJLAP9UD+dX6cicJDkzFZowDakmoIQkR5ZSDwChSlmvLcmquwEAlSq4svVd9Bdl
7kOFUk71DqhVHrPAwO7ap0BxehokEAA=
=Cli6
-----END PGP SIGNATURE-----
Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull cred guard updates from Christian Brauner:
"This contains substantial credential infrastructure improvements
adding guard-based credential management that simplifies code and
eliminates manual reference counting in many subsystems.
Features:
- Kernel Credential Guards
Add with_kernel_creds() and scoped_with_kernel_creds() guards that
allow using the kernel credentials without allocating and copying
them. This was requested by Linus after seeing repeated
prepare_kernel_creds() calls that duplicate the kernel credentials
only to drop them again later.
The new guards completely avoid the allocation and never expose the
temporary variable to hold the kernel credentials anywhere in
callers.
- Generic Credential Guards
Add scoped_with_creds() guards for the common override_creds() and
revert_creds() pattern. This builds on earlier work that made
override_creds()/revert_creds() completely reference count free.
- Prepare Credential Guards
Add prepare credential guards for the more complex pattern of
preparing a new set of credentials and overriding the current
credentials with them:
- prepare_creds()
- modify new creds
- override_creds()
- revert_creds()
- put_cred()
Cleanups:
- Make init_cred static since it should not be directly accessed
- Add kernel_cred() helper to properly access the kernel credentials
- Fix scoped_class() macro that was introduced two cycles ago
- coredump: split out do_coredump() from vfs_coredump() for cleaner
credential handling
- coredump: move revert_cred() before coredump_cleanup()
- coredump: mark struct mm_struct as const
- coredump: pass struct linux_binfmt as const
- sev-dev: use guard for path"
* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
trace: use override credential guard
trace: use prepare credential guard
coredump: use override credential guard
coredump: use prepare credential guard
coredump: split out do_coredump() from vfs_coredump()
coredump: mark struct mm_struct as const
coredump: pass struct linux_binfmt as const
coredump: move revert_cred() before coredump_cleanup()
sev-dev: use override credential guards
sev-dev: use prepare credential guard
sev-dev: use guard for path
cred: add prepare credential guard
net/dns_resolver: use credential guards in dns_query()
cgroup: use credential guards in cgroup_attach_permissions()
act: use credential guards in acct_write_process()
smb: use credential guards in cifs_get_spnego_key()
nfs: use credential guards in nfs_idmap_get_key()
nfs: use credential guards in nfs_local_call_write()
nfs: use credential guards in nfs_local_call_read()
erofs: use credential guards
...
|
|
|
|
415d34b92c |
namespace-6.19-rc1
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
=TExi
-----END PGP SIGNATURE-----
Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.
Features:
- listns() system call
Add a new listns() system call that allows userspace to iterate
through namespaces in the system. This provides a programmatic
interface to discover and inspect namespaces, addressing
longstanding limitations:
Currently, there is no direct way for userspace to enumerate
namespaces. Applications must resort to scanning /proc/*/ns/ across
all processes, which is:
- Inefficient - requires iterating over all processes
- Incomplete - misses namespaces not attached to any running
process but kept alive by file descriptors, bind mounts, or
parent references
- Permission-heavy - requires access to /proc for many processes
- No ordering or ownership information
- No filtering per namespace type
The listns() system call solves these problems:
ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
size_t nr_ns_ids, unsigned int flags);
struct ns_id_req {
__u32 size;
__u32 spare;
__u64 ns_id;
struct /* listns */ {
__u32 ns_type;
__u32 spare2;
__u64 user_ns_id;
};
};
Features include:
- Pagination support for large namespace sets
- Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
- Filtering by owning user namespace
- Permission checks respecting namespace isolation
- Active Reference Counting
Introduce an active reference count that tracks namespace
visibility to userspace. A namespace is visible in the following
cases:
- The namespace is in use by a task
- The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount)
- The namespace is a hierarchical type and is the parent of child
namespaces
The active reference count does not regulate lifetime (that's still
done by the normal reference count) - it only regulates visibility
to namespace file handles and listns().
This prevents resurrection of namespaces that are pinned only for
internal kernel reasons (e.g., user namespaces held by
file->f_cred, lazy TLB references on idle CPUs, etc.) which should
not be accessible via (1)-(3).
- Unified Namespace Tree
Introduce a unified tree structure for all namespaces with:
- Fixed IDs assigned to initial namespaces
- Lookup based solely on inode number
- Maintained list of owned namespaces per user namespace
- Simplified rbtree comparison helpers
Cleanups
- Header Reorganization:
- Move namespace types into separate header (ns_common_types.h)
- Decouple nstree from ns_common header
- Move nstree types into separate header
- Switch to new ns_tree_{node,root} structures with helper functions
- Use guards for ns_tree_lock
- Initial Namespace Reference Count Optimization
- Make all reference counts on initial namespaces a nop to avoid
pointless cacheline ping-pong for namespaces that can never go
away
- Drop custom reference count initialization for initial namespaces
- Add NS_COMMON_INIT() macro and use it for all namespaces
- pid: rely on common reference count behavior
- Miscellaneous Cleanups
- Rename exit_task_namespaces() to exit_nsproxy_namespaces()
- Rename is_initial_namespace() and make argument const
- Use boolean to indicate anonymous mount namespace
- Simplify owner list iteration in nstree
- nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
- nsfs: use inode_just_drop()
- pidfs: raise DCACHE_DONTCACHE explicitly
- pidfs: simplify PIDFD_GET__NAMESPACE ioctls
- libfs: allow to specify s_d_flags
- cgroup: add cgroup namespace to tree after owner is set
- nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
Fixes:
- setns(pidfd, ...) race condition
Fix a subtle race when using pidfds with setns(). When the target
task exits after prepare_nsset() but before commit_nsset(), the
namespace's active reference count might have been dropped. If
setns() then installs the namespaces, it would bump the active
reference count from zero without taking the required reference on
the owner namespace, leading to underflow when later decremented.
The fix resurrects the ownership chain if necessary - if the caller
succeeded in grabbing passive references, the setns() should
succeed even if the target task exits or gets reaped.
- Return EFAULT on put_user() error instead of success
- Make sure references are dropped outside of RCU lock (some
namespaces like mount namespace sleep when putting the last
reference)
- Don't skip active reference count initialization for network
namespace
- Add asserts for active refcount underflow
- Add asserts for initial namespace reference counts (both passive
and active)
- ipc: enable is_ns_init_id() assertions
- Fix kernel-doc comments for internal nstree functions
- Selftests
- 15 active reference count tests
- 9 listns() functionality tests
- 7 listns() permission tests
- 12 inactive namespace resurrection tests
- 3 threaded active reference count tests
- commit_creds() active reference tests
- Pagination and stress tests
- EFAULT handling test
- nsid tests fixes"
* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
nstree: fix kernel-doc comments for internal functions
nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
selftests/namespaces: fix nsid tests
ns: drop custom reference count initialization for initial namespaces
pid: rely on common reference count behavior
ns: add asserts for initial namespace active reference counts
ns: add asserts for initial namespace reference counts
ns: make all reference counts on initial namespace a nop
ipc: enable is_ns_init_id() assertions
fs: use boolean to indicate anonymous mount namespace
ns: rename is_initial_namespace()
ns: make is_initial_namespace() argument const
nstree: use guards for ns_tree_lock
nstree: simplify owner list iteration
nstree: switch to new structures
nstree: add helper to operate on struct ns_tree_{node,root}
nstree: move nstree types into separate header
nstree: decouple from ns_common header
ns: move namespace types into separate header
...
|
|
|
|
8cea569ca7 |
sched/mmcid: Use proper data structures
Having a lot of CID functionality specific members in struct task_struct and struct mm_struct is not really making the code easier to read. Encapsulate the CID specific parts in data structures and keep them separate from the stuff they are embedded in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de |
|
|
|
2313598222 |
convert ramfs and tmpfs
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.
With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().
The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.
NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
deab487e0f
|
kbuild: allow architectures to override CC_CAN_LINK
The generic test for CC_CAN_LINK assumes that all architectures use -m32 and -m64 to switch between 32-bit and 64-bit compilation. This is overly simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may also require byte order handling (-mlittle-endian, -EL). Expressing all of the different possibilities will be very complicated and brittle. Instead allow architectures to supply their own logic which will be easy to understand and evolve. Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need to be implemented as kconfig does not allow the reuse of string options. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de Signed-off-by: Nicolas Schier <nsc@kernel.org> |
|
|
|
80623f2c83
|
init: deduplicate cc-can-link.sh invocations
The command to invoke scripts/cc-can-link.sh is very long and new usages are about to be added. Add a helper variable to make the code easier to read and maintain. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de Signed-off-by: Nicolas Schier <nsc@kernel.org> |
|
|
|
88622323dd |
rust: enable slice_flatten feature and provide it through an extension trait
In Rust 1.80, the previously unstable `slice::flatten` family of methods have been stabilized and renamed to `slice::as_flattened`. This creates an issue as we want to use `as_flattened`, but need to support the MSRV (which at the moment is Rust 1.78) where it is named `flatten`. Solve this by enabling the `slice_flatten` feature, and providing an `as_flattened` implementation through an extension trait for compiler versions where it is not available. The trait is then exported from the prelude, making the `as_flattened` family of methods transparently available for all supported compiler versions. This extension trait can be removed once the MSRV passes 1.80. Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/all/CANiq72kK4pG=O35NwxPNoTO17oRcg1yfGcvr3==Fi4edr+sfmw@mail.gmail.com/ Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251110-gsp_boot-v9-8-8ae4058e3c0e@nvidia.com> Message-ID: <20251104-b4-as-flattened-v3-1-6cb9c26b45cd@nvidia.com> |
|
|
|
c2bbd2db52
|
ns: drop custom reference count initialization for initial namespaces
Initial namespaces don't modify their reference count anymore. They remain fixed at one so drop the custom refcount initializations. Link: https://patch.msgid.link/20251110-work-namespace-nstree-fixes-v1-16-e8a9264e0fb9@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
a6446829f8
|
init: Replace simple_strtoul() with kstrtouint() in root_delay_setup()
Replace deprecated simple_strtoul() with kstrtouint() for better error handling and input validation. Return 0 on parsing failure to indicate invalid parameter, maintaining existing behavior for valid inputs. The simple_strtoul() function is deprecated in favor of kstrtoint() family functions which provide better error handling and are recommended for new code and replacements. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Link: https://patch.msgid.link/20251103080627.1844645-1-kaushlendra.kumar@intel.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
40314c2818
|
cred: make init_cred static
There's zero need to expose struct init_cred. The very few places that need access can just go through init_task which is already exported. Link: https://patch.msgid.link/20251103-work-creds-init_cred-v1-3-cb3ec8711a6a@kernel.org Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
3db6b38dfe |
rseq: Switch to fast path processing on exit to user
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.
This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.
This results in the following improvements:
Kernel build Before After Reduction
exit to user 80692981 80514451
signal checks: 32581 121 99%
slowpath runs: 1201408 1.49% 198 0.00% 100%
fastpath runs: 675941 0.84% N/A
id updates: 1233989 1.53% 50541 0.06% 96%
cs checks: 1125366 1.39% 0 0.00% 100%
cs cleared: 1125366 100% 0 100%
cs fixup: 0 0% 0
RSEQ selftests Before After Reduction
exit to user: 386281778 387373750
signal checks: 35661203 0 100%
slowpath runs: 140542396 36.38% 100 0.00% 100%
fastpath runs:
|
|
|
|
9c37cb6e80 |
rseq: Provide static branch for runtime debugging
Config based debug is rarely turned on and is not available easily when things go wrong. Provide a static branch to allow permanent integration of debug mechanisms along with the usual toggles in Kconfig, command line and debugfs. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de |
|
|
|
5412910487 |
rseq: Expose lightweight statistics in debugfs
Analyzing the call frequency without actually using tracing is helpful for analysis of this infrastructure. The overhead is minimal as it just increments a per CPU counter associated to each operation. The debugfs readout provides a racy sum of all counters. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de |
|
|
|
0b1765830c
|
ns: use NS_COMMON_INIT() for all namespaces
Now that we have a common initializer use it for all static namespaces. Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
b2c43efc3c
|
initrd: Replace simple_strtol with kstrtoint to improve ramdisk_start_setup
Replace simple_strtol() with the recommended kstrtoint() for parsing the 'ramdisk_start=' boot parameter. Unlike simple_strtol(), which returns a a long, kstrtoint() converts the string directly to an integer and avoids implicit casting. Check the return value of kstrtoint() and reject invalid values. This adds error handling while preserving existing behavior for valid values, and removes use of the deprecated simple_strtol() helper. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
48e3694ae7 |
printk changes for 6.18
-----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ NUgzh2cOwSWWU4UHt/AiD9cd =uFn0 -----END PGP SIGNATURE----- Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add KUnit test for the printk ring buffer - Fix the check of the maximal record size which is allowed to be stored into the printk ring buffer. It prevents corruptions of the ring buffer. Note that printk() is on the safe side. The messages are limited by 1kB buffer and are always small enough for the minimal log buffer size 4kB, see CONFIG_LOG_BUF_SHIFT definition. * tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: ringbuffer: Fix data block max size check printk: kunit: support offstack cpumask printk: kunit: Fix __counted_by() in struct prbtest_rbdata printk: ringbuffer: Explain why the KUnit test ignores failed writes printk: ringbuffer: Add KUnit test |
|
|
|
e406d57be7 |
Patch series in this pull request:
- The 3 patch series "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API. - The 9 patch series "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place. - The 5 patch series "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool. - The 3 patch series "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO. - The 2 patch series "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark. - Plus another 50-odd singleton patches all over the place. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN78zwAKCRDdBJ7gKXxA jhLeAQCddTv0XtSUTrvBvmrJVUBrQQeJc+LtNopMIjfAF/WAWAEAogSVKxg+HHEB GaVixx4zDriNzEqrqiCx9rm4l+YooQA= =XRe0 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet completes the removal of this legacy IDR API - "panic: introduce panic status function family" from Jinchao Wang provides a number of cleanups to the panic code and its various helpers, which were rather ad-hoc and scattered all over the place - "tools/delaytop: implement real-time keyboard interaction support" from Fan Yu adds a few nice user-facing usability changes to the delaytop monitoring tool - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos Petrongonas fixes a panic which was happening with the combination of EFI and KHO - "Squashfs: performance improvement and a sanity check" from Phillip Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere 150x speedup was measured for a well-chosen microbenchmark - plus another 50-odd singleton patches all over the place * tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits) Squashfs: reject negative file sizes in squashfs_read_inode() kallsyms: use kmalloc_array() instead of kmalloc() MAINTAINERS: update Sibi Sankar's email address Squashfs: add SEEK_DATA/SEEK_HOLE support Squashfs: add additional inode sanity checking lib/genalloc: fix device leak in of_gen_pool_get() panic: remove CONFIG_PANIC_ON_OOPS_VALUE ocfs2: fix double free in user_cluster_connect() checkpatch: suppress strscpy warnings for userspace tools cramfs: fix incorrect physical page address calculation kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit Squashfs: fix uninit-value in squashfs_get_parent kho: only fill kimage if KHO is finalized ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name() kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock coccinelle: platform_no_drv_owner: handle also built-in drivers coccinelle: of_table: handle SPI device ID tables lib/decompress: use designated initializers for struct compress_format efi: support booting with kexec handover (KHO) ... |
|
|
|
7a75a5da79 | Merge branch 'rework/ringbuffer-kunit-test' into for-linus | |
|
|
7f70725741 |
Kbuild updates for 6.18
- Extend modules.builtin.modinfo to include module aliases from MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such as kmod) can verify that a particular module alias will be handled by a builtin module. - Bump the minimum version of LLVM for building the kernel to 15.0.0. - Upgrade several userspace API checks in headers_check.pl to errors. - Unify and consolidate CONFIG_WERROR / W=e handling. - Turn assembler and linker warnings into errors with CONFIG_WERROR / W=e. - Respect CONFIG_WERROR / W=e when building userspace programs (userprogs). - Enable -Werror unconditionally when building host programs (hostprogs). - Support copy_file_range() and data segment alignment in gen_init_cpio to improve performance on filesystems that support reflinks such as btrfs and XFS. - Miscellaneous small changes to scripts and configuration files. Signed-off-by: Nathan Chancellor <nathan@kernel.org> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaNrp6QAKCRAdayaRccAa ljxRAP4hYocKXeWsiJzkTB199P4QUGWf220a9elBmtdJEed07gD/VBnCbSOxG3RO vS8qbJHwxUFL7a+mDV8RIVXSt99NpAg= =psG/ -----END PGP SIGNATURE----- Merge tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild updates from Nathan Chancellor: - Extend modules.builtin.modinfo to include module aliases from MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such as kmod) can verify that a particular module alias will be handled by a builtin module - Bump the minimum version of LLVM for building the kernel to 15.0.0 - Upgrade several userspace API checks in headers_check.pl to errors - Unify and consolidate CONFIG_WERROR / W=e handling - Turn assembler and linker warnings into errors with CONFIG_WERROR / W=e - Respect CONFIG_WERROR / W=e when building userspace programs (userprogs) - Enable -Werror unconditionally when building host programs (hostprogs) - Support copy_file_range() and data segment alignment in gen_init_cpio to improve performance on filesystems that support reflinks such as btrfs and XFS - Miscellaneous small changes to scripts and configuration files * tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits) modpost: Initialize builtin_modname to stop SIGSEGVs Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o modpost: Create modalias for builtin modules modpost: Add modname to mod_device_table alias scsi: Always define blogic_pci_tbl structure kbuild: extract modules.builtin.modinfo from vmlinux.unstripped kbuild: keep .modinfo section in vmlinux.unstripped kbuild: always create intermediate vmlinux.unstripped s390: vmlinux.lds.S: Reorder sections KMSAN: Remove tautological checks objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs riscv: Unconditionally use linker relaxation riscv: Remove version check for LTO_CLANG selects powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault() mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER arm64: Remove tautological LLVM Kconfig conditions ARM: Clean up definition of ARM_HAS_GROUP_RELOCS ... |
|
|
|
4b81e2eb9e |
Updates for the VDSO subsystem:
- Further consolidation of the VDSO infrastructure and the common data
store.
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaUJgTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT5VD/9l+TP42vJzwygPArefVM9QijAc4k+g
iptaz5M+Lhk0BlwMUu502Tp1zcwGq5529LlE2P72DuYoZRtZKSFCGES2mye0JiAs
A5Vk8d1zicg3IIGjYgw39shA9ghzuzV3gZdgHVqQ6024uSeKxUY3SqF156GGfPFG
azXWDV7i4RGQvKLm2ye8m7JNYCE3P9f8Wh+GvPaq7qBlW1MsMCWC4FP5qu4iStf3
mol9uA/EM2zSDwkpIpq6P5L2TpIH1dGDNGxVf/HG3J16UhNXzzXv/rxuKHKPNGWm
MQZ8WdPjF+uwdUsNXaImC6aCE0hoQ1Ga7GE2YA3pMQYfySeiJ/fA5I1KfhlUCiHD
oUgnb6PQfZjaBDrIODKcrFHAYc5pTOyJXnJ6q1Lc1yrOfoVcIpMaRXHO/phCeuwI
61gfb9ggbFEH/7cL26Z9XGSqh4BDCKVP5Z+y98OnNEzV74ScpX13/szB3VgdN6LW
tmTOeMPYCo4FQoPw8OV1xkeYOKD2JyqH+garnxDZNFjaVaeUkzlAMKz+1JUjxRZj
UpUhZ8y5KFVjgKkqZHxhucr9cnwV/xd4n+IzZeG4l6XZp4VyirG+5QpAikGYe+Jq
WFU4Ao1xvevyizrJEhoGJKznwtrNrsy6AQ+wLQxYiRR4aTnng8n9T/ry2DVvhJf8
diYWxuRTR7KMxg==
=i3UB
-----END PGP SIGNATURE-----
Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:
- Further consolidation of the VDSO infrastructure and the common data
store
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: Drop vdso_test_clock_getres
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
vdso: Drop kconfig GENERIC_COMPAT_VDSO
vdso: Drop kconfig GENERIC_VDSO_32
riscv: vdso: Untangle Kconfig logic
time: Build generic update_vsyscall() only with generic time vDSO
vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
vdso: Move ENABLE_COMPAT_VDSO from core to arm64
ARM: VDSO: Remove cntvct_ok global variable
vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
|
|
|
|
755fa5b4fb |
cgroup: Changes for v6.18
- Extensive cpuset code cleanup and refactoring work with no functional changes: CPU mask computation logic refactoring, introducing new helpers, removing redundant code paths, and improving error handling for better maintainability. - A few bug fixes to cpuset including fixes for partition creation failures when isolcpus is in use, missing error returns, and null pointer access prevention in free_tmpmasks(). - Core cgroup changes include replacing the global percpu_rwsem with per-threadgroup rwsem when writing to cgroup.procs for better scalability, workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for workqueue default switching from percpu to unbound, and removal of unused code including the post_attach callback. - New cgroup.stat.local time accounting feature that tracks frozen time duration. - Misc changes including selftests updates (new freezer time tests and backward compatibility fixes), documentation sync, string function safety improvements, and 64-bit division fixes. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaNb1Sg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGfLMAPwKwkvUg9DPJEuECRfM9woOOHyIWLp1DwUhpg1v Zq0lkAEAmo/+IkJXGZ7TGF+wzSj7GFIugrILu3upzLCHzgYoDgs= =39KF -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Extensive cpuset code cleanup and refactoring work with no functional changes: CPU mask computation logic refactoring, introducing new helpers, removing redundant code paths, and improving error handling for better maintainability. - A few bug fixes to cpuset including fixes for partition creation failures when isolcpus is in use, missing error returns, and null pointer access prevention in free_tmpmasks(). - Core cgroup changes include replacing the global percpu_rwsem with per-threadgroup rwsem when writing to cgroup.procs for better scalability, workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for workqueue default switching from percpu to unbound, and removal of unused code including the post_attach callback. - New cgroup.stat.local time accounting feature that tracks frozen time duration. - Misc changes including selftests updates (new freezer time tests and backward compatibility fixes), documentation sync, string function safety improvements, and 64-bit division fixes. * tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits) cpuset: remove is_prs_invalid helper cpuset: remove impossible warning in update_parent_effective_cpumask cpuset: remove redundant special case for null input in node mask update cpuset: fix missing error return in update_cpumask cpuset: Use new excpus for nocpu error check when enabling root partition cpuset: fix failure to enable isolated partition when containing isolcpus Documentation: cgroup-v2: Sync manual toctree cpuset: use partition_cpus_change for setting exclusive cpus cpuset: use parse_cpulist for setting cpus.exclusive cpuset: introduce partition_cpus_change cpuset: refactor cpus_allowed_validate_change cpuset: refactor out validate_partition cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers cpuset: refactor CPU mask buffer parsing logic cpuset: Refactor exclusive CPU mask computation logic cpuset: change return type of is_partition_[in]valid to bool cpuset: remove unused assignment to trialcs->partition_root_state cpuset: move the root cpuset write check earlier cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock ... |
|
|
|
9cc220a422 |
s390 updates for 6.18 merge window
- Refactor SCLP memory hotplug code - Introduce common boot_panic() decompressor helper macro and use it to get rid of nearly few identical implementations - Take into account additional key generation flags and forward it to the ep11 implementation. With that allow users to modify the key generation process, e.g. provide valid combinations of XCP_BLOB_* flags - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390 debug facility and HMC driver - Add DAX support for DCSS memory block devices - Make the compiler statement attribute "assume" available with a new __assume macro - Rework ffs() and fls() family bitops functions, including source code improvements and generated code optimizations. Use the newly introduced __assume macro for that - Enable additional network features in default configurations - Use __GFP_ACCOUNT flag for user page table allocations to add missing kmemcg accounting - Add WQ_PERCPU flag to explicitly request the use of the per-CPU workqueue for 3590 tape driver - Switch power reading to the per-CPU and the Hiperdispatch to the default workqueue - Add memory allocation profiling hooks to allow better profiling data and the /proc/allocinfo output similar to other architectures -----BEGIN PGP SIGNATURE----- iI0EABYKADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCaNpOnhccYWdvcmRlZXZA bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8LJ0AP98TkWDCXLb02dNTST36dtoNaM+ I9HoosjbZIm8oHwhngD+JisTRWFogplXnE2z+JQrJJcshWvUpFDVtkk2pCSeOQM= =Cztn -----END PGP SIGNATURE----- Merge tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Refactor SCLP memory hotplug code - Introduce common boot_panic() decompressor helper macro and use it to get rid of nearly few identical implementations - Take into account additional key generation flags and forward it to the ep11 implementation. With that allow users to modify the key generation process, e.g. provide valid combinations of XCP_BLOB_* flags - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390 debug facility and HMC driver - Add DAX support for DCSS memory block devices - Make the compiler statement attribute "assume" available with a new __assume macro - Rework ffs() and fls() family bitops functions, including source code improvements and generated code optimizations. Use the newly introduced __assume macro for that - Enable additional network features in default configurations - Use __GFP_ACCOUNT flag for user page table allocations to add missing kmemcg accounting - Add WQ_PERCPU flag to explicitly request the use of the per-CPU workqueue for 3590 tape driver - Switch power reading to the per-CPU and the Hiperdispatch to the default workqueue - Add memory allocation profiling hooks to allow better profiling data and the /proc/allocinfo output similar to other architectures * tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits) s390/mm: Add memory allocation profiling hooks s390: Replace use of system_wq with system_dfl_wq s390/diag324: Replace use of system_wq with system_percpu_wq s390/tape: Add WQ_PERCPU to alloc_workqueue users s390/bitops: Switch to generic ffs() if supported by compiler s390/bitops: Switch to generic fls(), fls64(), etc. s390/mm: Use __GFP_ACCOUNT for user page table allocations s390/configs: Enable additional network features s390/bitops: Cleanup __flogr() s390/bitops: Use __assume() for __flogr() inline assembly return value compiler_types: Add __assume macro s390/bitops: Limit return value range of __flogr() s390/dcssblk: Add DAX support s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul() s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul() s390/pkey: Forward keygenflags to ep11_unwrapkey s390/boot: Add common boot_panic() code s390/bitops: Optimize inlining s390/bitops: Slightly optimize ffs() and fls64() s390/sclp: Move memory hotplug code for better modularity ... |
|
|
|
a5ba183bde |
hardening updates for v6.18-rc1
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNraNQAKCRA2KwveOeQk
u/DkAPwKPP5BSmVR2wkdpQaXIr3PGA+cbBYp34DMJNujZ9piIwD/WZ+HfGTLoERy
+2Q6HLj9hUdd+Rx3IZ8/w1QmnhUIUAU=
=AwV9
-----END PGP SIGNATURE-----
Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
|
|
|
|
18b19abc37 |
namespace-6.18-rc1
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQgQAKCRCRxhvAZXjc oiFXAQCpbLvkWbld9wLgxUBhq+q+kw5NvGxzpvqIhXwJB9F9YAEA44/Wevln4xGx +kRUbP+xlRQqenIYs2dLzVHzAwAdfQ4= =EO4Y -----END PGP SIGNATURE----- Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_*() helpers net: port to ns_ref_*() helpers uts: port to ns_ref_*() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_*() helpers time: port to ns_ref_*() helpers pid: port to ns_ref_*() helpers ipc: port to ns_ref_*() helpers cgroup: port to ns_ref_*() helpers ... |
|
|
|
b7ce6fa90f |
vfs-6.18-rc1.misc
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQMQAKCRCRxhvAZXjc
omNLAQCgrwzd9sa1JTlixweu3OAxQlSEbLuMpEv7Ztm+B7Wz0AD9HtwPC44Kev03
GbMcB2DCFLC4evqYECj6IG7NBmoKsAs=
=1ICf
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add "initramfs_options" parameter to set initramfs mount options.
This allows to add specific mount options to the rootfs to e.g.,
limit the memory size
- Add RWF_NOSIGNAL flag for pwritev2()
Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
signal from being raised when writing on disconnected pipes or
sockets. The flag is handled directly by the pipe filesystem and
converted to the existing MSG_NOSIGNAL flag for sockets
- Allow to pass pid namespace as procfs mount option
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs
mount is auto-selected based on the mounting process's active
pidns, and the pidns itself is basically hidden once the mount has
been constructed)
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns
of a procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes
creates a procfs mount from an empty pidns so that user
namespaced containers can be nested (without this, the nested
containers would fail to mount procfs)
But this requires forking off a helper process because you cannot
just one-shot this using mount(2)
* Container runtimes in general need to fork into a container
before configuring its mounts, which can lead to security issues
in the case of shared-pidns containers (a privileged process in
the pidns can interact with your container runtime process)
While SUID_DUMP_DISABLE and user namespaces make this less of an
issue, the strict need for this due to a minor uAPI wart is kind
of unfortunate
Things would be much easier if there was a way for userspace to
just specify the pidns they want. So this pull request contains
changes to implement a new "pidns" argument which can be set
using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
Cleanups:
- Remove the last references to EXPORT_OP_ASYNC_LOCK
- Make file_remove_privs_flags() static
- Remove redundant __GFP_NOWARN when GFP_NOWAIT is used
- Use try_cmpxchg() in start_dir_add()
- Use try_cmpxchg() in sb_init_done_wq()
- Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
- Remove vfs_ioctl() export
- Replace rwlock() with spinlock in epoll code as rwlock causes
priority inversion on preempt rt kernels
- Make ns_entries in fs/proc/namespaces const
- Use a switch() statement() in init_special_inode() just like we do
in may_open()
- Use struct_size() in dir_add() in the initramfs code
- Use str_plural() in rd_load_image()
- Replace strcpy() with strscpy() in find_link()
- Rename generic_delete_inode() to inode_just_drop() and
generic_drop_inode() to inode_generic_drop()
- Remove unused arguments from fcntl_{g,s}et_rw_hint()
Fixes:
- Document @name parameter for name_contains_dotdot() helper
- Fix spelling mistake
- Always return zero from replace_fd() instead of the file descriptor
number
- Limit the size for copy_file_range() in compat mode to prevent a
signed overflow
- Fix debugfs mount options not being applied
- Verify the inode mode when loading it from disk in minixfs
- Verify the inode mode when loading it from disk in cramfs
- Don't trigger automounts with RESOLVE_NO_XDEV
If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
through automounts, but could still trigger them
- Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
tracepoints
- Fix unused variable warning in rd_load_image() on s390
- Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD
- Use ns_capable_noaudit() when determining net sysctl permissions
- Don't call path_put() under namespace semaphore in listmount() and
statmount()"
* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
fcntl: trim arguments
listmount: don't call path_put() under namespace semaphore
statmount: don't call path_put() under namespace semaphore
pid: use ns_capable_noaudit() when determining net sysctl permissions
fs: rename generic_delete_inode() and generic_drop_inode()
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
initramfs: Replace strcpy() with strscpy() in find_link()
initrd: Use str_plural() in rd_load_image()
initramfs: Use struct_size() helper to improve dir_add()
initrd: Fix unused variable warning in rd_load_image() on s390
fs: use the switch statement in init_special_inode()
fs/proc/namespaces: make ns_entries const
filelock: add FL_RECLAIM to show_fl_flags() macro
eventpoll: Replace rwlock with spinlock
selftests/proc: add tests for new pidns APIs
procfs: add "pidns" mount option
pidns: move is-ancestor logic to helper
openat2: don't trigger automounts with RESOLVE_NO_XDEV
namei: move cross-device check to __traverse_mounts
namei: remove LOOKUP_NO_XDEV check from handle_mounts
...
|
|
|
|
fde0ab43b9 |
Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architectures
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.
And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).
This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.
Just use an empty asm statement instead.
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes:
|
|
|
|
4055526d35
|
ns: move ns type into struct ns_common
It's misplaced in struct proc_ns_operations and ns->ops might be NULL if the namespace is compiled out but we still want to know the type of the namespace for the initial namespace struct. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
23ef9d4397 |
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> |
|
|
|
e2ffa15b9b |
kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:
{
__label__ local_lbl;
...
unsafe_get_user(uval, uaddr, local_lbl);
...
return 0;
local_lbl:
return -EFAULT;
}
when two such scopes exist in the same function:
error: cannot jump from this asm goto statement to one of its possible targets
There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.
That issue prevents using local labels for a cleanup based user access
mechanism.
After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.
But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.
The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:
void bar(void **);
void* baz(void);
int foo (void) {
{
asm goto("jmp %l0"::::l0);
return 0;
l0:
return 1;
}
void *x __attribute__((cleanup(bar))) = baz();
{
asm goto("jmp %l0"::::l1);
return 42;
l1:
return 0xff;
}
}
Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.
That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.
Thanks to Nathan for pointing to the relevant clang issue!
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link:
|
|
|
|
95ee3364b2 |
Linux 6.17-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmjHMcoeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5bwH/23w8iGB4hf7L/7Z e7blX42Pe9EXA1uK62iWmwEjDvBuJ7TmVfXH09qYJ56fj6/rJEdpQwtBMd4ypL81 QA/7lq5UEl0apPzMN86J8EHCzmjNzv7o+UtEd4C/hPFEZHZJa5Hqj9CBglSwSCEn fTkLk7Gl6s8SfzBQ/rXX6/ZChAB/RleVWabDlIQMDz++/+9DZ0aqphj+5bYSqysL ROQOaj4LOICuLfrup9J61hKNBoF7Dv3sO20vc+Iic0XHRPZ6/lKCnHgCUsqVIOOQ L4kDT7XKQg+n3ttjrMe84/8iHZdWtf8VMWrtniPT8e1YGYuMpavVplgIcFoFCoNm Qa7NPDs= =rZeT -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaM3AYQAKCRAdayaRccAa lkrsAQCfR0LymE8Hq+Vfk65DK4qZxigaXGTfg5n3xlPhTAh/iQEA02N0/ReHOOdH nQde8709saIFE5axIMFvdWzbFPDtWwE= =eIkf -----END PGP SIGNATURE----- Merge 6.17-rc6 into kbuild-next Commit |
|
|
|
7cf7303211
|
ns: use inode initializer for initial namespaces
Just use the common helper we have. Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
024596a4e2
|
ns: rename to __ns_ref
Make it easier to grep and rename to ns_count. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
b36c823b9a
|
time: support ns lookup
Support the generic ns lookup infrastructure to support file handles for namespaces. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
f72e2cff13 |
compiler_types: Add __assume macro
Make the statement attribute "assume" with a new __assume macro available. The assume attribute is used to indicate that a certain condition is assumed to be true. Compilers may or may not use this indication to generate optimized code. If this condition is violated at runtime, the behavior is undefined. Note that the clang documentation states that optimizers may react differently to this attribute, and this may even have a negative performance impact. Therefore this attribute should be used with care. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> |
|
|
|
7479260860
|
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c. Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.
Fixes:
|
|
|
|
afd77d2050
|
initramfs: Replace strcpy() with strscpy() in find_link()
strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
beb022ef92
|
initrd: Use str_plural() in rd_load_image()
Add the local variable 'nr_disks' and replace the manual ternary "s" pluralization with the standardized str_plural() helper function. Use pr_notice() instead of printk(KERN_NOTICE) to silence a checkpatch warning. No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
e60625e7ce
|
initramfs: Use struct_size() helper to improve dir_add()
Use struct_size() to calculate the number of bytes to allocate for a new directory entry. No functional changes. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org> |
|
|
|
84f1766bdb
|
initrd: Fix unused variable warning in rd_load_image() on s390
The local variables 'rotator' and 'rotate' (used for the progress
indicator) aren't used on s390. Building the kernel with W=1 generates
the following warning:
init/do_mounts_rd.c:192:17: warning: variable 'rotate' set but not used [-Wunused-but-set-variable]
192 | unsigned short rotate = 0;
| ^
1 warning generated.
Remove the preprocessor directives and use the IS_ENABLED(CONFIG_S390)
macro instead, allowing the compiler to optimize away unused variables
and avoid the warning on s390.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
|
|
e416f0ed3c |
init: handle bootloader identifier in kernel parameters
BootLoaders (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE= /boot/vmlinuz-x.y.z" to kernel parameters. But these identifiers are not recognized by the kernel itself so will be passed to userspace. However user space init program also don't recognize it. KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on some architectures. We cannot change BootLoader's behavior, because this behavior exists for many years, and there are already user space programs search BOOT_IMAGE= in /proc/cmdline to obtain the kernel image locations: https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go (search getBootOptions) https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go (search getKernelReleaseWithBootOption) So the the best way is handle (ignore) it by the kernel itself, which can avoid such boot warnings (if we use something like init=/bin/bash, bootloader identifier can even cause a crash): Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space. [chenhuacai@loongson.cn: use strstarts()] Link: https://lkml.kernel.org/r/20250815090120.1569947-1-chenhuacai@loongson.cn Link: https://lkml.kernel.org/r/20250721101343.3283480-1-chenhuacai@loongson.cn Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
4f553c1e2c |
20 hotfixes. 15 are cc:stable and the remainder address post-6.16 issues
or aren't considered necessary for -stable kernels. 14 of these fixes are
for MM.
This includes
- a 3-patch kexec series from Breno that fixes a recently introduced
use-uninitialized bug,
- e 2-patch DAMON series from Quanmin Yan that avoids div-by-zero
crashes which can occur if the operator uses poorly-chosen insmod
parameters.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaMI7WQAKCRDdBJ7gKXxA
jq3sAQDkflIN0qW3R7yqgUZfdO78T2LMmGlPW1L7F/ZXkxLk7gD/WgkWoec5cqi0
ACiL81h6btIYBLHJ+SqJuowPMhaelQg=
=fquW
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes. 15 are cc:stable and the remainder address post-6.16
issues or aren't considered necessary for -stable kernels. 14 of these
fixes are for MM.
This includes
- kexec fixes from Breno for a recently introduced
use-uninitialized bug
- DAMON fixes from Quanmin Yan to avoid div-by-zero crashes
which can occur if the operator uses poorly-chosen insmod
parameters
and misc singleton fixes"
* tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add tree entry to numa memblocks and emulation block
mm/damon/sysfs: fix use-after-free in state_show()
proc: fix type confusion in pde_set_flags()
compiler-clang.h: define __SANITIZE_*__ macros only when undefined
mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
ocfs2: fix recursive semaphore deadlock in fiemap call
mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
mm/mremap: fix regression in vrm->new_addr check
percpu: fix race on alloc failed warning limit
mm/memory-failure: fix redundant updates for already poisoned pages
s390: kexec: initialize kexec_buf struct
riscv: kexec: initialize kexec_buf struct
arm64: kexec: initialize kexec_buf struct in load_other_segments()
mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
mm/damon/core: set quota->charged_from to jiffies at first charge window
mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
init/main.c: fix boot time tracing crash
mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()
mm/khugepaged: fix the address passed to notifier on testing young
|
|
|
|
0568f89d4f |
cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs
The static usage pattern of creating a cgroup, enabling controllers,
and then seeding it with CLONE_INTO_CGROUP doesn't require write
locking cgroup_threadgroup_rwsem and thus doesn't benefit from this
patch.
To avoid affecting other users, the per threadgroup rwsem is only used
when the favordynmods is enabled.
As computer hardware advances, modern systems are typically equipped
with many CPU cores and large amounts of memory, enabling the deployment
of numerous applications. On such systems, container creation and
deletion become frequent operations, making cgroup process migration no
longer a cold path. This leads to noticeable contention with common
process operations such as fork, exec, and exit.
To alleviate the contention between cgroup process migration and
operations like process fork, this patch modifies lock to take the write
lock on signal_struct->group_rwsem when writing pid to
cgroup.procs/threads instead of holding a global write lock.
Cgroup process migration has historically relied on
signal_struct->group_rwsem to protect thread group integrity. In commit
<1ed1328792ff> ("sched, cgroup: replace signal_struct->group_rwsem with
a global percpu_rwsem"), this was changed to a global
cgroup_threadgroup_rwsem. The advantage of using a global lock was
simplified handling of process group migrations. This patch retains the
use of the global lock for protecting process group migration, while
reducing contention by using per thread group lock during
cgroup.procs/threads writes.
The locking behavior is as follows:
write cgroup.procs/threads | process fork,exec,exit | process group migration
------------------------------------------------------------------------------
cgroup_lock() | down_read(&g_rwsem) | cgroup_lock()
down_write(&p_rwsem) | down_read(&p_rwsem) | down_write(&g_rwsem)
critical section | critical section | critical section
up_write(&p_rwsem) | up_read(&p_rwsem) | up_write(&g_rwsem)
cgroup_unlock() | up_read(&g_rwsem) | cgroup_unlock()
g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes
signal_struct->group_rwsem.
This patch eliminates contention between cgroup migration and fork
operations for threads that belong to different thread groups, thereby
reducing the long-tail latency of cgroup migrations and lowering system
load.
With this patch, under heavy fork and exec interference, the long-tail
latency of cgroup migration has been reduced from milliseconds to
microseconds. Under heavy cgroup migration interference, the multi-CPU
score of the spawn test case in UnixBench increased by 9%.
tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to
pr_warn_once().
Signed-off-by: Yi Tao <escape@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
|
|
b236920731 |
Rust fixes for v6.17 (2nd)
Toolchain and infrastructure:
- Two changes to prepare for the future Rust 1.91.0 release (expected
2025-10-30, currently in nightly): a target specification format
change and a renamed, soon-to-be-stabilized 'core' function.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmi8ciYACgkQGXyLc2ht
IW3Bdg/9G07hHjmW0TR+V7AExNTqaZKdCxY//1lFeIuiIIew/tasqqbMCzv6KPi0
UL/yk4f4bBp5oklRGxzvu5DKG31XCe5i1izML3zQ2Y04Klkj3lcfAG3u05OEhPZm
woelfv/s3Umy1Gl5xV2QfiVrocRfWlZvsTP/K+ho05FKhM476XjErn8p8U1U3tjA
KbB+EBKV5reZ14gMDwjCN5BnMpddMQleifaEX/4oRwNfw3zzKDcqB34ojOXjVcrS
0GfapPCx64FNHxSeHADQ2O9mzOSxT5cTEzcH9JBFU+p8owmjOImvvcPoYKFOn4My
btQeVzoCxeDwmLBrV8peR3nQHEwjxFuWvyhBg0SQy+51DIzhPJZdy/Z8RsrSywnH
qoirudXd/rchvUTk/dNxQC6NORuY1IGGi2uJAs4ZRrmC6UqbxjUdB08cyVY4gFj0
H3VMVaWlVWIGsPATrA2Oh0D1wsJBDYynf1oyG8R1NyYu4vCpYKMBAyJlVGlrGgq6
H+fU8BqV+LmXH9ogtk0vhT7sK2dHGSA64JR3q1ralOlQip1CsXVeaN9/B+wfjo+i
dL1qwJXkii0yxyVElHTbnugxI60hR2cgkXRnDMgJu5fBPGDdebOGtB1xxDwLnCjH
3yNiU1Eyyer/NbsqxO9ylYPvHeg4RXETHoK0gk+/jyyR0ryfbtE=
=j/MW
-----END PGP SIGNATURE-----
Merge tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
- Two changes to prepare for the future Rust 1.91.0 release (expected
2025-10-30, currently in nightly): a target specification format
change and a renamed, soon-to-be-stabilized 'core' function.
* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: support Rust >= 1.91.0 target spec
rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
|
|
|
|
bad53ae2dc |
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
All architectures implementing time-related functionality in the vDSO are using the generic vDSO library which handles time namespaces properly. Remove the now unnecessary Kconfig symbol. Enables the use of time namespaces on architectures, which use the generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de |
|
|
|
669602b5b7 |
init/main.c: fix boot time tracing crash
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:
[ 0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[ 0.160254] #PF: supervisor read access in kernel mode
[ 0.160975] #PF: error_code(0x0000) - not-present page
[ 0.161697] PGD 0 P4D 0
[ 0.162055] Oops: Oops: 0000 [#1] SMP PTI
[ 0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[ 0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[ 0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[ 0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[ 0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[ 0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[ 0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[ 0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[ 0.174542] FS: 0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[ 0.175684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[ 0.177483] Call Trace:
[ 0.177828] <TASK>
[ 0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[ 0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[ 0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[ 0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[ 0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[ 0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[ 0.181517] execmem_alloc_rw (mm/execmem.c:487)
[ 0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[ 0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[ 0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[ 0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[ 0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[ 0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[ 0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[ 0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[ 0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[ 0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[ 0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[ 0.189088] register_tracer (kernel/trace/trace.c:2391)
[ 0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[ 0.190204] start_kernel (init/main.c:970)
[ 0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[ 0.191381] x86_64_start_kernel (??:?)
[ 0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[ 0.192534] </TASK>
[ 0.192839] Modules linked in:
[ 0.193267] CR2: 000000000000001c
[ 0.193730] ---[ end trace 0000000000000000 ]---
The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.
Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().
Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org
Fixes:
|
|
|
|
c09461a0d2 |
rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
As part of the stabilization of Location::file_with_nul(), it was brought up that the with_nul() suffix usually means something else in Rust APIs, so the API is being renamed prior to stabilization [1]. Thus, use the new name on new rustc versions. Link: https://www.github.com/rust-lang/rust/pull/145928 [1] Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com [ Kept `cfg` separation. Reworded slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |