Commit Graph

2811 Commits

Author SHA1 Message Date
Linus Torvalds 7cd122b552 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
 Reference is grabbed and not released, but it's not actually _stored_
 anywhere.  That works, but it's hard to follow and verify; among other
 things, we have no way to tell _which_ of the increments is intended
 to be an unpaired one.  Worse, on removal we need to decide whether
 the reference had already been dropped, which can be non-trivial if
 that removal is on umount and we need to figure out if this dentry is
 pinned due to e.g. unlink() not done.  Usually that is handled by using
 kill_litter_super() as ->kill_sb(), but there are open-coded special
 cases of the same (consider e.g. /proc/self).
 
 Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
 marking those "leaked" dentries.  Having it set claims responsibility
 for +1 in refcount.
 
 The end result this series is aiming for:
 
 * get these unbalanced dget() and dput() replaced with new primitives that
   would, in addition to adjusting refcount, set and clear persistency flag.
 * instead of having kill_litter_super() mess with removing the remaining
   "leaked" references (e.g. for all tmpfs files that hadn't been removed
   prior to umount), have the regular shrink_dcache_for_umount() strip
   DCACHE_PERSISTENT of all dentries, dropping the corresponding
   reference if it had been set.  After that kill_litter_super() becomes
   an equivalent of kill_anon_super().
 
 Doing that in a single step is not feasible - it would affect too many places
 in too many filesystems.  It has to be split into a series.
 
 This work has really started early in 2024; quite a few preliminary pieces
 have already gone into mainline.  This chunk is finally getting to the
 meat of that stuff - infrastructure and most of the conversions to it.
 
 Some pieces are still sitting in the local branches, but the bulk of
 that stuff is here.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
 643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
 Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
 =rKq+
 -----END PGP SIGNATURE-----

Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Linus Torvalds 6dfafbd029 drm-next for 6.19-rc1:
new driver:
 - Arm Ethos-U65/U85 accel driver
 
 core:
 - support the drm color pipeline in vkms/amdgfx
 - add support for drm colorop pipeline
 - add COLOR PIPELINE plane property
 - add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
 - throttle dirty worker with vblank
 - use drm_for_each_bridge_in_chain_scoped in drm's bridge code
 - Ensure drm_client_modeset tests are enabled in UML
 - add simulated vblank interrupt - use in drivers
 - dumb buffer sizing helper
 - move freeing of drm client memory to driver
 - crtc sharpness strength property
 - stop using system_wq in scheduler/drivers
 - support emergency restore in drm-client
 
 rust:
 - make slice::as_flattened usable on all supported rustc
 - add FromBytes::from_bytes_prefix() method
 - remove redundant device ptr from Rust GEM object
 - Change how AlwaysRefCounted is implemented for GEM objects
 
 gpuvm:
 - Add deferred vm_bo cleanup to GPUVM (for rust)
 
 atomic:
 - cleanup and improve state handling interfaces
 
 buddy:
 - optimize block management
 
 dma-buf:
 - heaps: Create heap per CMA reserved location
 - improve userspace documentation
 
 dp:
 - add POST_LT_ADJ_REQ training sequence
 - DPCD dSC quirk for synaptics panamera devices
 - helpers to query branch DSC max throughput
 
 ttm:
 - Rename ttm_bo_put to ttm_bo_fini
 - allow page protection flags on risc-v
 - rework pipelined eviction fence handling
 
 amdgpu:
 - enable amdgpu by default for SI/CI dGPUs
 - enable DC by default on SI
 - refactor CIK/SI enablement
 - add ABM KMS property
 - Re-enable DM idle optimizations
 - DC Analog encoders support
 - Powerplay fixes for fiji/iceland
 - Enable DC on bonaire by default
 - HMM cleanup
 - Add new RAS framework
 - DML2.1 updates
 - YCbCr420 fixes
 - DC FP fixes
 - DMUB fixes
 - LTTPR fixes
 - DTBCLK fixes
 - DMU cursor offload handling
 - Userq validation improvements
 - Unify shutdown callback handling
 - Suspend improvements
 - Power limit code cleanup
 - SR-IOV fixes
 - AUX backlight fixes
 - DCN 3.5 fixes
 - HDMI compliance fixes
 - DCN 4.0.1 cursor updates
 - DCN interrupt fix
 - DC KMS full update improvements
 - Add additional HDCP traces
 - DCN 3.2 fixes
 - DP MST fixes
 - Add support for new SR-IOV mailbox interface
 - UQ reset support
 - HDP flush rework
 - VCE1 support
 
 amdkfd:
 - HMM cleanups
 - Relax checks on save area overallocations
 - Fix GPU mappings after prefetch
 
 radeon:
 - refactor CIK/SI enablement.
 
 xe:
 - Initial Xe3P support
 - panic support on VRAM for display
 - fix stolen size check
 - Loosen used tracking restriction
 - New SR-IOV debugfs structure and debugfs updates
 - Hide the GPU madvise flag behind a VM_BIND flag
 - Always expose VRAM provisioning data on discrete GPUs
 - Allow VRAM mappings for userptr when used with SVM
 - Allow pinning of p2p dma-buf
 - Use per-tile debugfs where appropriate
 - Add documentation for Execution Queues
 - PF improvements
 - VF migration recovery redesign work
 - User / Kernel VRAM partitioning
 - Update Tile-based messages
 - Allow configfs to disable specific GT types
 - VF provisioning and migration improvements
 - use SVM range helpers in PT layer
 - Initial CRI support
 - access VF registers using dedicated MMIO view
 - limit number of jobs per exec queue
 - add sriov_admin sysfs tree
 - more crescent island specific support
 - debugfs residency counter
 - SRIOV migration work
 - runtime registers for GFX 35
 
 i915:
 - add initial Xe3p_LPD display version 35 support
 - Enable LNL+ content adaptive sharpness filter
 - Use optimized VRR guardband
 - Enable Xe3p LT PHY
 - enable FBC support for Xe3p_LPD display
 - add display 30.02 firmware support
 - refactor SKL+ watermark latency setup
 - refactor fbdev handling
 - call i915/xe runtime PM via function pointers
 - refactor i915/xe stolen memory/display interfaces
 - use display version instead of gfx version in display code
 - extend i915_display_info with Type-C port details
 - lots of display cleanups/refactorings
 - set O_LARGEFILE in __create_shmem
 - skuip guc communication warning on reset
 - fix time conversions
 - defeature DRRS on LNL+
 - refactor intel_frontbuffer split between i915/xe/display
 - convert inteL_rom interfaces to struct drm_device
 - unify display register polling interfaces
 - aovid lock inversion when pinning to GGTT on CHV/BXT+VTD
 
 panel:
 - Add KD116N3730A08/A12, chromebook mt8189
 - JT101TM023, LQ079L1SX01,
 - GLD070WX3-SL01 MIPI DSI
 - Samsung LTL106AL0, Samsung LTL106AL01
 - Raystar RFF500F-AWH-DNN
 - Winstar WF70A8SYJHLNGA,
 - Wanchanglong w552946aaa
 - Samsung SOFEF00
 - Lenovo X13s panel.
 - ilitek-ili9881c : add rpi 5" support
 - visionx-rm69299 - add backlight support
 - edp - support AUI B116XAN02.0
 
 bridge:
 - improve ref counting
 - ti-sn65dsi86 - add support for DP mode with HPD
 - synopsis: support CEC, init timer with correct freq
 - ASL CS5263 DP-to-HDMI bridge support
 
 nova-core:
 - introduce bitfield! macro
 - introduce safe integer converters
 - GSP inits to fully booted state on Ampere
 - Use more future-proof register for GPU identification
 
 nova-drm:
 - select NOVA_CORE
 - 64-bit only
 
 nouveau:
 - improve reclocking on tegra 186+
 - add large page and compression support
 
 msm:
 - GPU:
   - Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
   - A612 support
 - MDSS:
   - Added support for Glymur and QCS8300 platforms
 - DPU:
   - Enabled Quad-Pipe support, unlocking higher resolutions support
   - Added support for Glymur platform
   - Documented DPU on QCS8300 platform as supported
 - DisplayPort:
   - Added support for Glymur platform
   - Added support lame remapping inside DP block
   - Documented DisplayPort controller on QCS8300 and SM6150/QCS615 as
     supported
 
 tegra:
 - NVJPG driver
 
 panfrost:
 - display JM contexts over debugfs
 - export JM contexts to userspace
 - improve error and job handling
 
 panthor:
 - support custom ASN_HASH for mt8196
 - support mali-G1 GPU
 - flush shmem write before mapping buffers uncached
 - make timeout per-queue instead of per-job
 
 mediatek:
 - MT8195/88 HDMIv2/DDCv2 support
 
 rockchip:
 - dsi: add support for RK3368
 
 amdxdna:
 - enhance runtime PM
 - last hardware error reading uapi
 - support firmware debug output
 - add resource and telemetry data uapi
 - preemption support
 
 imx:
 - add driver for HDMI TX Parallel audio interface
 
 ivpu:
 - add support for user-managed preemption buffer
 - add userptr support
 - update JSM firware API to 3.33.0
 - add better alloc/free warnings
 - fix page fault in unbind all bos
 - rework bind/unbind of imported buffers
 - enable MCA ECC signalling
 - split fw runtime and global memory buffers
 - add fdinfo memory statistics
 
 tidss:
 - convert to drm logging
 - logging cleanup
 
 ast:
 - refactor generation init paths
 - add per chip generation detect_tx_chip
 - set quirks for each chip model
 
 atmel-hlcdc:
 - set LCDC_ATTRE register in plane disable
 - set correct values for plane scaler
 
 solomon:
 - use drm helper for get_modes and move_valid
 
 sitronix:
 - fix output position when clearing screens
 
 qaic:
 - support dma-buf exports
 - support new firmware's READ_DATA implementation
 - sahara AIC200 image table update
 - add sysfs support
 - add coredump support
 - add uevents support
 - PM support
 
 sun4i:
 - layer refactors to decouple plane from output
 - improve DE33 support
 
 vc4:
 - switch to generic CEC helpers
 
 komeda:
 - use drm_ logging functions
 
 vkms:
 - configfs support for display configuration
 
 vgem:
 - fix fence timer deadlock
 
 etnaviv:
 - add HWDB entry for GC8000 Nano Ultra VIP r6205
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmkv4wAACgkQDHTzWXnE
 hr7GNw/9HITakN5B4DXKZg+Rz38WjsAsSbw1HdgoQR2NMYRw4oVXx3iAVTVLGbvV
 vvMCpgWtFE85IEtg96KWynpFjSHOx7xtXe/lckZ5s+VMi00K4c+wMmvFqKwv1Aul
 9VqPTe+28xNAzk+gglOtNjd4DRVuXQIhPMQdTELork32jQCteJorod+sZnM+RNBG
 0PiiYAKc0VwsPwklHQF7+dBt1yu+8r3ZH5Fc7SjHTuf8Q45Dpayd0HjatSgt9ycn
 s8uxAAY41iDLHnj4LwyYtBFYoMH65VqOAzDeaeVdM9xA/3KbIl9of8QnBpvYWhaR
 9dnh2Ceve3SjOufOBwK2p6H0RXu8Xo3OLnmw7+u0KwgvbmeBTILp8kgNQuk9beIp
 wvkjwpFsDnfo1rKevFMz4R+3U9W3NXP1fjU2tvmnD1lfaby2o3f3AdboaYPbX/xK
 YGh79KDsuAkKH4HdMJNg/FpJWIdSKaaWoFBzZqWBZgZBEPOquR8MHUAaLkVJYK/1
 wBcJZceQpHRssldsg3bBZxGHK811J46QlvMWaID6TSGIoZ8GC6+1Zt0+bwzxMyUR
 49jwRvOWbtYJmZEcR2JF+k1KteMXAp+bXIYqx9e+69c5Guz4w7JW7OLQFVvXszZW
 aObzfrkObPpCGYDS1r85/6vT8yqwVUWtwfPvAeDH9130sHBP8Xw=
 =gpnQ
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "There was a rather late merge of a new color pipeline feature, that
  some userspace projects are blocked on, and has seen a lot of work in
  amdgpu. This should have seen some time in -next. There is additional
  support for this for Intel, that if it arrives in the next day or two
  I'll pass it on in another pull request and you can decide if you want
  to take it.

  Highlights:
   - Arm Ethos NPU accelerator driver
   - new DRM color pipeline support
   - amdgpu will now run discrete SI/CIK cards instead of radeon, which
     enables vulkan support in userspace
   - msm gets gen8 gpu support
   - initial Xe3P support in xe

  Full detail summary:

  New driver:
   - Arm Ethos-U65/U85 accel driver

  Core:
   - support the drm color pipeline in vkms/amdgfx
   - add support for drm colorop pipeline
   - add COLOR PIPELINE plane property
   - add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
   - throttle dirty worker with vblank
   - use drm_for_each_bridge_in_chain_scoped in drm's bridge code
   - Ensure drm_client_modeset tests are enabled in UML
   - add simulated vblank interrupt - use in drivers
   - dumb buffer sizing helper
   - move freeing of drm client memory to driver
   - crtc sharpness strength property
   - stop using system_wq in scheduler/drivers
   - support emergency restore in drm-client

  Rust:
   - make slice::as_flattened usable on all supported rustc
   - add FromBytes::from_bytes_prefix() method
   - remove redundant device ptr from Rust GEM object
   - Change how AlwaysRefCounted is implemented for GEM objects

  gpuvm:
   - Add deferred vm_bo cleanup to GPUVM (for rust)

  atomic:
   - cleanup and improve state handling interfaces

  buddy:
   - optimize block management

  dma-buf:
   - heaps: Create heap per CMA reserved location
   - improve userspace documentation

  dp:
   - add POST_LT_ADJ_REQ training sequence
   - DPCD dSC quirk for synaptics panamera devices
   - helpers to query branch DSC max throughput

  ttm:
   - Rename ttm_bo_put to ttm_bo_fini
   - allow page protection flags on risc-v
   - rework pipelined eviction fence handling

  amdgpu:
   - enable amdgpu by default for SI/CI dGPUs
   - enable DC by default on SI
   - refactor CIK/SI enablement
   - add ABM KMS property
   - Re-enable DM idle optimizations
   - DC Analog encoders support
   - Powerplay fixes for fiji/iceland
   - Enable DC on bonaire by default
   - HMM cleanup
   - Add new RAS framework
   - DML2.1 updates
   - YCbCr420 fixes
   - DC FP fixes
   - DMUB fixes
   - LTTPR fixes
   - DTBCLK fixes
   - DMU cursor offload handling
   - Userq validation improvements
   - Unify shutdown callback handling
   - Suspend improvements
   - Power limit code cleanup
   - SR-IOV fixes
   - AUX backlight fixes
   - DCN 3.5 fixes
   - HDMI compliance fixes
   - DCN 4.0.1 cursor updates
   - DCN interrupt fix
   - DC KMS full update improvements
   - Add additional HDCP traces
   - DCN 3.2 fixes
   - DP MST fixes
   - Add support for new SR-IOV mailbox interface
   - UQ reset support
   - HDP flush rework
   - VCE1 support

  amdkfd:
   - HMM cleanups
   - Relax checks on save area overallocations
   - Fix GPU mappings after prefetch

  radeon:
   - refactor CIK/SI enablement

  xe:
   - Initial Xe3P support
   - panic support on VRAM for display
   - fix stolen size check
   - Loosen used tracking restriction
   - New SR-IOV debugfs structure and debugfs updates
   - Hide the GPU madvise flag behind a VM_BIND flag
   - Always expose VRAM provisioning data on discrete GPUs
   - Allow VRAM mappings for userptr when used with SVM
   - Allow pinning of p2p dma-buf
   - Use per-tile debugfs where appropriate
   - Add documentation for Execution Queues
   - PF improvements
   - VF migration recovery redesign work
   - User / Kernel VRAM partitioning
   - Update Tile-based messages
   - Allow configfs to disable specific GT types
   - VF provisioning and migration improvements
   - use SVM range helpers in PT layer
   - Initial CRI support
   - access VF registers using dedicated MMIO view
   - limit number of jobs per exec queue
   - add sriov_admin sysfs tree
   - more crescent island specific support
   - debugfs residency counter
   - SRIOV migration work
   - runtime registers for GFX 35

  i915:
   - add initial Xe3p_LPD display version 35 support
   - Enable LNL+ content adaptive sharpness filter
   - Use optimized VRR guardband
   - Enable Xe3p LT PHY
   - enable FBC support for Xe3p_LPD display
   - add display 30.02 firmware support
   - refactor SKL+ watermark latency setup
   - refactor fbdev handling
   - call i915/xe runtime PM via function pointers
   - refactor i915/xe stolen memory/display interfaces
   - use display version instead of gfx version in display code
   - extend i915_display_info with Type-C port details
   - lots of display cleanups/refactorings
   - set O_LARGEFILE in __create_shmem
   - skuip guc communication warning on reset
   - fix time conversions
   - defeature DRRS on LNL+
   - refactor intel_frontbuffer split between i915/xe/display
   - convert inteL_rom interfaces to struct drm_device
   - unify display register polling interfaces
   - aovid lock inversion when pinning to GGTT on CHV/BXT+VTD

  panel:
   - Add KD116N3730A08/A12, chromebook mt8189
   - JT101TM023, LQ079L1SX01,
   - GLD070WX3-SL01 MIPI DSI
   - Samsung LTL106AL0, Samsung LTL106AL01
   - Raystar RFF500F-AWH-DNN
   - Winstar WF70A8SYJHLNGA
   - Wanchanglong w552946aaa
   - Samsung SOFEF00
   - Lenovo X13s panel
   - ilitek-ili9881c - add rpi 5" support
   - visionx-rm69299 - add backlight support
   - edp - support AUI B116XAN02.0

  bridge:
   - improve ref counting
   - ti-sn65dsi86 - add support for DP mode with HPD
   - synopsis: support CEC, init timer with correct freq
   - ASL CS5263 DP-to-HDMI bridge support

  nova-core:
   - introduce bitfield! macro
   - introduce safe integer converters
   - GSP inits to fully booted state on Ampere
   - Use more future-proof register for GPU identification

  nova-drm:
   - select NOVA_CORE
   - 64-bit only

  nouveau:
   - improve reclocking on tegra 186+
   - add large page and compression support

  msm:
   - GPU:
      - Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
      - A612 support
   - MDSS:
      - Added support for Glymur and QCS8300 platforms
   - DPU:
      - Enabled Quad-Pipe support, unlocking higher resolutions support
      - Added support for Glymur platform
      - Documented DPU on QCS8300 platform as supported
   - DisplayPort:
      - Added support for Glymur platform
      - Added support lame remapping inside DP block
      - Documented DisplayPort controller on QCS8300 and SM6150/QCS615
        as supported

  tegra:
   - NVJPG driver

  panfrost:
   - display JM contexts over debugfs
   - export JM contexts to userspace
   - improve error and job handling

  panthor:
   - support custom ASN_HASH for mt8196
   - support mali-G1 GPU
   - flush shmem write before mapping buffers uncached
   - make timeout per-queue instead of per-job

  mediatek:
   - MT8195/88 HDMIv2/DDCv2 support

  rockchip:
   - dsi: add support for RK3368

  amdxdna:
   - enhance runtime PM
   - last hardware error reading uapi
   - support firmware debug output
   - add resource and telemetry data uapi
   - preemption support

  imx:
   - add driver for HDMI TX Parallel audio interface

  ivpu:
   - add support for user-managed preemption buffer
   - add userptr support
   - update JSM firware API to 3.33.0
   - add better alloc/free warnings
   - fix page fault in unbind all bos
   - rework bind/unbind of imported buffers
   - enable MCA ECC signalling
   - split fw runtime and global memory buffers
   - add fdinfo memory statistics

  tidss:
   - convert to drm logging
   - logging cleanup

  ast:
   - refactor generation init paths
   - add per chip generation detect_tx_chip
   - set quirks for each chip model

  atmel-hlcdc:
   - set LCDC_ATTRE register in plane disable
   - set correct values for plane scaler

  solomon:
   - use drm helper for get_modes and move_valid

  sitronix:
   - fix output position when clearing screens

  qaic:
   - support dma-buf exports
   - support new firmware's READ_DATA implementation
   - sahara AIC200 image table update
   - add sysfs support
   - add coredump support
   - add uevents support
   - PM support

  sun4i:
   - layer refactors to decouple plane from output
   - improve DE33 support

  vc4:
   - switch to generic CEC helpers

  komeda:
   - use drm_ logging functions

  vkms:
   - configfs support for display configuration

  vgem:
   - fix fence timer deadlock

  etnaviv:
   - add HWDB entry for GC8000 Nano Ultra VIP r6205"

* tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits)
  Revert "drm/amd: Skip power ungate during suspend for VPE"
  drm/amdgpu: use common defines for HUB faults
  drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
  drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
  drm/amdgpu: use static ids for ACP platform devs
  drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
  drm/amdgpu: Forward VMID reservation errors
  drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
  drm/amdgpu/gmc6: Cache VM fault info
  drm/amdgpu/gmc6: Don't print MC client as it's unknown
  drm/amdgpu/cz_ih: Enable soft IRQ handler ring
  drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
  drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
  drm/amdgpu/cik_ih: Enable soft IRQ handler ring
  drm/amdgpu/si_ih: Enable soft IRQ handler ring
  drm/amd/display: fix typo in display_mode_core_structs.h
  drm/amd/display: fix Smart Power OLED not working after S4
  drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
  ...
2025-12-04 08:53:30 -08:00
Linus Torvalds 2ddcf4962c Kbuild updates for v6.19
- Enable -fms-extensions, allowing anonymous use of tagged struct or
     union in struct/union (tag kbuild-ms-extensions-6.19).  An exemplary
     conversion patch is added here, too (btrfs).
 
   - Introduce architecture-specific CC_CAN_LINK and flags for userprogs
 
   - Add new packaging target 'modules-cpio-pkg' for building a initramfs
     cpio w/ kmods
 
   - Handle included .c files in gen_compile_commands
 
   - Minor kbuild changes:
     - Use objtree for module signing key path, fixing oot kmod signing
     - Improve documentation of KBUILD_BUILD_TIMESTAMP
     - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
     - Rename scripts/Makefile.extrawarn to Makefile.warn
     - Drop obsolete types.h check from headers_check.pl
     - Remove outdated config leak ignore entries
 
 Signed-off-by: Nicolas Schier <nsc@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0E3p4c3JKeBvsLGB1IKcBYmEmkFAmkttSoACgkQB1IKcBYm
 Emkt6g/+NLW1rJBCcONXIe3UlfHvo4MBziA+ItuLkYx7FwhhTTOivQgav68wXSQK
 /JqQ5N8O0nO4Gznv5aJWoW+GUCKFjqaDbJDXW9pRdDNGQjXJfL82OHu9gPiUAiDJ
 ffZwyeRfXtBaglWv6ovM5z6817OqUCLErqjESAMHZv4nXbXjFNi2YKwgJGmAjfNM
 kiNUU1ieO2/dxgI/mMCW0UuK0jjiC0EsY1as3IkxBdVZi3dRPobLNaL4JiKHGFlZ
 BlrtAoGHrqwqEQgFACVJE2BzWhW+Hwz7SyfBvF4c8T0eM/02ogcL43COh6scq0pV
 2om7hFwvE/NBdbCvz0KB/q3khx9jJRagCm0o/+YEexHb1rn+LcCbomQZYRjIEcHU
 mBc3DTD+B+jwiN6mow6e6E1uuKvpTrmPGBNVyQ5BMFpdlhfga7zsttLO5kMIXqAH
 Fc8MsSx06YggrwGevc/39C+3uzI501Rcxu9BWdyHXfeZKeq0gZGNtkXVN2edzzKm
 c4uFeDbbF3M0oeGXmtlm8I//jqeJuc0wK8d8tSXjKodo+vgppXj0s2okAwALY2jW
 NVAlxqKwS7CnjBbXOsHIXSCDDL5IQ+np/gIaOkSbl6DXU50BvFV+KGvqW9hdjmwR
 AEQXemqUhSvGFmYG92mLwewXWAivnxUvGTCifOhXipyikXhX/wI=
 =Gn4V
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild updates from Nicolas Schier:

  - Enable -fms-extensions, allowing anonymous use of tagged struct or
    union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
    conversion patch is added here, too (btrfs).

    [ Editor's note: the core of this actually came in early through a
      shared branch and a few other trees    - Linus ]

  - Introduce architecture-specific CC_CAN_LINK and flags for userprogs

  - Add new packaging target 'modules-cpio-pkg' for building a initramfs
    cpio w/ kmods

  - Handle included .c files in gen_compile_commands

  - Minor kbuild changes:
     - Use objtree for module signing key path, fixing oot kmod signing
     - Improve documentation of KBUILD_BUILD_TIMESTAMP
     - Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
     - Rename scripts/Makefile.extrawarn to Makefile.warn
     - Drop obsolete types.h check from headers_check.pl
     - Remove outdated config leak ignore entries

* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kbuild: add target to build a cpio containing modules
  initramfs: add gen_init_cpio to hostprogs unconditionally
  kbuild: allow architectures to override CC_CAN_LINK
  init: deduplicate cc-can-link.sh invocations
  kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
  scripts: headers_install.sh: Remove two outdated config leak ignore entries
  scripts/clang-tools: Handle included .c files in gen_compile_commands
  kbuild: uapi: Drop types.h check from headers_check.pl
  kbuild: Rename Makefile.extrawarn to Makefile.warn
  MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
  kbuild: uapi: reuse KBUILD_USERCFLAGS
  kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
  kbuild: Use objtree for module signing key path
  btrfs: send: make use of -fms-extensions for defining struct fs_path
2025-12-03 14:42:21 -08:00
Linus Torvalds 2b09f480f0 A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which are
   caused by the related overhead. It turned out that the decision to invoke
   the exit to user work was not really a decision. More or less each
   context switch caused that. There is a long list of small issues which
   sums up nicely and results in a 3-4% regression in I/O benchmarks.
 
   The other detail which caused issues due to extra work in context switch
   and task migration is the CID (memory context ID) management. It also
   requires to use a task work to consolidate the CID space, which is
   executed in the context of an arbitrary task and results in sporadic
   uncontrolled exit latencies.
 
   The rewrite addresses this by:
 
   - Removing deprecated and long unsupported functionality
 
   - Moving the related data into dedicated data structures which are
     optimized for fast path processing.
 
   - Caching values so actual decisions can be made
 
   - Replacing the current implementation with a optimized inlined variant.
 
   - Separating fast and slow path for architectures which use the generic
     entry code, so that only fault and error handling goes into the
     TIF_NOTIFY_RESUME handler.
 
   - Rewriting the CID management so that it becomes mostly invisible in the
     context switch path. That moves the work of switching modes into the
     fork/exit path, which is a reasonable tradeoff. That work is only
     required when a process creates more threads than the cpuset it is
     allowed to run on or when enough threads exit after that. An artificial
     thread pool benchmarks which triggers this did not degrade, it actually
     improved significantly.
 
     The main effect in migration heavy scenarios is that runqueue lock held
     time and therefore contention goes down significantly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksaRYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoencEADA5he8PAFPmSRRPo6+2G5mHzWe8kIU
 5ZViQStWFNAA0qqy8VXryWiJ6qqrO6la9o7K4YOXASUtlkVjquRp1DF7PabqGwuy
 zshbRCXNlT51J8uqanN8VrGVjlf+bMdHDbGoI1SLkUTxG8b+kDD5PXUQE1ARelPP
 Slbg9u+EMrxj6D5MDTPbuW6TqryJEkPtiNScyOz43emp9ww9+WVxenOcRqU4D+Th
 mjWmrGIzkroSf4XReMoD/wg9TPTpUjXnNCwl2viY9JvBpkMfYtU4tJAGK3aNFOWy
 zsAN0O9CaFGrUEFne7qUmtwhNLdtnjx5HN5pe7yZd1EhdTuQKq4jPiiQnwwm8w72
 c0o6m45FNPmPoSyfaZWCkLjbTEUXonT9JF61iN35JVxim8gBDDJjHFKnLxDmLrH3
 X0eESE48ReY2EneDV6Y8RJRo6oG14Fccvc39aTf/2Rw3trpmtt2agvConQzupQIg
 DzANw4jhUUzFRrHrMHACNsqKFXh9ratue/S9DM3xxTpGO/bKdeK7jGIgzNf8O34M
 J0O6Hvk5jMdcWlIJTx21GoGzoSkkXnR49g/71aCcp+MwdY4x9zFz5SWi8LWQRmkx
 xRo6tY27Bma8/SEwMJjPpAUXDTpq6v+j3cPisybL1yGsyt9lh+p8LX7VUtwcoEqe
 6ZelC5Kgw/+/kg==
 =n5KT
 -----END PGP SIGNATURE-----

Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull rseq updates from Thomas Gleixner:
 "A large overhaul of the restartable sequences and CID management:

  The recent enablement of RSEQ in glibc resulted in regressions which
  are caused by the related overhead. It turned out that the decision to
  invoke the exit to user work was not really a decision. More or less
  each context switch caused that. There is a long list of small issues
  which sums up nicely and results in a 3-4% regression in I/O
  benchmarks.

  The other detail which caused issues due to extra work in context
  switch and task migration is the CID (memory context ID) management.
  It also requires to use a task work to consolidate the CID space,
  which is executed in the context of an arbitrary task and results in
  sporadic uncontrolled exit latencies.

  The rewrite addresses this by:

   - Removing deprecated and long unsupported functionality

   - Moving the related data into dedicated data structures which are
     optimized for fast path processing.

   - Caching values so actual decisions can be made

   - Replacing the current implementation with a optimized inlined
     variant.

   - Separating fast and slow path for architectures which use the
     generic entry code, so that only fault and error handling goes into
     the TIF_NOTIFY_RESUME handler.

   - Rewriting the CID management so that it becomes mostly invisible in
     the context switch path. That moves the work of switching modes
     into the fork/exit path, which is a reasonable tradeoff. That work
     is only required when a process creates more threads than the
     cpuset it is allowed to run on or when enough threads exit after
     that. An artificial thread pool benchmarks which triggers this did
     not degrade, it actually improved significantly.

     The main effect in migration heavy scenarios is that runqueue lock
     held time and therefore contention goes down significantly"

* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/mmcid: Switch over to the new mechanism
  sched/mmcid: Implement deferred mode change
  irqwork: Move data struct to a types header
  sched/mmcid: Provide CID ownership mode fixup functions
  sched/mmcid: Provide new scheduler CID mechanism
  sched/mmcid: Introduce per task/CPU ownership infrastructure
  sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
  sched/mmcid: Provide precomputed maximal value
  sched/mmcid: Move initialization out of line
  signal: Move MMCID exit out of sighand lock
  sched/mmcid: Convert mm CID mask to a bitmap
  cpumask: Cache num_possible_cpus()
  sched/mmcid: Use cpumask_weighted_or()
  cpumask: Introduce cpumask_weighted_or()
  sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
  sched/mmcid: Move scheduler code out of global header
  sched: Fixup whitespace damage
  sched/mmcid: Cacheline align MM CID storage
  sched/mmcid: Use proper data structures
  sched/mmcid: Revert the complex CID management
  ...
2025-12-02 08:48:53 -08:00
Linus Torvalds 1d18101a64 kernel-6.19-rc1.cred
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 orJLAP9UD+dX6cicJDkzFZowDakmoIQkR5ZSDwChSlmvLcmquwEAlSq4svVd9Bdl
 7kOFUk71DqhVHrPAwO7ap0BxehokEAA=
 =Cli6
 -----END PGP SIGNATURE-----

Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull cred guard updates from Christian Brauner:
 "This contains substantial credential infrastructure improvements
  adding guard-based credential management that simplifies code and
  eliminates manual reference counting in many subsystems.

  Features:

   - Kernel Credential Guards

     Add with_kernel_creds() and scoped_with_kernel_creds() guards that
     allow using the kernel credentials without allocating and copying
     them. This was requested by Linus after seeing repeated
     prepare_kernel_creds() calls that duplicate the kernel credentials
     only to drop them again later.

     The new guards completely avoid the allocation and never expose the
     temporary variable to hold the kernel credentials anywhere in
     callers.

   - Generic Credential Guards

     Add scoped_with_creds() guards for the common override_creds() and
     revert_creds() pattern. This builds on earlier work that made
     override_creds()/revert_creds() completely reference count free.

   - Prepare Credential Guards

     Add prepare credential guards for the more complex pattern of
     preparing a new set of credentials and overriding the current
     credentials with them:
      - prepare_creds()
      - modify new creds
      - override_creds()
      - revert_creds()
      - put_cred()

  Cleanups:

   - Make init_cred static since it should not be directly accessed

   - Add kernel_cred() helper to properly access the kernel credentials

   - Fix scoped_class() macro that was introduced two cycles ago

   - coredump: split out do_coredump() from vfs_coredump() for cleaner
     credential handling

   - coredump: move revert_cred() before coredump_cleanup()

   - coredump: mark struct mm_struct as const

   - coredump: pass struct linux_binfmt as const

   - sev-dev: use guard for path"

* tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits)
  trace: use override credential guard
  trace: use prepare credential guard
  coredump: use override credential guard
  coredump: use prepare credential guard
  coredump: split out do_coredump() from vfs_coredump()
  coredump: mark struct mm_struct as const
  coredump: pass struct linux_binfmt as const
  coredump: move revert_cred() before coredump_cleanup()
  sev-dev: use override credential guards
  sev-dev: use prepare credential guard
  sev-dev: use guard for path
  cred: add prepare credential guard
  net/dns_resolver: use credential guards in dns_query()
  cgroup: use credential guards in cgroup_attach_permissions()
  act: use credential guards in acct_write_process()
  smb: use credential guards in cifs_get_spnego_key()
  nfs: use credential guards in nfs_idmap_get_key()
  nfs: use credential guards in nfs_local_call_write()
  nfs: use credential guards in nfs_local_call_read()
  erofs: use credential guards
  ...
2025-12-01 13:45:41 -08:00
Linus Torvalds 415d34b92c namespace-6.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaSmOZQAKCRCRxhvAZXjc
 ooKwAP4kR5kMjHlthf8jHmmCjVU3nQFO9hUZsIQL9gFJLOIQMAD+LLoTaq1WJufl
 oSgZpREXZVmI1TK61eR6EZMB1YikGAo=
 =TExi
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains substantial namespace infrastructure changes including a new
  system call, active reference counting, and extensive header cleanups.
  The branch depends on the shared kbuild branch for -fms-extensions support.

  Features:

   - listns() system call

     Add a new listns() system call that allows userspace to iterate
     through namespaces in the system. This provides a programmatic
     interface to discover and inspect namespaces, addressing
     longstanding limitations:

     Currently, there is no direct way for userspace to enumerate
     namespaces. Applications must resort to scanning /proc/*/ns/ across
     all processes, which is:
      - Inefficient - requires iterating over all processes
      - Incomplete - misses namespaces not attached to any running
        process but kept alive by file descriptors, bind mounts, or
        parent references
      - Permission-heavy - requires access to /proc for many processes
      - No ordering or ownership information
      - No filtering per namespace type

     The listns() system call solves these problems:

       ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
                      size_t nr_ns_ids, unsigned int flags);

       struct ns_id_req {
             __u32 size;
             __u32 spare;
             __u64 ns_id;
             struct /* listns */ {
                     __u32 ns_type;
                     __u32 spare2;
                     __u64 user_ns_id;
             };
       };

     Features include:
      - Pagination support for large namespace sets
      - Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
      - Filtering by owning user namespace
      - Permission checks respecting namespace isolation

   - Active Reference Counting

     Introduce an active reference count that tracks namespace
     visibility to userspace. A namespace is visible in the following
     cases:
      - The namespace is in use by a task
      - The namespace is persisted through a VFS object (namespace file
        descriptor or bind-mount)
      - The namespace is a hierarchical type and is the parent of child
        namespaces

     The active reference count does not regulate lifetime (that's still
     done by the normal reference count) - it only regulates visibility
     to namespace file handles and listns().

     This prevents resurrection of namespaces that are pinned only for
     internal kernel reasons (e.g., user namespaces held by
     file->f_cred, lazy TLB references on idle CPUs, etc.) which should
     not be accessible via (1)-(3).

   - Unified Namespace Tree

     Introduce a unified tree structure for all namespaces with:
      - Fixed IDs assigned to initial namespaces
      - Lookup based solely on inode number
      - Maintained list of owned namespaces per user namespace
      - Simplified rbtree comparison helpers

   Cleanups

    - Header Reorganization:
      - Move namespace types into separate header (ns_common_types.h)
      - Decouple nstree from ns_common header
      - Move nstree types into separate header
      - Switch to new ns_tree_{node,root} structures with helper functions
      - Use guards for ns_tree_lock

   - Initial Namespace Reference Count Optimization
      - Make all reference counts on initial namespaces a nop to avoid
        pointless cacheline ping-pong for namespaces that can never go
        away
      - Drop custom reference count initialization for initial namespaces
      - Add NS_COMMON_INIT() macro and use it for all namespaces
      - pid: rely on common reference count behavior

   - Miscellaneous Cleanups
      - Rename exit_task_namespaces() to exit_nsproxy_namespaces()
      - Rename is_initial_namespace() and make argument const
      - Use boolean to indicate anonymous mount namespace
      - Simplify owner list iteration in nstree
      - nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
      - nsfs: use inode_just_drop()
      - pidfs: raise DCACHE_DONTCACHE explicitly
      - pidfs: simplify PIDFD_GET__NAMESPACE ioctls
      - libfs: allow to specify s_d_flags
      - cgroup: add cgroup namespace to tree after owner is set
      - nsproxy: fix free_nsproxy() and simplify create_new_namespaces()

  Fixes:

   - setns(pidfd, ...) race condition

     Fix a subtle race when using pidfds with setns(). When the target
     task exits after prepare_nsset() but before commit_nsset(), the
     namespace's active reference count might have been dropped. If
     setns() then installs the namespaces, it would bump the active
     reference count from zero without taking the required reference on
     the owner namespace, leading to underflow when later decremented.

     The fix resurrects the ownership chain if necessary - if the caller
     succeeded in grabbing passive references, the setns() should
     succeed even if the target task exits or gets reaped.

   - Return EFAULT on put_user() error instead of success

   - Make sure references are dropped outside of RCU lock (some
     namespaces like mount namespace sleep when putting the last
     reference)

   - Don't skip active reference count initialization for network
     namespace

   - Add asserts for active refcount underflow

   - Add asserts for initial namespace reference counts (both passive
     and active)

   - ipc: enable is_ns_init_id() assertions

   - Fix kernel-doc comments for internal nstree functions

   - Selftests
      - 15 active reference count tests
      - 9 listns() functionality tests
      - 7 listns() permission tests
      - 12 inactive namespace resurrection tests
      - 3 threaded active reference count tests
      - commit_creds() active reference tests
      - Pagination and stress tests
      - EFAULT handling test
      - nsid tests fixes"

* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
  pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
  nstree: fix kernel-doc comments for internal functions
  nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
  selftests/namespaces: fix nsid tests
  ns: drop custom reference count initialization for initial namespaces
  pid: rely on common reference count behavior
  ns: add asserts for initial namespace active reference counts
  ns: add asserts for initial namespace reference counts
  ns: make all reference counts on initial namespace a nop
  ipc: enable is_ns_init_id() assertions
  fs: use boolean to indicate anonymous mount namespace
  ns: rename is_initial_namespace()
  ns: make is_initial_namespace() argument const
  nstree: use guards for ns_tree_lock
  nstree: simplify owner list iteration
  nstree: switch to new structures
  nstree: add helper to operate on struct ns_tree_{node,root}
  nstree: move nstree types into separate header
  nstree: decouple from ns_common header
  ns: move namespace types into separate header
  ...
2025-12-01 09:47:41 -08:00
Thomas Gleixner 8cea569ca7 sched/mmcid: Use proper data structures
Having a lot of CID functionality specific members in struct task_struct
and struct mm_struct is not really making the code easier to read.

Encapsulate the CID specific parts in data structures and keep them
separate from the stuff they are embedded in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de
2025-11-20 12:14:52 +01:00
Al Viro 2313598222 convert ramfs and tmpfs
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.

With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().

The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.

NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:02 -05:00
Thomas Weißschuh deab487e0f
kbuild: allow architectures to override CC_CAN_LINK
The generic test for CC_CAN_LINK assumes that all architectures use -m32
and -m64 to switch between 32-bit and 64-bit compilation. This is overly
simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may
also require byte order handling (-mlittle-endian, -EL). Expressing all
of the different possibilities will be very complicated and brittle.
Instead allow architectures to supply their own logic which will be
easy to understand and evolve.

Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need
to be implemented as kconfig does not allow the reuse of string options.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:35 +01:00
Thomas Weißschuh 80623f2c83
init: deduplicate cc-can-link.sh invocations
The command to invoke scripts/cc-can-link.sh is very long and new usages
are about to be added.

Add a helper variable to make the code easier to read and maintain.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
2025-11-14 20:20:34 +01:00
Alexandre Courbot 88622323dd rust: enable slice_flatten feature and provide it through an extension trait
In Rust 1.80, the previously unstable `slice::flatten` family of methods
have been stabilized and renamed to `slice::as_flattened`.

This creates an issue as we want to use `as_flattened`, but need to
support the MSRV (which at the moment is Rust 1.78) where it is named
`flatten`.

Solve this by enabling the `slice_flatten` feature, and providing an
`as_flattened` implementation through an extension trait for compiler
versions where it is not available.

The trait is then exported from the prelude, making the `as_flattened`
family of methods transparently available for all supported compiler
versions.

This extension trait can be removed once the MSRV passes 1.80.

Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/all/CANiq72kK4pG=O35NwxPNoTO17oRcg1yfGcvr3==Fi4edr+sfmw@mail.gmail.com/
Acked-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Message-ID: <20251110-gsp_boot-v9-8-8ae4058e3c0e@nvidia.com>
Message-ID: <20251104-b4-as-flattened-v3-1-6cb9c26b45cd@nvidia.com>
2025-11-14 20:25:57 +09:00
Christian Brauner c2bbd2db52
ns: drop custom reference count initialization for initial namespaces
Initial namespaces don't modify their reference count anymore.
They remain fixed at one so drop the custom refcount initializations.

Link: https://patch.msgid.link/20251110-work-namespace-nstree-fixes-v1-16-e8a9264e0fb9@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-11 10:01:32 +01:00
Kaushlendra Kumar a6446829f8
init: Replace simple_strtoul() with kstrtouint() in root_delay_setup()
Replace deprecated simple_strtoul() with kstrtouint() for better error
handling and input validation. Return 0 on parsing failure to indicate
invalid parameter, maintaining existing behavior for valid inputs.

The simple_strtoul() function is deprecated in favor of kstrtoint()
family functions which provide better error handling and are recommended
for new code and replacements.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Link: https://patch.msgid.link/20251103080627.1844645-1-kaushlendra.kumar@intel.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-05 12:49:38 +01:00
Christian Brauner 40314c2818
cred: make init_cred static
There's zero need to expose struct init_cred. The very few places that
need access can just go through init_task which is already exported.

Link: https://patch.msgid.link/20251103-work-creds-init_cred-v1-3-cb3ec8711a6a@kernel.org
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04 12:36:02 +01:00
Thomas Gleixner 3db6b38dfe rseq: Switch to fast path processing on exit to user
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.

This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.

This results in the following improvements:

  Kernel build	       Before		  After		      Reduction

  exit to user         80692981		  80514451
  signal checks:          32581		       121	       99%
  slowpath runs:        1201408   1.49%	       198 0.00%      100%
  fastpath runs:			    675941 0.84%       N/A
  id updates:           1233989   1.53%	     50541 0.06%       96%
  cs checks:            1125366   1.39%	         0 0.00%      100%
    cs cleared:         1125366      100%	 0            100%
    cs fixup:                 0        0%	 0

  RSEQ selftests      Before		  After		      Reduction

  exit to user:       386281778		  387373750
  signal checks:       35661203		          0           100%
  slowpath runs:      140542396 36.38%	        100  0.00%    100%
  fastpath runs:			    9509789  2.51%     N/A
  id updates:         176203599 45.62%	    9087994  2.35%     95%
  cs checks:          175587856 45.46%	    4728394  1.22%     98%
    cs cleared:       172359544   98.16%    1319307   27.90%   99%
    cs fixup:           3228312    1.84%    3409087   72.10%

The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to
user invocations, they are relative to the actual 'cs check' invocations.

While some of this could have been avoided in the original code, like the
obvious clearing of CS when it's already clear, the main problem of going
through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
notify handler is invoked more than once before going out to user
space. Doing this once when everything has stabilized is the only solution
to avoid this.

The initial attempt to completely decouple it from the TIF work turned out
to be suboptimal for workloads, which do a lot of quick and short system
calls. Even if the fast path decision is only 4 instructions (including a
conditional branch), this adds up quickly and becomes measurable when the
rate for actually having to handle rseq is in the low single digit
percentage range of user/kernel transitions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de
2025-11-04 08:34:39 +01:00
Thomas Gleixner 9c37cb6e80 rseq: Provide static branch for runtime debugging
Config based debug is rarely turned on and is not available easily when
things go wrong.

Provide a static branch to allow permanent integration of debug mechanisms
along with the usual toggles in Kconfig, command line and debugfs.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de
2025-11-04 08:32:49 +01:00
Thomas Gleixner 5412910487 rseq: Expose lightweight statistics in debugfs
Analyzing the call frequency without actually using tracing is helpful for
analysis of this infrastructure. The overhead is minimal as it just
increments a per CPU counter associated to each operation.

The debugfs readout provides a racy sum of all counters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de
2025-11-04 08:32:41 +01:00
Christian Brauner 0b1765830c
ns: use NS_COMMON_INIT() for all namespaces
Now that we have a common initializer use it for all static namespaces.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-03 17:41:16 +01:00
Thorsten Blum b2c43efc3c
initrd: Replace simple_strtol with kstrtoint to improve ramdisk_start_setup
Replace simple_strtol() with the recommended kstrtoint() for parsing the
'ramdisk_start=' boot parameter. Unlike simple_strtol(), which returns a
a long, kstrtoint() converts the string directly to an integer and
avoids implicit casting.

Check the return value of kstrtoint() and reject invalid values. This
adds error handling while preserving existing behavior for valid values,
and removes use of the deprecated simple_strtol() helper.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-31 10:12:32 +01:00
Linus Torvalds 48e3694ae7 printk changes for 6.18
-----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmjeOX8bFIAAAAAABAAO
 bWFudTIsMi41KzEuMTEsMiwyAAoJEFKgDEdIgJTyM7MP/0mD0oJa/8DRiZ0r1qLM
 xt/mCZIXhbqfJdCIaLEpfL35qPI59PWvrECGxwzL7CXOHlORxoCW98LoUDkigRKW
 e93mv8xSBaac11QrMSy32cEMSpzXmcNjz5u/02AUveXpxERr82nLgGwKY8eF0eAy
 7wF+N4bGHPShEy2J7kVFVrgZompFQLDgPB3/GycSdKsZd7ss9XiKa/EWvPJbEGOD
 CNC0MlAd5QDxVXxhioC9g9tZaUENZrqxys1vfROFx8RlaobRa2Vt/TEJQ5WCzmKo
 JO8JdojD03h0U2DdFE/lgTjf/lz8hTiaRaTifNnhUofqvzFKupzcv9L2xq/zmpfC
 4dMW6YOObD/FVeNVsgVC16/1WDKl+XfyogAmtTBjSNoPd6WoSczZsfubxGc30lW4
 AqJ2OpPa+CXq2elL+Qb0dLVekMhNytjfYdyFK/FP0DXrkoV4FMHdfNLW58TR7Pcq
 iN1MNEkEs4dWSbn2xJzKlFQyVq2/t6A6l5N/kodZ1xLWiwFYzeA3FTOVzrOlXnde
 IAYh3iH2WoTiHjZB/oomhnXKCMeTIvMYsGQQl4y+XqRHIwpEjSjcm8/M3hvlMyJE
 m0D6cpW8IRLoZ0raxqJPVpXR3WZsLJm2InGMrfY/tPWTU7L5uxmF51GPXYKdUmpZ
 NUgzh2cOwSWWU4UHt/AiD9cd
 =uFn0
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Add KUnit test for the printk ring buffer

 - Fix the check of the maximal record size which is allowed to be
   stored into the printk ring buffer. It prevents corruptions of the
   ring buffer.

   Note that printk() is on the safe side. The messages are limited by
   1kB buffer and are always small enough for the minimal log buffer
   size 4kB, see CONFIG_LOG_BUF_SHIFT definition.

* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: ringbuffer: Fix data block max size check
  printk: kunit: support offstack cpumask
  printk: kunit: Fix __counted_by() in struct prbtest_rbdata
  printk: ringbuffer: Explain why the KUnit test ignores failed writes
  printk: ringbuffer: Add KUnit test
2025-10-04 11:13:11 -07:00
Linus Torvalds e406d57be7 Patch series in this pull request:
- The 3 patch series "ida: Remove the ida_simple_xxx() API" from
   Christophe Jaillet completes the removal of this legacy IDR API.
 
 - The 9 patch series "panic: introduce panic status function family"
   from Jinchao Wang provides a number of cleanups to the panic code and
   its various helpers, which were rather ad-hoc and scattered all over the
   place.
 
 - The 5 patch series "tools/delaytop: implement real-time keyboard
   interaction support" from Fan Yu adds a few nice user-facing usability
   changes to the delaytop monitoring tool.
 
 - The 3 patch series "efi: Fix EFI boot with kexec handover (KHO)" from
   Evangelos Petrongonas fixes a panic which was happening with the
   combination of EFI and KHO.
 
 - The 2 patch series "Squashfs: performance improvement and a sanity
   check" from Phillip Lougher teaches squashfs's lseek() about
   SEEK_DATA/SEEK_HOLE.  A mere 150x speedup was measured for a well-chosen
   microbenchmark.
 
 - Plus another 50-odd singleton patches all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaN78zwAKCRDdBJ7gKXxA
 jhLeAQCddTv0XtSUTrvBvmrJVUBrQQeJc+LtNopMIjfAF/WAWAEAogSVKxg+HHEB
 GaVixx4zDriNzEqrqiCx9rm4l+YooQA=
 =XRe0
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
   completes the removal of this legacy IDR API

 - "panic: introduce panic status function family" from Jinchao Wang
   provides a number of cleanups to the panic code and its various
   helpers, which were rather ad-hoc and scattered all over the place

 - "tools/delaytop: implement real-time keyboard interaction support"
   from Fan Yu adds a few nice user-facing usability changes to the
   delaytop monitoring tool

 - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
   Petrongonas fixes a panic which was happening with the combination of
   EFI and KHO

 - "Squashfs: performance improvement and a sanity check" from Phillip
   Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
   150x speedup was measured for a well-chosen microbenchmark

 - plus another 50-odd singleton patches all over the place

* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
  Squashfs: reject negative file sizes in squashfs_read_inode()
  kallsyms: use kmalloc_array() instead of kmalloc()
  MAINTAINERS: update Sibi Sankar's email address
  Squashfs: add SEEK_DATA/SEEK_HOLE support
  Squashfs: add additional inode sanity checking
  lib/genalloc: fix device leak in of_gen_pool_get()
  panic: remove CONFIG_PANIC_ON_OOPS_VALUE
  ocfs2: fix double free in user_cluster_connect()
  checkpatch: suppress strscpy warnings for userspace tools
  cramfs: fix incorrect physical page address calculation
  kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
  Squashfs: fix uninit-value in squashfs_get_parent
  kho: only fill kimage if KHO is finalized
  ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
  kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
  sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
  coccinelle: platform_no_drv_owner: handle also built-in drivers
  coccinelle: of_table: handle SPI device ID tables
  lib/decompress: use designated initializers for struct compress_format
  efi: support booting with kexec handover (KHO)
  ...
2025-10-02 18:44:54 -07:00
Petr Mladek 7a75a5da79 Merge branch 'rework/ringbuffer-kunit-test' into for-linus 2025-10-02 10:33:08 +02:00
Linus Torvalds 7f70725741 Kbuild updates for 6.18
- Extend modules.builtin.modinfo to include module aliases from
   MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such
   as kmod) can verify that a particular module alias will be handled by
   a builtin module.
 
 - Bump the minimum version of LLVM for building the kernel to 15.0.0.
 
 - Upgrade several userspace API checks in headers_check.pl to errors.
 
 - Unify and consolidate CONFIG_WERROR / W=e handling.
 
 - Turn assembler and linker warnings into errors with CONFIG_WERROR /
   W=e.
 
 - Respect CONFIG_WERROR / W=e when building userspace programs
   (userprogs).
 
 - Enable -Werror unconditionally when building host programs
   (hostprogs).
 
 - Support copy_file_range() and data segment alignment in gen_init_cpio
   to improve performance on filesystems that support reflinks such as
   btrfs and XFS.
 
 - Miscellaneous small changes to scripts and configuration files.
 
 Signed-off-by: Nathan Chancellor <nathan@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaNrp6QAKCRAdayaRccAa
 ljxRAP4hYocKXeWsiJzkTB199P4QUGWf220a9elBmtdJEed07gD/VBnCbSOxG3RO
 vS8qbJHwxUFL7a+mDV8RIVXSt99NpAg=
 =psG/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild updates from Nathan Chancellor:

 - Extend modules.builtin.modinfo to include module aliases from
   MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such
   as kmod) can verify that a particular module alias will be handled by
   a builtin module

 - Bump the minimum version of LLVM for building the kernel to 15.0.0

 - Upgrade several userspace API checks in headers_check.pl to errors

 - Unify and consolidate CONFIG_WERROR / W=e handling

 - Turn assembler and linker warnings into errors with CONFIG_WERROR /
   W=e

 - Respect CONFIG_WERROR / W=e when building userspace programs
   (userprogs)

 - Enable -Werror unconditionally when building host programs
   (hostprogs)

 - Support copy_file_range() and data segment alignment in gen_init_cpio
   to improve performance on filesystems that support reflinks such as
   btrfs and XFS

 - Miscellaneous small changes to scripts and configuration files

* tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits)
  modpost: Initialize builtin_modname to stop SIGSEGVs
  Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds
  kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o
  modpost: Create modalias for builtin modules
  modpost: Add modname to mod_device_table alias
  scsi: Always define blogic_pci_tbl structure
  kbuild: extract modules.builtin.modinfo from vmlinux.unstripped
  kbuild: keep .modinfo section in vmlinux.unstripped
  kbuild: always create intermediate vmlinux.unstripped
  s390: vmlinux.lds.S: Reorder sections
  KMSAN: Remove tautological checks
  objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY
  lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
  riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs
  riscv: Unconditionally use linker relaxation
  riscv: Remove version check for LTO_CLANG selects
  powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault()
  mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER
  arm64: Remove tautological LLVM Kconfig conditions
  ARM: Clean up definition of ARM_HAS_GROUP_RELOCS
  ...
2025-10-01 20:58:51 -07:00
Linus Torvalds 4b81e2eb9e Updates for the VDSO subsystem:
- Further consolidation of the VDSO infrastructure and the common data
     store.
 
   - Simplification of the related Kconfig logic
 
   - Improve the VDSO selftest suite
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmjaUJgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoT5VD/9l+TP42vJzwygPArefVM9QijAc4k+g
 iptaz5M+Lhk0BlwMUu502Tp1zcwGq5529LlE2P72DuYoZRtZKSFCGES2mye0JiAs
 A5Vk8d1zicg3IIGjYgw39shA9ghzuzV3gZdgHVqQ6024uSeKxUY3SqF156GGfPFG
 azXWDV7i4RGQvKLm2ye8m7JNYCE3P9f8Wh+GvPaq7qBlW1MsMCWC4FP5qu4iStf3
 mol9uA/EM2zSDwkpIpq6P5L2TpIH1dGDNGxVf/HG3J16UhNXzzXv/rxuKHKPNGWm
 MQZ8WdPjF+uwdUsNXaImC6aCE0hoQ1Ga7GE2YA3pMQYfySeiJ/fA5I1KfhlUCiHD
 oUgnb6PQfZjaBDrIODKcrFHAYc5pTOyJXnJ6q1Lc1yrOfoVcIpMaRXHO/phCeuwI
 61gfb9ggbFEH/7cL26Z9XGSqh4BDCKVP5Z+y98OnNEzV74ScpX13/szB3VgdN6LW
 tmTOeMPYCo4FQoPw8OV1xkeYOKD2JyqH+garnxDZNFjaVaeUkzlAMKz+1JUjxRZj
 UpUhZ8y5KFVjgKkqZHxhucr9cnwV/xd4n+IzZeG4l6XZp4VyirG+5QpAikGYe+Jq
 WFU4Ao1xvevyizrJEhoGJKznwtrNrsy6AQ+wLQxYiRR4aTnng8n9T/ry2DVvhJf8
 diYWxuRTR7KMxg==
 =i3UB
 -----END PGP SIGNATURE-----

Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull VDSO updates from Thomas Gleixner:

 - Further consolidation of the VDSO infrastructure and the common data
   store

 - Simplification of the related Kconfig logic

 - Improve the VDSO selftest suite

* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: vDSO: Drop vdso_test_clock_getres
  selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
  selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
  selftests: vDSO: vdso_test_abi: Use explicit indices for name array
  selftests: vDSO: vdso_test_abi: Drop clock availability tests
  selftests: vDSO: vdso_test_abi: Use ksft_finished()
  selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
  selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
  vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
  vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
  vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
  vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
  vdso: Drop kconfig GENERIC_COMPAT_VDSO
  vdso: Drop kconfig GENERIC_VDSO_32
  riscv: vdso: Untangle Kconfig logic
  time: Build generic update_vsyscall() only with generic time vDSO
  vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
  vdso: Move ENABLE_COMPAT_VDSO from core to arm64
  ARM: VDSO: Remove cntvct_ok global variable
  vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30 16:58:21 -07:00
Linus Torvalds 755fa5b4fb cgroup: Changes for v6.18
- Extensive cpuset code cleanup and refactoring work with no functional
   changes: CPU mask computation logic refactoring, introducing new helpers,
   removing redundant code paths, and improving error handling for better
   maintainability.
 
 - A few bug fixes to cpuset including fixes for partition creation failures
   when isolcpus is in use, missing error returns, and null pointer access
   prevention in free_tmpmasks().
 
 - Core cgroup changes include replacing the global percpu_rwsem with
   per-threadgroup rwsem when writing to cgroup.procs for better scalability,
   workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for
   workqueue default switching from percpu to unbound, and removal of unused
   code including the post_attach callback.
 
 - New cgroup.stat.local time accounting feature that tracks frozen time
   duration.
 
 - Misc changes including selftests updates (new freezer time tests and
   backward compatibility fixes), documentation sync, string function safety
   improvements, and 64-bit division fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaNb1Sg4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGfLMAPwKwkvUg9DPJEuECRfM9woOOHyIWLp1DwUhpg1v
 Zq0lkAEAmo/+IkJXGZ7TGF+wzSj7GFIugrILu3upzLCHzgYoDgs=
 =39KF
 -----END PGP SIGNATURE-----

Merge tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup updates from Tejun Heo:

 - Extensive cpuset code cleanup and refactoring work with no functional
   changes: CPU mask computation logic refactoring, introducing new
   helpers, removing redundant code paths, and improving error handling
   for better maintainability.

 - A few bug fixes to cpuset including fixes for partition creation
   failures when isolcpus is in use, missing error returns, and null
   pointer access prevention in free_tmpmasks().

 - Core cgroup changes include replacing the global percpu_rwsem with
   per-threadgroup rwsem when writing to cgroup.procs for better
   scalability, workqueue conversions to use WQ_PERCPU and
   system_percpu_wq to prepare for workqueue default switching from
   percpu to unbound, and removal of unused code including the
   post_attach callback.

 - New cgroup.stat.local time accounting feature that tracks frozen time
   duration.

 - Misc changes including selftests updates (new freezer time tests and
   backward compatibility fixes), documentation sync, string function
   safety improvements, and 64-bit division fixes.

* tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits)
  cpuset: remove is_prs_invalid helper
  cpuset: remove impossible warning in update_parent_effective_cpumask
  cpuset: remove redundant special case for null input in node mask update
  cpuset: fix missing error return in update_cpumask
  cpuset: Use new excpus for nocpu error check when enabling root partition
  cpuset: fix failure to enable isolated partition when containing isolcpus
  Documentation: cgroup-v2: Sync manual toctree
  cpuset: use partition_cpus_change for setting exclusive cpus
  cpuset: use parse_cpulist for setting cpus.exclusive
  cpuset: introduce partition_cpus_change
  cpuset: refactor cpus_allowed_validate_change
  cpuset: refactor out validate_partition
  cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers
  cpuset: refactor CPU mask buffer parsing logic
  cpuset: Refactor exclusive CPU mask computation logic
  cpuset: change return type of is_partition_[in]valid to bool
  cpuset: remove unused assignment to trialcs->partition_root_state
  cpuset: move the root cpuset write check earlier
  cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock
  cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock
  ...
2025-09-30 09:55:41 -07:00
Linus Torvalds 9cc220a422 s390 updates for 6.18 merge window
- Refactor SCLP memory hotplug code
 
 - Introduce common boot_panic() decompressor helper macro and
   use it to get rid of nearly few identical implementations
 
 - Take into account additional key generation flags and forward
   it to the ep11 implementation. With that allow users to modify
   the key generation process, e.g. provide valid combinations of
   XCP_BLOB_* flags
 
 - Replace kmalloc() + copy_from_user() with memdup_user_nul()
   in s390 debug facility and HMC driver
 
 - Add DAX support for DCSS memory block devices
 
 - Make the compiler statement attribute "assume" available with
   a new __assume macro
 
 - Rework ffs() and fls() family bitops functions, including
   source code improvements and generated code optimizations.
   Use the newly introduced __assume macro for that
 
 - Enable additional network features in default configurations
 
 - Use __GFP_ACCOUNT flag for user page table allocations to add
   missing kmemcg accounting
 
 - Add WQ_PERCPU flag to explicitly request the use of the per-CPU
   workqueue for 3590 tape driver
 
 - Switch power reading to the per-CPU and the Hiperdispatch to the
   default workqueue
 
 - Add memory allocation profiling hooks to allow better profiling
   data and the /proc/allocinfo output similar to other architectures
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYKADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCaNpOnhccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8LJ0AP98TkWDCXLb02dNTST36dtoNaM+
 I9HoosjbZIm8oHwhngD+JisTRWFogplXnE2z+JQrJJcshWvUpFDVtkk2pCSeOQM=
 =Cztn
 -----END PGP SIGNATURE-----

Merge tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Alexander Gordeev:

 - Refactor SCLP memory hotplug code

 - Introduce common boot_panic() decompressor helper macro and use it to
   get rid of nearly few identical implementations

 - Take into account additional key generation flags and forward it to
   the ep11 implementation. With that allow users to modify the key
   generation process, e.g. provide valid combinations of XCP_BLOB_*
   flags

 - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
   debug facility and HMC driver

 - Add DAX support for DCSS memory block devices

 - Make the compiler statement attribute "assume" available with a new
   __assume macro

 - Rework ffs() and fls() family bitops functions, including source code
   improvements and generated code optimizations. Use the newly
   introduced __assume macro for that

 - Enable additional network features in default configurations

 - Use __GFP_ACCOUNT flag for user page table allocations to add missing
   kmemcg accounting

 - Add WQ_PERCPU flag to explicitly request the use of the per-CPU
   workqueue for 3590 tape driver

 - Switch power reading to the per-CPU and the Hiperdispatch to the
   default workqueue

 - Add memory allocation profiling hooks to allow better profiling data
   and the /proc/allocinfo output similar to other architectures

* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
  s390/mm: Add memory allocation profiling hooks
  s390: Replace use of system_wq with system_dfl_wq
  s390/diag324: Replace use of system_wq with system_percpu_wq
  s390/tape: Add WQ_PERCPU to alloc_workqueue users
  s390/bitops: Switch to generic ffs() if supported by compiler
  s390/bitops: Switch to generic fls(), fls64(), etc.
  s390/mm: Use __GFP_ACCOUNT for user page table allocations
  s390/configs: Enable additional network features
  s390/bitops: Cleanup __flogr()
  s390/bitops: Use __assume() for __flogr() inline assembly return value
  compiler_types: Add __assume macro
  s390/bitops: Limit return value range of __flogr()
  s390/dcssblk: Add DAX support
  s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/pkey: Forward keygenflags to ep11_unwrapkey
  s390/boot: Add common boot_panic() code
  s390/bitops: Optimize inlining
  s390/bitops: Slightly optimize ffs() and fls64()
  s390/sclp: Move memory hotplug code for better modularity
  ...
2025-09-29 19:14:25 -07:00
Linus Torvalds a5ba183bde hardening updates for v6.18-rc1
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
 
 - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
   (Junjie Cao)
 
 - Add str_assert_deassert() helper (Lad Prabhakar)
 
 - gcc-plugins: Remove TODO_verify_il for GCC >= 16
 
 - kconfig: Fix BrokenPipeError warnings in selftests
 
 - kconfig: Add transitional symbol attribute for migration support
 
 - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaNraNQAKCRA2KwveOeQk
 u/DkAPwKPP5BSmVR2wkdpQaXIr3PGA+cbBYp34DMJNujZ9piIwD/WZ+HfGTLoERy
 +2Q6HLj9hUdd+Rx3IZ8/w1QmnhUIUAU=
 =AwV9
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
 "One notable addition is the creation of the 'transitional' keyword for
  kconfig so CONFIG renaming can go more smoothly.

  This has been a long-standing deficiency, and with the renaming of
  CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
  support), this came up again.

  The breadth of the diffstat is mainly this renaming.

   - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)

   - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
     (Junjie Cao)

   - Add str_assert_deassert() helper (Lad Prabhakar)

   - gcc-plugins: Remove TODO_verify_il for GCC >= 16

   - kconfig: Fix BrokenPipeError warnings in selftests

   - kconfig: Add transitional symbol attribute for migration support

   - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"

* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/string_choices: Add str_assert_deassert() helper
  kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
  kconfig: Add transitional symbol attribute for migration support
  kconfig: Fix BrokenPipeError warnings in selftests
  gcc-plugins: Remove TODO_verify_il for GCC >= 16
  stddef: Introduce __TRAILING_OVERLAP()
  stddef: Remove token-pasting in TRAILING_OVERLAP()
  lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29 17:48:27 -07:00
Linus Torvalds 18b19abc37 namespace-6.18-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQgQAKCRCRxhvAZXjc
 oiFXAQCpbLvkWbld9wLgxUBhq+q+kw5NvGxzpvqIhXwJB9F9YAEA44/Wevln4xGx
 +kRUbP+xlRQqenIYs2dLzVHzAwAdfQ4=
 =EO4Y
 -----END PGP SIGNATURE-----

Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull namespace updates from Christian Brauner:
 "This contains a larger set of changes around the generic namespace
  infrastructure of the kernel.

  Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
  ns_common which carries the reference count of the namespace and so
  on.

  We open-coded and cargo-culted so many quirks for each namespace type
  that it just wasn't scalable anymore. So given there's a bunch of new
  changes coming in that area I've started cleaning all of this up.

  The core change is to make it possible to correctly initialize every
  namespace uniformly and derive the correct initialization settings
  from the type of the namespace such as namespace operations, namespace
  type and so on. This leaves the new ns_common_init() function with a
  single parameter which is the specific namespace type which derives
  the correct parameters statically. This also means the compiler will
  yell as soon as someone does something remotely fishy.

  The ns_common_init() addition also allows us to remove ns_alloc_inum()
  and drops any special-casing of the initial network namespace in the
  network namespace initialization code that Linus complained about.

  Another part is reworking the reference counting. The reference
  counting was open-coded and copy-pasted for each namespace type even
  though they all followed the same rules. This also removes all open
  accesses to the reference count and makes it private and only uses a
  very small set of dedicated helpers to manipulate them just like we do
  for e.g., files.

  In addition this generalizes the mount namespace iteration
  infrastructure introduced a few cycles ago. As reminder, the vfs makes
  it possible to iterate sequentially and bidirectionally through all
  mount namespaces on the system or all mount namespaces that the caller
  holds privilege over. This allow userspace to iterate over all mounts
  in all mount namespaces using the listmount() and statmount() system
  call.

  Each mount namespace has a unique identifier for the lifetime of the
  systems that is exposed to userspace. The network namespace also has a
  unique identifier working exactly the same way. This extends the
  concept to all other namespace types.

  The new nstree type makes it possible to lookup namespaces purely by
  their identifier and to walk the namespace list sequentially and
  bidirectionally for all namespace types, allowing userspace to iterate
  through all namespaces. Looking up namespaces in the namespace tree
  works completely locklessly.

  This also means we can move the mount namespace onto the generic
  infrastructure and remove a bunch of code and members from struct
  mnt_namespace itself.

  There's a bunch of stuff coming on top of this in the future but for
  now this uses the generic namespace tree to extend a concept
  introduced first for pidfs a few cycles ago. For a while now we have
  supported pidfs file handles for pidfds. This has proven to be very
  useful.

  This extends the concept to cover namespaces as well. It is possible
  to encode and decode namespace file handles using the common
  name_to_handle_at() and open_by_handle_at() apis.

  As with pidfs file handles, namespace file handles are exhaustive,
  meaning it is not required to actually hold a reference to nsfs in
  able to decode aka open_by_handle_at() a namespace file handle.
  Instead the FD_NSFS_ROOT constant can be passed which will let the
  kernel grab a reference to the root of nsfs internally and thus decode
  the file handle.

  Namespaces file descriptors can already be derived from pidfds which
  means they aren't subject to overmount protection bugs. IOW, it's
  irrelevant if the caller would not have access to an appropriate
  /proc/<pid>/ns/ directory as they could always just derive the
  namespace based on a pidfd already.

  It has the same advantage as pidfds. It's possible to reliably and for
  the lifetime of the system refer to a namespace without pinning any
  resources and to compare them trivially.

  Permission checking is kept simple. If the caller is located in the
  namespace the file handle refers to they are able to open it otherwise
  they must hold privilege over the owning namespace of the relevant
  namespace.

  The namespace file handle layout is exposed as uapi and has a stable
  and extensible format. For now it simply contains the namespace
  identifier, the namespace type, and the inode number. The stable
  format means that userspace may construct its own namespace file
  handles without going through name_to_handle_at() as they are already
  allowed for pidfs and cgroup file handles"

* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private
  ns: add ns_debug()
  ns: simplify ns_common_init() further
  cgroup: add missing ns_common include
  ns: use inode initializer for initial namespaces
  selftests/namespaces: verify initial namespace inode numbers
  ns: rename to __ns_ref
  nsfs: port to ns_ref_*() helpers
  net: port to ns_ref_*() helpers
  uts: port to ns_ref_*() helpers
  ipv4: use check_net()
  net: use check_net()
  net-sysfs: use check_net()
  user: port to ns_ref_*() helpers
  time: port to ns_ref_*() helpers
  pid: port to ns_ref_*() helpers
  ipc: port to ns_ref_*() helpers
  cgroup: port to ns_ref_*() helpers
  ...
2025-09-29 11:20:29 -07:00
Linus Torvalds b7ce6fa90f vfs-6.18-rc1.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaNZQMQAKCRCRxhvAZXjc
 omNLAQCgrwzd9sa1JTlixweu3OAxQlSEbLuMpEv7Ztm+B7Wz0AD9HtwPC44Kev03
 GbMcB2DCFLC4evqYECj6IG7NBmoKsAs=
 =1ICf
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Linus Torvalds fde0ab43b9 Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architectures
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.

And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).

This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.

Just use an empty asm statement instead.

Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9b ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29 08:54:12 -07:00
Christian Brauner 4055526d35
ns: move ns type into struct ns_common
It's misplaced in struct proc_ns_operations and ns->ops might be NULL if
the namespace is compiled out but we still want to know the type of the
namespace for the initial namespace struct.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-25 09:23:54 +02:00
Kees Cook 23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Thomas Gleixner e2ffa15b9b kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:

     {
     	__label__ local_lbl;
	...
	unsafe_get_user(uval, uaddr, local_lbl);
	...
	return 0;
	local_lbl:
		return -EFAULT;
     }

when two such scopes exist in the same function:

  error: cannot jump from this asm goto statement to one of its possible targets

There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.

That issue prevents using local labels for a cleanup based user access
mechanism.

After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.

But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.

The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:

void bar(void **);
void* baz(void);

int  foo (void) {
    {
	    asm goto("jmp %l0"::::l0);
	    return 0;
l0:
	    return 1;
    }
    void *x __attribute__((cleanup(bar))) = baz();
    {
	    asm goto("jmp %l0"::::l1);
	    return 42;
l1:
	    return 0xff;
    }
}

Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.

That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.

Thanks to Nathan for pointing to the relevant clang issue!

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: f023f5cdb2
2025-09-24 09:29:15 +02:00
Nathan Chancellor 95ee3364b2 Linux 6.17-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmjHMcoeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5bwH/23w8iGB4hf7L/7Z
 e7blX42Pe9EXA1uK62iWmwEjDvBuJ7TmVfXH09qYJ56fj6/rJEdpQwtBMd4ypL81
 QA/7lq5UEl0apPzMN86J8EHCzmjNzv7o+UtEd4C/hPFEZHZJa5Hqj9CBglSwSCEn
 fTkLk7Gl6s8SfzBQ/rXX6/ZChAB/RleVWabDlIQMDz++/+9DZ0aqphj+5bYSqysL
 ROQOaj4LOICuLfrup9J61hKNBoF7Dv3sO20vc+Iic0XHRPZ6/lKCnHgCUsqVIOOQ
 L4kDT7XKQg+n3ttjrMe84/8iHZdWtf8VMWrtniPT8e1YGYuMpavVplgIcFoFCoNm
 Qa7NPDs=
 =rZeT
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR74yXHMTGczQHYypIdayaRccAalgUCaM3AYQAKCRAdayaRccAa
 lkrsAQCfR0LymE8Hq+Vfk65DK4qZxigaXGTfg5n3xlPhTAh/iQEA02N0/ReHOOdH
 nQde8709saIFE5axIMFvdWzbFPDtWwE=
 =eIkf
 -----END PGP SIGNATURE-----

Merge 6.17-rc6 into kbuild-next

Commit bd7c231212 ("pinctrl: meson: Fix typo in device table macro")
is needed in kbuild-next to avoid a build error with a future change.

While at it, address the conflict between commit 41f9049cff ("riscv:
Only allow LTO with CMODEL_MEDANY") and commit 6578a1ff6a ("riscv:
Remove version check for LTO_CLANG selects"), as reported by Stephen
Rothwell [1].

Link: https://lore.kernel.org/20250908134913.68778b7b@canb.auug.org.au/ [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-09-19 13:43:11 -07:00
Christian Brauner 7cf7303211
ns: use inode initializer for initial namespaces
Just use the common helper we have.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 16:22:38 +02:00
Christian Brauner 024596a4e2
ns: rename to __ns_ref
Make it easier to grep and rename to ns_count.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 16:22:38 +02:00
Christian Brauner b36c823b9a
time: support ns lookup
Support the generic ns lookup infrastructure to support file handles for
namespaces.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 14:26:15 +02:00
Heiko Carstens f72e2cff13 compiler_types: Add __assume macro
Make the statement attribute "assume" with a new __assume macro available.

The assume attribute is used to indicate that a certain condition is
assumed to be true. Compilers may or may not use this indication to
generate optimized code. If this condition is violated at runtime, the
behavior is undefined.

Note that the clang documentation states that optimizers may react
differently to this attribute, and this may even have a negative
performance impact. Therefore this attribute should be used with care.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-09-18 14:06:40 +02:00
Geert Uytterhoeven 7479260860
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c.  Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.

Fixes: 1274aea127 ("initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 15:02:17 +02:00
Thorsten Blum afd77d2050
initramfs: Replace strcpy() with strscpy() in find_link()
strcpy() is deprecated; use strscpy() instead.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:52:02 +02:00
Thorsten Blum beb022ef92
initrd: Use str_plural() in rd_load_image()
Add the local variable 'nr_disks' and replace the manual ternary "s"
pluralization with the standardized str_plural() helper function.

Use pr_notice() instead of printk(KERN_NOTICE) to silence a checkpatch
warning.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:47:14 +02:00
Thorsten Blum e60625e7ce
initramfs: Use struct_size() helper to improve dir_add()
Use struct_size() to calculate the number of bytes to allocate for a new
directory entry.  No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:42:56 +02:00
Thorsten Blum 84f1766bdb
initrd: Fix unused variable warning in rd_load_image() on s390
The local variables 'rotator' and 'rotate' (used for the progress
indicator) aren't used on s390. Building the kernel with W=1 generates
the following warning:

init/do_mounts_rd.c:192:17: warning: variable 'rotate' set but not used [-Wunused-but-set-variable]
 192 |         unsigned short rotate = 0;
     |                        ^
1 warning generated.

Remove the preprocessor directives and use the IS_ENABLED(CONFIG_S390)
macro instead, allowing the compiler to optimize away unused variables
and avoid the warning on s390.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:28:37 +02:00
Huacai Chen e416f0ed3c init: handle bootloader identifier in kernel parameters
BootLoaders (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters.  But these identifiers are not
recognized by the kernel itself so will be passed to userspace.  However
user space init program also don't recognize it.

KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.

We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:

https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption) So the the best way is handle
(ignore) it by the kernel itself, which can avoid such boot warnings (if
we use something like init=/bin/bash, bootloader identifier can even cause
a crash):

Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.

[chenhuacai@loongson.cn: use strstarts()]
  Link: https://lkml.kernel.org/r/20250815090120.1569947-1-chenhuacai@loongson.cn
Link: https://lkml.kernel.org/r/20250721101343.3283480-1-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:45 -07:00
Linus Torvalds 4f553c1e2c 20 hotfixes. 15 are cc:stable and the remainder address post-6.16 issues
or aren't considered necessary for -stable kernels.  14 of these fixes are
 for MM.
 
 This includes
 
 - a 3-patch kexec series from Breno that fixes a recently introduced
   use-uninitialized bug,
 
 - e 2-patch DAMON series from Quanmin Yan that avoids div-by-zero
   crashes which can occur if the operator uses poorly-chosen insmod
   parameters.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCaMI7WQAKCRDdBJ7gKXxA
 jq3sAQDkflIN0qW3R7yqgUZfdO78T2LMmGlPW1L7F/ZXkxLk7gD/WgkWoec5cqi0
 ACiL81h6btIYBLHJ+SqJuowPMhaelQg=
 =fquW
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "20 hotfixes. 15 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 14 of these
  fixes are for MM.

  This includes

   - kexec fixes from Breno for a recently introduced
     use-uninitialized bug

   - DAMON fixes from Quanmin Yan to avoid div-by-zero crashes
     which can occur if the operator uses poorly-chosen insmod
     parameters

   and misc singleton fixes"

* tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add tree entry to numa memblocks and emulation block
  mm/damon/sysfs: fix use-after-free in state_show()
  proc: fix type confusion in pde_set_flags()
  compiler-clang.h: define __SANITIZE_*__ macros only when undefined
  mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
  ocfs2: fix recursive semaphore deadlock in fiemap call
  mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
  mm/mremap: fix regression in vrm->new_addr check
  percpu: fix race on alloc failed warning limit
  mm/memory-failure: fix redundant updates for already poisoned pages
  s390: kexec: initialize kexec_buf struct
  riscv: kexec: initialize kexec_buf struct
  arm64: kexec: initialize kexec_buf struct in load_other_segments()
  mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
  mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
  mm/damon/core: set quota->charged_from to jiffies at first charge window
  mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
  init/main.c: fix boot time tracing crash
  mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()
  mm/khugepaged: fix the address passed to notifier on testing young
2025-09-10 21:19:34 -07:00
Yi Tao 0568f89d4f cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs
The static usage pattern of creating a cgroup, enabling controllers,
and then seeding it with CLONE_INTO_CGROUP doesn't require write
locking cgroup_threadgroup_rwsem and thus doesn't benefit from this
patch.

To avoid affecting other users, the per threadgroup rwsem is only used
when the favordynmods is enabled.

As computer hardware advances, modern systems are typically equipped
with many CPU cores and large amounts of memory, enabling the deployment
of numerous applications. On such systems, container creation and
deletion become frequent operations, making cgroup process migration no
longer a cold path. This leads to noticeable contention with common
process operations such as fork, exec, and exit.

To alleviate the contention between cgroup process migration and
operations like process fork, this patch modifies lock to take the write
lock on signal_struct->group_rwsem when writing pid to
cgroup.procs/threads instead of holding a global write lock.

Cgroup process migration has historically relied on
signal_struct->group_rwsem to protect thread group integrity. In commit
<1ed1328792ff> ("sched, cgroup: replace signal_struct->group_rwsem with
a global percpu_rwsem"), this was changed to a global
cgroup_threadgroup_rwsem. The advantage of using a global lock was
simplified handling of process group migrations. This patch retains the
use of the global lock for protecting process group migration, while
reducing contention by using per thread group lock during
cgroup.procs/threads writes.

The locking behavior is as follows:

write cgroup.procs/threads  | process fork,exec,exit | process group migration
------------------------------------------------------------------------------
cgroup_lock()               | down_read(&g_rwsem)    | cgroup_lock()
down_write(&p_rwsem)        | down_read(&p_rwsem)    | down_write(&g_rwsem)
critical section            | critical section       | critical section
up_write(&p_rwsem)          | up_read(&p_rwsem)      | up_write(&g_rwsem)
cgroup_unlock()             | up_read(&g_rwsem)      | cgroup_unlock()

g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes
signal_struct->group_rwsem.

This patch eliminates contention between cgroup migration and fork
operations for threads that belong to different thread groups, thereby
reducing the long-tail latency of cgroup migrations and lowering system
load.

With this patch, under heavy fork and exec interference, the long-tail
latency of cgroup migration has been reduced from milliseconds to
microseconds. Under heavy cgroup migration interference, the multi-CPU
score of the spawn test case in UnixBench increased by 9%.

tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to
    pr_warn_once().

Signed-off-by: Yi Tao <escape@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-10 07:44:51 -10:00
Linus Torvalds b236920731 Rust fixes for v6.17 (2nd)
Toolchain and infrastructure:
 
  - Two changes to prepare for the future Rust 1.91.0 release (expected
    2025-10-30, currently in nightly): a target specification format
    change and a renamed, soon-to-be-stabilized 'core' function.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmi8ciYACgkQGXyLc2ht
 IW3Bdg/9G07hHjmW0TR+V7AExNTqaZKdCxY//1lFeIuiIIew/tasqqbMCzv6KPi0
 UL/yk4f4bBp5oklRGxzvu5DKG31XCe5i1izML3zQ2Y04Klkj3lcfAG3u05OEhPZm
 woelfv/s3Umy1Gl5xV2QfiVrocRfWlZvsTP/K+ho05FKhM476XjErn8p8U1U3tjA
 KbB+EBKV5reZ14gMDwjCN5BnMpddMQleifaEX/4oRwNfw3zzKDcqB34ojOXjVcrS
 0GfapPCx64FNHxSeHADQ2O9mzOSxT5cTEzcH9JBFU+p8owmjOImvvcPoYKFOn4My
 btQeVzoCxeDwmLBrV8peR3nQHEwjxFuWvyhBg0SQy+51DIzhPJZdy/Z8RsrSywnH
 qoirudXd/rchvUTk/dNxQC6NORuY1IGGi2uJAs4ZRrmC6UqbxjUdB08cyVY4gFj0
 H3VMVaWlVWIGsPATrA2Oh0D1wsJBDYynf1oyG8R1NyYu4vCpYKMBAyJlVGlrGgq6
 H+fU8BqV+LmXH9ogtk0vhT7sK2dHGSA64JR3q1ralOlQip1CsXVeaN9/B+wfjo+i
 dL1qwJXkii0yxyVElHTbnugxI60hR2cgkXRnDMgJu5fBPGDdebOGtB1xxDwLnCjH
 3yNiU1Eyyer/NbsqxO9ylYPvHeg4RXETHoK0gk+/jyyR0ryfbtE=
 =j/MW
 -----END PGP SIGNATURE-----

Merge tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:

 - Two changes to prepare for the future Rust 1.91.0 release (expected
   2025-10-30, currently in nightly): a target specification format
   change and a renamed, soon-to-be-stabilized 'core' function.

* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: support Rust >= 1.91.0 target spec
  rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
2025-09-06 12:33:09 -07:00
Thomas Weißschuh bad53ae2dc vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.

Remove the now unnecessary Kconfig symbol.

Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Mike Rapoport (Microsoft) 669602b5b7 init/main.c: fix boot time tracing crash
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[    0.181517] execmem_alloc_rw (mm/execmem.c:487)
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[    0.189088] register_tracer (kernel/trace/trace.c:2391)
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[    0.190204] start_kernel (init/main.c:970)
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[    0.191381] x86_64_start_kernel (??:?)
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---

The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.

Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().

Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org
Fixes: 5d79c2be50 ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250820184743.0302a8b5@gandalf.local.home/
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-03 17:10:36 -07:00
Alice Ryhl c09461a0d2 rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
As part of the stabilization of Location::file_with_nul(), it was brought
up that the with_nul() suffix usually means something else in Rust APIs,
so the API is being renamed prior to stabilization [1].

Thus, use the new name on new rustc versions.

Link: https://www.github.com/rust-lang/rust/pull/145928 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com
[ Kept `cfg` separation. Reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-08-31 23:34:34 +02:00