Compare commits

...

660 Commits

Author SHA1 Message Date
Linus Torvalds 619f4edc8d Thermal control updates for 6.19-rc1
- Add Nova Lake processor thermal device to the int340x
    processor_thermal driver, add DLVR support for Nova Lake to it,
    add Nova Lake support to the ACPI DPTF code, document thermal
    throttling on Intel platforms, and update workload type hint
    interface documentation (Srinivas Pandruvada)
 
  - Remove int340x thermal scan handler from the ACPI DPTF code
    because it turned out to be unnecessary (Slawomir Rosek)
 
  - Clean up the Intel int340x thermal driver (Kaushlendra Kumar)
 
  - Document the RZ/V2H TSU DT bindings (Ovidiu Panait)
 
  - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu
    Pallikunhi)
 
  - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut)
 
  - Convert to DEFINE_SIMPLE_DEV_PM_OPS() in R-Car [Gen3] (Geert
    Uytterhoeven)
 
  - Fix format string bug in thermal-engine (Malaya Kumar Rout)
 
  - Make ipq5018 tsens standalone compatible (George Moussalem)
 
  - Add the QCS8300 compatible for QCom Tsens (Gaurav Kohli)
 
  - Add support for the NXP i.MX91 thermal module, including the DT
    bindings (Pengfei Li)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkpt1cSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1cikH/jW6IUXUvrTy9VEi3wGTzLAcnOuJJtQL
 zKQBzrjtuGngr4xIeE+chr9Gr8+S4EfVcD17twp59I6C3T9fBZngfMxbi7VLdyd7
 gIJs2IxIqfIlQwK32lBOkLM2/YHa0AYU3Dd/YHsgYOU3Y25adSvmiwoTqG3kUmXB
 YnoHUobPskzV/9iKf2sptM7XBLDaBdoPHcDAM2BN4rfKhgOy/1ha7KzigJyuHHyW
 V+b9KR/IlFOVct8OrvhXKT4mzsS9VZv6IJ6KrRsDsCIgVwM/fO1YTmRgfOkuRASZ
 fw6vNNi+49xK0LN8zEvHKuYthBFy+7lIP+MiErMr2Fw5nuY2uXitZA8=
 =OgXe
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "These add Nova Lake processor support to the Intel thermal drivers and
  DPTF code, update thermal control documentation, simplify the ACPI
  DPTF code related to thermal control, add QCS8300 compatible to the
  tsens thermal DT bindings, add DT bindings for NXP i.MX91 thermal
  module and add support for it to the imx91 thermal driver, update a
  few other thermal drivers and fix a format string issue in a thermal
  utility:

   - Add Nova Lake processor thermal device to the int340x
     processor_thermal driver, add DLVR support for Nova Lake to it, add
     Nova Lake support to the ACPI DPTF code, document thermal
     throttling on Intel platforms, and update workload type hint
     interface documentation (Srinivas Pandruvada)

   - Remove int340x thermal scan handler from the ACPI DPTF code because
     it turned out to be unnecessary (Slawomir Rosek)

   - Clean up the Intel int340x thermal driver (Kaushlendra Kumar)

   - Document the RZ/V2H TSU DT bindings (Ovidiu Panait)

   - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu
     Pallikunhi)

   - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek
     Vasut)

   - Convert to DEFINE_SIMPLE_DEV_PM_OPS() in R-Car [Gen3] (Geert
     Uytterhoeven)

   - Fix format string bug in thermal-engine (Malaya Kumar Rout)

   - Make ipq5018 tsens standalone compatible (George Moussalem)

   - Add the QCS8300 compatible for QCom Tsens (Gaurav Kohli)

   - Add support for the NXP i.MX91 thermal module, including the DT
     bindings (Pengfei Li)"

* tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit
  dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module
  dt-bindings: thermal: tsens: Add QCS8300 compatible
  dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible
  tools/thermal/thermal-engine: Fix format string bug in thermal-engine
  docs: driver-api/thermal/intel_dptf: Add new workload type hint
  thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  Documentation: thermal: Document thermal throttling on Intel platforms
  ACPI: DPTF: Support Nova Lake
  thermal: intel: int340x: Add DLVR support for Nova Lake
  thermal: int340x: processor_thermal: Add Nova Lake processor thermal device
  thermal: intel: int340x: Replace sprintf() with sysfs_emit()
  thermal: intel: int340x: Use symbolic constant for UUID comparison
  thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment
  dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor
  dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU
  ACPI: DPTF: Remove int340x thermal scan handler
  thermal: intel: Select INT340X_THERMAL from INTEL_SOC_DTS_THERMAL
2025-12-02 17:49:12 -08:00
Linus Torvalds d348c22394 Power management updates for 6.19-rc1
- Introduce and document a QoS limit on CPU exit latency during wakeup
    from suspend-to-idle (Ulf Hansson)
 
  - Add support for building libcpupower statically (Zuo An)
 
  - Add support for sending netlink notifications to user space on energy
    model updates (Changwoo Mini, Peng Fan)
 
  - Minor improvements to the Rust OPP interface (Tamir Duberstein)
 
  - Fixes to scope-based pointers in the OPP library (Viresh Kumar)
 
  - Use residency threshold in polling state override decisions in the
    menu cpuidle governor (Aboorva Devarajan)
 
  - Add sanity check for exit latency and target residency in the cpufreq
    core (Rafael Wysocki)
 
  - Use this_cpu_ptr() where possible in the teo governor (Christian
    Loehle)
 
  - Rework the handling of tick wakeups in the teo cpuidle governor to
    increase the likelihood of stopping the scheduler tick in the cases
    when tick wakeups can be counted as non-timer ones (Rafael Wysocki)
 
  - Fix a reverse condition in the teo cpuidle governor and drop a
    misguided target residency check from it (Rafael Wysocki)
 
  - Clean up multiple minor defects in the teo cpuidle governor (Rafael
    Wysocki)
 
  - Update header inclusion to make it follow the Include What You Use
    principle (Andy Shevchenko)
 
  - Enable MSR-based RAPL PMU support in the intel_rapl power capping
    driver and arrange for using it on the Panther Lake and Wildcat Lake
    processors (Kuppuswamy Sathyanarayanan)
 
  - Add support for Nova Lake and Wildcat Lake processors to the
    intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
    Pandruvada)
 
  - Add OPP and bandwidth support for Tegra186 (Aaron Kling)
 
  - Optimizations for parameter array handling in the amd-pstate cpufreq
    driver (Mario Limonciello)
 
  - Fix for mode changes with offline CPUs in the amd-pstate cpufreq
    driver (Gautham Shenoy)
 
  - Preserve freq_table_sorted across suspend/hibernate in the cpufreq
    core (Zihuan Zhang)
 
  - Adjust energy model rules for Intel hybrid platforms in the
    intel_pstate cpufreq driver and improve printing of debug messages
    in it (Rafael Wysocki)
 
  - Replace deprecated strcpy() in cpufreq_unregister_governor()
    (Thorsten Blum)
 
  - Fix duplicate hyperlink target errors in the intel_pstate cpufreq
    driver documentation and use :ref: directive for internal linking in
    it (Swaraj Gaikwad, Bagas Sanjaya)
 
  - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
    driver (Kuppuswamy Sathyanarayanan)
 
  - Use mutex guard for driver locking in the intel_pstate driver and
    eliminate some code duplication from it (Rafael Wysocki)
 
  - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
    Kumar)
 
  - Minor improvements to various cpufreq drivers (Christian Marangi, Hal
    Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)
 
  - Replace snprintf() with scnprintf() in show_trace_dev_match()
    (Kaushlendra Kumar)
 
  - Fix memory allocation error handling in pm_vt_switch_required()
    (Malaya Kumar Rout)
 
  - Introduce CALL_PM_OP() macro and use it to simplify code in
    generic PM operations (Kaushlendra Kumar)
 
  - Add module param to backtrace all CPUs in the device power management
    watchdog (Sergey Senozhatsky)
 
  - Rework message printing in swsusp_save() (Rafael Wysocki)
 
  - Make it possible to change the number of hibernation compression
    threads (Xueqin Luo)
 
  - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)
 
  - Add document on debugging shutdown hangs to PM documentation and
    correct a mistaken configuration option in it (Mario Limonciello)
 
  - Shut down wakeup source timer before removing the wakeup source from
    the list (Kaushlendra Kumar, Rafael Wysocki)
 
  - Introduce new PMSG_POWEROFF event for system shutdown handling with
    the help of PM device callbacks (Mario Limonciello)
 
  - Make pm_test delay interruptible by wakeup events (Riwen Lu)
 
  - Clean up kernel-doc comment style usage in the core hibernation
    code and remove unuseful comments from it (Sunday Adelodun, Rafael
    Wysocki)
 
  - Add support for handling wakeup events and aborting the suspend
    process while it is syncing file systems (Samuel Wu, Rafael Wysocki)
 
  - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)
 
  - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
    them in the PCI core and the ACPI TAD driver (Rafael Wysocki)
 
  - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)
 
  - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)
 
  - Fix typos in runtime.c comments (Malaya Kumar Rout)
 
  - Move governor.h from devfreq under include/linux/ and rename to
    devfreq-governor.h to allow devfreq governor definitions in out
    of drivers/devfreq/ (Dmitry Baryshkov)
 
  - Use min() to improve readability in tegra30-devfreq.c (Thorsten
    Blum)
 
  - Fix potential use-after-free issue of OPP handling in
    hisi_uncore_freq.c (Pengjie Zhang)
 
  - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
    governor_simpleondemand.c in devfreq (Riwen Lu)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkp0BYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1Pc8H/2G5d0aD/ym1a8MDTpKqn7t3/rVMHa76
 YGfxXMBr1oY++r5GTJTKBxZBHmF89VH71kdyvsMidTAtHjR+iZAS1ajd2Q5VYjOF
 QNMld1qgPEzAZU8WSetDrBqMr89zls05Uubo4aCoNy6rFmgRaLHh3AmIKSS9aJuo
 C1eH8dRONME5I/rafkOUpFs1+/Agq1vePwPZmwVnZX9A3qI+UOhMRdU9A37kYkx9
 YwfQvR2fKTIPjZ6B9f/wGXPOvdrT37d4+dWT3EABOHMkxlpAPDMvmVzZsUaXSQMr
 0d9NGEjPGo33qciKJJpHqNOdDOhi90606WBBf7aaMF+GMhDX3PznOK4=
 =rzXO
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "There are quite a few interesting things here, including new hardware
  support, new features, some bug fixes and documentation updates. In
  addition, there are a usual bunch of minor fixes and cleanups all
  over.

  In the new hardware support category, there are intel_pstate and
  intel_rapl driver updates to support new processors, Panther Lake,
  Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and
  bandwidth allocation support in the tegra186 cpufreq driver, and
  JH7110S SOC support in dt-platdev cpufreq.

  The new features are the PM QoS CPU latency limit for suspend-to-idle,
  the netlink support for the energy model management, support for
  terminating system suspend via a wakeup event during the sync of file
  systems, configurable number of hibernation compression threads, the
  runtime PM auto-cleanup macros, and the "poweroff" PM event that is
  expected to be used during system shutdown.

  Bugs are mostly fixed in cpuidle governors, but there are also fixes
  elsewhere, like in the amd-pstate cpufreq driver.

  Documentation updates include, but are not limited to, a new doc on
  debugging shutdown hangs, cross-referencing fixes and cleanups in the
  intel_pstate documentation, and updates of comments in the core
  hibernation code.

  Specifics:

   - Introduce and document a QoS limit on CPU exit latency during
     wakeup from suspend-to-idle (Ulf Hansson)

   - Add support for building libcpupower statically (Zuo An)

   - Add support for sending netlink notifications to user space on
     energy model updates (Changwoo Mini, Peng Fan)

   - Minor improvements to the Rust OPP interface (Tamir Duberstein)

   - Fixes to scope-based pointers in the OPP library (Viresh Kumar)

   - Use residency threshold in polling state override decisions in the
     menu cpuidle governor (Aboorva Devarajan)

   - Add sanity check for exit latency and target residency in the
     cpufreq core (Rafael Wysocki)

   - Use this_cpu_ptr() where possible in the teo governor (Christian
     Loehle)

   - Rework the handling of tick wakeups in the teo cpuidle governor to
     increase the likelihood of stopping the scheduler tick in the cases
     when tick wakeups can be counted as non-timer ones (Rafael Wysocki)

   - Fix a reverse condition in the teo cpuidle governor and drop a
     misguided target residency check from it (Rafael Wysocki)

   - Clean up multiple minor defects in the teo cpuidle governor (Rafael
     Wysocki)

   - Update header inclusion to make it follow the Include What You Use
     principle (Andy Shevchenko)

   - Enable MSR-based RAPL PMU support in the intel_rapl power capping
     driver and arrange for using it on the Panther Lake and Wildcat
     Lake processors (Kuppuswamy Sathyanarayanan)

   - Add support for Nova Lake and Wildcat Lake processors to the
     intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
     Pandruvada)

   - Add OPP and bandwidth support for Tegra186 (Aaron Kling)

   - Optimizations for parameter array handling in the amd-pstate
     cpufreq driver (Mario Limonciello)

   - Fix for mode changes with offline CPUs in the amd-pstate cpufreq
     driver (Gautham Shenoy)

   - Preserve freq_table_sorted across suspend/hibernate in the cpufreq
     core (Zihuan Zhang)

   - Adjust energy model rules for Intel hybrid platforms in the
     intel_pstate cpufreq driver and improve printing of debug messages
     in it (Rafael Wysocki)

   - Replace deprecated strcpy() in cpufreq_unregister_governor()
     (Thorsten Blum)

   - Fix duplicate hyperlink target errors in the intel_pstate cpufreq
     driver documentation and use :ref: directive for internal linking
     in it (Swaraj Gaikwad, Bagas Sanjaya)

   - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
     driver (Kuppuswamy Sathyanarayanan)

   - Use mutex guard for driver locking in the intel_pstate driver and
     eliminate some code duplication from it (Rafael Wysocki)

   - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
     Kumar)

   - Minor improvements to various cpufreq drivers (Christian Marangi,
     Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)

   - Replace snprintf() with scnprintf() in show_trace_dev_match()
     (Kaushlendra Kumar)

   - Fix memory allocation error handling in pm_vt_switch_required()
     (Malaya Kumar Rout)

   - Introduce CALL_PM_OP() macro and use it to simplify code in generic
     PM operations (Kaushlendra Kumar)

   - Add module param to backtrace all CPUs in the device power
     management watchdog (Sergey Senozhatsky)

   - Rework message printing in swsusp_save() (Rafael Wysocki)

   - Make it possible to change the number of hibernation compression
     threads (Xueqin Luo)

   - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

   - Add document on debugging shutdown hangs to PM documentation and
     correct a mistaken configuration option in it (Mario Limonciello)

   - Shut down wakeup source timer before removing the wakeup source
     from the list (Kaushlendra Kumar, Rafael Wysocki)

   - Introduce new PMSG_POWEROFF event for system shutdown handling with
     the help of PM device callbacks (Mario Limonciello)

   - Make pm_test delay interruptible by wakeup events (Riwen Lu)

   - Clean up kernel-doc comment style usage in the core hibernation
     code and remove unuseful comments from it (Sunday Adelodun, Rafael
     Wysocki)

   - Add support for handling wakeup events and aborting the suspend
     process while it is syncing file systems (Samuel Wu, Rafael
     Wysocki)

   - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)

   - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
     them in the PCI core and the ACPI TAD driver (Rafael Wysocki)

   - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)

   - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)

   - Fix typos in runtime.c comments (Malaya Kumar Rout)

   - Move governor.h from devfreq under include/linux/ and rename to
     devfreq-governor.h to allow devfreq governor definitions in out of
     drivers/devfreq/ (Dmitry Baryshkov)

   - Use min() to improve readability in tegra30-devfreq.c (Thorsten
     Blum)

   - Fix potential use-after-free issue of OPP handling in
     hisi_uncore_freq.c (Pengjie Zhang)

   - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in
     governor_simpleondemand.c in devfreq (Riwen Lu)"

* tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits)
  PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
  cpuidle: Warn instead of bailing out if target residency check fails
  cpuidle: Update header inclusion
  Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
  cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
  sched: idle: Respect the CPU system wakeup QoS limit for s2idle
  pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
  pmdomain: Respect the CPU system wakeup QoS limit for s2idle
  PM: QoS: Introduce a CPU system wakeup QoS limit
  cpuidle: governors: teo: Add missing space to the description
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
  PM / devfreq: hisi: Fix potential UAF in OPP handling
  PM / devfreq: Move governor.h to a public header location
  powercap: intel_rapl: Enable MSR-based RAPL PMU support
  powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
  cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  cpufreq: ACPI: Replace udelay() with usleep_range()
  ...
2025-12-02 17:31:22 -08:00
Linus Torvalds 959bfe496b ACPI support updates for 6.19-rc1
- Avoid walking the ACPI namespace in the AML interpreter if the
    starting node cannot be determined (Cryolitia PukNgae)
 
  - Use min() instead of min_t() in the ACPI device properties handling
    code to avoid discarding significant bits (David Laight)
 
  - Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
    that may prevent the parent fwnode from being released (Haotian Zhang)
 
  - Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove
    unnecessary conditionals from it to make it easier to follow, and make
    acpi_get_next_subnode() static (Sakari Ailus)
 
  - Drop unused function acpi_get_lps0_constraint(), make some Low-Power
    S0 callback functions for suspend-to-idle static, and rearrange the
    code retrieving Low-Power S0 constraints so it only runs when the
    constraints are actually used (Rafael Wysocki)
 
  - Drop redundant locking from the ACPI battery driver (Rafael Wysocki)
 
  - Improve runtime PM in the ACPI time and alarm device (TAD) driver
    using guard macros and rearrange code related to runtime PM in
    acpi_tad_remove() (Rafael Wysocki)
 
  - Add support for Microsoft fan extensions to the ACPI fan driver along
    with notification support and work around a 64-bit firmware bug in
    that driver (Armin Wolf)
 
  - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code (Kaushlendra
    Kumar)
 
  - Fix a memory leak and a resource leak in the ACPI pfrut utility (Malaya
    Kumar Rout)
 
  - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI Rust
    code (Siyuan Huang)
 
  - Update the ACPI code to use the new style of allocating workqueues
    and new global workqueues (Marco Crivellari)
 
  - Fix two spelling mistakes in the ACPI code (Chu Guangqing)
 
  - Fix ISAPNP to generate uevents to auto-load modules (René Rebe)
 
  - Relocate the state flags initialization in the ACPI processor idle
    driver and drop redundant C-state count checks from it (Huisong Li)
 
  - Fix map_x2apic_id() in the ACPI processor core driver for amd-pstate
    on am4 (René Rebe)
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkpsnISHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO1OoAH/0hw86dPEF2hj1Pw06/o2pS4+Ka/yCAm
 vSRn0WOyCEVPWzFmNWg6bZCgUC8AmFRkqXlafI2q9SCcgUaoG8dQ1sWijAEe4Pdz
 eo5G1pnDiNiljAF9JYUCtkAmmEZo7k9aQovi3RIhyS+rOdrLjCGziz5sbzalj2hJ
 CF6w3rN5O5Cp9lf3zPFF90AZsg9WuPVGa1xr2CjaNbrTuSwbmQn73X6JHuc8ROSX
 aeAIwvtSNIqdyBFLx52hdM9g7M+cm0UMe6eLMCtVTu4wOw1kL/QHNklzETTv6Fce
 P2JihZDhClaT2CKA4k/6vD4BQaPtTDvMFOV8TUM4rbmTw0hpxzKgLTc=
 =P5HW
 -----END PGP SIGNATURE-----

Merge tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These add Microsoft fan extensions support to the ACPI fan driver, fix
  a bug in ACPICA, update other ACPI drivers (processor, time and alarm
  device), update ACPI power management code and ACPI device properties
  management, and fix an ACPI utility:

   - Avoid walking the ACPI namespace in the AML interpreter if the
     starting node cannot be determined (Cryolitia PukNgae)

   - Use min() instead of min_t() in the ACPI device properties handling
     code to avoid discarding significant bits (David Laight)

   - Fix potential fwnode refcount leak in
     acpi_fwnode_graph_parse_endpoint() that may prevent the parent
     fwnode from being released (Haotian Zhang)

   - Rework acpi_graph_get_next_endpoint() to use ACPI functions only,
     remove unnecessary conditionals from it to make it easier to
     follow, and make acpi_get_next_subnode() static (Sakari Ailus)

   - Drop unused function acpi_get_lps0_constraint(), make some
     Low-Power S0 callback functions for suspend-to-idle static, and
     rearrange the code retrieving Low-Power S0 constraints so it only
     runs when the constraints are actually used (Rafael Wysocki)

   - Drop redundant locking from the ACPI battery driver (Rafael
     Wysocki)

   - Improve runtime PM in the ACPI time and alarm device (TAD) driver
     using guard macros and rearrange code related to runtime PM in
     acpi_tad_remove() (Rafael Wysocki)

   - Add support for Microsoft fan extensions to the ACPI fan driver
     along with notification support and work around a 64-bit firmware
     bug in that driver (Armin Wolf)

   - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code
     (Kaushlendra Kumar)

   - Fix a memory leak and a resource leak in the ACPI pfrut utility
     (Malaya Kumar Rout)

   - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI
     Rust code (Siyuan Huang)

   - Update the ACPI code to use the new style of allocating workqueues
     and new global workqueues (Marco Crivellari)

   - Fix two spelling mistakes in the ACPI code (Chu Guangqing)

   - Fix ISAPNP to generate uevents to auto-load modules (René Rebe)

   - Relocate the state flags initialization in the ACPI processor idle
     driver and drop redundant C-state count checks from it (Huisong Li)

   - Fix map_x2apic_id() in the ACPI processor core driver for
     amd-pstate on am4 (René Rebe)"

* tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits)
  ACPI: PM: Fix a spelling mistake
  ACPI: LPSS: Fix a spelling mistake
  ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4
  ACPICA: Avoid walking the Namespace if start_node is NULL
  ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c
  ACPI: property: use min() instead of min_t()
  PNP: Fix ISAPNP to generate uevents to auto-load modules
  ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
  ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation
  ACPI: processor: idle: Drop redundant C-state count checks
  ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: OSL: replace use of system_wq with system_percpu_wq
  ACPI: scan: replace use of system_unbound_wq with system_dfl_wq
  ACPI: fan: Add support for Microsoft fan extensions
  ACPI: fan: Add hwmon notification support
  ACPI: fan: Add basic notification support
  ACPI: TAD: Improve runtime PM using guard macros
  ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
  ...
2025-12-02 17:24:03 -08:00
Linus Torvalds 44fc84337b arm64 updates for 6.19:
Core features:
 
  - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
    driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
 
 Perf and PMU:
 
  - Avoid cycle counter on multi-threaded CPUs
 
  - Extend CSPMU device probing and add additional filtering support for
    NVIDIA implementations
 
  - Add support for the PMUs on the NoC S3 interconnect
 
  - Add additional compatible strings for new Cortex and C1 CPUs
 
  - Add support for data source filtering to the SPE driver
 
  - Add support for i.MX8QM and "DB" PMU in the imx PMU driver
 
 Memory managemennt:
 
  - Avoid broadcast TLBI if page reused in write fault
 
  - Elide TLB invalidation if the old PTE was not valid
 
  - Drop redundant cpu_set_*_tcr_t0sz() macros
 
  - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
 
  - Propagate return value from __change_memory_common()
 
 ACPI and EFI:
 
  - Call EFI runtime services without disabling preemption
 
  - Remove unused ACPI function
 
 Miscellaneous:
 
  - ptrace support to disable streaming on SME-only systems
 
  - Improve sysreg generation to include a 'Prefix' descriptor
 
  - Replace __ASSEMBLY__ with __ASSEMBLER__
 
  - Align register dumps in the kselftest zt-test
 
  - Remove some no longer used macros/functions
 
  - Various spelling corrections
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI
 XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+
 61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6
 zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx
 +AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L
 zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj
 p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT
 +6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl
 jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj
 sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+
 bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r
 KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ=
 =4Sb6
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "These are the arm64 updates for 6.19.

  The biggest part is the Arm MPAM driver under drivers/resctrl/.
  There's a patch touching mm/ to handle spurious faults for huge pmd
  (similar to the pte version). The corresponding arm64 part allows us
  to avoid the TLB maintenance if a (huge) page is reused after a write
  fault. There's EFI refactoring to allow runtime services with
  preemption enabled and the rest is the usual perf/PMU updates and
  several cleanups/typos.

  Summary:

  Core features:

   - Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
     driver under drivers/resctrl/ which makes use of the fs/rectrl/ API

  Perf and PMU:

   - Avoid cycle counter on multi-threaded CPUs

   - Extend CSPMU device probing and add additional filtering support
     for NVIDIA implementations

   - Add support for the PMUs on the NoC S3 interconnect

   - Add additional compatible strings for new Cortex and C1 CPUs

   - Add support for data source filtering to the SPE driver

   - Add support for i.MX8QM and "DB" PMU in the imx PMU driver

  Memory managemennt:

   - Avoid broadcast TLBI if page reused in write fault

   - Elide TLB invalidation if the old PTE was not valid

   - Drop redundant cpu_set_*_tcr_t0sz() macros

   - Propagate pgtable_alloc() errors outside of __create_pgd_mapping()

   - Propagate return value from __change_memory_common()

  ACPI and EFI:

   - Call EFI runtime services without disabling preemption

   - Remove unused ACPI function

  Miscellaneous:

   - ptrace support to disable streaming on SME-only systems

   - Improve sysreg generation to include a 'Prefix' descriptor

   - Replace __ASSEMBLY__ with __ASSEMBLER__

   - Align register dumps in the kselftest zt-test

   - Remove some no longer used macros/functions

   - Various spelling corrections"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
  arm64/pageattr: Propagate return value from __change_memory_common
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  Documentation/arm64: Fix the typo of register names
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  ...
2025-12-02 17:03:55 -08:00
Linus Torvalds 2547f79b0b s390 updates for 6.19 merge window
- Provide a new interface for dynamic configuration and deconfiguration of
   hotplug memory, allowing with and without memmap_on_memory support. This
   makes the way memory hotplug is handled on s390 much more similar to
   other architectures
 
 - Remove compat support. There shouldn't be any compat user space around
   anymore, therefore get rid of a lot of code which also doesn't need to be
   tested anymore
 
 - Add stackprotector support. GCC 16 will get new compiler options, which
   allow to generate code required for kernel stackprotector support
 
 - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes
   a lot of duplicated code. The new driver is also extendable and allows
   to support new PMUs
 
 - Add driver override support for AP queues
 
 - Rework and extend zcrypt and AP trace events to allow for tracing of
   crypto requests
 
 - Support block sizes larger than 65535 bytes for CCW tape devices
 
 - Since the rework of the virtual kernel address space the module area and
   the kernel image are within the same 4GB area. This eliminates the need
   of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
 
 - Various other small improvements and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmktZioACgkQIg7DeRsp
 bsK4Rw//VzkvHyzOtGKZ8Hb4S+Sh/PFlaZQXNhj+Xt5gWoOhP1uPmmhBe6LxjYaB
 J9Ns3hpONQ1dTHV7VVkds8FvM/SBcGe8m5RpefmChC/bjm5UEOV/MppKtA0aLnEH
 hJmdubIrrRAXKggxlHEfRLzBsFvV/rJ9Xf16FhRxGDc4pgmgkI1NPQ41/dyCHklQ
 dB3YrFVPIETywVYYVB/G3h11JgF5Z6CKtjYCdSx72Fkbj65+6JPfcPgLKMpcJuPd
 UxUXtCo1FCXlP70jsz8JQI8cdieG0KDQTtnZP4P/pqjQ3wirOqvMewNa9t9xmQ2e
 p6Rc1Vx5DESkq9bRWtQEaprTVVzK7DDLH3RuZwB+uLrcLGD8JvVS6/m9n9CgzBMT
 BnJXG2sLZH+gdQy+DSD/fVDD7OvIk8TGrH+OFwVIKhrT/J3B2E7ZSYyZZCNIS7VG
 yiuypoDGYg3ZpYjH9+qOXWB3nc0vQWrlFzb1bsQu1omJGmunLv4jtTjAKGN82C33
 auBsIYAlQW20X7DV0vZa59PwqwtBqtdQQcTidwtSztzKogRXAdK8KKHtN60JM4S2
 7sWFOFCQaTChAeDNw6MF5EtULb551nwH2RtJ9x3CrJj+OGK6clbQNcxIA7Oy0veR
 Sl9v1lMfeKOgDrPdDy3ArQBJ8WLlF9qX9wLKbiaNyIKmkz2ymkg=
 =CNrb
 -----END PGP SIGNATURE-----

Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - Provide a new interface for dynamic configuration and deconfiguration
   of hotplug memory, allowing with and without memmap_on_memory
   support. This makes the way memory hotplug is handled on s390 much
   more similar to other architectures

 - Remove compat support. There shouldn't be any compat user space
   around anymore, therefore get rid of a lot of code which also doesn't
   need to be tested anymore

 - Add stackprotector support. GCC 16 will get new compiler options,
   which allow to generate code required for kernel stackprotector
   support

 - Merge pai_crypto and pai_ext PMU drivers into a new driver. This
   removes a lot of duplicated code. The new driver is also extendable
   and allows to support new PMUs

 - Add driver override support for AP queues

 - Rework and extend zcrypt and AP trace events to allow for tracing of
   crypto requests

 - Support block sizes larger than 65535 bytes for CCW tape devices

 - Since the rework of the virtual kernel address space the module area
   and the kernel image are within the same 4GB area. This eliminates
   the need of weak per cpu variables. Get rid of
   ARCH_MODULE_NEEDS_WEAK_PER_CPU

 - Various other small improvements and fixes

* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits)
  watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
  s390/entry: Use lay instead of aghik
  s390/vdso: Get rid of -m64 flag handling
  s390/vdso: Rename vdso64 to vdso
  s390: Rename head64.S to head.S
  s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
  s390: Add stackprotector support
  s390/modules: Simplify module_finalize() slightly
  s390: Remove KMSG_COMPONENT macro
  s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
  s390/ap: Restrict driver_override versus apmask and aqmask use
  s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
  s390/ap: Support driver_override for AP queue devices
  s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
  s390/debug: Update description of resize operation
  s390/syscalls: Switch to generic system call table generation
  s390/syscalls: Remove system call table pointer from thread_struct
  s390/uapi: Remove 31 bit support from uapi header files
  s390: Remove compat support
  tools: Remove s390 compat support
  ...
2025-12-02 16:37:00 -08:00
Linus Torvalds 4a21d1b33f m68k updates for v6.19
- Defconfig updates.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCaS1kJwAKCRCKwlD9ZEnx
 cFC6AQDkzu/3/YADECgLcWeuGw0GqxrYma0cEXkk0YJsfYU6dQEAsTcIl0zUP+IV
 cqMRQveanl9yDRpmmXlmRPDBK58hZQI=
 =n3hx
 -----END PGP SIGNATURE-----

Merge tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k update from Geert Uytterhoeven:

 - defconfig update

* tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: defconfig: Update defconfigs for v6.18-rc1
2025-12-02 16:32:02 -08:00
Linus Torvalds d61f1cc5db * Enable Linear Address Space Separation (LASS)
* Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuIXAACgkQaDWVMHDJ
 krAwoRAAqqavNrthj26XJHjR3x7FVGu11/rvYXAd1U2moN/dhM2w82HMHNFvPuQY
 3iq9GDRQdc2rKL7LTkREvN4ZM/rFvkFLt6a5Yv0eCRK8KAiSJEw6Yzu/qgG7kF+0
 9clujDUskjjHU0zR5v+o1RxirrLVQ+R50sMVI5uoFx6+WJRiW1BvMG4Csw4BgbvA
 AqgrZpyq1dQ/GQOW4f0yxBPH0z84wgUbdllYzQzE0GeUlGWQSI4lqa8GFMOmE/Gr
 7gBcKmyE0M/BycwTZW7tiMnjWgNL+Y5/RroQJ7hh6R+f5WOd+SpGvlyOihbF7GER
 L3yZfeQ+EWz1aY1QMWwOSvSawIPJo8EkSn3d9/JFq5Vl9zsFh+ZoPZfZ8bEi36U0
 inO93swDcyMkkfOTh4sIgxedLgHja5GFNCGPs0yblvLulWbw7yYVzzEmEjXnclzS
 fmmifsJjGrUpegEnWdEjAQzXkWPd/hKiAvpzDE/3thBal5NkOzFrudITFvCVuk8w
 uS2MW0U8VCskNoON0jjwnvv84p0XdHJOsgPB9WnsuMMASKC1RqKAJWXh8AXvZA+I
 TfCNdSyHDTm+o1e+SMQZRbqoE/r7MmAxUQOkKnlvpJDCz58tsLzW64hRXTe7QpCt
 rry9/wODswu+oaHoDgfjAmzYde2RhCjwWLzGmqmapNIYfCCVhYs=
 =5bcW
 -----END PGP SIGNATURE-----

Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU feature updates from Dave Hansen:
 "The biggest thing of note here is Linear Address Space Separation
  (LASS). It represents the first time I can think of that the
  upper=>kernel/lower=>user address space convention is actually
  recognized by the hardware on x86. It ensures that userspace can not
  even get the hardware to _start_ page walks for the kernel address
  space. This, of course, is a really nice generic side channel defense.

  This is really only a down payment on LASS support. There are still
  some details to work out in its interaction with EFI calls and
  vsyscall emulation. For now, LASS is disabled if either of those
  features is compiled in (which is almost always the case).

  There's also one straggler commit in here which converts an
  under-utilized AMD CPU feature leaf into a generic Linux-defined leaf
  so more feature can be packed in there.

  Summary:

   - Enable Linear Address Space Separation (LASS)

   - Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined"

* tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Enable LASS during CPU initialization
  selftests/x86: Update the negative vsyscall tests to expect a #GP
  x86/traps: Communicate a LASS violation in #GP message
  x86/kexec: Disable LASS during relocate kernel
  x86/alternatives: Disable LASS when patching kernel code
  x86/asm: Introduce inline memcpy and memset
  x86/cpu: Add an LASS dependency on SMAP
  x86/cpufeatures: Enumerate the LASS feature bits
  x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
2025-12-02 14:48:08 -08:00
Linus Torvalds a7610b8465 * Fix 64-bit identifier structure member name in fred_ss
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuHhcACgkQaDWVMHDJ
 krDXKg//UVSrG9dyLoyhXKLEF33mKm98l3AKeIQSV9yY56K71HwxNDr44di6GGY7
 Rwlwl0GPXRCW1qDC11Kbd1m6buj5CZeuCMp6nnIhAvefw+NOlApg4mhzVIdjUTg0
 EKagZPlq13g1puOPiV9T+ybFsRJxV5MPoY+fLCvGCJmTMwavfpYylXo0xkU7vgkR
 Asc7LDK/H+t7Cbe51h+XAAdJMRC/tjBH7s7MhYVZuJtzXiin2MLAw/wV9Uz+YJ5d
 fGy1N945igItyH+qG9TqbKQDi1s5kf70NPcGAL+UxF8ox8NNOVEF90VFs8YEaF/s
 sHegUtahfFmkb7YEBGVz+ctHoSgwQfwiScYdjBPosj3EzPksSOogcy2PqTxGo3wx
 kNp9jc1oHfMziOLVbVDyQg31XEEy6Vvbc/2O7UTPx8fJbivG0+7ZpisKb1XC1MFh
 fsX9mPP2aNQCXs8kIjozeHQTEGCIDWsvnlX3co3bhlIIIO8eIFp6JYoy9A+wTimo
 WOrFH8E9ktbhBQcvIOrf+9WZ9P3uJxq60h6SLtGeL6/x7c3geCajFLcJzwlIYPQU
 m1gZqMLV4/ceQeUw2wE7rExzjD56JSDtnP6FgW3ygalLbxj9mRqqbg5ArsjH9gJ+
 xA01rH6XcXNSUOzuu2yqBps3FqOfp5A4sLyUfnSbs0eTztmX3Q8=
 =CN+R
 -----END PGP SIGNATURE-----

Merge tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 entry update from Dave Hansen:
 "This one is pretty trivial: fix a badly-named FRED data structure
  member"

* tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fred: Fix 64bit identifier in fred_ss
2025-12-02 14:24:21 -08:00
Linus Torvalds e2aa39b368 * Make MSR-induced taint easier for users to track down
* Restrict KVM-specific exports to KVM itself
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuHIIACgkQaDWVMHDJ
 krAxMg//RQUz6JnQnMASuN/NhrjIANRjcPJI9S0LoKcTbZ0nZ5aH6oR1VOFszLLa
 ShGcUO2RuDbCl2wPAG/lRWV8eL/4k4mZi0zNT7vEKTkX/EZn5RDV59p88zCo62KV
 835OpX8W9Hvyiichw51RoVrJxEcqgCmlUYO2fCwtk2rpntUCOVQgHMeLhhqMsZ0e
 yMQECAE75oXQ4vhAG+zO7/KmLqVbSGgqpXYw6DOZGEJF0T7tdZIgFhd25WAPgcf0
 UN8VmTX971Eq67OrUX9OojN6+SxBqQ7vc+qBtd5bDlkZsRxVyV157Zso2PCPbsm2
 FkE65eJBa9qacqvwkCPND6J7gvE/Sm8DaLVafLPKDNWTaqSo4cfKJD7P/sgN1L69
 O8QsiLfafy8ITIA8AXS90C8x/puhqk15OKW2kJFFfUkhrGdu72/AxVlo6JcM1N0u
 qkDXUNBSX9/LHkRT9AtkLch27MEFXRKxsajjx2lFoBIR2VjIijm9314cRczHGZEV
 R/pqBh21yL/ZTriNIgmEPrFOV4zDxaOsHRh8YSEFAXRe2xWvm7dZwNSPRSh7hMT+
 q0ABPuYqTZ4PDGMaAB0gNRqmR9aQKpVMY+4xmTdmqscYkgV4usZQcrQOeiKVwh7F
 KdMC5tr4yFOMMl8CaMgOK+27ZrSYI1hwtXCc/orAhOwxhg62Z40=
 =tjcN
 -----END PGP SIGNATURE-----

Merge tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 updates from Dave Hansen:
 "The most significant are some changes to ensure that symbols exported
  for KVM are used only by KVM modules themselves, along with some
  related cleanups.

  In true x86/misc fashion, the other patch is completely unrelated and
  just enhances an existing pr_warn() to make it clear to users how they
  have tainted their kernel when something is mucking with MSRs.

  Summary:

   - Make MSR-induced taint easier for users to track down

   - Restrict KVM-specific exports to KVM itself"

* tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible
  x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs"
  x86/mtrr: Drop unnecessary export of "mtrr_state"
  x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base"
  x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
2025-12-02 14:16:42 -08:00
Linus Torvalds 54de197c9a * Allow security version (SVN) updates so enclaves can attest
to new microcode.
  * Fix kernel docs typos
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuGsMACgkQaDWVMHDJ
 krB/AhAAlRe+7+jNXp4Xtor9Q9VpTG5Q7WL7yLUHOd+JbsVKCcCLy2i86rrXEVvo
 AN3nQqkAotd1xx9P1SDz2SL4S0PS8zM3avj8JLppBb16VEiApFqUsXYMUJrFVjne
 x/+fEsKVOXp+kfM72qhI3/sqgq/zPUanm/6GR0wlpe2+TPOmnw588GgCV9ADW8ha
 ktLWaBGPHgyzNGAO5CqSSnN+2u25vRcyKX6mfUZeeooudPPVbANxKLN1WzBP2zZo
 eO5jbyeiPNH0bC1p6H5TQOl/bfktfhl99S+ta3e7+/PduDvDR01Z3kMl5JSb74hk
 75fXUI1R+/zUyXbyJ+exxTqAkS0F66m1Tv+DU9U4WorKjgAwYFPcdvjkJh/Ju6XB
 cqvwvLUE5rH2f3hzUmffSP8FwQO65svLsB4BIWd+3sgu6ZbAV4zSVOSb51W39Q7U
 WlAMYtlM6+bZF6RaJCweDUMq407KcHFqjN6rzX7x+VguNe/FMERRsGBz6kBT2FZH
 oxMzUXTwRfwSxblvVJMjjecfplywH5viHATbApMIGdamHWis8Dmvl0DMmBi4O6CI
 /O792TOHxIqLmxkh9SE7ONOCRe1hUfK2JY4n6sgNrBH8OwylPqkl2kgE5X3Cr9c1
 1qwMGfto5t0izr/SA5wruEJtPjQPsAyUKjfP+msyBemTg7dzTFU=
 =2lh5
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Dave HansenL
 "The main content here is adding support for the new EUPDATESVN SGX
  ISA. Before this, folks who updated microcode had to reboot before
  enclaves could attest to the new microcode. The new functionality lets
  them do this without a reboot.

  The rest are some nice, but relatively mundane comment and kernel-doc
  fixups.

  Summary:

   - Allow security version (SVN) updates so enclaves can attest to new
     microcode

   - Fix kernel docs typos"

* tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute
  x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h
  x86/sgx: Document structs and enums with '@', not '%'
  x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler
  x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run"
  x86/sgx: Enable automatic SVN updates for SGX enclaves
  x86/sgx: Implement ENCLS[EUPDATESVN]
  x86/sgx: Define error codes for use by ENCLS[EUPDATESVN]
  x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag
  x86/sgx: Introduce functions to count the sgx_(vepc_)open()
2025-12-02 14:03:05 -08:00
Linus Torvalds c76431e3b5 - Use the proper accessors when reading CR3 as part of the page level
transitions (5-level to 4-level, the use case being kexec) so that
   only the physical address in CR3 is picked up and not flags which are
   above the physical mask shift
 
 - Clean up and unify __phys_addr_symbol() definitions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkt8McACgkQEsHwGGHe
 VUpxJBAAg6PaKVNOmceCCcwDb331YLHpd18eeLy7Cdr6ktcdDflo39TiKnwy/BEs
 2uENe9OrS52JL98vMhZxPVFL/3yplrMo7jfuamthSEcFuvlxe2wh7NGhxbNl2gOe
 +9BpYTbHe5wts+W+ij/srcBCzGDIYoYhh7Dbc8wB1dh/jcH2qkEnYvBTGoYtgELF
 lWt1pWsdHVnUORn9qKNI3iAX47jmkUTBqEgQHyPFcSM6s8WGtIOKib7+UtvNiMTw
 V0ZMzfsL5k4J6ifwR5PLLaMNXdwQoZeArWbCA6VYhnOEP0MBmgLxFFCCi5z6iGwv
 ph+YYWm2/kMEOdJDfDlZqjZFcw/QOfk44chGMTqf+G3rFdNrHMdTiovtvzg6vGvG
 akJK5r2JsAJu8ymuwd3Rke3F3k1SP7QfdYB1Tipu4wvt7iSOQNqIA/xcHjMprHBx
 MZ6BifOxwXhhihUr9UA0TSQM6fJfnzrKPdzDSh/h5qThSpjbH/qkNlJwNGy/Knm5
 5MTftDkDkpkmJDiOhJAOCweMBGNyFQrOH1QYuqURrB+AGo3Iq9HIJ+2fVXtUdIZy
 AMmvEROjMRgxD2hoBCVa4AF5Gm3cNiGxGn+jEitdLgqVbTi0tWSO+oPOK+uH2Zib
 77r8hNmd9hE7ikHSRGhWS+5D3mVWejsDtrs8YyCMuXN/Ft2omRU=
 =+p5z
 -----END PGP SIGNATURE-----

Merge tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Borislav Petkov:

 - Use the proper accessors when reading CR3 as part of the page level
   transitions (5-level to 4-level, the use case being kexec) so that
   only the physical address in CR3 is picked up and not flags which are
   above the physical mask shift

 - Clean up and unify __phys_addr_symbol() definitions

* tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/libstub: Fix page table access in 5-level to 4-level paging transition
  x86/boot: Fix page table access in 5-level to 4-level paging transition
  x86/mm: Unify __phys_addr_symbol()
2025-12-02 13:32:52 -08:00
Linus Torvalds a9a10e920e - Convert the tsx= cmdline parsing to use early_param()
- Cleanup forward declarations gunk in bugs.c
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkt3IMACgkQEsHwGGHe
 VUqPxRAAkul5R9zZgR9XcV/ZRA6iulo4M0o38Z2COeWtlDtDXpAJ5vNarDBB1l+K
 ea9YzbfN4iECXNXOCtSrEr7SmJ2ld8xBSyzISeWx2V1AXi4A6A8GX574egtDDCaV
 vN+eTT1ki9YuJLhgi0eZ+uHkdoTLhChrBehuvlcHnEOXN1N9D/P8/QfDYG7IyCxu
 /jSnuXyQ3AA2uM3IMEAGLcAarI+cU4HgMsF2Za5Dp8SKhSbhpgpEfFkp8+p+bUO/
 nKLgUj2dqibFZd+hvhrwtrnsDL8rTJxLr0dVwg7MZeBea4GliUsaT8jBRBMQddIL
 DGjAR2niGeGM42DWvFUVZczcgdP0NI8ChpZ3zCjdICrZBGMkZe70iqcPaL0BELob
 toeBjlHIrlUnXIfZhVBYj59+E+q4yjSwvOd23FYqj1XKcgSR+j305dxo9jTAndLV
 M/f8o8au3tlccE308OQ/XgIkXFBlGLjrNlak32ze4P2nHFMTlYJBCbAFIti3qcpT
 Er7q866s7kzagphmZ/2vf0I7JuYI2le/8OnneVFz7l+SVXY1SNEpA7qRM5bA220J
 n2Orfaen3CgotX7c3yT/XP0c9Wqbd29LMXiQbOdGMMONv5O4aaWMsfRPhlCEYiTK
 VZ/FklfXrVxtSqa96zBny4MlvtKoYwYklk+McuGF5M4iDXGCthw=
 =kAho
 -----END PGP SIGNATURE-----

Merge tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 CPU mitigation updates from Borislav Petkov:

 - Convert the tsx= cmdline parsing to use early_param()

 - Cleanup forward declarations gunk in bugs.c

* tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Get rid of the forward declarations
  x86/tsx: Get the tsx= command line parameter with early_param()
  x86/tsx: Make tsx_ctrl_state static
2025-12-02 13:27:09 -08:00
Linus Torvalds cb502f0e5e - Largely cleanups along with a change to save XSS to the GHCB (Guest-Host
Communication Block) in SEV-ES guests so that the hypervisor can determine
    the guest's XSAVES buffer size properly and thus support shadow stacks in
    AMD confidential guests
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkttt0ACgkQEsHwGGHe
 VUoaFg/8CY+UAE1VtnaWhG7hpxCqBLlVtyt3gVhIn6ZCZ5mxtFoEcZI8BnxnFbbM
 Rpd+5LsMbu4GWfq/AEx/a+IT3rKZIT1RjRrd73JZZYRNpv6/Vnmv/7OizjDbBqhU
 n3Q1rHJCaVk90oP2sbB4dr9qsYHOx624jz0CrBxUM7GnaCQwtofqa6hK6HMJDu3g
 OLjJCoaHY0ry779QUjCmMJ1BbOLsy1fGsmuQO8LcE6xWRJv5ueJPcZbH0I0g5UIF
 NExe03uxaSvrM0ZYdjHpQU590kyPwjzo0Jx8IANQDb0dyY4mIFPdnZwbRBr/OPnZ
 205c0EllHZvDZ4nKTYfeJYjXnPWmovHXJATr/BuqW+0GQZYmTbDq2IgvbbnE9gs1
 67Sy94ISuxs683hNb9U2cLI7OFHcVDGfESuHhmeJTQsVY+VazL00p6azFP1ONpsn
 N93GYK+ZFvOeFFssO/gm97jbkKyUH9PS2+TEbhijeQkZ/PYKVbObM89LDLSRrKC7
 5mEFCZIKIehKLdSoAc8yTzKRE0piyK/PR6ykzP9A2rEjrSN2HDqvsR+nDr0hQ1/V
 Uye4a8V0XiHyZsvD+AIYJnARGYdcnUCiezep81WaC55hn0sdqrhlqnycnkd+7fDE
 MZF9epXCnIAA8IF7I5jCLfwTa1b6ouMhBxpwiMpaL9DGmQghWcM=
 =GCGn
 -----END PGP SIGNATURE-----

Merge tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SEV updates from Borislav Petkov:

 - Largely cleanups along with a change to save XSS to the GHCB
   (Guest-Host Communication Block) in SEV-ES guests so that the
   hypervisor can determine the guest's XSAVES buffer size properly
   and thus support shadow stacks in AMD confidential guests

* tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cc: Fix enum spelling to fix kernel-doc warnings
  x86/boot: Drop unused sev_enable() fallback
  x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled()
  x86/sev: Include XSS value in GHCB CPUID request
  x86/boot: Move boot_*msr helpers to asm/shared/msr.h
2025-12-02 13:07:53 -08:00
Linus Torvalds d748981834 - The mandatory pile of cleanups the cat drags in every merge window
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktqqMACgkQEsHwGGHe
 VUpBDg/8CT4uBJLqgQYoh9YYN9U/C1walUMLivHq9VJbld7a55nGCirUTBuiTRzp
 V2np/6V/XpG2YmV9Xj6E4Q6QLiOsfrWKFRzfr7OlPZ7yW5GR8aKBcEZm2Voi7j94
 qLVSraAjtfmTtU5Ym7Lp1fqswDW2Y+iU6zqIyotqy+/7qZ6yp4NwlrUwSocDsSLo
 n6Gh1Vv6fXdeMckzT1WJ/CMtx07IfQB/wqVVTO4WwBu1Pv71WO+Cv1pG2mewUVjK
 879X7+oc1icxCtD2OZxHfOEJGtC97N7cZDq6VEd4s/bjgYF75glyFtkv/DVgbQ6E
 BSkwciT5Sqb7B5N09RJWWJiNLlNQPFQdDFAuU/Sk18Vqc01jySFoWLLsiS/DeCsz
 opEnPK6uXO4m+2DI44dWpWgVj33a/ao6zej3uEPENkp9+FYaZPwip8Bdjn1FptBp
 ZGmqa7oy38MXTzV6hOctMAx/nXE05lff41Xe8fLlisNc7a6ZUwql9oA4A4yULtKI
 BMlddTZ5/N5LzOGVH5nixX+ig3wDqtEEOD1tdLb/f8Nhy71Z3Kmbf7zDuJCTaabn
 VAGaPUDZ4qfeMks1qRIVVR7rUo6PwRRNvi8Wyiauc7/fnxWyVR0z1vrqs2o3MtZJ
 WQ0/HAZZ/K54HWk049ea2b1kFjjOe6R5hKKX1Hc2nPIS3APD/Vc=
 =3uJO
 -----END PGP SIGNATURE-----

Merge tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cleanups from Borislav Petkov:

 - The mandatory pile of cleanups the cat drags in every merge window

* tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Clean up whitespace in a20.c
  x86/mm: Delete disabled debug code
  x86/{boot,mtrr}: Remove unused function declarations
  x86/percpu: Use BIT_WORD() and BIT_MASK() macros
  x86/cpufeatures: Correct LKGS feature flag description
  x86/idtentry: Add missing '*' to kernel-doc lines
2025-12-02 12:17:47 -08:00
Linus Torvalds 2ae20d6510 - Add support for AMD's Smart Data Cache Injection feature which allows
for direct insertion of data from I/O devices into the L3 cache, thus
   bypassing DRAM and saving its bandwidth; the resctrl side of the feature
   allows the size of the L3 used for data injection to be controlled
 
 - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA
   clustering
 
 - Other fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktpFQACgkQEsHwGGHe
 VUop4g/9GTb/5rcFMQzeGlG3USnJOqJ+SmiAalA9lm1c933en9tqUgL/K0C0xC6h
 yraB3ICuob1YayiZkBwKIOQiei9gmfhH/CGf5vLcZMM+D6fqvlk1D+C40SuFoDFV
 DOH3H2nYoJ3vbZRtRZsD3bv/djST/OVk28g7eY8OwpZIwN5VSFULJwjK1ePPy+nL
 l65s/yrgLY0oLDBCGxtJ9gVxjCBqAoqfbbwVbcJm5hXv+2sYk8BH6de/CU+0v/vo
 K6Qu4GbmWqDKYH9thjC4ZC/DPXjtoCxGkg/l1Af5T1PiZF0ZtgEZI6i9JTR33jYJ
 7j6BpkCwPzY07MKj/Ub1RemlMfY4XMN/qssEfFmnwG+aMBtbojNAjdb00Pu9Ffn+
 TKFKiZ6WBTcYhqPQsFVruwHh8wDbJp2/x/yBfjD4qovo1HuyCln4iGDmoFcU2wTD
 UlOXW89bxOT56A3FL77ElnOg9nRltvdKduOluGtkpSkmBbzmDfoXrhG2z9zuuAui
 FB6GT2c5MRVXEC4BY30xwQBG5MArVRMyz9uYDyXf9+KHhWVdmq9K0ZAkIaUmPCvy
 BvBXpRhfxm/dKJPhtSuUPhh5A+a87gqoiu1McaFoVGyjVJIJ5gflge8+/mLj1lQz
 kG56SnLOzdtcwKcmQ5ncv5EkrTBD1Ph12u1kcd+4IZwkpgGZteE=
 =o7Dg
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

 - Add support for AMD's Smart Data Cache Injection feature which allows
   for direct insertion of data from I/O devices into the L3 cache, thus
   bypassing DRAM and saving its bandwidth; the resctrl side of the
   feature allows the size of the L3 used for data injection to be
   controlled

 - Add Intel Clearwater Forest to the list of CPUs which support
   Sub-NUMA clustering

 - Other fixes and cleanups

* tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fs/resctrl: Update bit_usage to reflect io_alloc
  fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks
  fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID
  fs/resctrl: Introduce interface to display io_alloc CBMs
  fs/resctrl: Add user interface to enable/disable io_alloc feature
  fs/resctrl: Introduce interface to display "io_alloc" support
  x86,fs/resctrl: Implement "io_alloc" enable/disable handlers
  x86,fs/resctrl: Detect io_alloc feature
  x86/resctrl: Add SDCIAE feature in the command line options
  x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement
  fs/resctrl: Consider sparse masks when initializing new group's allocation
  x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
2025-12-02 11:55:58 -08:00
Linus Torvalds 2a47c26e55 - Add microcode staging support on Intel: it moves the sole microcode
blobs loading to a non-critical path so that microcode loading
   latencies are kept at minimum. The actual "directing" the hardware to
   load microcode is the only step which is done on the critical path.
   This scheme is also opportunistic as in: on a failure, the machinery
   falls back to normal loading
 
 - Add the capability to the AMD side of the loader to select one of two
   per-family/model/stepping patches: one is pre-Entrysign and the other
   is post-Entrysign; with the goal to take care of machines which
   haven't updated their BIOS yet - something they should absolutely do
   as this is the only proper Entrysign fix
 
 - Other small cleanups and fixlets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktjK4ACgkQEsHwGGHe
 VUqCzg/+NTMgw/cb6zvgXviUTTL62127q4YBr0G3AoNruYbWvdt65suK1pMoRUZL
 CDtflIjDTj8ZSIreXS6tUoFIAzsZUnPApUshHCXlHbK6hYbHDjQgkZme48P+AIqC
 kuP8zcqL0Epzv/Il/d9M8LEmP/0JUoACiibI5T0xMA5Ji9yw0njiHaHCBnrwXduy
 oNsTW8KSaGSaq+zbqa+cS7T06b6SNtUpAQyNSg4Jgj9u3+uPb3a9AfD81jGxUmYl
 SoM/gsiwYjujKV/ZAldnN6tOoRSECqeYLRT/J/Bbqe4zSM5gYh7TRg7N4AcZXKuY
 BLps8IbmiS6ZF2qziicJ7+zN35kXLeuVC+T4rq+IjvkTyH+eJsuGFnGYbXxCwV8A
 nkinSLtn6x0sebem/6H77OjNMLZU0zmLgWfiUfvgnXCErb7SZfs967aG8nxs5bDX
 CnEzS7/98sSkZm0yDSjp0TuXzo1PSGS9wcv30vOR4hClx42YmTZlBUJ5QHJQ9AB0
 1PNmLptwUk9rorTemAzB3Cstm490U7BEd32Od6b+NiIyKogL7uPJKHsQ2Q/t07tw
 ubBm5nFzIhCXWz9v5q1fkvInKAXytHdIN4OnzOPw+7jHF95Vpa2o22OBWaBaCRex
 96jCa4b6pPomxPD+LxdSSMtSihUa4PQz9VrrqnYn7vulumQ/YDo=
 =rxMs
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode loading updates from Borislav Petkov:

 - Add microcode staging support on Intel: it moves the sole microcode
   blobs loading to a non-critical path so that microcode loading
   latencies are kept at minimum. The actual "directing" the hardware to
   load microcode is the only step which is done on the critical path.

   This scheme is also opportunistic as in: on a failure, the machinery
   falls back to normal loading

 - Add the capability to the AMD side of the loader to select one of two
   per-family/model/stepping patches: one is pre-Entrysign and the other
   is post-Entrysign; with the goal to take care of machines which
   haven't updated their BIOS yet - something they should absolutely do
   as this is the only proper Entrysign fix

 - Other small cleanups and fixlets

* tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Mark early_parse_cmdline() as __init
  x86/microcode/AMD: Select which microcode patch to load
  x86/microcode/intel: Enable staging when available
  x86/microcode/intel: Support mailbox transfer
  x86/microcode/intel: Implement staging handler
  x86/microcode/intel: Define staging state struct
  x86/microcode/intel: Establish staging control logic
  x86/microcode: Introduce staging step to reduce late-loading time
  x86/cpu/topology: Make primary thread mask available with SMP=n
2025-12-02 11:35:49 -08:00
Linus Torvalds a61288200e - The second part of the AMD MCA interrupts rework after the last-minute
show-stopper from the last merge window was sorted out. After this,
   the AMD MCA deferred errors, thresholding and corrected errors
   interrupt handlers use common MCA code and are tightly integrated
   into the core MCA code, thereby getting rid of considerable
   duplication. All culminating into allowing CMCI error thresholding
   storms to be detected at AMD too, using the common infrastructure
 
 - Add support for two new MCA bank bits on AMD Zen6 which denote whether
   the error address logged is a system physical address, which obviates
   the need for it to be translated before further error recovery can be
   done
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktlV8ACgkQEsHwGGHe
 VUrfGRAAsoVknP8SPap1dFpT82+avi7knEnZ56zuwCxjXOSlXDbvsrAUFsS8Io4o
 sf60gyUnFEFLW551qXUJoSnuSjf0S63tKmnX6ebUXtxe6mVC5Y0l3VGHz8/ymbCV
 8tLFF1yx6qMEwE2WutuIIeKGdZjn4lpg2lvhtaZnzeUSBk/BQcANjPaVYKQZPx/Q
 mXqpfvJnEBxkP6gy9VlrKxkpPyR0obD2/RFcN1M5dEbk0q52KNtcwyjblYR2XmNB
 7SVmwAcRkH+7Icp14XgHZamAs9NMdAShaQ7Rov7OjEucTnot+Q5BO/3ftvFOzvGu
 GHiY4rSew6QtKv4MWIYVHGrxIm6o6Sco7EFmESEC9UDX/Ck60WAj1LY6v6jKEF0g
 nnbqxO1hoD0ygNApBXMYleut8eqiriJlXCrImlaldkG8iQqsmf11kEHagS9EVtk0
 X28/eCoyD14a90NqmY13hBf2xscU41jy+LxdYy7sisL3LC4rhGgBpE/5vd/Ynnlf
 HELeQA8/5bIOgcbVvOIFxQGC+pBwhrHxIIOF0Z6pJZzznUO2cTUepJaLgWXdne7P
 EFE30+tDfeIy/bbB6CmkPV19NW3jNlkZib28t7L9uMCShPKiaza+Qv0SgzfeEy6t
 IERhzgmPxJiJ/7fOtdCUDL8YTlisiZ9t9RbSKNbriL54JHjX+Mc=
 =TY7F
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 RAS updates from Borislav Petkov:

 - The second part of the AMD MCA interrupts rework after the
   last-minute show-stopper from the last merge window was sorted out.
   After this, the AMD MCA deferred errors, thresholding and corrected
   errors interrupt handlers use common MCA code and are tightly
   integrated into the core MCA code, thereby getting rid of
   considerable duplication. All culminating into allowing CMCI error
   thresholding storms to be detected at AMD too, using the common
   infrastructure

 - Add support for two new MCA bank bits on AMD Zen6 which denote
   whether the error address logged is a system physical address, which
   obviates the need for it to be translated before further error
   recovery can be done

* tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Handle AMD threshold interrupt storms
  x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems
  x86/mce: Add support for physical address valid bit
  x86/mce: Save and use APEI corrected threshold limit
  x86/mce/amd: Define threshold restart function for banks
  x86/mce/amd: Remove redundant reset_block()
  x86/mce/amd: Support SMCA Corrected Error Interrupt
  x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
  x86/mce: Unify AMD DFR handler with MCA Polling
  x86/mce: Unify AMD THR handler with MCA Polling
2025-12-02 11:04:37 -08:00
Linus Torvalds 49219bba01 - imh_edac: Add a new EDAC driver for Intel Diamond Rapids and
future incarnations of this memory controllers architecture
 
 - amd64_edac: Remove the legacy csrow sysfs interface which has been
   deprecated and unused (we assume) for at least a decade
 
 - Add the capability to fallback to BIOS-provided address translation
   functionality (ACPI PRM) which can be used on systems unsupported by
   the current AMD address translation library
 
 - The usual fixes, fixlets, cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktdyMACgkQEsHwGGHe
 VUpXTxAAhdQxn1v1tYKya6YHxBS3T3Y3+4fec+LeKgoY1YnoFHMse3TAU+G67opR
 1xnEKHKrkX4v1FAwe7eD2G6qyz2ytqcApv4XGxmQ1WgldFWuPl/lI3ngPNMCHMog
 dqeQFRQ7MXsk0no0cjMA6NjafFpYOGGGhIzdU3wvgZawH4hG9wHLS6Urvn2SfWj6
 Pf/449qS7XoPU5G22qWPqqixRHpc9BPkJfKMIYeaWbxldePlwbh9cOMLqwsZo1QV
 v5cv/3CAIVFzRvNVIx05kDhRrwqTjIZL+u9IYHg2g9DA45GQuktYQwd1KksbVpUn
 CijhpKMoSnQHN+ZLW84XzvEH2rvroSTZl28d5suY1GHXG3ePc9HpmTVbVElFXWKZ
 dq0X2RIbMEbSxneePFHJ4ESUfNN2HbPSfh/sXN4epxcMQI0VWVhXYs5+Ek4UV1+E
 hvhCS/kuAypODzEi0cULoMcXdyKr2V1zpaAHNlZshepp/kUzY46b3cBhxKiL3Fsd
 x+IhZgow9a+iMJfMpCJhMABKEkoZRgS3gs5nWMJ6t0EvulvknG+aovGB/Q0VaIIa
 H69Fn+R2ewnEuZf1JGZDMit1y+wjGgeamk+uWTym+tCyNH1eHaSq48POribajcYF
 UtcobK4kG7hPodsbwwD4MhqtSLhuyIcXTHbI3x4+r+LLAgdAPKM=
 =NidS
 -----END PGP SIGNATURE-----

Merge tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

 - imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future
   incarnations of this memory controllers architecture

 - amd64_edac: Remove the legacy csrow sysfs interface which has been
   deprecated and unused (we assume) for at least a decade

 - Add the capability to fallback to BIOS-provided address translation
   functionality (ACPI PRM) which can be used on systems unsupported by
   the current AMD address translation library

 - The usual fixes, fixlets, cleanups and improvements all over the
   place

* tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16()
  EDAC/igen6: Fix error handling in igen6_edac driver
  EDAC/imh: Setup 'imh_test' debugfs testing node
  EDAC/{skx_comm,imh}: Detect 2-level memory configuration
  EDAC/skx_common: Extend the maximum number of DRAM chip row bits
  EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers
  EDAC/skx_common: Prepare for skx_set_hi_lo()
  EDAC/skx_common: Prepare for skx_get_edac_list()
  EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev
  EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error()
  EDAC/ie31200: Fix error handling in ie31200_register_mci
  RAS/CEC: Replace use of system_wq with system_percpu_wq
  EDAC: Remove the legacy EDAC sysfs interface
  EDAC/amd64: Remove NUM_CONTROLLERS macro
  EDAC/amd64: Generate ctl_name string at runtime
  RAS/AMD/ATL: Require PRM support for future systems
  ACPI: PRM: Add acpi_prm_handler_available()
  RAS/AMD/ATL: Return error codes from helper functions
2025-12-02 10:45:50 -08:00
Linus Torvalds 7f8d5f70ff Tree wide cleanup of the remaining users of in_irq() which got replaced
by in_hardirq() and marked deprecated in 2020.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkvDhUTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoW+XD/959KAIm2JpcEYUWuNBmlhEyuYWvPLw
 ZyOiraLYBNyWmfCO/Yz4Ff8VZSR9gdWQoNfvBb8uxkbSXa0UOEUhCbzWsuoTnqR5
 ObTIHCJ9QmPlRiFDvs4Sf5TGmy/4nXh6/PoH3JykNdlD3rZMTxiAz/k6QuO/S2iu
 ykA+DNtNL7jDkQHzrWa3rf597BkBN1Z+hUD8zHRt8LYKRfmLYWjCMggjPLMnuqcn
 240fnV/FubCLd9f5ZgNxHQMQCQH2qB7GYMk08YwXwCZQqIIXWqbNnhedkkNO3kWq
 Sws4TEO6yg9pgTFqkuiDU5QgYEboRY4pDT45KSkdTHHGZl2OAAl3eVIGCto72UEI
 Eyzn4k900hZ1iI/Rad5mx3D4XJZEXFgEbXhjph0odn6jVvmSj+Fmg3J67u1niO2a
 obzB+xeaIkbGNQIgJFy8+A9SSnZckvuPlXdZdUxS2S95zH7f9+vBY8HWJMuyursa
 3AJAKa82mN1i3A9FdSuMTdttQWkDmrwPKVzxvixs1mBu7kB70XaRIKsPjZj7LH6X
 CiqP9Kt5FO0hVA7K+nKTeUA5DdjB4HzYzOgMqzFUhExY3hksVsj8rQEO6B0bCp9t
 CfITA3BvU7GXxhXZHOq3dABQ21J/ZHgeuK3QdQSnOxSQOv2ElYIdKvYirJy2QdS1
 tSM3O3GXb4zWDg==
 =6LKf
 -----END PGP SIGNATURE-----

Merge tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core irq cleanup from Thomas Gleixner:
 "Tree wide cleanup of the remaining users of in_irq() which got
  replaced by in_hardirq() and marked deprecated in 2020"

* tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide: Remove in_irq()
2025-12-02 10:18:49 -08:00
Linus Torvalds d42e504a55 Update to the time/timers core:
- Prevent a thundering herd problem when the timekeeper CPU is delayed
     and a large number of CPUs compete to acquire jiffies_lock to do the
     update. Limit it to one CPU with a separate "uncontended" atomic
     variable.
 
   - A set of improvements for the timer migration mechanism:
 
     - Support imbalanced NUMA trees correctly
 
     - Support dynamic exclusion of CPUs from the migrator duty to allow the
       cpuset/isolation mechanism to exclude them from handling timers of
       remote idle CPUs.
 
    - The usual small updates, cleanups and enhancements
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmks7doTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaxrD/40nxx+8cEXsVbVLIkP2PQbd2Y8+7sk
 YbNu/Cb7j7Bg7R8YIs4p5GHk+7Yt/hNsW77SmbAzRPUyYYG6L3bUYlBa3yQlvIuo
 xRPbzGA+RJies9skIGHbQ8z6ig1zUASRJPcBYiuaVIAuQhCfLNc4Nii9cEWtjZ24
 +5gfRwV+vy74ArWwRkwaGejDK1tav+gd62OkFQZC8WtjQ08ozGZ6VBJNg7nYq/gH
 FYO1rH2tQ/ZyjlO/x5NF8gFcjYD8iv5PDp8oH35MPx+XTdDccf0G3QB7ug0ffVdV
 b4gA6lZTAmpsu/NHb6ByN4i/kf3wf8la/i+EaAh/Ov7NW078gunvVKVA7jStcbBl
 ZgG5SRHiKRvQF/WXLGVQAnilRDZwRuS0nmJlqfExa44v23l5o3768RwdRYwQlv8g
 X5KSRl0jlVgVtZHgNBlZtgX9+rnQSr9sB5sVGBP2a6a1WhVXQV/2kp0wjdnU0mPw
 jLCnSdsHqBlSf9V7O/na823WCnBFb7blrLBXUoSbHBnICqtVFzhE1kBXWw3S7Kqh
 CiaWM+S4WfR0HRnUlWMTS8BZ82MgiDnd7nGUXWwXBbdqWmoj/9CoU6SZRjbMBkzi
 EY1XvmoYf6eSzdxfydI1hFi0/bbb8K9umHQlrpW3HeN9uXnVc0/+TroVPLuaKUdi
 53ClqXjzE+CpJg==
 =lQKn
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer core updates from Thomas Gleixner:

 - Prevent a thundering herd problem when the timekeeper CPU is delayed
   and a large number of CPUs compete to acquire jiffies_lock to do the
   update. Limit it to one CPU with a separate "uncontended" atomic
   variable.

 - A set of improvements for the timer migration mechanism:

     - Support imbalanced NUMA trees correctly

     - Support dynamic exclusion of CPUs from the migrator duty to allow
       the cpuset/isolation mechanism to exclude them from handling
       timers of remote idle CPUs

 - The usual small updates, cleanups and enhancements

* tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers/migration: Exclude isolated cpus from hierarchy
  cpumask: Add initialiser to use cleanup helpers
  sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  timers/migration: Use scoped_guard on available flag set/clear
  timers/migration: Add mask for CPUs available in the hierarchy
  timers/migration: Rename 'online' bit to 'available'
  selftests/timers/nanosleep: Add tests for return of remaining time
  selftests/timers: Clean up kernel version check in posix_timers
  time: Fix a few typos in time[r] related code comments
  time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
  hrtimer: Store time as ktime_t in restart block
  timers/migration: Remove dead code handling idle CPU checking for remote timers
  timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
  timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
  timers/migration: Fix imbalanced NUMA trees
  timers/migration: Remove locking on group connection
  timers/migration: Convert "while" loops to use "for"
  tick/sched: Limit non-timekeeper CPUs calling jiffies update
2025-12-02 09:58:33 -08:00
Linus Torvalds 5028f42416 Updates for clocksource and clockevent drivers:
- A new driver for the Realtel system timer
 
  - Prevent the unbinding of timers when the drivers do not support that.
 
  - Expand the timer counter readout for the SPRD driver to 64 bit to allow
    IOT devices suspend times of more than 36 hours, which is the current
    limit of the 32-bi readout
 
  - The usual small cleanups, fixes and enhancements all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksxAATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobIuD/9HNzi+SDKiWiwuhZEfwrk4IJY1k4uM
 yRrxQHt8yODKPq13M1eKNiXro3Tbhq6cLhECdQ6Rsf/g4Q0x+TeAl1M2CfHLMOJ5
 +VYNqAx7b63bkZIp1pJk8HJfn4e9itDKnqEgi0M20tIoG3K8fLtZfyIdiuqOsTia
 USWOdOqnPtwIOtVvMLPCmjYTh2FFHFxxcrQgoAW+1ACwOq/AkSSSAqKNcjEB7edH
 7C9IZpm6rCl+13ywMiHS5UsOYFWz1fOgSmQ1c7KPqx9PquMaJ7oZFAQgb2FF0xXJ
 S8DwTMKlwCO2Tq15XjmmCPLlvsGzZgVJkXhDsqyrDAZzOowqjHuT/HTrENLcE3K3
 /gS721vahsLWfJp229whKkT11RDgQOO2c/3cplsL2joUyrkDzW4sloYuu00gqWrJ
 mR9srdA7F3HeSACPb6rX64Rzg3m63P/zJ20h2uJt/JblIkZd+3kBTELM30GZRQbn
 z176KwiRPy0TDbN8pW1I4I1sLtG7zYhaEsASGZM9yH9uKYU1cLej1SmmbLqDs3oO
 e0+QyK+A4OzR43LiRltN4X3dJJ59uf+zru12WGjV85WxJsA4rN4/5q/S0xcpWR7b
 eQNXn/YZwppdlwxTg+n2RWSTzOFtvNm8nfnepxB2UqffOAa1Ah87AT3rPaUrCULj
 NwI9Fy4AY4IvVQ==
 =0426
 -----END PGP SIGNATURE-----

Merge tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull clocksource updates from Thomas Gleixner:
 "Updates for clocksource and clockevent drivers:

   - A new driver for the Realtel system timer

   - Prevent the unbinding of timers when the drivers do not support
     that

   - Expand the timer counter readout for the SPRD driver to 64 bit
     to allow IOT devices suspend times of more than 36 hours, which
     is the current limit of the 32-bi readout

   - The usual small cleanups, fixes and enhancements all over the
     place"

* tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers: Add Realtek system timer driver
  dt-bindings: timer: Add Realtek SYSTIMER
  clocksource/drivers/stm32-lp: Drop unused module alias
  clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC
  clocksource/drivers/nxp-stm: Prevent driver unbind
  clocksource/drivers/nxp-pit: Prevent driver unbind
  clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind
  clocksource/drivers/nxp-stm: Fix section mismatches
  clocksource/drivers/sh_cmt: Always leave device running after probe
  clocksource/drivers/stm: Fix double deregistration on probe failure
  clocksource/drivers/ralink: Fix resource leaks in init error path
  clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered
  clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit
2025-12-02 09:54:27 -08:00
Linus Torvalds 9ce62ebbb7 Updates for [PCI] MSI related code:
- Remove one variant of PCI/MSI management as all users have been
    converted to use per device domains. That reduces the variants to two:
 
    The modern and the real archaic legacy variant, which keeps the usual
    suspects in the museum category alive.
 
  - Rework the platform MSI device ID detection mechanism in the ARM GIC
    world to address resource leaks, duplicated code and other details. This
    requires a corresponding preparatory step in the PCI/iproc driver.
 
  - Trivial core code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkswn0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYpuD/wKT7d6I6AqnJVF/RhiJ+/d6vuX/aFW
 g6E7XAkMLKhmxunSNFfPzXsHy2a0oJroYKmDJH4C8GWGo/gXa+QvmDt2491k9rdV
 zM+CBodBu3/bXWvTW+o1fbyAvG+p2C3+iSRW/gGqzPdcY8gQiRnNOZS1j7zusMjO
 A6pz5SvLSPWQUnVl9PygJBuNX5TFHPnY3AySRpW11CvqB5/8gqGz+O6lT/Q+5hov
 GUC57hskbQd1PsYhTNRaUR4z7VMolPHqscp8DYVCWjOMP/r5quC6dlsn91yxuATU
 8D7oRiW8xkCaTJplY/rA6r/VxUthZ3EgIxzev3rGaWBdPxHcFfftf2oxyFFAf3lf
 3rEdfGBcNgApx+MCcoT5/3mf3KJfn2/bE6bZhwv94+dtbTlHguztyMD3vnGTS73i
 zPWQ5ae4M5sqc8kCNMRaBfU8yQEHEKs3gia67vStZyn5R/uUNVKRo67LBPZKVDcJ
 2511Ylnm62yG6PtdPGIFHY1i75uPpxXuS7F0BJignzM3iPvVvwLPZLDORr3/pR4q
 CmswZTA2obue6+nwz/LUacxzONsZ2Z8pzGY6rrT9sfj0Z4mk6xrfEPfjfmVoMpyk
 Dk4B8lIVYwcR7d/Sw+FIwYst8iw+L77Yn7kN8yCbh4lAOxBUUvtS5KAP6uPGe3D1
 30Q/DbBVlEvg/g==
 =VAtQ
 -----END PGP SIGNATURE-----

Merge tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull MSI updates from Thomas Gleixner:
 "Updates for [PCI] MSI related code:

   - Remove one variant of PCI/MSI management as all users have been
     converted to use per device domains. That reduces the variants to
     two:

     The modern and the real archaic legacy variant, which keeps the
     usual suspects in the museum category alive.

   - Rework the platform MSI device ID detection mechanism in the ARM
     GIC world to address resource leaks, duplicated code and other
     details. This requires a corresponding preparatory step in the
     PCI/iproc driver.

   - Trivial core code cleanups"

* tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-its: Rework platform MSI deviceID detection
  PCI: iproc: Implement MSI controller node detection with of_msi_xlate()
  genirq/msi: Slightly simplify msi_domain_alloc()
  PCI/MSI: Delete pci_msi_create_irq_domain()
2025-12-02 09:35:59 -08:00
Linus Torvalds 15b87bec89 Boring updates for interrupt drivers:
- Support for a couple of new ARM64 and RISCV SoC variants and their
     magic interrupt controllers which either can reuse existing code or
     require quirks due to a botched hardware implementation.
 
   - More section mismatch fixes.
 
   - The usual cleanups and fixes all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkswMYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZvSEACZdCx7vO2XX7oef7DxQ6EKFA/NQvd0
 xJGFTBlrxIucp26yUxWKkuDVdhu8WYe13zJG6+LVl9IxH3IIBa2duQ4HhIyqxuz6
 z74IDjBlOcKAHu2xLFJmBIS4vGTd6UPOg1KvSrIFd9oiuMXikphbnFgyrFGAFiSQ
 J1gP7mKZATUH08mTXK5k1pmBIbMjEHpyyTdBEJKoVgiN/MB/qsq95dy0Oxal+C13
 1cOKBaFreTMdX+77U5RucBcGaLHW4SdoaAVaqc/UXw2c2TAezbt/gPYexRpkdVaG
 2tuYTWIfCUuHbjUoOOYwI+ILnuiBMzjxlIUx3uSvcvtUVO4YuMDR4JOWVsevtfgI
 uUV+4OPq9kBI6PNqAyo16NhDdZ9rmjg3q14F9oyidQfR5gRbsZPPDmtCB/M2jbE1
 n3LlsHUJt0UYo8ZqCPrGhiw9hkGXv4wsEl10FKkyoNrQ0Y4SCUrdzGdr6vwhAAub
 yxMe1+BrFQT23R9l+qVrUZmDmpV9tlFNr6rPwtucrQX3PMWEfAeCc6a/vjY3eqJl
 sZ4pGyFEx0cwfKzHu1/SmNpnjSNdyc7niiN8HAQ7AnxzRW13fDdGQuuVGsKxHyJc
 Tke9wJsyUO4MxpSQDI+cmpsF8OeJDHuRDKMBdLFxlLPhABECdLUO0qKq9l0Ry/Ji
 uDkc3WvM14zKpw==
 =kdyt
 -----END PGP SIGNATURE-----

Merge tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq driver updates from Thomas Gleixner:
 "Boring updates for interrupt drivers:

   - Support for a couple of new ARM64 and RISCV SoC variants and their
     magic interrupt controllers which either can reuse existing code or
     require quirks due to a botched hardware implementation

   - More section mismatch fixes

   - The usual cleanups and fixes all over the place"

* tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs
  dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs
  dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count
  irqchip/aclint-sswi: Add Nuclei UX900 support
  dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI
  dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI
  dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC
  irqchip/irq-bcm7038-l1: Remove unused reg_mask_status()
  irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path
  irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC
  irqchip/sifive-plic: Cache the interrupt enable state
  dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC
  dt-bindings: vendor-prefixes: Add UltraRISC
  irqchip/qcom-irq-combiner: Rename driver structure
  irqchip/riscv-imsic: Inline imsic_vector_from_local_id()
  irqchip/riscv-imsic: Embed the vector array in lpriv
  irqchip/riscv-imsic: Remove redundant irq_data lookups
  irqchip/ts4800: Drop unused module alias
  irqchip/mvebu-pic: Drop unused module alias
  irqchip/meson-gpio: Drop unused module alias
  ...
2025-12-02 09:32:53 -08:00
Linus Torvalds 6863c8385c Updates for the interrupt core and treewide cleanups:
- Rework of the Per Processor Interrupt (PPI) management on ARM[64].
 
     PPI support was built under the assumption that the systems are
     homogenous so that the same CPU local device types are connected to
     them. That's unfortunately wishful thinking and created horrible
     workarounds.
 
     This rework provides affinity management for PPIs so that they can be
     individually configured in the firmware tables and mops up the related
     drivers all over the place.
 
   - Prevent CPUSET/isolation changes to arbitrarily affine interrupt
     threads to random CPUs, which ignores user or driver settings.
 
   - Plug a harmless race in the interrupt affinity proc interface, which
     allows to see a half updated mask
 
   - Adjust the priority of secondary interrupt threads on RT, so that the
     combination of primary and secondary thread emulates the hardware
     interrupt plus thread scenario. Having them at the same priority can
     cause starvation issues in some drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksv3oTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoe5+D/wNnBaX9LRajuLOF+zaYw5WZxkzp6U7
 X4AP3cLny8xynI1kM5V8M1ym3Fspk0hiqxNX2LLXrSZzBR+3O4uGCyCceBXeHKo2
 vW4auUXG4MB+2sZyudQXaBpNK4A2YBubycTUcRECjkjDkBPAWgN7J+Oz2lXUSUcH
 zlitlHNo48hnZQPAJr4PDpi5q9+rChn+8/s+K1d8NlEf9HOXC98qzyMuMq+jHdJE
 AQ6tKoHkA5lHjHAUY3AbWptoHo1Wp+p5PSqsrFr6nbKuPlhUqRNEPXX0Z8q7aUTj
 NgdkvIHJVJ0C+T40FIWCNzUYOUk4gTQXBSPvptwJSHAmf9ovp+Kg2ltVZBzyL2iI
 R0EZSQAQU8iJcRrqjcAYqI36LkmwwVT6RD1zFa98xJT/AjsMpAt/U1pEMDtkoTKe
 Lv7ZQ/hloc+4wV4xS4zEtoV/ukdUfA9aEdXsh5hNH/07tvatpKO2LgortsiI+lCK
 76vAULcGvbMr5Jr63snjICgstahunpNMRn2HmnGAjmdZf4+g+TDvZR4DI6bswtuO
 jp5G6OM30Z9zKheAr1VioV1XAKr6Y4jDKVjfFy/n1k5pDwYaSJopmZxSD35aas4e
 VqWizAzc5dAVCYRlzr6S1lrMQ2JJRg0RpIn+sMS8dhf9SK7hs5ilGSOsgX1fgVat
 1N3WXvYM8vSW+g==
 =zrA1
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq core updates from Thomas Gleixner:
 "Updates for the interrupt core and treewide cleanups:

   - Rework of the Per Processor Interrupt (PPI) management on ARM[64]

     PPI support was built under the assumption that the systems are
     homogenous so that the same CPU local device types are connected to
     them. That's unfortunately wishful thinking and created horrible
     workarounds.

     This rework provides affinity management for PPIs so that they can
     be individually configured in the firmware tables and mops up the
     related drivers all over the place.

   - Prevent CPUSET/isolation changes to arbitrarily affine interrupt
     threads to random CPUs, which ignores user or driver settings.

   - Plug a harmless race in the interrupt affinity proc interface,
     which allows to see a half updated mask

   - Adjust the priority of secondary interrupt threads on RT, so that
     the combination of primary and secondary thread emulates the
     hardware interrupt plus thread scenario. Having them at the same
     priority can cause starvation issues in some drivers"

* tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  genirq: Remove cpumask availability check on kthread affinity setting
  genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
  genirq: Prevent early spurious wake-ups of interrupt threads
  genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier()
  genirq/manage: Reduce priority of forced secondary interrupt handler
  genirq/proc: Fix race in show_irq_affinity()
  genirq: Fix percpu_devid irq affinity documentation
  perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer
  irqdomain: Kill of_node_to_fwnode() helper
  genirq: Kill irq_{g,s}et_percpu_devid_partition()
  irqchip: Kill irq-partition-percpu
  irqchip/apple-aic: Drop support for custom PMU irq partitions
  irqchip/gic-v3: Drop support for custom PPI partitions
  coresight: trbe: Request specific affinities for per CPU interrupts
  perf: arm_spe_pmu: Request specific affinities for per CPU interrupts
  perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts
  genirq: Add request_percpu_irq_affinity() helper
  genirq: Allow per-cpu interrupt sharing for non-overlapping affinities
  genirq: Update request_percpu_nmi() to take an affinity
  genirq: Add affinity to percpu_devid interrupt requests
  ...
2025-12-02 09:14:26 -08:00
Linus Torvalds 312f5b1866 Two small updates for debugobjects:
- Allow pool refill on RT enabled kernels before the scheduler is up
       and running to prevent pool exhaustion
 
     - Correct the lockdep override to prevent false positives.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksu/UTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoScuD/9g3OtG29VZ2uNkOhgvGKuAThxZ+Y4d
 8YPNT/X9SuSevuwBCc9zXgpc5S4Af40ndRbcsiZ38t/xrInE6+J6qPJ7BbKCTiac
 cYvNz4ibMx1qz35BPtym4RnJyZA2EHX2hVIFGCfdh4MkILI7r3OPjemX542epZAW
 8MdKu5WZJNa8KYIvUE1UZdDtH1imxU7jdYBkr1ockN66+HMjRKHxcPwrhTCFJeCT
 N4DHOQ+hf9NzipHpRppDmqwkzQCOyKrOojXht00rG92QXIzmZRepH93cCFi/nW1d
 8aUjHU6myNQa65VkFDM2I2bpzCzlK7HpBU3iNXEkXPLZ8bVrYMP9koK+SXIa+Gj0
 icXdJwBe9uOKQOaG6MRSO2hn3fHO0m+PjZGtQFg7EqFCaY0J+8tv9k3WttDDpfMg
 hjXjyJ0U9T+/YUuSDBLdPczIJZr8eGh960SF0OTshHGGVOCGJt4dlvoC0NtUxN8x
 WQ/he9K/Cyz7U6yr1aNO6hAfqX/+6c0ZhD3OONuC9xgxHUkjPdlEe1ntLbdfn92z
 VygbJaguvRdzkAeaAlXNNU5WTNvm3ZeLPqDnnUHUlDW1f7hF0KwCrfZUW0PqdB76
 +94ptMeIlCv53zIEKamHuALGp7WtGddGGzaZLH8rUnPxfiff+JiMhXtV0ioMuUNG
 jpdlyBMXK+s0PA==
 =dUo2
 -----END PGP SIGNATURE-----

Merge tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobjects update from Thomas Gleixner:
 "Two small updates for debugobjects:

   - Allow pool refill on RT enabled kernels before the scheduler is up
     and running to prevent pool exhaustion

   - Correct the lockdep override to prevent false positives"

* tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP
  debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING
2025-12-02 09:07:48 -08:00
Linus Torvalds 2b09f480f0 A large overhaul of the restartable sequences and CID management:
The recent enablement of RSEQ in glibc resulted in regressions which are
   caused by the related overhead. It turned out that the decision to invoke
   the exit to user work was not really a decision. More or less each
   context switch caused that. There is a long list of small issues which
   sums up nicely and results in a 3-4% regression in I/O benchmarks.
 
   The other detail which caused issues due to extra work in context switch
   and task migration is the CID (memory context ID) management. It also
   requires to use a task work to consolidate the CID space, which is
   executed in the context of an arbitrary task and results in sporadic
   uncontrolled exit latencies.
 
   The rewrite addresses this by:
 
   - Removing deprecated and long unsupported functionality
 
   - Moving the related data into dedicated data structures which are
     optimized for fast path processing.
 
   - Caching values so actual decisions can be made
 
   - Replacing the current implementation with a optimized inlined variant.
 
   - Separating fast and slow path for architectures which use the generic
     entry code, so that only fault and error handling goes into the
     TIF_NOTIFY_RESUME handler.
 
   - Rewriting the CID management so that it becomes mostly invisible in the
     context switch path. That moves the work of switching modes into the
     fork/exit path, which is a reasonable tradeoff. That work is only
     required when a process creates more threads than the cpuset it is
     allowed to run on or when enough threads exit after that. An artificial
     thread pool benchmarks which triggers this did not degrade, it actually
     improved significantly.
 
     The main effect in migration heavy scenarios is that runqueue lock held
     time and therefore contention goes down significantly.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksaRYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoencEADA5he8PAFPmSRRPo6+2G5mHzWe8kIU
 5ZViQStWFNAA0qqy8VXryWiJ6qqrO6la9o7K4YOXASUtlkVjquRp1DF7PabqGwuy
 zshbRCXNlT51J8uqanN8VrGVjlf+bMdHDbGoI1SLkUTxG8b+kDD5PXUQE1ARelPP
 Slbg9u+EMrxj6D5MDTPbuW6TqryJEkPtiNScyOz43emp9ww9+WVxenOcRqU4D+Th
 mjWmrGIzkroSf4XReMoD/wg9TPTpUjXnNCwl2viY9JvBpkMfYtU4tJAGK3aNFOWy
 zsAN0O9CaFGrUEFne7qUmtwhNLdtnjx5HN5pe7yZd1EhdTuQKq4jPiiQnwwm8w72
 c0o6m45FNPmPoSyfaZWCkLjbTEUXonT9JF61iN35JVxim8gBDDJjHFKnLxDmLrH3
 X0eESE48ReY2EneDV6Y8RJRo6oG14Fccvc39aTf/2Rw3trpmtt2agvConQzupQIg
 DzANw4jhUUzFRrHrMHACNsqKFXh9ratue/S9DM3xxTpGO/bKdeK7jGIgzNf8O34M
 J0O6Hvk5jMdcWlIJTx21GoGzoSkkXnR49g/71aCcp+MwdY4x9zFz5SWi8LWQRmkx
 xRo6tY27Bma8/SEwMJjPpAUXDTpq6v+j3cPisybL1yGsyt9lh+p8LX7VUtwcoEqe
 6ZelC5Kgw/+/kg==
 =n5KT
 -----END PGP SIGNATURE-----

Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull rseq updates from Thomas Gleixner:
 "A large overhaul of the restartable sequences and CID management:

  The recent enablement of RSEQ in glibc resulted in regressions which
  are caused by the related overhead. It turned out that the decision to
  invoke the exit to user work was not really a decision. More or less
  each context switch caused that. There is a long list of small issues
  which sums up nicely and results in a 3-4% regression in I/O
  benchmarks.

  The other detail which caused issues due to extra work in context
  switch and task migration is the CID (memory context ID) management.
  It also requires to use a task work to consolidate the CID space,
  which is executed in the context of an arbitrary task and results in
  sporadic uncontrolled exit latencies.

  The rewrite addresses this by:

   - Removing deprecated and long unsupported functionality

   - Moving the related data into dedicated data structures which are
     optimized for fast path processing.

   - Caching values so actual decisions can be made

   - Replacing the current implementation with a optimized inlined
     variant.

   - Separating fast and slow path for architectures which use the
     generic entry code, so that only fault and error handling goes into
     the TIF_NOTIFY_RESUME handler.

   - Rewriting the CID management so that it becomes mostly invisible in
     the context switch path. That moves the work of switching modes
     into the fork/exit path, which is a reasonable tradeoff. That work
     is only required when a process creates more threads than the
     cpuset it is allowed to run on or when enough threads exit after
     that. An artificial thread pool benchmarks which triggers this did
     not degrade, it actually improved significantly.

     The main effect in migration heavy scenarios is that runqueue lock
     held time and therefore contention goes down significantly"

* tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/mmcid: Switch over to the new mechanism
  sched/mmcid: Implement deferred mode change
  irqwork: Move data struct to a types header
  sched/mmcid: Provide CID ownership mode fixup functions
  sched/mmcid: Provide new scheduler CID mechanism
  sched/mmcid: Introduce per task/CPU ownership infrastructure
  sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
  sched/mmcid: Provide precomputed maximal value
  sched/mmcid: Move initialization out of line
  signal: Move MMCID exit out of sighand lock
  sched/mmcid: Convert mm CID mask to a bitmap
  cpumask: Cache num_possible_cpus()
  sched/mmcid: Use cpumask_weighted_or()
  cpumask: Introduce cpumask_weighted_or()
  sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
  sched/mmcid: Move scheduler code out of global header
  sched: Fixup whitespace damage
  sched/mmcid: Cacheline align MM CID storage
  sched/mmcid: Use proper data structures
  sched/mmcid: Revert the complex CID management
  ...
2025-12-02 08:48:53 -08:00
Linus Torvalds 1dce50698a Scoped user mode access and related changes:
- Implement the missing u64 user access function on ARM when
    CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in
    generic code with [unsafe_]get_user(). All other architectures and ARM
    variants provide the relevant accessors already.
 
  - Ensure that ASM GOTO jump label usage in the user mode access helpers
    always goes through a local C scope label indirection inside the
    helpers. This is required because compilers are not supporting that a
    ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit
    the cleanup invocation and CLANG fails the build.
 
    This provides generic wrapper macros and the conversion of affected
    architecture code to use them.
 
  - Scoped user mode access with auto cleanup
 
    Access to user mode memory can be required in hot code paths, but if it
    has to be done with user controlled pointers, the access is shielded
    with a speculation barrier, so that the CPU cannot speculate around the
    address range check. Those speculation barriers impact performance quite
    significantly. This can be avoided by "masking" the provided pointer so
    it is guaranteed to be in the valid user memory access range and
    otherwise to point to a guaranteed unpopulated address space. This has
    to be done without branches so it creates an address dependency for the
    access, which the CPU cannot speculate ahead.
 
    This results in repeating and error prone programming patterns:
 
      	    if (can_do_masked_user_access())
                     from = masked_user_read_access_begin((from));
             else if (!user_read_access_begin(from, sizeof(*from)))
                     return -EFAULT;
             unsafe_get_user(val, from, Efault);
             user_read_access_end();
             return 0;
       Efault:
             user_read_access_end();
             return -EFAULT;
 
     which can be replaced with scopes and automatic cleanup:
 
             scoped_user_read_access(from, Efault)
                     unsafe_get_user(val, from, Efault);
             return 0;
        Efault:
             return -EFAULT;
 
  - Convert code which implements the above pattern over to
    scope_user.*.access(). This also corrects a couple of imbalanced
    masked_*_begin() instances which are harmless on most architectures, but
    prevent PowerPC from implementing the masking optimization.
 
  - Add a missing speculation barrier in copy_from_user_iter()
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksRfITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVhBEACEySjWcyCrD1e0ZFMFAOJZFI2BShav
 reotzCzmHYQdpVukDRxc64BgM2vN4yB04xnyMhi2o4hSTiIJhz1NzbKggsQJhVoA
 psYz+xEI161HuLZnUBUBuF9RRko/HVsbGqO2JFCuOKor4GCycvjVgupR3EIN9h5T
 HZEWGIgaTmN7MBj0QRrJgJkaaSTnPKOwWaNMV/F9pfk27zuB7vuV8WM9P3FaJYG+
 JGa9td7VGaBpWavxgMJqfdvXWBCVDDfZ1dunWx8tPTnLxKZZZD6HlfQXhZTr2n1e
 rtJpGgfVBx5Uqxn4RrhS0I7QeK1b9rrt3IU7EkFoaa3Z8LU5B7cHlm7KyicyoHhy
 SzFFUszssznT/0OhA5fmgPRlqI295HynW2p1L4Xy9hC0EZ2vXJPG5rO6X3x6QwSR
 asjRB7x/6JzWQUzE7/nhXd9KcB66wvQxhnjp7GqulF74aPBCtIdXXDD68YEDYkbi
 dPC3NRBr0ePbsGVGWbYvYIPWcvo1u814C2io1zKwmVbiN6lCYURgQK861vfAZUP8
 oP5D2a6ENgezDKoJo6eJ82inuDu64qZy7OOkU/aO3cbOuWGVyY9CjYD11x85Nr0k
 UNabSOfvcmhmobtYUiAgLLrjX1grQUG3F74ZQTw513mwgMObuDAAoS11GPjY6HL6
 b99WUJRv8jP66A==
 =6no0
 -----END PGP SIGNATURE-----

Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scoped user access updates from Thomas Gleixner:
 "Scoped user mode access and related changes:

   - Implement the missing u64 user access function on ARM when
     CONFIG_CPU_SPECTRE=n.

     This makes it possible to access a 64bit value in generic code with
     [unsafe_]get_user(). All other architectures and ARM variants
     provide the relevant accessors already.

   - Ensure that ASM GOTO jump label usage in the user mode access
     helpers always goes through a local C scope label indirection
     inside the helpers.

     This is required because compilers are not supporting that a ASM
     GOTO target leaves a auto cleanup scope. GCC silently fails to emit
     the cleanup invocation and CLANG fails the build.

     [ Editor's note: gcc-16 will have fixed the code generation issue
       in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm
       goto jumps [PR122835]"). But we obviously have to deal with clang
       and older versions of gcc, so.. - Linus ]

     This provides generic wrapper macros and the conversion of affected
     architecture code to use them.

   - Scoped user mode access with auto cleanup

     Access to user mode memory can be required in hot code paths, but
     if it has to be done with user controlled pointers, the access is
     shielded with a speculation barrier, so that the CPU cannot
     speculate around the address range check. Those speculation
     barriers impact performance quite significantly.

     This cost can be avoided by "masking" the provided pointer so it is
     guaranteed to be in the valid user memory access range and
     otherwise to point to a guaranteed unpopulated address space. This
     has to be done without branches so it creates an address dependency
     for the access, which the CPU cannot speculate ahead.

     This results in repeating and error prone programming patterns:

       	    if (can_do_masked_user_access())
                      from = masked_user_read_access_begin((from));
              else if (!user_read_access_begin(from, sizeof(*from)))
                      return -EFAULT;
              unsafe_get_user(val, from, Efault);
              user_read_access_end();
              return 0;
        Efault:
              user_read_access_end();
              return -EFAULT;

      which can be replaced with scopes and automatic cleanup:

              scoped_user_read_access(from, Efault)
                      unsafe_get_user(val, from, Efault);
              return 0;
         Efault:
              return -EFAULT;

   - Convert code which implements the above pattern over to
     scope_user.*.access(). This also corrects a couple of imbalanced
     masked_*_begin() instances which are harmless on most
     architectures, but prevent PowerPC from implementing the masking
     optimization.

   - Add a missing speculation barrier in copy_from_user_iter()"

* tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
  scm: Convert put_cmsg() to scoped user access
  iov_iter: Add missing speculation barrier to copy_from_user_iter()
  iov_iter: Convert copy_from_user_iter() to masked user access
  select: Convert to scoped user access
  x86/futex: Convert to scoped user access
  futex: Convert to get/put_user_inline()
  uaccess: Provide put/get_user_inline()
  uaccess: Provide scoped user access regions
  arm64: uaccess: Use unsafe wrappers for ASM GOTO
  s390/uaccess: Use unsafe wrappers for ASM GOTO
  riscv/uaccess: Use unsafe wrappers for ASM GOTO
  powerpc/uaccess: Use unsafe wrappers for ASM GOTO
  x86/uaccess: Use unsafe wrappers for ASM GOTO
  uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
  ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-02 08:01:39 -08:00
Borislav Petkov (AMD) e2349c5811 Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-12-01 12:06:08 +01:00
Harry Fellowes d911fe6e94 x86/boot: Clean up whitespace in a20.c
Remove trailing whitespace on empty lines.

No functional changes.

  [ bp: Massage commit message. ]

Signed-off-by: Harry Fellowes <harryfellowes1@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20250825192832.6444-3-harryfellowes1@gmail.com
2025-11-28 20:29:52 +01:00
Rafael J. Wysocki 7cede21e9f Merge branches 'pm-qos' and 'pm-tools'
Merge PM QoS updates and a cpupower utility update for 6.19-rc1:

 - Introduce and document a QoS limit on CPU exit latency during wakeup
   from suspend-to-idle (Ulf Hansson)

 - Add support for building libcpupower statically (Zuo An)

* pm-qos:
  Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
  cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
  sched: idle: Respect the CPU system wakeup QoS limit for s2idle
  pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
  pmdomain: Respect the CPU system wakeup QoS limit for s2idle
  PM: QoS: Introduce a CPU system wakeup QoS limit

* pm-tools:
  tools/power/cpupower: Support building libcpupower statically
2025-11-28 16:50:45 +01:00
Catalin Marinas edde060637 Merge branch 'for-next/set_memory' into for-next/core
* for-next/set_memory:
  : Fix + documentation for the arm64 change_memory_common()
  arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
  arm64/pageattr: Propagate return value from __change_memory_common
2025-11-28 15:48:03 +00:00
Catalin Marinas 52c4d1d624 Merge branch 'for-next/sysreg' into for-next/core
* for-next/sysreg:
  : arm64 sysreg updates/cleanups
  arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
  KVM: arm64: selftests: Consider all 7 possible levels of cache
  KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
  arm64/sysreg: Add ICH_VMCR_EL2
  arm64/sysreg: Move generation of RES0/RES1/UNKN to function
  arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
  arm64/sysreg: Fix checks for incomplete sysreg definitions
  arm64/sysreg: Replace TCR_EL1 field macros
2025-11-28 15:47:53 +00:00
Catalin Marinas 17c05cb0ef Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: arm_spe: Add support for filtering on data source
  perf: Add perf_event_attr::config4
  perf/imx_ddr: Add support for PMU in DB (system interconnects)
  perf/imx_ddr: Get and enable optional clks
  perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
  dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
  arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
  perf/arm-ni: Fix and optimise register offset calculation
  perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
  perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
  perf/arm-ni: Add NoC S3 support
  perf/arm_cspmu: nvidia: Add pmevfiltr2 support
  perf/arm_cspmu: nvidia: Add revision id matching
  perf/arm_cspmu: Add pmpidr support
  perf/arm_cspmu: Add callback to reset filter config
  perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores

* for-next/misc:
  : Miscellaneous patches
  arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
  arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
  arm64: mm: use untagged address to calculate page index
  arm64: mm: make linear mapping permission update more robust for patial range
  arm64/mm: Elide TLB flush in certain pte protection transitions
  arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
  arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
  arm64: add unlikely hint to MTE async fault check in el0_svc_common
  arm64: acpi: add newline to deferred APEI warning
  arm64: entry: Clean out some indirection
  arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
  arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
  arm64: remove unused ARCH_PFN_OFFSET
  arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
  arm64: Remove assertion on CONFIG_VMAP_STACK

* for-next/kselftest:
  : arm64 kselftest patches
  kselftest/arm64: Align zt-test register dumps

* for-next/efi-preempt:
  : arm64: Make EFI calls preemptible
  arm64/efi: Call EFI runtime services without disabling preemption
  arm64/efi: Move uaccess en/disable out of efi_set_pgd()
  arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
  arm64/fpsimd: Permit kernel mode NEON with IRQs off
  arm64/fpsimd: Don't warn when EFI execution context is preemptible
  efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
  efi: Add missing static initializer for efi_mm::cpus_allowed_lock

* for-next/assembler-macro:
  : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers
  arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
  arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers

* for-next/typos:
  : Random typo/spelling fixes
  arm64: Fix double word in comments
  arm64: Fix typos and spelling errors in comments

* for-next/sme-ptrace-disable:
  : Support disabling streaming mode via ptrace on SME only systems
  kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
  kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
  arm64/sme: Support disabling streaming mode via ptrace on SME only systems

* for-next/local-tlbi-page-reused:
  : arm64, mm: avoid TLBI broadcast if page reused in write fault
  arm64, tlbflush: don't TLBI broadcast if page reused in write fault
  mm: add spurious fault fixing support for huge pmd

* for-next/mpam: (34 commits)
  : Basic Arm MPAM driver (more to follow)
  MAINTAINERS: new entry for MPAM Driver
  arm_mpam: Add kunit tests for props_mismatch()
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Use long MBWU counters if supported
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Consider overflow in bandwidth counter state
  arm_mpam: Track bandwidth counter state for power management
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Allow configuration to be applied and restored during cpu online
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Register and enable IRQs
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Reset MSC controls from cpuhp callbacks
  arm_mpam: Merge supported features during mpam_enable() into mpam_class
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Add helpers for managing the locking around the mon_sel registers
  ...

* for-next/acpi:
  : arm64 acpi updates
  ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()

* for-next/documentation:
  : arm64 Documentation updates
  Documentation/arm64: Fix the typo of register names
2025-11-28 15:47:12 +00:00
Rafael J. Wysocki 638757c9c9 Merge branches 'pm-em' and 'pm-opp'
Merge energy model management updates and operating performance points
(OPP) library changes for 6.19-rc1:

 - Add support for sending netlink notifications to user space on energy
   model updates (Changwoo Mini, Peng Fan)

 - Minor improvements to the Rust OPP interface (Tamir Duberstein)

 - Fixes to scope-based pointers in the OPP library (Viresh Kumar)

* pm-em:
  PM: EM: Add to em_pd_list only when no failure
  PM: EM: Notify an event when the performance domain changes
  PM: EM: Implement em_notify_pd_created/updated()
  PM: EM: Implement em_notify_pd_deleted()
  PM: EM: Implement em_nl_get_pd_table_doit()
  PM: EM: Implement em_nl_get_pds_doit()
  PM: EM: Add an iterator and accessor for the performance domain
  PM: EM: Add a skeleton code for netlink notification
  PM: EM: Add em.yaml and autogen files
  PM: EM: Expose the ID of a performance domain via debugfs
  PM: EM: Assign a unique ID when creating a performance domain

* pm-opp:
  rust: opp: simplify callers of `to_c_str_array`
  OPP: Initialize scope-based pointers inline
  rust: opp: fix broken rustdoc link
2025-11-28 16:44:00 +01:00
Dev Jain 0c2988aaa4 arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
Consider the following code path:

(1) vmalloc -> (2) set_vm_flush_reset_perms -> (3) set_memory_ro/set_memory_rox
-> .... (4) use the mapping .... -> (5) vfree -> (6) vm_reset_perms
-> (7) set_area_direct_map.
Or, it may happen that we encounter failure at (3) and directly jump to (5).

In both cases, (7) may fail due to linear map split failure. But, we care
about its success *only* for the region which got successfully changed by
(3). Such a region is guaranteed to be pte-mapped.

The TLDR is that (7) will surely succeed for the regions we care about.

Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-28 15:36:40 +00:00
Dev Jain e5efd56fa1 arm64/pageattr: Propagate return value from __change_memory_common
The rodata=on security measure requires that any code path which does
vmalloc -> set_memory_ro/set_memory_rox must protect the linear map alias
too. Therefore, if such a call fails, we must abort set_memory_* and caller
must take appropriate action; currently we are suppressing the error, and
there is a real chance of such an error arising post commit a166563e7e
("arm64: mm: support large block mapping when rodata=full"). Therefore,
propagate any error to the caller.

Fixes: a166563e7e ("arm64: mm: support large block mapping when rodata=full")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-28 15:36:40 +00:00
Rafael J. Wysocki bf7ae1773e Merge branches 'pm-cpuidle' and 'pm-powercap'
Merge cpuidle and power capping updates for 6.19-rc1:

 - Use residency threshold in polling state override decisions in the
   menu cpuidle governor (Aboorva Devarajan)

 - Add sanity check for exit latency and target residency in the cpufreq
   core (Rafael Wysocki)

 - Use this_cpu_ptr() where possible in the teo governor (Christian
   Loehle)

 - Rework the handling of tick wakeups in the teo cpuidle governor to
   increase the likelihood of stopping the scheduler tick in the cases
   when tick wakeups can be counted as non-timer ones (Rafael Wysocki)

 - Fix a reverse condition in the teo cpuidle governor and drop a
   misguided target residency check from it (Rafael Wysocki)

 - Clean up muliple minor defects in the teo cpuidle governor (Rafael
   Wysocki)

 - Update header inclusion to make it follow the Include What You Use
   principle (Andy Shevchenko)

 - Enable MSR-based RAPL PMU support in the intel_rapl power capping
   driver and arrange for using it on the Panther Lake and Wildcat Lake
   processors (Kuppuswamy Sathyanarayanan)

 - Add support for Nova Lake and Wildcat Lake processors to the
   intel_rapl power capping driver (Kaushlendra Kumar, Srinivas
   Pandruvada)

* pm-cpuidle:
  cpuidle: Warn instead of bailing out if target residency check fails
  cpuidle: Update header inclusion
  cpuidle: governors: teo: Add missing space to the description
  cpuidle: governors: teo: Simplify intercepts-based state lookup
  cpuidle: governors: teo: Fix tick_intercepts handling in teo_update()
  cpuidle: governors: teo: Rework the handling of tick wakeups
  cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
  cpuidle: governors: teo: Use s64 consistently in teo_update()
  cpuidle: governors: teo: Drop redundant function parameter
  cpuidle: governors: teo: Drop misguided target residency check
  cpuidle: teo: Use this_cpu_ptr() where possible
  cpuidle: Add sanity check for exit latency and target residency
  cpuidle: menu: Use residency threshold in polling state override decisions

* pm-powercap:
  powercap: intel_rapl: Enable MSR-based RAPL PMU support
  powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
  powercap: intel_rapl: Add support for Nova Lake processors
  powercap: intel_rapl: Add support for Wildcat Lake platform
2025-11-28 16:29:41 +01:00
Rafael J. Wysocki 1fe2523713 Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.19-rc1:

 - Add OPP and bandwidth support for Tegra186 (Aaron Kling)

 - Optimizations for parameter array handling in the amd-pstate cpufreq
   driver (Mario Limonciello)

 - Fix for mode changes with offline CPUs in the amd-pstate cpufreq
   driver (Gautham Shenoy)

 - Preserve freq_table_sorted across suspend/hibernate in the cpufreq
   core (Zihuan Zhang)

 - Adjust energy model rules for Intel hybrid platforms in the
   intel_pstate cpufreq driver and improve printing of debug messages
   in it (Rafael Wysocki)

 - Replace deprecated strcpy() in cpufreq_unregister_governor()
   (Thorsten Blum)

 - Fix duplicate hyperlink target errors in the intel_pstate cpufreq
   driver documentation and use :ref: directive for internal linking in
   it (Swaraj Gaikwad, Bagas Sanjaya)

 - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq
   driver (Kuppuswamy Sathyanarayanan)

 - Use mutex guard for driver locking in the intel_pstate driver and
   eliminate some code duplication from it (Rafael Wysocki)

 - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra
   Kumar)

 - Minor improvements to various cpufreq drivers (Christian Marangi, Hal
   Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)

* pm-cpufreq: (27 commits)
  cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
  cpufreq: ACPI: Replace udelay() with usleep_range()
  cpufreq: intel_pstate: Eliminate some code duplication
  cpufreq: intel_pstate: Use mutex guard for driver locking
  cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs
  cpufreq/amd-pstate: Add static asserts for EPP indices
  cpufreq/amd-pstate: Fix some whitespace issues
  cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status()
  cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL
  cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string
  cpufreq/amd-pstate: Use sysfs_match_string() for epp
  cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users
  cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM
  Documentation: intel-pstate: Use :ref: directive for internal linking
  cpufreq: intel_pstate: Add Diamond Rapids OOB mode support
  Documentation: intel_pstate: fix duplicate hyperlink target errors
  cpufreq: CPPC: Don't warn if FIE init fails to read counters
  cpufreq: nforce2: fix reference count leak in nforce2
  cpufreq: tegra186: add OPP support and set bandwidth
  cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
  ...
2025-11-28 16:15:38 +01:00
Rafael J. Wysocki f086594adb Merge branch 'pm-sleep'
Merge updates related to system suspend and hibernation for 6.19-rc1:

 - Replace snprintf() with scnprintf() in show_trace_dev_match()
   (Kaushlendra Kumar)

 - Fix memory allocation error handling in pm_vt_switch_required()
   (Malaya Kumar Rout)

 - Introduce CALL_PM_OP() macro and use it to simplify code in
   generic PM operations (Kaushlendra Kumar)

 - Add module param to backtrace all CPUs in the device power management
   watchdog (Sergey Senozhatsky)

 - Rework message printing in swsusp_save() (Rafael Wysocki)

 - Make it possible to change the number of hibernation compression
   threads (Xueqin Luo)

 - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo)

 - Add document on debugging shutdown hangs to PM documentation and
   correct a mistaken configuration option in it (Mario Limonciello)

 - Shut down wakeup source timer before removing the wakeup source from
   the list (Kaushlendra Kumar, Rafael Wysocki)

 - Introduce new PMSG_POWEROFF event for system shutdown handling with
   the help of PM device callbacks (Mario Limonciello)

 - Make pm_test delay interruptible by wakeup events (Riwen Lu)

 - Clean up kernel-doc comment style usage in the core hibernation
   code and remove unuseful comments from it (Sunday Adelodun, Rafael
   Wysocki)

 - Add support for handling wakeup events and aborting the suspend
   process while it is syncing file systems (Samuel Wu, Rafael Wysocki)

* pm-sleep: (21 commits)
  PM: hibernate: Extra cleanup of comments in swap handling code
  PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
  PM: sleep: Add support for wakeup during filesystem sync
  PM: hibernate: Clean up kernel-doc comment style usage
  PM: suspend: Make pm_test delay interruptible by wakeup events
  usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks
  scsi: Add PM_EVENT_POWEROFF into suspend callbacks
  PM: Introduce new PMSG_POWEROFF event
  PM: wakeup: Update after recent wakeup source removal ordering change
  PM: wakeup: Delete timer before removing wakeup source from list
  Documentation: power: Correct a mistaken configuration option
  Documentation: power: Add document on debugging shutdown hangs
  freezer: Clarify that only cgroup1 freezer uses PM freezer
  PM: hibernate: add sysfs interface for hibernate_compression_threads
  PM: hibernate: make compression threads configurable
  PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads
  PM: hibernate: Rework message printing in swsusp_save()
  PM: dpm_watchdog: add module param to backtrace all CPUs
  PM: sleep: Introduce CALL_PM_OP() macro to simplify code
  PM: console: Fix memory allocation error handling in pm_vt_switch_required()
  ...
2025-11-28 16:01:13 +01:00
Rafael J. Wysocki 60d69a7ed1 Merge branches 'pm-core' and 'pm-runtime'
Merge a core power management update and runtime PM framework updates
for 6.19-rc1:

 - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari)

 - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use
   them in the PCI core and the ACPI TAD driver (Rafael Wysocki)

 - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki)

 - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki)

 - Fix typos in runtime.c comments (Malaya Kumar Rout)

* pm-core:
  PM: WQ_UNBOUND added to pm_wq workqueue

* pm-runtime:
  PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
  ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
  PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR()
  PM: runtime: fix typos in runtime.c comments
  ACPI: TAD: Improve runtime PM using guard macros
  ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
  PM: runtime: docs: Update pm_runtime_allow/forbid() documentation
2025-11-28 15:56:09 +01:00
Rafael J. Wysocki af47d98064 Merge branches 'acpi-misc' and 'pnp'
Merge miscellaneous ACPI support updates and a PNP update for 6.19-rc1:

 - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI Rust
   code (Siyuan Huang)

 - Update the ACPI code to use the new style of allocating workqueues
   and new global workqueues (Marco Crivellari)

 - Fix two spelling mistakes in the ACPI code (Chu Guangqing)

 - Fix ISAPNP to generate uevents to auto-load modules (René Rebe)

* acpi-misc:
  ACPI: PM: Fix a spelling mistake
  ACPI: LPSS: Fix a spelling mistake
  ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users
  ACPI: OSL: replace use of system_wq with system_percpu_wq
  ACPI: scan: replace use of system_unbound_wq with system_dfl_wq
  rust: acpi: replace `core::mem::zeroed` with `pin_init::zeroed`

* pnp:
  PNP: Fix ISAPNP to generate uevents to auto-load modules
2025-11-28 15:08:38 +01:00
Rafael J. Wysocki ba9aeba053 Merge branches 'acpi-tad', 'acpi-fan', 'acpi-dptf' and 'acpi-tools'
Merge updates of the ACPI time and alarm device (TAD) driver, ACPI fan
driver, ACPI DPTF code and an ACPI utility update for 6.19-rc1:

 - Improve runtime PM in the ACPI time and alarm device (TAD) driver
   using guard macros and rearrange code related to runtime PM in
   acpi_tad_remove() (Rafael Wysocki)

 - Add support for Microsoft fan extensions to the ACPI fan driver along
   with notification support and work around a 64-bit firmware bug in
   that driver (Armin Wolf)

 - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code (Kaushlendra
   Kumar)

 - Fix a memory leak and a resource leak in the ACPI pfrut utility (Malaya
   Kumar Rout)

* acpi-tad:
  ACPI: TAD: Improve runtime PM using guard macros
  ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()

* acpi-fan:
  ACPI: fan: Add support for Microsoft fan extensions
  ACPI: fan: Add hwmon notification support
  ACPI: fan: Add basic notification support
  ACPI: fan: Workaround for 64-bit firmware bug

* acpi-dptf:
  ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation

* acpi-tools:
  ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c
2025-11-28 15:01:07 +01:00
Rafael J. Wysocki 24d268add6 Merge branches 'acpica', 'acpi-property', 'acpi-pm' and 'acpi-battery'
Merge an ACPICA change, device ACPI properties handling update, ACPI
power management updates, and an ACPI battery driver update for
6.19-rc1:

 - Avoid walking the ACPI namespace in the AML interpreter if the
   starting node cannot be determined (Cryolitia PukNgae)

 - Use min() instead of min_t() in the ACPI device properties handling
   code to avoid discarding significant bits (David Laight)

 - Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
   that may prevent the parent fwnode from being released (Haotian Zhang)

 - Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove
   unnecessary contitionals from it to make it easier to follow, and make
   acpi_get_next_subnode() static (Sakari Ailus)

 - Drop unused function acpi_get_lps0_constraint(), make some Low-Power
   S0 callback functions for suspend-to-idle static, and rearrange the
   code retrieving Low-Power S0 constraits so it only runs when the
   constraits are actually used (Rafael Wysocki)

 - Drop redundant locking from the ACPI battery driver (Rafael Wysocki)

* acpica:
  ACPICA: Avoid walking the Namespace if start_node is NULL

* acpi-property:
  ACPI: property: use min() instead of min_t()
  ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
  ACPI: property: Rework acpi_graph_get_next_endpoint()
  ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only
  ACPI: property: Make acpi_get_next_subnode() static

* acpi-pm:
  ACPI: PM: s2idle: Only retrieve constraints when needed
  ACPI: PM: s2idle: Staticise LPS0 callback functions
  ACPI: PM: s2idle: Drop acpi_get_lps0_constraint()

* acpi-battery:
  ACPI: battery: Drop redundant locking
2025-11-28 14:44:36 +01:00
Rafael J. Wysocki 63d26c3811 - Document the RZ/V2H TSU DT bindings (Ovidiu Panait)
- Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu
   Pallikunhi)
 
 - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek
 Vasut)
 
 - Convert to DEFINE_SIMPLE_DEV_PM_OPS in the R-Car [Gen3] (Geert
   Uytterhoeven)
 
 - Fix format string bug in thermal-engine (Malaya Kumar Rout)
 
 - Make ipq5018 tsens standalone compatible (George Moussalem)
 
 - Add the QCS8300 compatible for the QCom Tsens (Gaurav Kohli)
 
 - Add the support for the NXP i.MX91 thermal module, including the DT
   bindings (Pengfei Li)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmkocvMACgkQqDIjiipP
 6E/vOQgAncWodGutUfBwvXePKEiR6vgGJK1MExTpgwhPmbJRttEtQhXcG6/Rk76e
 hA4q+/Cags5rU/ydK5Qor8PcAmC/Y98Xs9fepWoygKB1fClWejVftTBSANgHDLnD
 Txz12/F9FXxy957mm0NEKgVUkMYt/cipzj0c0y5c4xPVyVFYenVG0YTitCNlmTsI
 ouBZgKXCZ1lYnGsRwebNcODnkvMKzjOeN5PetcI5ThSGF7SWcgJb9bKRTKS+kJ+r
 1j68xJrjh/viYv7qrpaOCMaYbQcb38yPncPOWHLMAx2AIWVceOcmKWEypA2dT63z
 agHB9jkhf/8kUuymB6tWSA4zjACLsw==
 =fLGN
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal control changes for 6.19-rc1 from Daniel Lezcano:

"- Document the RZ/V2H TSU DT bindings (Ovidiu Panait)

 - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu
   Pallikunhi)

 - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut)

 - Convert to DEFINE_SIMPLE_DEV_PM_OPS in the R-Car [Gen3] (Geert
   Uytterhoeven)

 - Fix format string bug in thermal-engine (Malaya Kumar Rout)

 - Make ipq5018 tsens standalone compatible (George Moussalem)

 - Add the QCS8300 compatible for the QCom Tsens (Gaurav Kohli)

 - Add the support for the NXP i.MX91 thermal module, including the DT
   bindings (Pengfei Li)

* tag 'thermal-v6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit
  dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module
  dt-bindings: thermal: tsens: Add QCS8300 compatible
  dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible
  tools/thermal/thermal-engine: Fix format string bug in thermal-engine
  thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment
  dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor
  dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU
2025-11-28 13:02:50 +01:00
Rafael J. Wysocki 9fac2a114b Merge back ACPI processor driver changes for 6.19 2025-11-28 12:59:01 +01:00
Ben Horgan 27abb1ee5a arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
The define ARM64_FEATURE_FIELD_BITS is now unused and feature id
fields don't always have 4 bits. Remove it.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:17:59 +00:00
Ben Horgan 4138cc63d3 KVM: arm64: selftests: Consider all 7 possible levels of cache
In test_clidr() if an empty cache level is not found then the TEST_ASSERT
will not fire. Fix this by considering all 7 possible levels when iterating
through the hierarchy. Found by inspection.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:16:46 +00:00
Ben Horgan bf09ee9180 KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
ARM64_FEATURE_FIELD_BITS is set to 4 but not all ID register fields are 4
bits. See for instance ID_AA64SMFR0_EL1. The last user of this define,
ARM64_FEATURE_FIELD_BITS, is the set_id_regs selftest. Its logic assumes
the fields aren't a single bits; assert that's the case and stop using the
define. As there are no more users, ARM64_FEATURE_FIELD_BITS is removed
from the arm64 tools sysreg.h header. A separate commit removes this from
the kernel version of the header.

Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Oliver Upton <oupton@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:16:46 +00:00
Seongsu Park c86d9f8764 arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
The ATOMIC_FETCH_OP_AND and ATOMIC64_FETCH_OP_AND macros accept 'mb' and
'cl' parameters but never use them in their implementation. These macros
simply delegate to the corresponding andnot functions, which handle the
actual atomic operations and memory barriers.

Signed-off-by: Seongsu Park <sgsu.park@samsung.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27 18:15:24 +00:00
Sebastian Andrzej Siewior 37de2dbc31 debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP
fill_pool_map is used to suppress nesting violations caused by acquiring
a spinlock_t (from within the memory allocator) while holding a
raw_spinlock_t. The used annotation is wrong.

LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t.
LD_WAIT_CONFIG is for lock type which are sleeping while spinning on
PREEMPT_RT such as spinlock_t.

Use LD_WAIT_CONFIG as override.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251127153652.291697-3-bigeasy@linutronix.de
2025-11-27 16:55:34 +01:00
Sebastian Andrzej Siewior 06e0ae988f debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING
The pool of free objects is refilled on several occasions such as object
initialisation. On PREEMPT_RT refilling is limited to preemptible
sections due to sleeping locks used by the memory allocator. The system
boots with disabled interrupts so the pool can not be refilled.

If too many objects are initialized and the pool gets empty then
debugobjects disables itself.

Refiling can also happen early in the boot with disabled interrupts as
long as the scheduler is not operational. If the scheduler can not
preempt a task then a sleeping lock can not be contended.

Allow to additionally refill the pool if the scheduler is not
operational.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251127153652.291697-2-bigeasy@linutronix.de
2025-11-27 16:55:34 +01:00
Rafael J. Wysocki 5e8b7b58b2 Update devfreq next for v6.19
Detailed description for this pull request:
 - Move governor.h under include/linux/ and rename to devfreq-governor.h
   in order to allow devfreq governor definitions in out of drivers/devfreq/.
 
 - Fix potential use-after-free issue of OPP handling on hisi_uncore_freq.c
 
 - Use min() to improve the readability on tegra30-devfreq.c
 
 - Fix typo in DFSO_DOWNDIFFERENTIAL macro name on governor_simpleondemand.c
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAmkoZ+wACgkQnM3fLN7r
 z1NDrhAAtHlmhFgMQdGO8gfvKHRWQ4qzrJkiNGlU2iwXznCB38bNIGHdXAV5hPaL
 p0N+HeW2JgQRD2fjkB3gez5IJXDPo7xu3YJQ6uTuAY/k/AFzoz6HJR97luBZhPH3
 38a10cAJM9wbUd/ssrSHQMjBGr4S+teAuT99fBn+VpNzUzNs7XGfcqvm3cwP48QX
 4sv77IGZn1noXkVppq2UjBxjL4f4njcTEuOnWyVzEsdgYqDVc34MR1wImUx1ypFy
 uzxyOvOWfE0J955wisBltBPdb3WCRH9ZoRdmcCq+KUlgjtRvfK9B0KUccOgyACIq
 WTIar61Ct2hY1bLD/P57g7qH/jO/3/7usNjeaHr+OacKie4tAFqj8IdYB7RcwDfu
 iqYzTD+C+10kAUaLLlqwYap7pmW+0I7gLK/oUVRy1qNvfkA13v/i2uMdUgXm6flJ
 /4cjPDTq1P8Y5ZOpu4FKSho7Lj2e2e1Oz6bspJ3Ny0PZZupN7LlW19h0TwMIX4g+
 1PaKOsxeM9Tj6SG8IdxYcvKnfmxeY4KFBCenmURVqhy+xizkBl/ekD1gPyuKF467
 XWgppWWU3RQ0P5VXcFxvi8q6a9yOhFeqF5RJGmpJA0LXYeI6riBMYeI4VTgfiOeO
 ABii33JZt7dktydlD7RDS6hECfzoGYqR5wFk4K+lmzLP4BPLcZU=
 =dEyH
 -----END PGP SIGNATURE-----

Merge tag 'devfreq-next-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux

Pull devfreq changes for v6.19 from Chanwoo Choi:

"- Move governor.h under include/linux/ and rename to devfreq-governor.h
   in order to allow devfreq governor definitions in out of drivers/devfreq/.

 - Fix potential use-after-free issue of OPP handling on hisi_uncore_freq.c

 - Use min() to improve the readability on tegra30-devfreq.c

 - Fix typo in DFSO_DOWNDIFFERENTIAL macro name on governor_simpleondemand.c"

* tag 'devfreq-next-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
  PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
  PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
  PM / devfreq: hisi: Fix potential UAF in OPP handling
  PM / devfreq: Move governor.h to a public header location
2025-11-27 16:28:16 +01:00
Brendan Jackman 3d1f108845 x86/mm: Delete disabled debug code
This code doesn't run. Since 2008:

  4f9c11dd49 ("x86, 64-bit: adjust mapping of physical pagetables to work with Xen")

the kernel has gained more flexible logging and tracing capabilities;
presumably if anyone wanted to take advantage of this log message they would
have got rid of the "if (0)" so they could use these capabilities.

Since they haven't, just delete it.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251003-x86-init-cleanup-v1-1-f2b7994c2ad6@google.com
2025-11-27 14:32:16 +01:00
Chu Guangqing a508939e15 ACPI: PM: Fix a spelling mistake
Fix spelling by replacing "interrups" with "interrupts".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251125022403.2614-1-chuguangqing@inspur.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-27 14:11:19 +01:00
Chu Guangqing 037dada8bb ACPI: LPSS: Fix a spelling mistake
Fix spelling by replacing "successfull" with "successful".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251125021431.2243-1-chuguangqing@inspur.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-27 14:08:26 +01:00
René Rebe 17e7972979 ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4
On all AMD AM4 systems I have seen, e.g ASUS X470-i, Pro WS X570 Ace
and equivalent Gigabyte, amd-pstate does not initialize when the
x2apic is enabled in the BIOS. Kernel debug messages include:

[    0.315438] acpi LNXCPU:00: Failed to get CPU physical ID.
[    0.354756] ACPI CPPC: No CPC descriptor for CPU:0
[    0.714951] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled

I tracked this down to map_x2apic_id() checking device_declaration
passed in via the type argument of acpi_get_phys_id() via
map_madt_entry() while map_lapic_id() does not.

It appears these BIOSes use Processor statements for declaring the CPUs
in the ACPI namespace instead of processor device objects (which should
have been used). CPU declarations via Processor statements were
deprecated in ACPI 6.0 that was released 10 years ago. They should not
be used any more in any contemporary platform firmware.

I tried to contact Asus support multiple times, but never received a
reply nor did any BIOS update ever change this.

Fix amd-pstate w/ x2apic on am4 by allowing map_x2apic_id() to work with
CPUs declared via Processor statements for IDs less than 255, which is
consistent with ACPI 5.0 that still allowed Processor statements to be
used for declaring CPUs.

Fixes: 7237d3de78 ("x86, ACPI: add support for x2apic ACPI extensions")
Signed-off-by: René Rebe <rene@exactco.de>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251126.165513.1373131139292726554.rene@exactco.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-26 18:06:25 +01:00
Heiko Carstens 283f90b50d watchdog: diag288_wdt: Remove KMSG_COMPONENT macro
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message
catalog" from 2008 [1] which never made it upstream.

The macro was added to s390 code to allow for an out-of-tree patch which
used this to generate unique message ids. Also this out-of-tree doesn't
exist anymore.

Remove the macro in order to get rid of a pointless indirection.

[1] https://lwn.net/Articles/292650/

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-26 17:34:52 +01:00
Pengfei Li c411d8bf06 thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit
Introduce support for the i.MX91 thermal monitoring unit, which features a
single sensor for the CPU. The register layout differs from other chips,
necessitating the creation of a dedicated file for this.

This sensor provides a resolution of 1/64°C (6-bit fraction). For actual
accuracy, refer to the datasheet, as it varies depending on the chip grade.
Provide an interrupt for end of measurement and threshold violation and
Contain temperature threshold comparators, in normal and secure address
space, with direction and threshold programmability.

Datasheet Link: https://www.nxp.com/docs/en/data-sheet/IMX91CEC.pdf

Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251020-imx91tmu-v7-2-48d7d9f25055@nxp.com
2025-11-26 15:51:28 +01:00
Pengfei Li f32aedc575 dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module
Add bindings documentation for i.MX91 thermal modules.

Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251020-imx91tmu-v7-1-48d7d9f25055@nxp.com
2025-11-26 15:51:16 +01:00
Gaurav Kohli 1ee90870ce dt-bindings: thermal: tsens: Add QCS8300 compatible
Add compatibility string for the thermal sensors on QCS8300 platform.

Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Link: https://patch.msgid.link/20250822042316.1762153-2-quic_gkohli@quicinc.com
2025-11-26 15:50:59 +01:00
Thomas Gleixner 2437f79880 - Use 64-bits for timer compensation for IoT usage where the suspend
time is much longer than what 32-bits can provide (Enlin Mu)
 
 - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou)
 
 - Fix missing resource release on error in the probe path of in the
   ralink driver (Haotian Zhang)
 
 - Fix double deregistration on probe failure in the NXP STM driver
   (Johan Hovold)
 
 - Disable runtime PM for the Renesas SH CMT timer because it is
   incompatible with PREEMPT_RT=y (Niklas Söderlund)
 
 - Fix section mismatches in the NXP STM driver (Johan Hovold)
 
 - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as
   the code does not suppport bind/unbind (Johan Hovold)
 
 - Use the clocksource instead of ticks on the RDA8810PL platform
   (Enlin Mu)
 
 - Drop the unused module alias for the STM32-LP (Johan Hovold)
 
 - Add Realtek system timer driver (Hao-Wen Ting)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmkm6t4ACgkQqDIjiipP
 6E8xqQgAlXnV3vRJmEbjd3ILECvbKMLI2haHV2eA+75P+DvbfriL+ePMHkfkOPI6
 CC5UhCSy410cQLO88tzy5+9K8Po2KnHxb+lVS2P6zzcdefL5ZWMZ9Q+CAOwSo1s9
 An1A4nUgcTB52mAR+jlz++SF1VV/fMvskMrtiTg8bSIScSc+xi4sEC3GaZR09qSG
 RODtzmVsyeoHQ1u6ziRJen8GzpX1q6vUP0eAAr+vXqTUXdCuUL8P20h2mwzxPJWH
 mFo53OuKVbTMOoY1Av7euvO1ZZ1tsHsS4NxJfD1qatq+eh1As1dxYodB4dp44qZt
 jjnVuj0QrE40VB6EnHAA4kKb6WWpow==
 =WXsR
 -----END PGP SIGNATURE-----

Merge tag 'timers-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource

Pull clocksource/event changes from Daniel Lezcano:

    - Use 64-bits for timer compensation for IoT usage where the suspend
      time is much longer than what 32-bits can provide (Enlin Mu)

    - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou)

    - Fix missing resource release on error in the probe path of in the
      ralink driver (Haotian Zhang)

    - Fix double deregistration on probe failure in the NXP STM driver
      (Johan Hovold)

    - Disable runtime PM for the Renesas SH CMT timer because it is
      incompatible with PREEMPT_RT=y (Niklas Söderlund)

    - Fix section mismatches in the NXP STM driver (Johan Hovold)

    - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as
      the code does not suppport bind/unbind (Johan Hovold)

    - Use the clocksource instead of ticks on the RDA8810PL platform
      (Enlin Mu)

    - Drop the unused module alias for the STM32-LP (Johan Hovold)

    - Add Realtek system timer driver (Hao-Wen Ting)

Link: https://lore.kernel.org/all/9303b790-28d4-4bd9-b01d-28fb05493596@linaro.org
2025-11-26 15:36:52 +01:00
Rafael J. Wysocki 6e757fd548 Merge back ACPI processor driver changes for 6.19 2025-11-26 13:56:30 +01:00
Heiko Carstens 1c93edfd50 s390/entry: Use lay instead of aghik
Use the lay instruction instead of aghik. aghik is only available since
z196, therefore compiling the kernel for z10 results in this error:

   arch/s390/kernel/entry.S: Assembler messages:
   arch/s390/kernel/entry.S:165: Error: Unrecognized opcode: `aghik'

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511261518.nBbQN5h7-lkp@intel.com/
Fixes: f5730d44e0 ("s390: Add stackprotector support")
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-26 12:28:23 +01:00
Hao-Wen Ting d1780dce95 clocksource/drivers: Add Realtek system timer driver
Add a system timer driver for Realtek SoCs.

This driver registers the 1 MHz global hardware counter on Realtek
platforms as a clock event device. Since this hardware counter starts
counting automatically after SoC power-on, no clock initialization is
required. Because the counter does not stop or get affected by CPU power
down, and it supports oneshot mode, it is typically used as a tick
broadcast timer.

Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251126060110.198330-3-haowen.ting@realtek.com
2025-11-26 11:25:15 +01:00
Hao-Wen Ting 40caba2bd0 dt-bindings: timer: Add Realtek SYSTIMER
The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter
operating at a fixed 1MHz frequency. Thanks to its compare match
interrupt capability, the timer natively supports oneshot mode for tick
broadcast functionality.

Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://patch.msgid.link/20251126060110.198330-2-haowen.ting@realtek.com
2025-11-26 11:25:15 +01:00
Johan Hovold ed92a968a9 clocksource/drivers/stm32-lp: Drop unused module alias
The driver cannot be built as a module so drop the unused platform
module alias.

Note that platform aliases are not needed for OF probing should it ever
become possible to build the driver as a module.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251111154516.1698-1-johan@kernel.org
2025-11-26 11:25:15 +01:00
Enlin Mu 627f3f3716 clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC
The current system log timestamp accuracy is tick based, which can not
meet the usage requirements and needs to reach nanoseconds.
Therefore, the sched_clock_register function needs to be added.

[ dlezcano: Fixed typos ]

Signed-off-by: Enlin Mu <enlin.mu@unisoc.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251107063347.3692-1-enlin.mu@linux.dev
2025-11-26 11:25:11 +01:00
Johan Hovold 6a2416892e clocksource/drivers/nxp-stm: Prevent driver unbind
Clockevents cannot be deregistered so suppress the bind attributes to
prevent the driver from being unbound and releasing the underlying
resources after registration.

Even if the driver can currently only be built-in, also switch to
builtin_platform_driver() to prevent it from being unloaded should
modular builds ever be enabled.

Fixes: cec32ac758 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251111153226.579-4-johan@kernel.org
2025-11-26 11:25:03 +01:00
Johan Hovold e25f964cf4 clocksource/drivers/nxp-pit: Prevent driver unbind
The driver does not support unbinding (e.g. as clockevents cannot be
deregistered) so suppress the bind attributes to prevent the driver from
being unbound and rebound after registration (and disabling the timer
when reprobing fails).

Even if the driver can currently only be built-in, also switch to
builtin_platform_driver() to prevent it from being unloaded should
modular builds ever be enabled.

Fixes: bee33f22d7 ("clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251111153226.579-3-johan@kernel.org
2025-11-26 11:24:57 +01:00
Johan Hovold 6aa10f0e2e clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind
Clockevents cannot be deregistered so suppress the bind attributes to
prevent the driver from being unbound and releasing the underlying
resources after registration.

Fixes: 4891f01527 ("clocksource/drivers/arm_arch_timer: Add standalone MMIO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20251111153226.579-2-johan@kernel.org
2025-11-26 11:24:47 +01:00
Johan Hovold b452d2c97e clocksource/drivers/nxp-stm: Fix section mismatches
Platform drivers can be probed after their init sections have been
discarded (e.g. on probe deferral or manual rebind through sysfs) so the
probe function must not live in init. Device managed resource actions
similarly cannot be discarded.

The "_probe" suffix of the driver structure name prevents modpost from
warning about this so replace it to catch any similar future issues.

Fixes: cec32ac758 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: stable@vger.kernel.org	# 6.16
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251017054943.7195-1-johan@kernel.org
2025-11-26 11:24:44 +01:00
Niklas Söderlund 62524f285c clocksource/drivers/sh_cmt: Always leave device running after probe
The CMT device can be used as both a clocksource and a clockevent
provider. The driver tries to be smart and power itself on and off, as
well as enabling and disabling its clock when it's not in operation.
This behavior is slightly altered if the CMT is used as an early
platform device in which case the device is left powered on after probe,
but the clock is still enabled and disabled at runtime.

This has worked for a long time, but recent improvements in PREEMPT_RT
and PROVE_LOCKING have highlighted an issue. As the CMT registers itself
as a clockevent provider, clockevents_register_device(), it needs to use
raw spinlocks internally as this is the context of which the clockevent
framework interacts with the CMT driver. However in the context of
holding a raw spinlock the CMT driver can't really manage its power
state or clock with calls to pm_runtime_*() and clk_*() as these calls
end up in other platform drivers using regular spinlocks to control
power and clocks.

This mix of spinlock contexts trips a lockdep warning.

    =============================
    [ BUG: Invalid wait context ]
    6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 Not tainted
    -----------------------------
    swapper/1/0 is trying to lock:
    ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88
    ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0
    other info that might help us debug this:
    ccree e6601000.crypto: ARM ccree device initialized
    context-{5:5}
    2 locks held by swapper/1/0:
     #0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8
     #1: ffff0000089a5858 (&ch->lock){....}-{2:2}
    usbcore: registered new interface driver usbhid
    , at: sh_cmt_start+0x30/0x364
    stack backtrace:
    CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 PREEMPT
    Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
    Call trace:
     show_stack+0x14/0x1c (C)
     dump_stack_lvl+0x6c/0x90
     dump_stack+0x14/0x1c
     __lock_acquire+0x904/0x1584
     lock_acquire+0x220/0x34c
     _raw_spin_lock_irqsave+0x58/0x80
     __pm_runtime_resume+0x38/0x88
     sh_cmt_start+0x54/0x364
     sh_cmt_clock_event_set_oneshot+0x64/0xb8
     clockevents_switch_state+0xfc/0x13c
     tick_broadcast_set_event+0x30/0xa4
     __tick_broadcast_oneshot_control+0x1e0/0x3a8
     tick_broadcast_oneshot_control+0x30/0x40
     cpuidle_enter_state+0x40c/0x680
     cpuidle_enter+0x30/0x40
     do_idle+0x1f4/0x26c
     cpu_startup_entry+0x34/0x40
     secondary_start_kernel+0x11c/0x13c
     __secondary_switched+0x74/0x78

For non-PREEMPT_RT builds this is not really an issue, but for
PREEMPT_RT builds where normal spinlocks can sleep this might be an
issue. Be cautious and always leave the power and clock running after
probe.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se
2025-11-26 11:24:40 +01:00
Johan Hovold 6b38a8b31e clocksource/drivers/stm: Fix double deregistration on probe failure
The purpose of the devm_add_action_or_reset() helper is to call the
action function in case adding an action ever fails so drop the clock
source deregistration from the error path to avoid deregistering twice.

Fixes: cec32ac758 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251017055039.7307-1-johan@kernel.org
2025-11-26 11:24:37 +01:00
Haotian Zhang 2ba8e2aae1 clocksource/drivers/ralink: Fix resource leaks in init error path
The ralink_systick_init() function does not release all acquired resources
on its error paths. If irq_of_parse_and_map() or a subsequent call fails,
the previously created I/O memory mapping and IRQ mapping are leaked.

Add goto-based error handling labels to ensure that all allocated
resources are correctly freed.

Fixes: 1f2acc5a8a ("MIPS: ralink: Add support for systick timer found on newer ralink SoC")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251030090710.1603-1-vulab@iscas.ac.cn
2025-11-26 11:24:34 +01:00
Stephen Eta Zhou 640594a04f clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered
Register a valid read_current_timer() function for the
SP804 timer on ARM32.

On ARM32 platforms, when the SP804 timer is selected as the clocksource,
the driver does not register a valid read_current_timer() function.
As a result, features that rely on this API—such as rdseed—consistently
return incorrect values.

To fix this, a delay_timer structure is registered during the SP804
driver's initialization. The read_current_timer() function is implemented
using the existing sp804_read() logic, and the timer frequency is reused
from the already-initialized clocksource.

Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20250525-sp804-fix-read_current_timer-v4-1-87a9201fa4ec@gmail.com
2025-11-26 11:24:32 +01:00
Enlin Mu 576c564ec3 clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit
Using 32 bit for suspend compensation, the max compensation time is 36
hours(working clock is 32k).In some IOT devices, the suspend time may
be long, even exceeding 36 hours. Therefore, a 64 bit timer counter
is needed for counting.

Signed-off-by: Enlin Mu <enlin.mu@unisoc.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://patch.msgid.link/20251106021830.34846-1-enlin.mu@linux.dev
2025-11-26 11:24:26 +01:00
Riwen Lu d9600d5766 PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name
Correct the spelling error in the DFSO_DOWNDIFFERENTIAL macro
definition and update the corresponding variable assignment.

The macro was previously misspelled as DFSO_DOWNDIFFERENCTIAL.
This change ensures consistent and correct spelling throughout
the simpleondemand governor implementation.

Signed-off-by: Riwen Lu <luriwen@kylinos.cn>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20251118032339.2799230-1-luriwen@kylinos.cn/
2025-11-26 13:58:59 +09:00
Cryolitia PukNgae 9d6c58dae8 ACPICA: Avoid walking the Namespace if start_node is NULL
Although commit 0c9992315e ("ACPICA: Avoid walking the ACPI Namespace
if it is not there") fixed the situation when both start_node and
acpi_gbl_root_node are NULL, the Linux kernel mainline now still crashed
on Honor Magicbook 14 Pro [1].

That happens due to the access to the member of parent_node in
acpi_ns_get_next_node().  The NULL pointer dereference will always
happen, no matter whether or not the start_node is equal to
ACPI_ROOT_OBJECT, so move the check of start_node being NULL
out of the if block.

Unfortunately, all the attempts to contact Honor have failed, they
refused to provide any technical support for Linux.

The bad DSDT table's dump could be found on GitHub [2].

DMI: HONOR FMB-P/FMB-P-PCB, BIOS 1.13 05/08/2025

Link: 1c1b57b9eb
Link: https://gist.github.com/Cryolitia/a860ffc97437dcd2cd988371d5b73ed7 [1]
Link: https://github.com/denis-bb/honor-fmb-p-dsdt [2]
Signed-off-by: Cryolitia PukNgae <cryolitia.pukngae@linux.dev>
Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20251125-acpica-v1-1-99e63b1b25f8@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 22:14:11 +01:00
Thomas Gleixner 653fda7ae7 sched/mmcid: Switch over to the new mechanism
Now that all pieces are in place, change the implementations of
sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict
ownership scheme and switch context_switch() over to use the new
mm_cid_schedin() functionality.

The common case is that there is no mode change required, which makes
fork() and exit() just update the user count and the constraints.

In case that a new user would exceed the CID space limit the fork() context
handles the transition to per CPU mode with mm::mm_cid::mutex held. exit()
handles the transition back to per task mode when the user count drops
below the switch back threshold. fork() might also be forced to handle a
deferred switch back to per task mode, when a affinity change increased the
number of allowed CPUs enough.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.280380631@linutronix.de
2025-11-25 19:45:42 +01:00
Thomas Gleixner 9da6ccbcea sched/mmcid: Implement deferred mode change
When affinity changes cause an increase of the number of CPUs allowed for
tasks which are related to a MM, that might results in a situation where
the ownership mode can go back from per CPU mode to per task mode.

As affinity changes happen with runqueue lock held there is no way to do
the actual mode change and required fixup right there.

Add the infrastructure to defer it to a workqueue. The scheduled work can
race with a fork() or exit(). Whatever happens first takes care of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.216484739@linutronix.de
2025-11-25 19:45:42 +01:00
Thomas Gleixner c809f081fe irqwork: Move data struct to a types header
... to avoid header recursion hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.152813625@linutronix.de
2025-11-25 19:45:41 +01:00
Thomas Gleixner fbd0e71dc3 sched/mmcid: Provide CID ownership mode fixup functions
CIDs are either owned by tasks or by CPUs. The ownership mode depends on
the number of tasks related to a MM and the number of CPUs on which these
tasks are theoretically allowed to run on. Theoretically because that
number is the superset of CPU affinities of all tasks which only grows and
never shrinks.

Switching to per CPU mode happens when the user count becomes greater than
the maximum number of CIDs, which is calculated by:

	opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users);
	max_cids = min(1.25 * opt_cids, nr_cpu_ids);

The +25% allowance is useful for tight CPU masks in scenarios where only a
few threads are created and destroyed to avoid frequent mode
switches. Though this allowance shrinks, the closer opt_cids becomes to
nr_cpu_ids, which is the (unfortunate) hard ABI limit.

At the point of switching to per CPU mode the new user is not yet visible
in the system, so the task which initiated the fork() runs the fixup
function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
transfers each tasks owned CID to the CPU the task runs on or drops it into
the CID pool if a task is not on a CPU at that point in time. Tasks which
schedule in before the task walk reaches them do the handover in
mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
guaranteed that no task related to that MM owns a CID anymore.

Switching back to task mode happens when the user count goes below the
threshold which was recorded on the per CPU mode switch:

	pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);

This threshold is updated when a affinity change increases the number of
allowed CPUs for the MM, which might cause a switch back to per task mode.

If the switch back was initiated by a exiting task, then that task runs the
fixup function. If it was initiated by a affinity change, then it's run
either in the deferred update function in context of a workqueue or by a
task which forks a new one or by a task which exits. Whatever happens
first. mm_cid_fixup_cpus_to_task() walks through the possible CPUs and
either transfers the CPU owned CIDs to a related task which runs on the CPU
or drops it into the pool. Tasks which schedule in on a CPU which the walk
did not cover yet do the handover themselves.

This transition from CPU to per task ownership happens in two phases:

 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
    CID and denotes that the CID is only temporarily owned by the
    task. When it schedules out the task drops the CID back into the
    pool if this bit is set.

 2) The initiating context walks the per CPU space and after completion
    clears mm:mm_cid.transit. After that point the CIDs are strictly
    task owned again.

This two phase transition is required to prevent CID space exhaustion
during the transition as a direct transfer of ownership would fail if
two tasks are scheduled in on the same CPU before the fixup freed per
CPU CIDs.

When mm_cid_fixup_cpus_to_tasks() completes it's guaranteed that no CID
related to that MM is owned by a CPU anymore.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.088189028@linutronix.de
2025-11-25 19:45:41 +01:00
Thomas Gleixner 9a723ed7fa sched/mmcid: Provide new scheduler CID mechanism
The MM CID management has two fundamental requirements:

  1) It has to guarantee that at no given point in time the same CID is
     used by concurrent tasks in userspace.

  2) The CID space must not exceed the number of possible CPUs in a
     system. While most allocators (glibc, tcmalloc, jemalloc) do not
     care about that, there seems to be at least some LTTng library
     depending on it.

The CID space compaction itself is not a functional correctness
requirement, it is only a useful optimization mechanism to reduce the
memory foot print in unused user space pools.

The optimal CID space is:

    min(nr_tasks, nr_cpus_allowed);

Where @nr_tasks is the number of actual user space threads associated to
the mm and @nr_cpus_allowed is the superset of all task affinities. It is
growth only as it would be insane to take a racy snapshot of all task
affinities when the affinity of one task changes just do redo it 2
milliseconds later when the next task changes it's affinity.

That means that as long as the number of tasks is lower or equal than the
number of CPUs allowed, each task owns a CID. If the number of tasks
exceeds the number of CPUs allowed it switches to per CPU mode, where the
CPUs own the CIDs and the tasks borrow them as long as they are scheduled
in.

For transition periods CIDs can go beyond the optimal space as long as they
don't go beyond the number of possible CPUs.

The current upstream implementation adds overhead into task migration to
keep the CID with the task. It also has to do the CID space consolidation
work from a task work in the exit to user space path. As that work is
assigned to a random task related to a MM this can inflict unwanted exit
latencies.

Implement the context switch parts of a strict ownership mechanism to
address this.

This removes most of the work from the task which schedules out. Only
during transitioning from per CPU to per task ownership it is required to
drop the CID when leaving the CPU to prevent CID space exhaustion. Other
than that scheduling out is just a single check and branch.

The task which schedules in has to check whether:

    1) The ownership mode changed
    2) The CID is within the optimal CID space

In stable situations this results in zero work. The only short disruption
is when ownership mode changes or when the associated CID is not in the
optimal CID space. The latter only happens when tasks exit and therefore
the optimal CID space shrinks.

That mechanism is strictly optimized for the common case where no change
happens. The only case where it actually causes a temporary one time spike
is on mode changes when and only when a lot of tasks related to a MM
schedule exactly at the same time and have eventually to compete on
allocating a CID from the bitmap.

In the sysbench test case which triggered the spinlock contention in the
initial CID code, __schedule() drops significantly in perf top on a 128
Core (256 threads) machine when running sysbench with 255 threads, which
fits into the task mode limit of 256 together with the parent thread:

  Upstream  rseq/perf branch  +CID rework
  0.42%     0.37%             0.32%          [k] __schedule

Increasing the number of threads to 256, which puts the test process into
per CPU mode looks about the same.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172550.023984859@linutronix.de
2025-11-25 19:45:41 +01:00
Thomas Gleixner 23343b6b09 sched/mmcid: Introduce per task/CPU ownership infrastructure
The MM CID management has two fundamental requirements:

  1) It has to guarantee that at no given point in time the same CID is
     used by concurrent tasks in userspace.

  2) The CID space must not exceed the number of possible CPUs in a
     system. While most allocators (glibc, tcmalloc, jemalloc) do not care
     about that, there seems to be at least librseq depending on it.

The CID space compaction itself is not a functional correctness
requirement, it is only a useful optimization mechanism to reduce the
memory foot print in unused user space pools.

The optimal CID space is:

    min(nr_tasks, nr_cpus_allowed);

Where @nr_tasks is the number of actual user space threads associated to
the mm and @nr_cpus_allowed is the superset of all task affinities. It is
growth only as it would be insane to take a racy snapshot of all task
affinities when the affinity of one task changes just do redo it 2
milliseconds later when the next task changes its affinity.

That means that as long as the number of tasks is lower or equal than the
number of CPUs allowed, each task owns a CID. If the number of tasks
exceeds the number of CPUs allowed it switches to per CPU mode, where the
CPUs own the CIDs and the tasks borrow them as long as they are scheduled
in.

For transition periods CIDs can go beyond the optimal space as long as they
don't go beyond the number of possible CPUs.

The current upstream implementation adds overhead into task migration to
keep the CID with the task. It also has to do the CID space consolidation
work from a task work in the exit to user space path. As that work is
assigned to a random task related to a MM this can inflict unwanted exit
latencies.

This can be done differently by implementing a strict CID ownership
mechanism. Either the CIDs are owned by the tasks or by the CPUs. The
latter provides less locality when tasks are heavily migrating, but there
is no justification to optimize for overcommit scenarios and thereby
penalizing everyone else.

Provide the basic infrastructure to implement this:

  - Change the UNSET marker to BIT(31) from ~0U
  - Add the ONCPU marker as BIT(30)
  - Add the TRANSIT marker as BIT(29)

That allows to check for ownership trivially and provides a simple check for
UNSET as well. The TRANSIT marker is required to prevent CID space
exhaustion when switching from per CPU to per task mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251119172549.960252358@linutronix.de
2025-11-25 19:45:41 +01:00
Thomas Gleixner 51dd92c71a sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
Prepare for the new CID management scheme which puts the CID ownership
transition into the fork() and exit() slow path by serializing
sched_mm_cid_fork()/exit() with it, so task list and cpu mask walks can be
done in interruptible and preemptible code.

The contention on it is not worse than on other concurrency controls in the
fork()/exit() machinery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.895826703@linutronix.de
2025-11-25 19:45:41 +01:00
Thomas Gleixner b0c3d51b54 sched/mmcid: Provide precomputed maximal value
Reading mm::mm_users and mm:::mm_cid::nr_cpus_allowed every time to compute
the maximal CID value is just wasteful as that value is only changing on
fork(), exit() and eventually when the affinity changes.

So it can be easily precomputed at those points and provided in mm::mm_cid
for consumption in the hot path.

But there is an issue with using mm::mm_users for accounting because that
does not necessarily reflect the number of user space tasks as other kernel
code can take temporary references on the MM which skew the picture.

Solve that by adding a users counter to struct mm_mm_cid, which is modified
by fork() and exit() and used for precomputing under mm_mm_cid::lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.832764634@linutronix.de
2025-11-25 19:45:40 +01:00
Thomas Gleixner bf070520e3 sched/mmcid: Move initialization out of line
It's getting bigger soon, so just move it out of line to the rest of the
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.769636491@linutronix.de
2025-11-25 19:45:40 +01:00
Thomas Gleixner 2b1642b881 signal: Move MMCID exit out of sighand lock
There is no need anymore to keep this under sighand lock as the current
code and the upcoming replacement are not depending on the exit state of a
task anymore.

That allows to use a mutex in the exit path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de
2025-11-25 19:45:40 +01:00
Thomas Gleixner 539115f08c sched/mmcid: Convert mm CID mask to a bitmap
This is truly a bitmap and just conveniently uses a cpumask because the
maximum size of the bitmap is nr_cpu_ids.

But that prevents to do searches for a zero bit in a limited range, which
is helpful to provide an efficient mechanism to consolidate the CID space
when the number of users decreases.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251119172549.642866767@linutronix.de
2025-11-25 19:45:40 +01:00
Thomas Gleixner 35a5c37cb9 cpumask: Cache num_possible_cpus()
Reevaluating num_possible_cpus() over and over does not make sense. That
becomes a constant after init as cpu_possible_mask is marked ro_after_init.

Cache the value during initialization and provide that for consumption.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20251119172549.578653738@linutronix.de
2025-11-25 19:45:40 +01:00
Rafael J. Wysocki 4bf944f3fc cpuidle: Warn instead of bailing out if target residency check fails
It turns out that the change in commit 76934e495c ("cpuidle: Add
sanity check for exit latency and target residency") goes too far
because there are systems in the field on which the check introduced
by that commit does not pass.

For this reason, change __cpuidle_driver_init() return type back to void
and make it print a warning when the check mentioned above does not
pass.

Fixes: 76934e495c ("cpuidle: Add sanity check for exit latency and target residency")
Reported-by: Val Packett <val@packett.cool>
Closes: https://lore.kernel.org/linux-pm/20251121010756.6687-1-val@packett.cool/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2808566.mvXUDI8C0e@rafael.j.wysocki
2025-11-25 19:06:38 +01:00
Andy Shevchenko 6d96ceff9a cpuidle: Update header inclusion
While cleaning up some headers, I got a build error on this file:

drivers/cpuidle/poll_state.c:52:2: error: call to undeclared library function 'snprintf' with type 'int (char *restrict, unsigned long, const char *restrict, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]

Update header inclusions to follow IWYU (Include What You Use)
principle.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124205752.1328701-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:04:29 +01:00
Ulf Hansson c19dfb267c Documentation: power/cpuidle: Document the CPU system wakeup latency QoS
Let's document how the new CPU system wakeup latency QoS limit can be used
from user space, along with how the constraint is taken into account for
s2idle and cpuidle.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-7-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:30 +01:00
Ulf Hansson 2b8d594742 cpuidle: Respect the CPU system wakeup QoS limit for cpuidle
The CPU system wakeup QoS limit must be respected for the regular cpuidle
state selection. Therefore, let's extend the common governor helper
cpuidle_governor_latency_req(), to take the constraint into account.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-6-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Ulf Hansson 99b42445f4 sched: idle: Respect the CPU system wakeup QoS limit for s2idle
A CPU system wakeup QoS limit may have been requested by user space. To
avoid breaking this constraint when entering a low power state during
s2idle, let's start to take into account the QoS limit.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Ulf Hansson e2e4695f01 pmdomain: Respect the CPU system wakeup QoS limit for cpuidle
The CPU system wakeup QoS limit must be respected for the regular cpuidle
state selection. Therefore, let's extend the genpd governor for CPUs to
take the constraint into account when it selects a domain idle state for
the corresponding PM domain.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-4-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Ulf Hansson 8e7de6dc42 pmdomain: Respect the CPU system wakeup QoS limit for s2idle
A CPU system wakeup QoS limit may have been requested by user space. To
avoid breaking this constraint when entering a low power state during
s2idle through genpd, let's extend the corresponding genpd governor for
CPUs. More precisely, during s2idle let the genpd governor select a
suitable domain idle state, by taking into account the QoS limit.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-3-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Ulf Hansson a4e6512a79 PM: QoS: Introduce a CPU system wakeup QoS limit
Some platforms supports multiple low power states for CPUs that can be used
when entering system-wide suspend. Currently we are always selecting the
deepest possible state for the CPUs, which can break the system wakeup
latency constraint that may be required for a use case.

Let's take the first step towards addressing this problem, by introducing
an interface for user space, that allows us to specify the CPU system
wakeup QoS limit. Subsequent changes will start taking into account the new
QoS limit.

Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com>
Tested-by: Kevin Hilman (TI) <khilman@baylibre.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://patch.msgid.link/20251125112650.329269-2-ulf.hansson@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25 19:01:29 +01:00
Rafael J. Wysocki 30a8e0a32e linux-cpupower-6.19-rc1
Adds support for building libcpupower statically when STATIC=true is
 specified during build.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkk6sIACgkQCwJExA0N
 QxyUOxAAvI0losir97aizng6pyo5RM1OEsI10mEhBo80Vbko9d0NOk8kdTiatYgO
 X4dBIZ/WZLFZ+t63LDWRx0a8QjBi4UjZjv+ZIUcfvwLME7zZXuJyophfJhs1mgVQ
 c/X0aUCYTalziOn+zDlJWPtQVD86dxGBikE8RGZuj+LgQBNF0frB5aWvV/yUTdTj
 o6e+vNRQGKKx6TpKdRuS+fk+fBcGIY1y5fVoYGcXA3gqBTJkj2eNQCbpxnUxqlj8
 sZgHh24Gb9rPrVQK48C8SIEMQuPcHNJ2C966OI/okOBPMLb54oH0OW9DXv3mZFug
 +fn13mIfmTixRI6QH3ZHuZLujtEoNmUnmAhbyng2HJM+ei/U3wskRaOrstnWkRFM
 42iYF0DzSeqxF8V3YVt81FUMnXttyooCiiq3H1o2JCmua2tkXf7eK1QdVidc7CAj
 k2DVj0Fu0o0TXjV2v4V+HCDBSqOxC9oNRsxcRn0agckyhRjfXvRIuKRjmK/Jgene
 8E4ImRW4H2ZQCCrWROhQuzLTy0CVJ7bGB81YQIO/t8ifdycUS9ok7qJtaXJBEk9e
 Jn25zgAYXmeu8aYbNFEBVAhcXLimvIbQtHc9B2Qo0RTnenHIxjZbRQhsf6OKp1Fg
 YyagHyFUntj5+kyAXzJ77zsVfp9Ehi9whGfOarwciN91UZIxZ+8=
 =RS7N
 -----END PGP SIGNATURE-----

Merge tag 'linux-cpupower-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux

Pull a cpupower utility update for 6.19-rc1 from Shuah Khan:

"Adds support for building libcpupower statically when STATIC=true is
 specified during build."

* tag 'linux-cpupower-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
  tools/power/cpupower: Support building libcpupower statically
2025-11-25 17:11:45 +01:00
Rafael J. Wysocki 8dfa8bb652 OPP updates for 6.19
- Minor improvements to the Rust interface (Tamir Duberstein).
 
 - Fixes to scope-based pointers (Viresh Kumar).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmklXvMACgkQ0rkcPK6B
 EhzYWBAAsX+K/K9hqMS89kgpetKQZiscU10xYjlZcfXufoh/iJtDFpUFizT79Hpf
 xWRf4azkKIIfJJcdPEYwuzgdz8Smt2cin7o/UG8ycDzjL4+m6dVdD9lNfkjNjqYc
 Mn3Ypry8qWxaNGLN15GdV6RIGpkYwBAVbIOAdMu565qN1FMVikQLovXp8dKOsDlz
 CRgTSk8bYFOli5xbgVEYgcy0GgufKlS1IY0gyrWOKh4gzJOpx/kwTCSCVv3U8tUD
 DLOGFYg66+qlIlPk5VjFCtBuBcVRfqze62ezgL3lstrhfuWEi0zHt0kwyG2Fp31r
 KAOM812fGVj2Y+CRYdUIRsntSC4VyEywP4CH90if0ONXbRjWr3YQBr1e2mK0obiJ
 p4Axh8r71/6E1ZKPajxE1RJqpCEWA+cFc4zJ2QABSaXkQ576gV9OLO08nUeN3sb4
 V/Z0ZzpCxNlG3q7ZpVGqCyGRp/S5L802vMtKITdqsiOyqH3p+d9oG5tRe131pXPQ
 KWPS/F/LBeXKfpIlasaFfcqhBUQXphm91kPEDn+X4bD67GCpA1mue6F+NFbhmX6I
 zVR8ioswk8bSYnX50zT4kAHHO0qEgBGsUHQSCZiHTjbFIWwyCsf6w9JLLeNtkHzT
 kQI537YjStualv2SnY2ujVUdg0horMMJxwatUTv7syHKFlFeiHI=
 =aAUV
 -----END PGP SIGNATURE-----

Merge tag 'opp-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull OPP updates for 6.19 from Viresh Kumar:

"- Minor improvements to the Rust interface (Tamir Duberstein).

 - Fixes to scope-based pointers (Viresh Kumar)."

* tag 'opp-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  rust: opp: simplify callers of `to_c_str_array`
  OPP: Initialize scope-based pointers inline
  rust: opp: fix broken rustdoc link
2025-11-25 17:08:06 +01:00
Rafael J. Wysocki ded4feb14d CPUFreq updates for 6.19
- tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling).
 
 - Minor improvements to various cpufreq drivers (Christian Marangi, Hal
   Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmklXiwACgkQ0rkcPK6B
 Ehw4zw//Qf6p1HXJME/o6AhPA2B+obbqBOxqPN8xw1yvnr3MIn1q7wCEv7M/B1zP
 3wdW/Tchg6kgwsNJ0IKduHz5Dhx06xH6PzF9YXblS46nehMtErjqrF2K/BN/1faU
 cLrAcs31dlK9Nv7qCxaXvwl6G6bsN52V4qbdbssONbN2IRtILcs2KiU1eb8WUIMv
 yjll91trQoQ3eSufKV73FbKzmIwLynYvErwko20WyJODlOO5TgltkRSICNbaDbJh
 G95eJnMIeXjOT4rf4BX7mMx45GzZwKWVfF/AZBQNTFBVRw+g7vDiuC8SbEY1VMws
 KxkgWlTdJfv5Hh2UEP3XYodhT6WJehiPdO5RAgLjSkfSw5XEbNTnAvAnhhDqWw7J
 Tk122tgw/2SRh7bJQoetL4dggKKh9aT9vaRFklrUukmDjzs3GdZKw0xXakc8GjMY
 dHXgiOpqU3eGY2YGiU5DZnTxS/vDga/D2+jGcX/4C+vq5t8LJ6MiyUP7vuizPCNq
 eZ0DyFzQiuOe3Kgq2cXRE3wMz0/wNU7JmTdMIuTtpjJm5cewE+7QqS0ksA5zQ35r
 fj0WrIaqVXOpxoXsOAYB4z8JdgB0P/CH1A2tYfQ5go18sSHUOJsoQhTOT8JMxEx5
 4ZsA0WgfzSyALExjjNy6Y06XghpGz4ZUOSeQR6Zzyb88FBZcZ8U=
 =Nkvn
 -----END PGP SIGNATURE-----

Merge tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull CPUFreq updates for 6.19 from Viresh Kumar:

"- tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling).

 - Minor improvements to various cpufreq drivers (Christian Marangi, Hal
   Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)."

* tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
  cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users
  cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM
  cpufreq: CPPC: Don't warn if FIE init fails to read counters
  cpufreq: nforce2: fix reference count leak in nforce2
  cpufreq: tegra186: add OPP support and set bandwidth
  cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
  cpufreq: s5pv210: fix refcount leak
2025-11-25 17:06:04 +01:00
George Moussalem 8d6f8d5c58 dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible
The tsens IP found in the IPQ5018 SoC should not use qcom,tsens-v1 as
fallback since it has no RPM and, as such, must deviate from the
standard v1 init routine as this version of tsens needs to be explicitly
reset and enabled in the driver.

So let's make qcom,ipq5018-tsens a standalone compatible in the bindings.

Fixes: 77c6d28192 ("dt-bindings: thermal: qcom-tsens: Add ipq5018 compatible")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20250818-ipq5018-tsens-fix-v1-1-0f08cf09182d@outlook.com
2025-11-25 16:07:57 +01:00
Heiko Carstens 509c34924d s390/vdso: Get rid of -m64 flag handling
The compiler/assembler flag -m64 is added and removed at two locations.
This pointless exercise is a leftover to keep the 31 and 64 bit vdso
Makefiles as symmetrical as possible. Given that the 31 bit vdso code
does not exist anymore, remove the -m64 flag handling.

Suggested-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25 15:28:08 +01:00
Heiko Carstens c0087d807a s390/vdso: Rename vdso64 to vdso
Since compat is gone there is only a 64 bit vdso left.
Remove the superfluous "64" suffix everywhere.

Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25 15:28:07 +01:00
Heiko Carstens b3bdfdf1f9 s390: Rename head64.S to head.S
All the code is 64 bit, therefore remove the superfluous suffix.

Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25 15:28:07 +01:00
Jens Remus 5e811b922e s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros
This simplifies the vDSO linker script.  The ELF_DETAILS macro was not
used in addition, as done on arm64 and powerpc, as that would introduce
an empty .modinfo section.

Note that this rearranges the .comment section to follow after all of
the debug sections.

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25 15:28:07 +01:00
Zenon Xiu 4b7a59fa70 Documentation/arm64: Fix the typo of register names
The register name 'HWFGWTR_EL2' and 'HWFGRTR_EL2' is wrong, should be 'HFGWTR_EL2' and 'HFGRTR_EL2'.
Find the register description on arm website here,
https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGWTR-EL2--Hypervisor-Fine-Grained-Write-Trap-Register
https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGRTR-EL2--Hypervisor-Fine-Grained-Read-Trap-Register?lang=en

Signed-off-by: Zenon Xiu <zenonxiu@outlook.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-25 12:26:31 +00:00
Marc Zyngier 155f8d4ef0 ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
Since 0f67b56d84 ("clocksource/drivers/arm_arch_timer_mmio: Switch
over to standalone driver"), acpi_arch_timer_mem_init() is unused.

Remove it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-25 11:55:13 +00:00
Malaya Kumar Rout 16e802667e tools/thermal/thermal-engine: Fix format string bug in thermal-engine
The error message in the daemon() failure path uses %p format specifier
without providing a corresponding pointer argument, resulting in undefined
behavior and printing garbage values.

Replace %p with %m to properly print the errno error message, which is
the intended behavior when daemon() fails.

This fix ensures proper error reporting when daemonization fails.

Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20251124104401.374856-1-mrout@redhat.com
2025-11-25 11:00:28 +01:00
Randy Dunlap 73029e73cc x86/cc: Fix enum spelling to fix kernel-doc warnings
Make the enum name in kernel-doc match the code to prevent kernel-doc warnings:

  Warning: include/linux/cc_platform.h:106 Enum value
   'CC_ATTR_GUEST_SEV_SNP' not described in enum 'cc_attr'
  Warning: include/linux/cc_platform.h:106 Excess enum value
   '%CC_ATTR_SEV_SNP' description in 'cc_attr'

Fixes: f742b90e61 ("x86/mm: Extend cc_attr to include AMD SEV-SNP")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251125022730.3163679-1-rdunlap@infradead.org
2025-11-25 09:17:13 +01:00
Malaya Kumar Rout 8974573ba4 ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c
Static analysis found an issue in pfrut.c

cppcheck output before this patch:
tools/power/acpi/tools/pfrut/pfrut.c:225:3: error: Resource leak: fd_update [resourceLeak]
tools/power/acpi/tools/pfrut/pfrut.c:269:3: error: Resource leak: fd_update [resourceLeak]
tools/power/acpi/tools/pfrut/pfrut.c:269:3: error: Resource leak: fd_update_log [resourceLeak]
tools/power/acpi/tools/pfrut/pfrut.c:365:4: error: Memory leak: addr_map_capsule [memleak]
tools/power/acpi/tools/pfrut/pfrut.c:424:4: error: Memory leak: log_buf [memleak]

cppcheck output after this patch:
No resource leaks found

Fix by closing file descriptors and freeing allocated memory.

Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Link: https://patch.msgid.link/20251120170001.251968-1-mrout@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-24 20:50:15 +01:00
David Laight c964081d60 ACPI: property: use min() instead of min_t()
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'.
Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long'
and so cannot discard significant bits.

In this case the 'unsigned long' value is small enough that the result
is ok.

Detected by an extra check added to min_t().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20251119224140.8616-14-david.laight.linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-24 20:45:38 +01:00
Rafael J. Wysocki 15bfdadd61 cpuidle: governors: teo: Add missing space to the description
There is a missing space in the governor description comment, so add it.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5059034.31r3eYUQgx@rafael.j.wysocki
2025-11-24 20:43:32 +01:00
Rafael J. Wysocki c03aef8833 PM: hibernate: Extra cleanup of comments in swap handling code
Continue recent cleanups of comments in the swap handling code.

Unify the use of white space in the comments, drop some unuseful
comments outside function bodies, and move some other comments into
function bodies.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5943864.DvuYhMxLoT@rafael.j.wysocki
2025-11-24 20:41:06 +01:00
Nikolay Borisov 69acbdbbef RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16()
Doing hweight16() and checking whether the lsb is set is functionally
equivalent to what bitwise_xor_bits() does. In addition, it results in better
generated code as before gcc would inline the function 4 times.  With hweight16(),
the resulting code boils down to 2 instructions -  POPCNT and AND, and all
relevant CPUs support POPCNT.

An alternative would have been to use the __builtin_parity() function provided
by both Clang/GCC, however under some circumstances the compiler can choose not
to inline it but generate a library call which is unsupported in the kernel.

No functional changes.

  [ bp: Massage commit message. ]

Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251124142517.1708451-1-nik.borisov@suse.com
2025-11-24 17:00:37 +01:00
James Clark e6a27290d8 perf: arm_spe: Add support for filtering on data source
SPE_FEAT_FDS adds the ability to filter on the data source of packets.
Like the other existing filters, enable filtering with PMSFCR_EL1.FDS
when any of the filter bits are set.

Each bit position of the 64 bit filter maps to numerical data sources
0-63 described by bits[0:5] in the data source packet (although the full
range of data source is 16 bits so higher value data sources can't be
filtered on). The filter is an OR of all the filter bits, so for example
clearing filter bits 0 and 3 only includes packets from data sources 0
OR 3.

Invert the filter given by userspace so that the default value of 0 is
equivalent to including all values (no filtering). This allows us to
skip adding a new format bit to enable filtering and still support
excluding all data sources which would have been a filter value of 0 if
not for the inversion.

Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:59:18 +00:00
James Clark cbbfba4847 perf: Add perf_event_attr::config4
Arm FEAT_SPE_FDS adds the ability to filter on the data source of a
packet using another 64-bits of event filtering control. As the existing
perf_event_attr::configN fields are all used up for SPE PMU, an
additional field is needed. Add a new 'config4' field.

Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: James Clark <james.clark@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:59:18 +00:00
Joakim Zhang 11abb4e87b perf/imx_ddr: Add support for PMU in DB (system interconnects)
There is a PMU in DB, which has the same function with PMU in DDR
subsystem, the difference is PMU in DB only supports cycles, axid-read,
axid-write events.

e.g.
perf stat -a -e imx8_db0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd
perf stat -a -e imx8_db0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:39:05 +00:00
Frank Li 037e8cf671 perf/imx_ddr: Get and enable optional clks
Get and enable optional clks because fsl,imx8dxl-db-pmu have two clocks.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:39:05 +00:00
Frank Li 66db99ffdf perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
Move ida_alloc() from helper ddr_perf_init() into ddr_perf_probe() to
clarify why ida_free() must be called at the error path.

Add return value check for ida_alloc().

Rename label 'cpuhp_state_err' to 'idr_free' to make the code clearer,
since two error paths now jump to this label.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:39:05 +00:00
Frank Li de8209e554 dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
Add compatible string fsl,imx8qm-ddr-pmu, fsl,imx8qxp-ddr-pmu, which
fallback to fsl,imx8-ddr-pmu and fsl,imx8dxl-db-pmu (for data bus fabric).

Add clocks, clock-names for fsl,imx8dxl-db-pmu and keep the same
restriction for existing compatible strings.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24 15:39:05 +00:00
Heiko Carstens f5730d44e0 s390: Add stackprotector support
Stackprotector support was previously unavailable on s390 because by
default compilers generate code which is not suitable for the kernel:
the canary value is accessed via thread local storage, where the address
of thread local storage is within access registers 0 and 1.

Using those registers also for the kernel would come with a significant
performance impact and more complicated kernel entry/exit code, since
access registers contents would have to be exchanged on every kernel entry
and exit.

With the upcoming gcc 16 release new compiler options will become available
which allow to generate code suitable for the kernel. [1]

Compiler option -mstack-protector-guard=global instructs gcc to generate
stackprotector code that refers to a global stackprotector canary value via
symbol __stack_chk_guard. Access to this value is guaranteed to occur via
larl and lgrl instructions.

Furthermore, compiler option -mstack-protector-guard-record generates a
section containing all code addresses that reference the canary value.

To allow for per task canary values the instructions which load the address
of __stack_chk_guard are patched so they access a lowcore field instead: a
per task canary value is available within the task_struct of each task, and
is written to the per-cpu lowcore location on each context switch.

Also add sanity checks and debugging option to be consistent with other
kernel code patching mechanisms.

Full debugging output can be enabled with the following kernel command line
options:

debug_stackprotector
bootdebug
ignore_loglevel
earlyprintk
dyndbg="file stackprotector.c +p"

Example debug output:

stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240

where "<insn address>: <old insn> -> <new insn>".

[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:45:21 +01:00
Heiko Carstens 1d7764cfe3 s390/modules: Simplify module_finalize() slightly
Preinitialize the return value, and break out the for loop in
module_finalize() in case of an error to get rid of an ifdef.

This makes it easier to add additional code, which may also depend
on config options.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:45:21 +01:00
Heiko Carstens c3d17464f0 s390: Remove KMSG_COMPONENT macro
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel
message catalog" which never made it upstream.

Remove the macro in order to get rid of a pointless indirection. Replace
all users with the string it defines. In almost all cases this leads to a
simple replacement like this:

 - #define KMSG_COMPONENT "appldata"
 - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
 + #define pr_fmt(fmt) "appldata: " fmt

Except for some special cases this is just mechanical/scripted work.

Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:45:21 +01:00
Heiko Carstens e950d1f84d s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
Since the rework of the kernel virtual address space [1] the module area
and the kernel image are within the same 4GB area. Therefore there is no
need for the weak per cpu workaround for modules anymore. Remove it.

[1] commit c98d2ecae0 ("s390/mm: Uncouple physical vs virtual address spaces")

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:45:20 +01:00
Heiko Carstens f555d885bf Merge branch 'ap-driver-override' into features
Harald Freudenberger says:

====================

Support for driver override on AP queues.

Add a new sysfs attribute driver_override the AP queue's directory. Writing
in a string overrides the default driver determination and the drivers are
matched against this string instead. This overrules the driver binding
determined by the apmask/aqmask bitmask fields. With the write to the
attribute a check is done if the queue is in use by an mdev device.
If this is true, the write is aborted and EBUSY is returned.

As there exists some tooling for this kind of driver_override
(see package driverctl) the AP bus behavior for re-binding
should be compatible to this. The steps for a driver_override are:
 1) unbind the current driver from the device. For example
    echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind
 2) set the new driver for this device in the sysfs
    driver_override attribute. For example
    echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override
 3) trigger a bus reprobe of this device. For example
    echo "17.0005" > /sys/bus/ap/drivers_probe
With the driverctl package this is more comfortable and
the settings get persisted:
  driverctl -b ap set-override 17.0005 vfio_ap
and unset with
  driverctl -b ap unset-override 17.0005

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:43:19 +01:00
Harald Freudenberger 46030379f1 s390/ap: Restrict driver_override versus apmask and aqmask use
Introduce a restriction for the driver_override feature versus apmask
and aqmask:
- driver_override is only allowed when the apmask and aqmask values
  both are default (=0xffff..ffff).
- apmask and aqmask modifications are only allowed when there is no
  driver_override on any AP device active.
So in the end the user is restricted to choose to either use
apmask/apmask to divide the AP devices into host owned and vfio owned
or use the driver_override feature but not mix these two approaches.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:43:06 +01:00
Harald Freudenberger 8babcc2b6a s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex
The mutex ap_perms_mutex was already used not only for protection
of the struct ap_perms ap_perms variable but also for an consistent
update of the AP bus sysfs attributes apmask and aqmask.

So rename this mutex to ap_attr_mutex which better reflects the
current use. This is also a preparation for an upcoming patch which
will use this mutex to lock updates on a new sysfs attribute.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:43:06 +01:00
Harald Freudenberger d38a87d7c0 s390/ap: Support driver_override for AP queue devices
Add a new sysfs attribute driver_override the AP queue's
directory. Writing in a string overrides the default driver
determination and the drivers are matched against this string
instead. This overrules the driver binding determined by the
apmask/aqmask bitmask fields.

According to the common understanding of how the driver_override
behavior shall work, there is no further checking done. Neither about
the string which is given as override driver nor if this device is
currently in use by an mdev device. Another patch may limit this
behavior to refuse a mixed usage of the driver_override and
apmask/aqmask feature.

As there exists some tooling for this kind of driver_override
(see package driverctl) the AP bus behavior for re-binding
should be compatible to this. The steps for a driver_override are:
 1) unbind the current driver from the device. For example
    echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind
 2) set the new driver for this device in the sysfs
    driver_override attribute. For example
    echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override
 3) trigger a bus reprobe of this device. For example
    echo "17.0005" > /sys/bus/ap/drivers_probe
With the driverctl package this is more comfortable and
the settings get persisted:
  driverctl -b ap set-override 17.0005 vfio_ap
and unset with
  driverctl -b ap unset-override 17.0005

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:43:05 +01:00
Harald Freudenberger 6917f434fd s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks
For the in_use() check of an updated apmask the host's aqmask
was provided to the vfio function. Similar on an update of the
aqmask the host's apmask was provided to the vfio in_use()
function. This led to false results on the check for apmask or
aqmask updates. For example with only one APQN when exactly
this card is tried to be re-assigned back to the host, the
in_use() check did not complain.

The correct behavior is achieved with providing a full mask
for aqmask when an adapter is to be checked and similar a full
mask for aqmask when a domain is to be checked for usage.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24 11:43:05 +01:00
Geert Uytterhoeven aaf4e92341 m68k: defconfig: Update defconfigs for v6.18-rc1
- Drop CONFIG_SCTP_COOKIE_HMAC_SHA1=y (removed in commit
    2f3dd6ec90 ("sctp: Convert cookie authentication to use
    HMAC-SHA256")),
  - Drop CONFIG_BATMAN_ADV_NC=y (removed in commit 87b95082db
    ("batman-adv: remove network coding support")),
  - Enable modular build of the SHA-1 secure hash algorithm (no longer
    auto-enabled since commit 2f3dd6ec90 ("sctp: Convert cookie
    authentication to use HMAC-SHA256")).

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/65e00bcb7b2980278bb087986ee405627aa32d8b.1760360254.git.geert@linux-m68k.org
2025-11-24 11:03:50 +01:00
Thorsten Blum dc30fe7a0a PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate
Use min() to improve the readability of actmon_cpu_to_emc_rate() and
remove any unnecessary curly braces.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20251112172121.3741-2-thorsten.blum@linux.dev/
2025-11-24 00:02:07 +09:00
Pengjie Zhang 26dd44a400 PM / devfreq: hisi: Fix potential UAF in OPP handling
Ensure all required data is acquired before calling dev_pm_opp_put(opp)
to maintain correct resource acquisition and release order.

Fixes: 7da2fdaaa1 ("PM / devfreq: Add HiSilicon uncore frequency scaling driver")
Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com>
Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20250915062135.748653-1-zhangpengjie2@huawei.com/
2025-11-24 00:02:07 +09:00
Dmitry Baryshkov 447c4e8338 PM / devfreq: Move governor.h to a public header location
Some device drivers (and out-of-tree modules) might want to define
device-specific device governors. Rather than restricting all of them to
be a part of drivers/devfreq/ (which is not possible for out-of-tree
drivers anyway) move governor.h to include/linux/devfreq-governor.h and
update all drivers to use it.

The devfreq_cpu_data is only used internally, by the passive governor,
so it is moved to the driver source rather than being a part of the
public interface.

Reported-by: Robie Basak <robibasa@qti.qualcomm.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20251030-governor-public-v2-1-432a11a9975a@oss.qualcomm.com/
2025-11-24 00:02:01 +09:00
Yue Haibing e6a11a526e x86/{boot,mtrr}: Remove unused function declarations
Commits

  28be1b454c ("x86/boot: Remove unused copy_*_gs() functions")
  34d2819f20 ("x86, mtrr: Remove unused mtrr/state.c")

removed the functions but left the prototypes. Remove them.

  [ bp: Merge into a single patch. ]

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251120121037.1479334-1-yuehaibing@huawei.com
2025-11-22 21:26:36 +01:00
Lorenzo Pieralisi 9c1fbc56ca irqchip/gic-its: Rework platform MSI deviceID detection
Current code retrieving platform devices MSI devID in the GIC ITS MSI
parent helpers suffers from some minor issues:

- It leaks a struct device_node reference
- It is duplicated between GICv3 and GICv5 for no good reason
- It does not use the OF phandle iterator code that simplifies
  the msi-parent property parsing

Consolidate GIC v3 and v5 deviceID retrieval in a function that addresses
the full set of issues in one go by merging GIC v3 and v5 code and
converting the msi-parent parsing loop to the more modern OF phandle
iterator API, fixing the struct device_node reference leak in the process.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20251021124103.198419-6-lpieralisi@kernel.org
2025-11-22 17:09:03 +01:00
Lorenzo Pieralisi 4f32612f6a PCI: iproc: Implement MSI controller node detection with of_msi_xlate()
The functionality implemented in the iproc driver in order to detect an
OF MSI controller node is now fully implemented in of_msi_xlate().

Replace the current msi-map/msi-parent parsing code with of_msi_xlate().

Since of_msi_xlate() is also a deviceID mapping API, pass in a fictitious
0 as deviceID - the driver only requires detecting the OF MSI controller
node not the deviceID mapping per-se (of_msi_xlate() return value is
ignored for the same reason).

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20251021124103.198419-5-lpieralisi@kernel.org
2025-11-22 17:09:03 +01:00
Thomas Gleixner ebb922c920 Linux 6.18-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmj+p+UeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGKsIH/1EFGYZDVJ7pTOcO
 qJY/xfu5YNd4ezZTGMW5SgJK+lAdJwkmbu8PUlcOhXKRVvACG9Tud/+pZzw966C5
 pk9pF9vpCXq2Zz6dk3/XGFARUPUlDA4uJ/jiPTNVA8yy+V18u+Ds55Y+rhv9MkcW
 n/Fi+fiYfjqAaqP328mWH9z51ibRqH3WQfqVdjzClzoSC31BuJUVEZi9s5FZ7C9Q
 OCvRLp8WvTpcQ7ab7WH/wCgznXEKyRM/OxaNtXWztod9GLqOmWoFiHUxWfEQ/gg+
 KzgbgQOeXI6q7U8xJZ/711ZFzGLR9VBEPN0HnqxRNr8fCpzJ9FKFGTFD2HcBgUjy
 F9JH3nk=
 =YBEg
 -----END PGP SIGNATURE-----

Merge tag 'v6.18-rc3' into irq/msi

Pick up OF changes to resolve dependencies
2025-11-22 17:07:57 +01:00
Babu Moger ac7de456a3 fs/resctrl: Update bit_usage to reflect io_alloc
The "shareable_bits" and "bit_usage" resctrl files associated with cache
resources give insight into how instances of a cache is used.

Update the annotated capacity bitmasks displayed by "bit_usage" to include the
cache portions allocated for I/O via the "io_alloc" feature. "shareable_bits"
is a global bitmask of shareable cache with I/O and can thus not present the
per-domain I/O allocations possible with the "io_alloc" feature. Revise the
"shareable_bits" documentation to direct users to "bit_usage" for accurate
cache usage information.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/e02a0d424129fd7f3e45822a559b1c614ae4652a.1762995456.git.babu.moger@amd.com
2025-11-22 14:30:34 +01:00
Babu Moger 28fa2cce7a fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks
The io_alloc feature in resctrl enables system software to configure the
portion of the cache allocated for I/O traffic. When supported, the
io_alloc_cbm file in resctrl provides access to capacity bitmasks (CBMs)
allocated for I/O devices.

Enable users to modify io_alloc CBMs by writing to the io_alloc_cbm resctrl
file when the io_alloc feature is enabled.

Mirror the CBMs between CDP_CODE and CDP_DATA when CDP is enabled to present
consistent I/O allocation information to user space.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/67609641b03ccfba18a8ee0bf9dbd1f3dcbecda3.1762995456.git.babu.moger@amd.com
2025-11-22 14:28:31 +01:00
Babu Moger af1242eeca fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID
parse_cbm() requires resource group mode and CLOSID to validate the capacity
bitmask (CBM). It is passed via struct rdtgroup in struct rdt_parse_data.

The io_alloc feature also uses CBMs to indicate which portions of cache are
allocated for I/O traffic. The CBMs are provided by user space and need to be
validated the same as CBMs provided for general (CPU) cache allocation.
parse_cbm() cannot be used as-is since io_alloc does not have rdtgroup context.

Pass the resource group mode and CLOSID directly to parse_cbm() via struct
rdt_parse_data, instead of through the rdtgroup struct, to facilitate calling
parse_cbm() to verify the CBM of the io_alloc feature.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/f8ec6ab5cf594d906a3fe75f56793d5fbd63f38f.1762995456.git.babu.moger@amd.com
2025-11-22 13:10:12 +01:00
Babu Moger 77b6623262 fs/resctrl: Introduce interface to display io_alloc CBMs
Introduce the "io_alloc_cbm" resctrl file to display the capacity bitmasks
(CBMs) that represent the portions of each cache instance allocated
for I/O traffic on a cache resource that supports the "io_alloc" feature.

io_alloc_cbm resides in the info directory of a cache resource, for example,
/sys/fs/resctrl/info/L3/. Since the resource name is part of the path, it
is not necessary to display the resource name as done in the schemata file.

When CDP is enabled, io_alloc routes traffic using the highest CLOSID
associated with the CDP_CODE resource and that CLOSID becomes unusable for
the CDP_DATA resource. The highest CLOSID of CDP_CODE and CDP_DATA resources
will be kept in sync to ensure consistent user interface. In preparation for
this, access the CBMs for I/O traffic through highest CLOSID of either
CDP_CODE or CDP_DATA resource.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/55a3ff66a70e7ce8239f022e62b334e9d64af604.1762995456.git.babu.moger@amd.com
2025-11-22 11:37:21 +01:00
Frederic Weisbecker 3de5e46e50 genirq: Remove cpumask availability check on kthread affinity setting
Failing to allocate the affinity mask of an interrupt descriptor fails the
whole descriptor initialization. It is then guaranteed that the cpumask is
always available whenever the related interrupt objects are alive, such as
the kthread handler.

Therefore remove the superfluous check since it is merely a historical
leftover. Get rid also of the comments above it that are obsolete and
useless.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251121143500.42111-4-frederic@kernel.org
2025-11-22 09:26:18 +01:00
Frederic Weisbecker 801afdfbfc genirq: Fix interrupt threads affinity vs. cpuset isolated partitions
When a cpuset isolated partition is created / updated or destroyed, the
interrupt threads are affined blindly to all the non-isolated CPUs. This
happens without taking into account the interrupt threads initial affinity
that becomes ignored.

For example in a system with 8 CPUs, if an interrupt and its kthread are
initially affine to CPU 5, creating an isolated partition with only CPU 2
inside will eventually end up affining the interrupt kthread to all CPUs
but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for CPU 5.

Besides the blind re-affining, this doesn't take care of the actual low
level interrupt which isn't migrated. As of today the only way to isolate
non managed interrupts, along with their kthreads, is to overwrite their
affinity separately, for example through /proc/irq/

To avoid doing that manually, future development should focus on updating
the interrupt's affinity whenever cpuset isolated partitions are updated.

In the meantime, cpuset shouldn't fiddle with interrupt threads directly.
To prevent from that, set the PF_NO_SETAFFINITY flag to them.

This is done through kthread_bind_mask() by affining them initially to all
possible CPUs as at that point the interrupt is not started up which means
the affinity of the hard interrupt is not known. The thread will adjust
that once it reaches the handler, which is guaranteed to happen after the
initial affinity of the hard interrupt is established.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251121143500.42111-3-frederic@kernel.org
2025-11-22 09:26:18 +01:00
Frederic Weisbecker 68775ca79a genirq: Prevent early spurious wake-ups of interrupt threads
During initialization, the interrupt thread is created before the interrupt
is enabled. The interrupt enablement happens before the actual kthread wake
up point. Once the interrupt is enabled the hardware can raise an interrupt
and once setup_irq() drops the descriptor lock a interrupt wake-up can
happen.

Even when such an interrupt can be considered premature, this is not a
problem in general because at the point where the descriptor lock is
dropped and the wakeup can happen, the data which is used by the thread is
fully initialized.

Though from the perspective of least surprise, the initial wakeup really
should be performed by the setup code and not randomly by a premature
interrupt.

Prevent this by performing a wake-up only if the target is in state
TASK_INTERRUPTIBLE, which the thread uses in wait_for_interrupt().

If the thread is still in state TASK_UNINTERRUPTIBLE, the wake-up is not
lost because after the setup code completed the initial wake-up the thread
will observe the IRQTF_RUNTHREAD and proceed with the handling.

[ tglx: Simplified the changes and extended the changelog. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251121143500.42111-2-frederic@kernel.org
2025-11-22 09:26:18 +01:00
Babu Moger 9445c7059c fs/resctrl: Add user interface to enable/disable io_alloc feature
AMD's SDCIAE forces all SDCI lines to be placed into the L3 cache portions
identified by the highest-supported L3_MASK_n register, where n is the maximum
supported CLOSID.

To support this, when io_alloc resctrl feature is enabled, reserve the highest
CLOSID exclusively for I/O allocation traffic making it no longer available for
general CPU cache allocation.

Introduce user interface to enable/disable io_alloc feature and encourage users
to enable io_alloc only when running workloads that can benefit from this
functionality. On enable, initialize the io_alloc CLOSID with all usable CBMs
across all the domains.

Since CLOSIDs are managed by resctrl fs, it is least invasive to make "io_alloc
is supported by maximum supported CLOSID" part of the initial resctrl fs
support for io_alloc. Take care to minimally (only in error messages) expose
this use of CLOSID for io_alloc to user space so that this is not required from
other architectures that may support io_alloc differently in the future.

When resctrl is mounted with "-o cdp" to enable code/data prioritization,
there are two L3 resources that can support I/O allocation: L3CODE and
L3DATA.  From resctrl fs perspective the two resources share a CLOSID and
the architecture's available CLOSID are halved to support this.

The architecture's underlying CLOSID used by SDCIAE when CDP is enabled is the
CLOSID associated with the CDP_CODE resource, but from resctrl's perspective
there is only one CLOSID for both CDP_CODE and CDP_DATA. CDP_DATA is thus not
usable for general (CPU) cache allocation nor I/O allocation.

Keep the CDP_CODE and CDP_DATA I/O alloc status in sync to avoid any confusion
to user space.  That is, enabling io_alloc on CDP_CODE does so on CDP_DATA and
vice-versa, and keep the I/O allocation CBMs of CDP_CODE and CDP_DATA in sync.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/c7d3037795e653e22b02d8fc73ca80d9b075031c.1762995456.git.babu.moger@amd.com
2025-11-21 23:01:54 +01:00
Babu Moger 48068e5650 fs/resctrl: Introduce interface to display "io_alloc" support
Introduce the "io_alloc" resctrl file to the "info" area of a cache resource,
for example /sys/fs/resctrl/info/L3/io_alloc. "io_alloc" indicates support for
the "io_alloc" feature that allows direct insertion of data from I/O
devices into the cache.

Restrict exposing support for "io_alloc" to the L3 resource that is the only
resource where this feature can be backed by AMD's L3 Smart Data Cache
Injection Allocation Enforcement (SDCIAE). With that, the "io_alloc" file is
only visible to user space if the L3 resource supports "io_alloc".

Doing so makes the file visible for all cache resources though, for example
also L2 cache (if it supports cache allocation). As a consequence, add
capability for file to report expected "enabled" and "disabled", as well as
"not supported".

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/e8b116a8f424128b227734bb1d433c14af478d90.1762995456.git.babu.moger@amd.com
2025-11-21 22:49:42 +01:00
Babu Moger 556d2892aa x86,fs/resctrl: Implement "io_alloc" enable/disable handlers
"io_alloc" is the generic name of the new resctrl feature that enables system
software to configure the portion of cache allocated for I/O traffic. On AMD
systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache
Injection Allocation Enforcement (SDCIAE).

Introduce the architecture-specific functions that resctrl fs should call to
enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state
by setting (to enable) or clearing (to disable) bit 1 of
MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com
2025-11-21 22:35:22 +01:00
Babu Moger 7923ae7698 x86,fs/resctrl: Detect io_alloc feature
AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software
to control the portions of L3 cache used for direct insertion of data from I/O
devices into the L3 cache.

Introduce a generic resctrl cache resource property "io_alloc_capable" as the
first part of the new "io_alloc" resctrl feature that will support AMD's
SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if
a portion of the cache can be allocated for I/O traffic.

Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD)
systems that support SDCIAE.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com
2025-11-21 22:04:59 +01:00
Babu Moger 4d4840b125 x86/resctrl: Add SDCIAE feature in the command line options
Add a kernel command-line parameter to enable or disable the exposure of
the L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE) hardware
feature to resctrl.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/c623edf7cb369ba9da966de47d9f1b666778a40e.1762995456.git.babu.moger@amd.com
2025-11-21 22:03:23 +01:00
Babu Moger 3767def18f x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement
Smart Data Cache Injection (SDCI) is a mechanism that enables direct insertion
of data from I/O devices into the L3 cache. By directly caching data from I/O
devices rather than first storing the I/O data in DRAM, SDCI reduces demands on
DRAM bandwidth and reduces latency to the processor consuming the I/O data.

The SDCIAE (SDCI Allocation Enforcement) PQE feature allows system software to
control the portion of the L3 cache used for SDCI.

When enabled, SDCIAE forces all SDCI lines to be placed into the L3 cache
partitions identified by the highest-supported L3_MASK_n register, where n is
the maximum supported CLOSID.

Add CPUID feature bit that can be used to configure SDCIAE.

The SDCIAE feature details are documented in:

  AMD64 Architecture Programmer's Manual Volume 2: System Programming
     Publication # 24593 Revision 3.41 section 19.4.7 L3 Smart Data Cache
    Injection Allocation Enforcement (SDCIAE).

available at https://bugzilla.kernel.org/show_bug.cgi?id=206537

Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/83ca10d981c48e86df2c3ad9658bb3ba3544c763.1762995456.git.babu.moger@amd.com
2025-11-21 22:03:07 +01:00
Kuppuswamy Sathyanarayanan 748d6ba43a powercap: intel_rapl: Enable MSR-based RAPL PMU support
Currently, RAPL PMU support requires adding CPU model entries to
arch/x86/events/rapl.c for each new generation. However, RAPL MSRs are
not architectural and require platform-specific customization, making
arch/x86 an inappropriate location for this functionality.

The powercap subsystem already handles RAPL functionality and is the
natural place to consolidate all RAPL features. The powercap RAPL
driver already includes PMU support for TPMI-based RAPL interfaces,
making it straightforward to extend this support to MSR-based RAPL
interfaces as well.

This consolidation eliminates the need to maintain RAPL support in
multiple subsystems and provides a unified approach for both TPMI and
MSR-based RAPL implementations.

The MSR-based PMU support includes the following updates:

 1. Register MSR-based PMU support for the supported platforms
    and unregister it when no online CPUs remain in the package.

 2. Remove existing checks that restrict RAPL PMU support to TPMI-based
    interfaces and extend the logic to allow MSR-based RAPL interfaces.

 3. Define a CPU model list to determine which processors should
    register RAPL PMU interface through the powercap driver for
    MSR-based RAPL, excluding those that support TPMI interface.
    This list prevents conflicts with existing arch/x86 PMU code
    that already registers RAPL PMU for some processors. Add
    Panther Lake & Wildcat Lake to the CPU models list.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251121000539.386069-3-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-21 21:47:08 +01:00
Kuppuswamy Sathyanarayanan 1d6c915819 powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers
The current read_raw() implementation of the TPMI, MMIO and MSR
interfaces does not distinguish between atomic and non-atomic callers.

rapl_msr_read_raw() uses rdmsrq_safe_on_cpu(), which can sleep and
issue cross CPU calls. When MSR-based RAPL PMU support is enabled, PMU
event handlers can invoke this function from atomic context where
sleeping or rescheduling is not allowed. In atomic context, the caller
is already executing on the target CPU, so a direct rdmsrq() is
sufficient.

To support such usage, introduce an atomic flag to the read_raw()
interface to allow callers pass the context information. Modify the
common RAPL code to propagate this flag, and set the flag to reflect
the calling contexts.

Utilize the atomic flag in rapl_msr_read_raw() to perform direct MSR
read with rdmsrq() when running in atomic context, and a sanity check
to ensure target CPU matches the current CPU for such use cases.

The TPMI and MMIO implementations do not require special atomic
handling, so the flag is ignored in those paths.

This is a preparatory patch for adding MSR-based RAPL PMU support.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251121000539.386069-2-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-21 21:47:08 +01:00
Smita Koralahalli 5c4663ed1e x86/mce: Handle AMD threshold interrupt storms
Extend the logic of handling CMCI storms to AMD threshold interrupts.

Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and
per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on
a storm. Rather, disable the interrupt on the corresponding CPU and bank.
Re-enable back the interrupts if enough consecutive polls of the bank show no
corrected errors (30, as programmed by Intel).

Turning off the threshold interrupts would be a better solution on AMD systems
as other error severities will still be handled even if the threshold
interrupts are disabled.

  [ Tony: Small tweak because mce_handle_storm() isn't a pointer now ]
  [ Yazen: Rebase and simplify ]
  [ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix
    checkpatch warnings. ]

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com
2025-11-21 20:41:10 +01:00
Avadhut Naik d7ac083f09 x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems
Currently, when a CMCI storm detected on a Machine Check bank, subsides, the
bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared
unconditionally by cmci_storm_end().

On AMD SMCA systems, this essentially disables polling on that particular bank
on that CPU. Consequently, any subsequent correctable errors or storms will not
be logged.

Since AMD SMCA systems allow banks to be managed by both polling and
interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should
not be modified when a storm subsides.

Fixes: 7eae17c4ad ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com
2025-11-21 20:33:12 +01:00
Ma Ke ef1b6d9049 EDAC/igen6: Fix error handling in igen6_edac driver
The igen6_edac driver calls device_initialize() for all memory
controllers in igen6_register_mci(), but misses corresponding
put_device() calls in error paths and during normal shutdown in
igen6_unregister_mcis().

Adding the missing put_device() calls improves code readability and
ensures proper reference counting for the device structure.

Found by code review.

Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251105090244.23327-1-make24@iscas.ac.cn
2025-11-21 10:20:51 -08:00
Qiuxu Zhuo 5f40ea7f41 EDAC/imh: Setup 'imh_test' debugfs testing node
Setup the following debugfs testing node to enable fake memory error
address decoding tests for the imh_edac driver.

  /sys/kernel/debug/edac/imh_test/addr

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-8-qiuxu.zhuo@intel.com
2025-11-21 10:20:51 -08:00
Qiuxu Zhuo f619613f30 EDAC/{skx_comm,imh}: Detect 2-level memory configuration
Detect 2-level memory configurations and notify the 'skx_common' library
to enable ADXL 2-level memory error decoding.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-7-qiuxu.zhuo@intel.com
2025-11-21 10:20:51 -08:00
Qiuxu Zhuo 39abdcbdad EDAC/skx_common: Extend the maximum number of DRAM chip row bits
The allowed maximum number of row bits for DRAM chips in the Diamond
Rapids server processor is up to 19. Extend the current maximum row
bits from 18 to 19.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-6-qiuxu.zhuo@intel.com
2025-11-21 10:20:51 -08:00
Qiuxu Zhuo 9fc67b1170 EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers
Intel Diamond Rapids CPUs include Integrated Memory and I/O Hubs (IMH).
The memory controllers within the IMHs provide memory stacks to the
processor. Create a new driver for this IMH-based memory controllers
rather than applying additional patches to the existing i10nm_edac.c
for the following reasons:

1) The memory controllers are not presented as PCI devices; instead,
   the detection and all their registers have been transitioned to
   MMIO-based memory spaces.

2) Validation processes are costly. Modifications to i10nm_edac would
   require extensive validation checks against multiple platforms,
   including Ice Lake, Sapphire Rapids, Emerald Rapids, Granite Rapids,
   Sierra Forest, and Grand Ridge.

3) Future Intel CPUs will likely only need patches on top of this new
   EDAC driver. Validation can be limited to Diamond Rapids servers
   and future Intel CPU generations.

[Tony: Fix kerneldoc for struct local_reg]
[randconfig: Added dependencies on NFIT and DMI]

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-5-qiuxu.zhuo@intel.com
2025-11-21 10:19:43 -08:00
Avadhut Naik 821f5fe4db x86/mce: Add support for physical address valid bit
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in
MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System
Physical Address (SPA) is present in MCA_ADDR.

PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it
indicates validity of SPA in MCA_ADDR.

PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if
it is implementation specific.

Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com
2025-11-21 10:32:28 +01:00
Yazen Ghannam eeb3f76d73 x86/mce: Save and use APEI corrected threshold limit
The MCA threshold limit generally is not something that needs to change during
runtime. It is common for a system administrator to decide on a policy for
their managed systems.

If MCA thresholding is OS-managed, then the threshold limit must be set at
every boot. However, many systems allow the user to set a value in their BIOS.
And this is reported through an APEI HEST entry even if thresholding is not in
FW-First mode.

Use this value, if available, to set the OS-managed threshold limit.  Users
can still override it through sysfs if desired for testing or debug.

APEI is parsed after MCE is initialized. So reset the thresholding blocks
later to pick up the threshold limit.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-21 10:32:28 +01:00
Christian Marangi c3852d2ca4 cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list
If CONFIG_OF is not enabled, of_match_node() is set as NULL and
qcom_cpufreq_ipq806x_match_list won't be used causing a compilation
warning.

Flag qcom_cpufreq_ipq806x_match_list as __maybe_unused to fix the
compilation warning.

While at it also flag as __initconst as it's used only in probe contest
and can be freed after probe.

This follows the pattern of the usual of_device_id variables.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511202119.6zvvFMup-lkp@intel.com/
Fixes: 58f5d39d5e ("cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
[ Viresh: Drop __initconst ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-11-21 10:21:13 +05:30
Samuel Wu 8e2d57e653 PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper()
Replace the direct calls to ksys_sync_helper() with the new
pm_sleep_fs_sync() in suspend and hibernation code paths.

This enables the new mechanism allowing the filesystem sync phase
to be interrupted.

Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Samuel Wu <wusamuel@google.com>
Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Subject and changelog edits, tags adjustment ]
Link: https://patch.msgid.link/20251119171426.4086783-3-wusamuel@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 22:29:40 +01:00
Samuel Wu bf8867eae1 PM: sleep: Add support for wakeup during filesystem sync
Add helper function pm_sleep_fs_sync() and related data structures
as a preparation for allowing system suspend and hibernation to be
aborted by wakeup events while syncing file systems.

The new function, to be called by the suspend process in order to
sync file systems, uses a dedicated ordered workqueue to run
ksys_sync_helper() in parallel with the calling process.  Next, it
waits for the completion of the filesystem sync and periodically
checks if any system wakeup events are pending, in which case it will
return an error.

If that happens while the filesystem sync is still in progress, it
will continue, possibly after pm_sleep_fs_sync() has returned, and if
that function is called again before the sync is complete, a new work
item to run ksys_sync_helper() again will be queued (and waited for)
to increase the likelihood of writing all of the dirty pages in memory
back to persistent storage.

Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Samuel Wu <wusamuel@google.com>
Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Subject and changelog rewrite, tags adjustment ]
Link: https://patch.msgid.link/20251119171426.4086783-2-wusamuel@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 22:29:40 +01:00
Rafael J. Wysocki a857b530b3 Merge back material related to system sleep for 6.19 2025-11-20 22:28:23 +01:00
Kaushlendra Kumar 1b541e10ee cpufreq: ACPI: Replace udelay() with usleep_range()
Replace udelay() with usleep_range() in check_freqs() to allow
CPU scheduling during frequency polling.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251119031109.134583-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 21:50:08 +01:00
Srinivas Pandruvada 8538e7ee09 docs: driver-api/thermal/intel_dptf: Add new workload type hint
Add documentation for longer term classification of workload type for
power or performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20251118223620.554798-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 21:32:52 +01:00
Ard Biesheuvel a3e6907128 x86/boot: Drop unused sev_enable() fallback
The misc.h header is not included by the EFI stub, which is the only
C caller of sev_enable(). This means the fallback for cases where
CONFIG_AMD_MEM_ENCRYPT is not set is never used, so it can be dropped.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/20250909080631.2867579-6-ardb+git@google.com
2025-11-20 21:12:48 +01:00
Gabriele Monaco 7dec062cfc timers/migration: Exclude isolated cpus from hierarchy
The timer migration mechanism allows active CPUs to pull timers from
idle ones to improve the overall idle time. This is however undesired
when CPU intensive workloads run on isolated cores, as the algorithm
would move the timers from housekeeping to isolated cores, negatively
affecting the isolation.

Exclude isolated cores from the timer migration algorithm, extend the
concept of unavailable cores, currently used for offline ones, to
isolated ones:
* A core is unavailable if isolated or offline;
* A core is available if non isolated and online;

A core is considered unavailable as isolated if it belongs to:
* the isolcpus (domain) list
* an isolated cpuset
Except if it is:
* in the nohz_full list (already idle for the hierarchy)
* the nohz timekeeper core (must be available to handle global timers)

CPUs are added to the hierarchy during late boot, excluding isolated
ones, the hierarchy is also adapted when the cpuset isolation changes.

Due to how the timer migration algorithm works, any CPU part of the
hierarchy can have their global timers pulled by remote CPUs and have to
pull remote timers, only skipping pulling remote timers would break the
logic.
For this reason, prevent isolated CPUs from pulling remote global
timers, but also the other way around: any global timer started on an
isolated CPU will run there. This does not break the concept of
isolation (global timers don't come from outside the CPU) and, if
considered inappropriate, can usually be mitigated with other isolation
techniques (e.g. IRQ pinning).

This effect was noticed on a 128 cores machine running oslat on the
isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs,
and the CPU with lowest count in a timer migration hierarchy (here 1
and 65) appears as always active and continuously pulls global timers,
from the housekeeping CPUs. This ends up moving driver work (e.g.
delayed work) to isolated CPUs and causes latency spikes:

before the change:

 # oslat -c 1-31,33-63,65-95,97-127 -D 62s
 ...
  Maximum:     1203 10 3 4 ... 5 (us)

after the change:

 # oslat -c 1-31,33-63,65-95,97-127 -D 62s
 ...
  Maximum:      10 4 3 4 3 ... 5 (us)

The same behaviour was observed on a machine with as few as 20 cores /
40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: John B. Wyatt IV <jwyatt@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251120145653.296659-8-gmonaco@redhat.com
2025-11-20 20:17:32 +01:00
Yury Norov b56651007f cpumask: Add initialiser to use cleanup helpers
Now we can simplify a code that allocates cpumasks for local needs.

Automatic variables have to be initialized at declaration, or at least
before any possibility for the logic to return, so that compiler
wouldn't try to call an associate destructor function on a random stack
number.

Because cpumask_var_t, depending on the CPUMASK_OFFSTACK config, is
either a pointer or an array, we have to have a macro for initialization.

So define a CPUMASK_VAR_NULL macro, which allows to init struct cpumask
pointer with NULL when CPUMASK_OFFSTACK is enabled, and effectively a
no-op when CPUMASK_OFFSTACK is disabled (initialisation optimised out
with -O2).

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251120145653.296659-7-gmonaco@redhat.com
2025-11-20 20:17:32 +01:00
Gabriele Monaco 185bccc797 sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
Currently the user can set up isolcpus and nohz_full in such a way that
leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated
nor nohz full). This can be a problem for other subsystems (e.g. the
timer wheel imgration).

Prevent this configuration by invalidating the last setting in case the
union of isolcpus (domain) and nohz_full covers all CPUs.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251120145653.296659-6-gmonaco@redhat.com
2025-11-20 20:17:31 +01:00
Gabriele Monaco 22f8e41680 cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
update_unbound_workqueue_cpumask() updates unbound workqueues settings
when there's a change in isolated CPUs, but it can be used for other
subsystems requiring updated when isolated CPUs change.

Generalise the name to update_isolation_cpumasks() to prepare for other
functions unrelated to workqueues to be called in that spot.

[longman: Change the function name to update_isolation_cpumasks()]

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Chen Ridong <chenridong@huaweicloud.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20251120145653.296659-5-gmonaco@redhat.com
2025-11-20 20:17:31 +01:00
Gabriele Monaco 4c2374ed86 timers/migration: Use scoped_guard on available flag set/clear
Cleanup tmigr_clear_cpu_available() and tmigr_set_cpu_available() to
prepare for easier checks on the available flag.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120145653.296659-4-gmonaco@redhat.com
2025-11-20 20:17:31 +01:00
Gabriele Monaco a048ca5f00 timers/migration: Add mask for CPUs available in the hierarchy
Keep track of the CPUs available for timer migration in a cpumask. This
prepares the ground to generalise the concept of unavailable CPUs.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120145653.296659-3-gmonaco@redhat.com
2025-11-20 20:17:31 +01:00
Gabriele Monaco 8312cab5ff timers/migration: Rename 'online' bit to 'available'
The timer migration hierarchy excludes offline CPUs via the
tmigr_is_not_available function, which is essentially checking the
online bit for the CPU.

Rename the online bit to available and all references in function names
and tracepoint to generalise the concept of available CPUs.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251120145653.296659-2-gmonaco@redhat.com
2025-11-20 20:17:31 +01:00
Cai Xinchen f20810157f arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
The commit e7bafbf717 ("arm64: mm: Add top-level dispatcher for
internal mem_encrypt API") adds ARCH_HAS_MEM_ENCRYPT. And then the
commit 42be24a417 ("arm64: Enable memory encrypt for Realms") adds
duplicate config. Just remove it.

Fixes: 42be24a417 ("arm64: Enable memory encrypt for Realms")
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-20 17:57:23 +00:00
Yang Shi a06494adb7 arm64: mm: use untagged address to calculate page index
Nathan Chancellor reported the below bug:

[    0.149929] BUG: KASAN: invalid-access in change_memory_common+0x258/0x2d0
[    0.151006] Read of size 8 at addr f96680000268a000 by task swapper/0/1
[    0.152031]
[    0.152274] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc1-00012-g37cb0aab9068 #1 PREEMPT
[    0.152288] Hardware name: linux,dummy-virt (DT)
[    0.152292] Call trace:
[    0.152295]  show_stack+0x18/0x30 (C)
[    0.152309]  dump_stack_lvl+0x60/0x80
[    0.152320]  print_report+0x480/0x498
[    0.152331]  kasan_report+0xac/0xf0
[    0.152343]  kasan_check_range+0x90/0xb0
[    0.152353]  __hwasan_load8_noabort+0x20/0x34
[    0.152364]  change_memory_common+0x258/0x2d0
[    0.152375]  set_memory_ro+0x18/0x24
[    0.152386]  bpf_prog_pack_alloc+0x200/0x2e8
[    0.152397]  bpf_jit_binary_pack_alloc+0x78/0x188
[    0.152409]  bpf_int_jit_compile+0xa4c/0xc74
[    0.152420]  bpf_prog_select_runtime+0x1c0/0x2bc
[    0.152430]  bpf_prepare_filter+0x5a4/0x7c0
[    0.152443]  bpf_prog_create+0xa4/0x100
[    0.152454]  ptp_classifier_init+0x80/0xd0
[    0.152465]  sock_init+0x12c/0x178
[    0.152474]  do_one_initcall+0xa0/0x260
[    0.152484]  kernel_init_freeable+0x2d8/0x358
[    0.152495]  kernel_init+0x20/0x140
[    0.152510]  ret_from_fork+0x10/0x20

It is because the KASAN tagged address was used when calculating the page
index. The untagged address should be used.

Fixes: 37cb0aab90 ("arm64: mm: make linear mapping permission update more robust for patial range")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-20 17:57:19 +00:00
Rafael J. Wysocki d834e68a0e cpuidle: governors: teo: Simplify intercepts-based state lookup
Simplify the loop looking up a candidate idle state in the case when an
intercept is likely to occur by adding a search for the state index limit
if the tick is stopped before it.

First, call tick_nohz_tick_stopped() just once and if it returns true,
look for the shallowest state index below the current candidate one with
target residency at least equal to the tick period length.

Next, simply look for a state that is not shallower than the one found
in the previous step and satisfies the intercepts majority condition (if
there are no such states, the shallowest state that is not shallower
than the one found in the previous step becomes the new candidate).

Since teo_state_ok() has no callers any more after the above changes,
drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Changelog clarification and code comment edit ]
Link: https://patch.msgid.link/2418792.ElGaqSPkdT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 16:32:48 +01:00
Geert Uytterhoeven a6eb177102 thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
Convert the Renesas R-Car Gen3 thermal driver from SIMPLE_DEV_PM_OPS()
to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr().  This lets us drop the
__maybe_unused annotation from its resume callback, and reduces kernel
size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/813ad36fdc8561cf1c396230436e8ff3ff903a1f.1763117455.git.geert+renesas@glider.be
2025-11-20 15:33:45 +01:00
Geert Uytterhoeven 186b5c2726 thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
Convert the Renesas R-Car thermal driver from SIMPLE_DEV_PM_OPS() to
DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr().  This lets us drop the
check for CONFIG_PM_SLEEP, and reduces kernel size in case CONFIG_PM or
CONFIG_PM_SLEEP is disabled, while increasing build coverage.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/ee03ec71d10fd589e7458fa1b0ada3d3c19dbb54.1763117351.git.geert+renesas@glider.be
2025-11-20 15:32:14 +01:00
Rafael J. Wysocki 50db438231 cpuidle: governors: teo: Fix tick_intercepts handling in teo_update()
The condition deciding whether or not to increase cpu_data->tick_intercepts
in teo_update() is reverse, so fix it.

Fixes: d619b5cc67 ("cpuidle: teo: Simplify counting events used for tick management")
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 0796ddf4a7f0: cpuidle: teo: Use this_cpu_ptr() where possible
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 8f3f01082d7a: cpuidle: governors: teo: Use s64 consistently in teo_update()
Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: b54df61c7428: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
Cc: 6.14+ <stable@vger.kernel.org> 6.14+: 083654ded547: cpuidle: governors: teo: Rework the handling of tick wakeups
Cc: 6.14+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5085160.31r3eYUQgx@rafael.j.wysocki
2025-11-20 14:54:08 +01:00
Rafael J. Wysocki 083654ded5 cpuidle: governors: teo: Rework the handling of tick wakeups
If the wakeup pattern is clearly dominated by tick wakeups, count those
wakeups as hits on the deepest available idle state to increase the
likelihood of stopping the tick, especially on systems where there are
only 2 usable idle states and the tick can only be stopped when the
deeper state is selected.

This change is expected to reduce power on some systems where state 0 is
selected relatively often even though they are almost idle.  Without it,
the governor may end up selecting the shallowest idle state all the time
even if the system is almost completely idle due all tick wakeups being
counted as hits on that state and preventing the tick from being stopped
at all.

Fixes: 4b20b07ce7 ("cpuidle: teo: Don't count non-existent intercepts")
Reported-by: Reka Norman <rekanorman@chromium.org>
Closes: https://lore.kernel.org/linux-pm/CAEmPcwsNMNnNXuxgvHTQ93Mx-q3Oz9U57THQsU_qdcCx1m4w5g@mail.gmail.com/
Tested-by: Reka Norman <rekanorman@chromium.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 92ce5c07b7a1: cpuidle: teo: Reorder candidate state index checks
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ea185406d1ed: cpuidle: teo: Combine candidate state index checks against 0
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b9a6af26bd83: cpuidle: teo: Drop local variable prev_intercept_idx
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: e24f8a55de50: cpuidle: teo: Clarify two code comments
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: d619b5cc6780: cpuidle: teo: Simplify counting events used for tick management
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 13ed5c4a6d9c: cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ddcfa7964677: cpuidle: teo: Simplify handling of total events count
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 65e18e654475: cpuidle: teo: Replace time_span_ns with a flag
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 0796ddf4a7f0: cpuidle: teo: Use this_cpu_ptr() where possible
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 8f3f01082d7a: cpuidle: governors: teo: Use s64 consistently in teo_update()
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b54df61c7428: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
Cc: 6.11+ <stable@vger.kernel.org> # 6.11+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Rebase on commit 0796ddf4a7, changelog update ]
Link: https://patch.msgid.link/6228387.lOV4Wx5bFT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-20 14:49:57 +01:00
Thomas Gleixner 79c11fb3da sched/mmcid: Use cpumask_weighted_or()
Use cpumask_weighted_or() instead of cpumask_or() and cpumask_weight() on
the result, which walks the same bitmap twice. Results in 10-20% less
cycles, which reduces the runqueue lock hold time.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251119172549.511736272@linutronix.de
2025-11-20 12:14:54 +01:00
Thomas Gleixner 437cb3ded2 cpumask: Introduce cpumask_weighted_or()
CID management OR's two cpumasks and then calculates the weight on the
result. That's inefficient as that has to walk the same stuff twice. As
this is done with runqueue lock held, there is a real benefit of speeding
this up. Depending on the system this results in 10-20% less cycles spent
with runqueue lock held for a 4K cpumask.

Provide cpumask_weighted_or() and the corresponding bitmap functions which
return the weight of the OR result right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.448263340@linutronix.de
2025-11-20 12:14:54 +01:00
Thomas Gleixner 0d032a43eb sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
mm_update_cpus_allowed() is not required to be invoked for affinity changes
due to migrate_disable() and migrate_enable().

migrate_disable() restricts the task temporarily to a CPU on which the task
was already allowed to run, so nothing changes. migrate_enable() restores
the actual task affinity mask.

If that mask changed between migrate_disable() and migrate_enable() then
that change was already accounted for.

Move the invocation to the proper place to avoid that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.385208276@linutronix.de
2025-11-20 12:14:54 +01:00
Thomas Gleixner b08ef5fc8f sched/mmcid: Move scheduler code out of global header
This is only used in the scheduler core code, so there is no point to have
it in a global header.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251119172549.321259077@linutronix.de
2025-11-20 12:14:53 +01:00
Thomas Gleixner 925b7847bb sched: Fixup whitespace damage
With whitespace checks enabled in the editor this makes eyes bleed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.258651925@linutronix.de
2025-11-20 12:14:53 +01:00
Thomas Gleixner be4463fa2c sched/mmcid: Cacheline align MM CID storage
Both the per CPU storage and the data in mm_struct are heavily used in
context switch. As they can end up next to other frequently modified data,
they are subject to false sharing.

Make them cache line aligned.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.194111661@linutronix.de
2025-11-20 12:14:53 +01:00
Thomas Gleixner 8cea569ca7 sched/mmcid: Use proper data structures
Having a lot of CID functionality specific members in struct task_struct
and struct mm_struct is not really making the code easier to read.

Encapsulate the CID specific parts in data structures and keep them
separate from the stuff they are embedded in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de
2025-11-20 12:14:52 +01:00
Thomas Gleixner 77d7dc8bef sched/mmcid: Revert the complex CID management
The CID management is a complex beast, which affects both scheduling and
task migration. The compaction mechanism forces random tasks of a process
into task work on exit to user space causing latency spikes.

Revert back to the initial simple bitmap allocating mechanics, which are
known to have scalability issues as that allows to gradually build up a
replacement functionality in a reviewable way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251119172549.068197830@linutronix.de
2025-11-20 12:14:52 +01:00
Qiuxu Zhuo d4839582bc EDAC/skx_common: Prepare for skx_set_hi_lo()
The upcoming imh_edac driver for Intel Diamond Rapids servers cannot
use skx_get_hi_lo() in skx_common to retrieve the TOHM (Top of High
Memory) and TOLM (Top of Low Memory) parameters. Instead, it obtains
these parameters within its own EDAC driver. To accommodate this,
prepare skx_set_hi_lo() to allow the driver to notify skx_common of
these parameters.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-4-qiuxu.zhuo@intel.com
2025-11-19 12:11:40 -08:00
Qiuxu Zhuo 9529e69773 EDAC/skx_common: Prepare for skx_get_edac_list()
The Intel EDAC library 'skx_common' maintains the Intel server EDAC device
list for {skx, i10nm}_edac drivers, which use skx_get_all_bus_mappings()
to build and retrieve the EDAC device list.

However, the upcoming Intel EDAC driver, imh_edac, for Diamond Rapids
servers is designed for memory controllers that are MMIO-based devices
rather than PCI devices. Consequently, it can't use
skx_get_all_bus_mappings() due to the absence of a PCI bus. To accommodate
this, prepare skx_get_edac_list() to enable the upcoming imh_edac driver
to obtain the EDAC device list from the skx_common library and build the
EDAC device list independently.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-3-qiuxu.zhuo@intel.com
2025-11-19 12:11:40 -08:00
Qiuxu Zhuo b3d70059cb EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev
Memory controllers in the new Intel server CPUs, such as Diamond Rapids,
are presented as MMIO-based devices rather than PCI devices.
Modify skx_register_mci() to be independent of 'pci_dev' and use a generic
'dev' of 'struct device' to prepare for support of such MMIO-based memory
controllers.

Tested-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20251119134132.2389472-2-qiuxu.zhuo@intel.com
2025-11-19 12:11:40 -08:00
Ben Horgan ce1e1421f8 MAINTAINERS: new entry for MPAM Driver
Create a maintainer entry for the new MPAM Driver. Add myself and
James Morse as maintainers. James created the driver and I have
taken up the later versions of his series.

Cc: James Morse <james.morse@arm.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:24 +00:00
James Morse 2557e0eafe arm_mpam: Add kunit tests for props_mismatch()
When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.

Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:24 +00:00
James Morse e3565d1fd4 arm_mpam: Add kunit test for bitmap reset
The bitmap reset code has been a source of bugs. Add a unit test.

This currently has to be built in, as the rest of the driver is
builtin.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:24 +00:00
James Morse 201d96ca4c arm_mpam: Add helper to reset saved mbwu state
resctrl expects to reset the bandwidth counters when the filesystem
is mounted.

To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.

Signed-off-by: James Morse <james.morse@arm.com>
Cc: Peter Newman <peternewman@google.com>
Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:24 +00:00
Rohit Mathew 9e5afb7c32 arm_mpam: Use long MBWU counters if supported
Now that the larger counter sizes are probed, make use of them.

Callers of mpam_msmon_read() may not know (or care!) about the different
counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
driver pick the counter to use.

Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit, added explicit counter selection ]
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Peter Newman <peternewman@google.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:24 +00:00
Rohit Mathew fdc29a141d arm_mpam: Probe for long/lwd mbwu counters
mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
indicating support for long counters.

Probe these feature bits.

The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
monitors are supported, instead of muddling this with which size of
bandwidth monitors, add an explicit 31 bit counter feature.

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[ morse: Added 31bit counter feature to simplify later logic ]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:23 +00:00
Ben Horgan b353637932 arm_mpam: Consider overflow in bandwidth counter state
Use the overflow status bit to track overflow on each bandwidth counter
read and add the counter size to the correction when overflow is detected.

This assumes that only a single overflow has occurred since the last read
of the counter. Overflow interrupts, on hardware that supports them could
be used to remove this limitation.

Cc: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:23 +00:00
James Morse 41e8a14950 arm_mpam: Track bandwidth counter state for power management
Bandwidth counters need to run continuously to correctly reflect the
bandwidth.

Save the counter state when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter. Support for
tracking overflow with the same structure will be added in a
subsequent commit.

Cc: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:23 +00:00
James Morse 823e7c3712 arm_mpam: Add mpam_msmon_read() to read monitor value
Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.

Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.

CC: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:23 +00:00
James Morse c891bae664 arm_mpam: Add helpers to allocate monitors
MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 880df85d86 arm_mpam: Probe and reset the rest of the features
MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.

Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.

PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 09b89d2a72 arm_mpam: Allow configuration to be applied and restored during cpu online
When CPUs come online the MSC's original configuration should be restored.

Add struct mpam_config to hold the configuration. For each component, this
has a bitmap of features that have been changed from the reset values. The
mpam_config is also used on RIS reset where all bits are set to ensure all
features are reset.

Once the maximum partid is known, allocate a configuration array for each
component, and reprogram each RIS configuration from this.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Fujitsu Fujitsu <Shaopeng Tan  tan.shaopeng@fujitsu.com>
Cc: Peter Newman peternewman@google.com
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 3796f75aa7 arm_mpam: Use a static key to indicate when mpam is enabled
Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.

After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.

Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in response to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 49aa621c4d arm_mpam: Register and enable IRQs
Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.

Only the irq handler accesses the MPAMF_ESR register, so no locking is
needed. The work to disable MPAM after an error needs to happen at process
context as it takes mutex. It also unregisters the interrupts, meaning
it can't be done from the threaded part of a threaded interrupt.
Instead, mpam_disable() gets scheduled.

Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.

Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.

CC: Rohit Mathew <rohit.mathew@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 3bd04fe7d8 arm_mpam: Extend reset logic to allow devices to be reset any time
cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.

Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:22 +00:00
James Morse 475228d15d arm_mpam: Add a helper to touch an MSC from any CPU
Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.

Add a helper that schedules the provided function if necessary.

Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:21 +00:00
James Morse f188a36ca2 arm_mpam: Reset MSC controls from cpuhp callbacks
When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtied. e.g. Kexec.

Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.

MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.

If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.

To reset, write the maximum values for all discovered controls.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:21 +00:00
James Morse c10ca83a77 arm_mpam: Merge supported features during mpam_enable() into mpam_class
To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.

Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.

If bitmap properties are mismatched within a component we
cannot support the mismatched feature.

Care has to be taken as vMSC may hold mismatched RIS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:21 +00:00
James Morse 8c90dc68a5 arm_mpam: Probe the hardware features resctrl supports
Expand the probing support with the control and monitor types
we can use with resctrl.

CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:21 +00:00
James Morse d02beb06ca arm_mpam: Add helpers for managing the locking around the mon_sel registers
The MSC MON_SEL register needs to be accessed from hardirq for the overflow
interrupt, and when taking an IPI to access these registers on platforms
where MSC are not accessible from every CPU. This makes an irqsave
spinlock the obvious lock to protect these registers. On systems with SCMI
or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
The SCMI or PCC platforms can't support an overflow interrupt, and
can't access the registers from hardirq context.

Clearly these two can't exist for one MSC at the same time.

Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
only support 'real' MMIO platforms.

In the future this lock will be split in two allowing SCMI/PCC platforms
to take a mutex. Because there are contexts where the SCMI/PCC platforms
can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
this now, so that all the error handling on these paths is present. This
allows the relevant paths to fail if they are needed on a platform where
this isn't possible, instead of having to make explicit checks of the
interface type.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:21 +00:00
James Morse bd221f9f82 arm_mpam: Probe hardware to find the supported partid/pmg values
CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may also have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
The limits from requestors (e.g. CPUs) also need taking into account.

While doing this, RIS entries that firmware didn't describe are created
under MPAM_CLASS_UNKNOWN.

This adds the low level MSC write accessors.

While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
James Morse 8f8d0ac1da arm_mpam: Add cpuhp callbacks to probe MSC hardware
Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.

Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed as each CPU's online call is made.

This adds the low-level MSC register read accessors.

Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.

The architecture's context switch code will be enabled by a static-key,
this can be set by mpam_enable(), but must be done from process context,
not a cpuhp callback because both take the cpuhp lock.
Whenever a new MSC has been probed, the mpam_enable() work is scheduled
to test if all the MSCs have been probed. If probing fails, mpam_disable()
is scheduled to unregister the cpuhp callbacks and free memory.

CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
James Morse aa64b9e110 arm_mpam: Add MPAM MSC register layout definitions
Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.

Add the definitions for these registers as offset within the page(s).

Link: https://developer.arm.com/documentation/ihi0099/aa/
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
James Morse 01fb4b8224 arm_mpam: Add the class and component structures for firmware described ris
An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.

To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)

Add support for creating and destroying structures to allow a hierarchy
of resources to be created.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
James Morse f04046f257 arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.

Start with driver probe/remove and mapping the MSC.

Cc: Carl Worth <carl@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
James Morse 115c5325be ACPI / MPAM: Parse the MPAM table
Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.

This happens in two stages. Platform devices are created first for the
MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
to discover the RIS entries the MSC contains.

For now the MPAM hook mpam_ris_create() is stubbed out, but will update
the MPAM driver with optional discovered data about the RIS entries.

CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:20 +00:00
Ben Horgan 96f4a4d53e ACPI: Define acpi_put_table cleanup handler and acpi_get_table_pointer() helper
Define a cleanup helper for use with __free to release the acpi table when
the pointer goes out of scope. Also, introduce the helper
acpi_get_table_pointer() to simplify a commonly used pattern involving
acpi_get_table().

These are first used in a subsequent commit.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:19 +00:00
Ben Horgan f5915600cc platform: Define platform_device_put cleanup handler
Define a cleanup helper for use with __free to destroy platform devices
automatically when the pointer goes out of scope. This is only intended to
be used in error cases and so should be used with return_ptr() or
no_free_ptr() directly to avoid the automatic destruction on success.

A first use of this is introduced in a subsequent commit.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:19 +00:00
James Morse d8bf01d809 arm64: kconfig: Add Kconfig entry for MPAM
The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it. As MPAM is only found on arm64
platforms, the arm64 tree is the most natural home for the Kconfig
option.

This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and to register properties
of CPUs with the MPAM driver.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
CC: Dave Martin <dave.martin@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:19 +00:00
James Morse a39a723a6f ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
MPAM identifies CPUs by the cache_id in the PPTT cache structure.

The driver needs to know which CPUs are associated with the cache.
The CPUs may not all be online, so cacheinfo does not have the
information.

Add a helper to pull this information out of the PPTT.

CC: Rohit Mathew <Rohit.Mathew@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:14 +00:00
James Morse 41a7bb39fe ACPI / PPTT: Find cache level by cache-id
The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.

Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.

Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:09 +00:00
Ben Horgan cfc085af83 ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure
In actbl2.h, acpi_pptt_cache describes the fields in the original
Cache Type Structure. In PPTT table version 3 a new field was added at the
end, cache_id. This is described in acpi_pptt_cache_v1 but rather than
including all v1 fields it just includes this one.

In lieu of this being fixed in acpica, introduce acpi_pptt_cache_v1_full to
contain all the fields of the Cache Type Structure . Update the existing
code to use this new struct. This simplifies the code and removes a
non-standard use of ACPI_ADD_PTR.

Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:34:01 +00:00
James Morse eeec7845e9 ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter.  The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.

Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.

Get rid of the levels parameter, which has no remaining purpose.

Fix acpi_get_cache_info() to match.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:33:56 +00:00
James Morse 796e29b857 ACPI / PPTT: Add a helper to fill a cpumask from a processor container
The ACPI MPAM table uses the UID of a processor container specified in
the PPTT to indicate the subset of CPUs and cache topology that can
access each MPAM System Component (MSC).

This information is not directly useful to the kernel. The equivalent
cpumask is needed instead.

Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.

CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Zeng Heng <zengheng4@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 18:33:36 +00:00
Huang Ying cb1fa2e999 arm64, tlbflush: don't TLBI broadcast if page reused in write fault
A multi-thread customer workload with large memory footprint uses
fork()/exec() to run some external programs every tens seconds.  When
running the workload on an arm64 server machine, it's observed that
quite some CPU cycles are spent in the TLB flushing functions.  While
running the workload on the x86_64 server machine, it's not.  This
causes the performance on arm64 to be much worse than that on x86_64.

During the workload running, after fork()/exec() write-protects all
pages in the parent process, memory writing in the parent process
will cause a write protection fault.  Then the page fault handler
will make the PTE/PDE writable if the page can be reused, which is
almost always true in the workload.  On arm64, to avoid the write
protection fault on other CPUs, the page fault handler flushes the TLB
globally with TLBI broadcast after changing the PTE/PDE.  However, this
isn't always necessary.  Firstly, it's safe to leave some stale
read-only TLB entries as long as they will be flushed finally.
Secondly, it's quite possible that the original read-only PTE/PDEs
aren't cached in remote TLB at all if the memory footprint is large.
In fact, on x86_64, the page fault handler doesn't flush the remote
TLB in this situation, which benefits the performance a lot.

To improve the performance on arm64, make the write protection fault
handler flush the TLB locally instead of globally via TLBI broadcast
after making the PTE/PDE writable.  If there are stale read-only TLB
entries in the remote CPUs, the page fault handler on these CPUs will
regard the page fault as spurious and flush the stale TLB entries.

To test the patchset, make the usemem.c from
vm-scalability (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git).
support calling fork()/exec() periodically.  To mimic the behavior of
the customer workload, run usemem with 4 threads, access 100GB memory,
and call fork()/exec() every 40 seconds.  Test results show that with
the patchset the score of usemem improves ~40.6%.  The cycles% of TLB
flush functions reduces from ~50.5% to ~0.3% in perf profile.

Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 16:01:48 +00:00
Huang Ying 79301c7d60 mm: add spurious fault fixing support for huge pmd
The page faults may be spurious because of the racy access to the page
table.  For example, a non-populated virtual page is accessed on 2
CPUs simultaneously, thus the page faults are triggered on both CPUs.
However, it's possible that one CPU (say CPU A) cannot find the reason
for the page fault if the other CPU (say CPU B) has changed the page
table before the PTE is checked on CPU A.  Most of the time, the
spurious page faults can be ignored safely.  However, if the page
fault is for the write access, it's possible that a stale read-only
TLB entry exists in the local CPU and needs to be flushed on some
architectures.  This is called the spurious page fault fixing.

In the current kernel, there is spurious fault fixing support for pte,
but not for huge pmd because no architectures need it. But in the
next patch in the series, we will change the write protection fault
handling logic on arm64, so that some stale huge pmd entries may
remain in the TLB. These entries need to be flushed via the huge pmd
spurious fault fixing mechanism.

Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19 16:01:48 +00:00
Reinette Chatre 5a88a6e92b fs/resctrl: Consider sparse masks when initializing new group's allocation
A new resource group is intended to be created with sane defaults. For a cache
resource this means all cache portions the new group could possibly allocate
into. This includes unused cache portions and shareable cache portions used by
other groups and hardware.

New resource group creation does not take sparse masks into account. After
determining the bitmask reflecting the new group's possible allocations the
bitmask is forced to be contiguous even if the system supports sparse masks.
For example, a new group could by default allocate into a large portion of
cache represented by 0xff0f, but it is instead created with a mask of 0xf.

Do not force a contiguous allocation range if the system supports sparse masks.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/abbbb008bc09d982d715e79d3b885c10f92c64e0.1763426240.git.reinette.chatre@intel.com
2025-11-18 21:10:56 +01:00
Sohil Mehta d5cb957439 x86/cpu: Enable LASS during CPU initialization
Linear Address Space Separation (LASS) mitigates a class of side-channel
attacks that rely on speculative access across the user/kernel boundary.
Enable LASS along with similar security features if the platform
supports it.

While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup
instead of updating it, as the whole sequence is quite self-explanatory.

Some EFI runtime and boot services may rely on 1:1 mappings in the lower
half during early boot and even after SetVirtualAddressMap(). To avoid
tripping LASS, the initial CR4 programming would need to be delayed
until EFI has completely finished entering virtual mode (including
efi_free_boot_services()). Also, LASS would need to be temporarily
disabled while switching to efi_mm to avoid potential faults on stray
runtime accesses.

Similarly, legacy vsyscall page accesses are flagged by LASS resulting
in a #GP (instead of a #PF). Without LASS, the #PF handler emulates the
accesses and returns the appropriate values. Equivalent emulation
support is required in the #GP handler with LASS enabled. In case of
vsyscall XONLY (execute only) mode, the faulting address is readily
available in the RIP which would make it easier to reuse the #PF
emulation logic.

For now, keep it simple and disable LASS if either of those are compiled
in. Though not ideal, this makes it easier to start testing LASS support
in some environments. In future, LASS support can easily be expanded to
support EFI and legacy vsyscalls.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-9-sohil.mehta%40intel.com
2025-11-18 10:38:27 -08:00
Sohil Mehta c9129cf0f0 selftests/x86: Update the negative vsyscall tests to expect a #GP
Some of the vsyscall selftests expect a #PF when vsyscalls are disabled.
However, with LASS enabled, an invalid access results in a SIGSEGV due
to a #GP instead of a #PF. One such negative test fails because it is
expecting X86_PF_INSTR to be set.

Update the failing test to expect either a #GP or a #PF. Also, update
the printed messages to show the trap number (denoting the type of
fault) instead of assuming a #PF.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-8-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Alexander Shishkin 42fea0a3a7 x86/traps: Communicate a LASS violation in #GP message
A LASS violation typically results in a #GP. With LASS active, any
invalid access to user memory (including the first page frame) would be
reported as a #GP, instead of a #PF.

Unfortunately, the #GP error messages provide limited information about
the cause of the fault. This could be confusing for kernel developers
and users who are accustomed to the friendly #PF messages.

To make the transition easier, enhance the #GP Oops message to include a
hint about LASS violations. Also, add a special hint for kernel NULL
pointer dereferences to match with the existing #PF message.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-7-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Sohil Mehta 731d43750c x86/kexec: Disable LASS during relocate kernel
The relocate kernel mechanism uses an identity mapping to copy the new
kernel, which leads to a LASS violation when executing from a low
address.

LASS must be disabled after the original CR4 value is saved because
kexec paths that preserve context need to restore CR4.LASS. But,
disabling it along with CET during identity_mapped() is too late. So,
disable LASS immediately after saving CR4, along with PGE, and before
jumping to the identity-mapped page.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-6-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Sohil Mehta b3a7e973ab x86/alternatives: Disable LASS when patching kernel code
For patching, the kernel initializes a temporary mm area in the lower
half of the address range. LASS blocks these accesses because its
enforcement relies on bit 63 of the virtual address as opposed to SMAP
which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS
enforcement by toggling the RFLAGS.AC bit during patching to avoid
triggering a #GP fault.

Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on
platforms that need it. Name the wrappers as lass_stac()/_clac() instead
of lass_disable()/_enable() because they only control the kernel data
access enforcement. The entire LASS mechanism (including instruction
fetch enforcement) is controlled by the CR4.LASS bit.

Describe the usage of the new helpers in comparison to the ones used for
SMAP. Also, add comments to explain when the existing stac()/clac()
should be used. While at it, move the duplicated "barrier" comment to
the same block.

The Text poking functions use standard memcpy()/memset() while patching
kernel code. However, objtool complains about calling such dynamic
functions within an AC=1 region. See warning #9, regarding function
calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt.

To pacify objtool, one option is to add memcpy() and memset() to the
list of allowed-functions. However, that would provide a blanket
exemption for all usages of memcpy() and memset(). Instead, replace the
standard calls in the text poking functions with their unoptimized,
always-inlined versions. Considering that patching is usually small,
there is no performance impact expected.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-5-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Peter Zijlstra (Intel) d9a96cc18b x86/asm: Introduce inline memcpy and memset
Provide inline memcpy and memset functions that can be used instead of
the GCC builtins when necessary. The immediate use case is for the text
poking functions to avoid the standard memcpy()/memset() calls because
objtool complains about such dynamic calls within an AC=1 region. See
tools/objtool/Documentation/objtool.txt, warning #9, regarding function
calls with UACCESS enabled.

Some user copy functions such as copy_user_generic() and __clear_user()
have similar rep_{movs,stos} usages. But, those are highly specialized
and hard to combine or reuse for other things. Define these new helpers
for all other usages that need a completely unoptimized, strictly inline
version of memcpy() or memset().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-4-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Sohil Mehta e39c5387ad x86/cpu: Add an LASS dependency on SMAP
With LASS enabled, any kernel data access to userspace typically results
in a #GP, or a #SS in some stack-related cases. When the kernel needs to
access user memory, it can suspend LASS enforcement by toggling the
RFLAGS.AC bit. Most of these cases are already covered by the
stac()/clac() pairs used to avoid SMAP violations.

Even though LASS could potentially be enabled independently, it would be
very painful without SMAP and the related stac()/clac() calls. There is
no reason to support such a configuration because all future hardware
with LASS is expected to have SMAP as well. Also, the STAC/CLAC
instructions are architected to:
	#UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0.

So, make LASS depend on SMAP to conveniently reuse the existing AC bit
toggling already in place.

Note: Additional STAC/CLAC would still be needed for accesses such as
text poking which are not flagged by SMAP. This is because such mappings
are in the lower half but do not have the _PAGE_USER bit set which SMAP
uses for enforcement.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-3-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
Sohil Mehta 7baadd463e x86/cpufeatures: Enumerate the LASS feature bits
Linear Address Space Separation (LASS) is a security feature that
mitigates a class of side-channel attacks relying on speculative access
across the user/kernel boundary.

Privilege mode based access protection already exists today with paging
and features such as SMEP and SMAP. However, to enforce these
protections, the processor must traverse the paging structures in
memory. An attacker can use timing information resulting from this
traversal to determine details about the paging structures, and to
determine the layout of the kernel memory.

LASS provides the same mode-based protections as paging but without
traversing the paging structures. Because the protections are enforced
prior to page-walks, an attacker will not be able to derive paging-based
timing information from the various caching structures such as the TLBs,
mid-level caches, page walker, data caches, etc.

LASS enforcement relies on the kernel implementation to divide the
64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space

Any data access or code execution across address spaces typically
results in a #GP fault, with an #SS generated in some rare cases. The
LASS enforcement for kernel data accesses is dependent on CR4.SMAP being
set. The enforcement can be disabled by toggling the RFLAGS.AC bit
similar to SMAP.

Define the CPU feature bits to enumerate LASS. Also, disable the feature
at compile time on 32-bit kernels. Use a direct dependency on X86_32
(instead of !X86_64) to make it easier to combine with similar 32-bit
specific dependencies in the future.

LASS mitigates a class of side-channel speculative attacks, such as
Spectre LAM, described in the paper, "Leaky Address Masking: Exploiting
Unmasked Spectre Gadgets with Noncanonical Address Translation".

Add the "lass" flag to /proc/cpuinfo to indicate that the feature is
supported by hardware and enabled by the kernel. This allows userspace
to determine if the system is secure against such attacks.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251118182911.2983253-2-sohil.mehta%40intel.com
2025-11-18 10:38:26 -08:00
René Rebe e96190da17 PNP: Fix ISAPNP to generate uevents to auto-load modules
Currently ISAPNP devices do not generate an uevent for udev to
auto-load the driver modules for Creative SoundBlaster or Gravis
UltraSound to just work.

Signed-off-by: René Rebe <rene@exactco.de>
[ rjw: Subject edits ]
Link: https://patch.msgid.link/20251118.145942.1445519082574147037.rene@exactco.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-18 17:35:36 +01:00
Thorsten Blum cdf5ecc3f6 EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error()
strcpy() has been deprecated¹ because it performs no bounds checking on the
destination buffer, which can lead to buffer overflows. Use the safer
strscpy() instead.

¹ https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://patch.msgid.link/20251118135621.101148-2-thorsten.blum@linux.dev
2025-11-18 16:50:32 +01:00
Chengkaitao 9d3faec60b genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier()
Since irq_set_affinity_notifier() may sleep, interrupts are enabled. So
raw_spinlock_irqsave() can be replaced with raw_spinlock_irq().

Signed-off-by: Chengkaitao <chengkaitao@kylinos.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251118012754.61805-1-pilgrimtao@gmail.com
2025-11-18 16:19:40 +01:00
Dan Carpenter 80adaccf0e rseq: Delete duplicate if statement in rseq_virt_userspace_exit()
This if statement is indented weirdly.  It's a duplicate and doesn't
affect runtime (beyond wasting a little time).  Delete it.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/aRxP3YcwscrP1BU_@stanley.mountain
2025-11-18 15:56:55 +01:00
Rafael J. Wysocki b20a374902 cpufreq: intel_pstate: Eliminate some code duplication
To eliminate some code duplication from the intel_pstate driver,
move the core_get_val() function body to a new function called
get_perf_ctl_val() and make both core_get_val() and atom_get_val()
invoke it to carry out the same computation.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/2829273.mvXUDI8C0e@rafael.j.wysocki
2025-11-18 15:51:31 +01:00
Kaushlendra Kumar 58075aec92 powercap: intel_rapl: Add support for Nova Lake processors
Add RAPL support for Intel Nova Lake and Nova Lake L processors using
the core defaults configuration.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject and changelog edits, rebase ]
Link: https://patch.msgid.link/20251028101814.3482508-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-18 15:39:29 +01:00
Christophe Leroy 4322c8f81c lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required
Properly use masked_user_read_access_begin() and
masked_user_write_access_begin() instead of masked_user_access_begin() in
order to match user_read_access_end() and user_write_access_end().  This is
important for architectures like PowerPC that enable separately user reads
and user writes.

That means masked_user_read_access_begin() is used when user memory is
exclusively read during the window and masked_user_write_access_begin()
is used when user memory is exclusively writen during the window.
masked_user_access_begin() remains and is used when both reads and
writes are performed during the open window. Each of them is expected
to be terminated by the matching user_read_access_end(),
user_write_access_end() and user_access_end().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/cb5e4b0fa49ea9c740570949d5e3544423389757.1763396724.git.christophe.leroy@csgroup.eu
2025-11-18 15:27:35 +01:00
Christophe Leroy 1c204914bc scm: Convert put_cmsg() to scoped user access
Replace the open coded implementation with the scoped user access guard.

That also corrects the imbalance between masked_user_access_begin() and
user_write_access_end(), which would affect PowerPC when it gains masked
user access support.

No functional change intended.

[ tglx: Amend change log ]

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/793219313f641eda09a892d06768d2837246bf9f.1763396724.git.christophe.leroy@csgroup.eu
2025-11-18 15:27:34 +01:00
Christophe Leroy 803abedbd5 iov_iter: Add missing speculation barrier to copy_from_user_iter()
The results of "access_ok()" can be mis-speculated.  The result is that
the CPU can end speculatively:

	if (access_ok(from, size))
		// Right here

For the same reason as done in copy_from_user() in commit 74e19ef0ff
("uaccess: Add speculation barrier to copy_from_user()"), add a speculation
barrier to copy_from_user_iter().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/6b73e69cc7168c89df4eab0a216e3ed4cca36b0a.1763396724.git.christophe.leroy@csgroup.eu
2025-11-18 15:27:34 +01:00
Christophe Leroy 4db1df7a72 iov_iter: Convert copy_from_user_iter() to masked user access
copy_from_user_iter() lacks a speculation barrier, which will degrade
performance on some architecture like x86, which would be unfortunate as
copy_from_user_iter() is a critical hotpath function.

Convert copy_from_user_iter() to using masked user access on architecture
that support it. This allows to add the speculation barrier without
impacting performance.

This is similar to what was done for copy_from_user() in commit
0fc810ae3a ("x86/uaccess: Avoid barrier_nospec() in 64-bit
copy_from_user()")

[ tglx: Massage change log ]

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/58e4b07d469ca68a2b9477fe2c1ccc8a44cef131.1763396724.git.christophe.leroy@csgroup.eu
2025-11-18 15:27:34 +01:00
Mark Brown a0245b42f8 kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
On a system which support SME but not SVE we can now disable streaming mode
via ptrace by writing FPSIMD formatted data through NT_ARM_SVE with a VL of
0. Extend fp-ptrace to cover rather than skip these cases, relax the check
for SVE writes of FPSIMD format data to not skip if SME is supported and
accept 0 as the VL when performing the ptrace write.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-17 20:11:54 +00:00
Mark Brown eb9df6d69a kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
In order to allow exiting streaming mode on systems with SME but not SVE
we allow writes of FPSIMD format data via NT_ARM_SVE even when SVE is not
supported, add a test case that covers this to sve-ptrace.

We do not support reads.

Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-17 20:11:54 +00:00
Mark Brown 472800cd5e arm64/sme: Support disabling streaming mode via ptrace on SME only systems
Currently it is not possible to disable streaming mode via ptrace on SME
only systems, the interface for doing this is to write via NT_ARM_SVE but
such writes will be rejected on a system without SVE support. Enable this
functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data
via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such
writes currently error since we require that a vector length is specified
which should minimise the risk that existing software is relying on current
behaviour.

Reads are not supported since I am not aware of any use case for this and
there is some risk that an existing userspace application may be confused if
it reads NT_ARM_SVE on a system without SVE. Existing kernels will return
FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not
stored, for example if the task has not used SVE. Returning a vector length
of 0 would create a risk that software would try to do things like allocate
space for register state with zero sizes, while returning a vector length of
128 bits would look like SVE is supported. It seems safer to just not make
the changes to add read support.

It remains possible for userspace to detect a SME only system via the ptrace
interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while
reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in
non-streaming mode is available via REGSET_FPR.

sve_set_common() already avoids allocating SVE storage when doing a FPSIMD
formatted write and allocating SME storage when doing a NT_ARM_SVE write so
we change the function to validate the new case and skip setting a vector
length for it.

The aim is to make a minimally invasive change, no operation that would
previously have succeeded will be affected, and we use a previously
defined interface in new circumstances rather than define completely new
ABI.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: David Spickett <david.spickett@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-17 20:11:54 +00:00
Sunday Adelodun 46fc75a29b PM: hibernate: Clean up kernel-doc comment style usage
Several static functions in kernel/power/swap.c were described using the
kernel-doc comment style (/** ... */) even though they are not exported
or referenced by generated documentation. This led to kernel-doc warnings
and stylistic inconsistencies.

Convert these unnecessary kernel-doc blocks to regular C comments,
remove comment blocks that are no longer useful, relocate comments to
more appropriate positions where needed, and fix a few "Return:"
descriptions that were either missing or incorrectly formatted.

No functional changes.

Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
[ rjw: Subject adjustment, changelog edits, comment edits ]
Link: https://patch.msgid.link/20251114220438.52448-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-17 20:16:56 +01:00
Rafael J. Wysocki 37d6d92fe0 Merge back earlier material related to system sleep for 6.19 2025-11-17 16:55:55 +01:00
Peter Oberparleiter 2a2153a2ba s390/debug: Update description of resize operation
With commit 1204777867 ("s390/debug: keep debug data on resize")
the behavior of a debug area resize operation was changed. Update the
associated documentation to reflect this change.

Fixes: 1204777867 ("s390/debug: keep debug data on resize")
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:55:16 +01:00
Heiko Carstens 0d79affa31 Merge branch 'compat-removal'
Heiko Carstens says:

====================

Remove s390 compat support to allow for code simplification and especially
reduced test effort. To the best of our knowledge there aren't any 31 bit
binaries out in the world anymore that would matter for newer kernels or
newer distributions.

Distributions do not provide compat packages since quite some time or even
have CONFIG_COMPAT disabled.

Instead of adding deprecation warnings to config option, or adding kernel
messages, just remove the code. Deprecation warnings haven't proven to be
useful. If it turns out there is still a reason to keep the compat support
this series can be reverted at any time in the future.

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:11:48 +01:00
Heiko Carstens 4ac286c4a8 s390/syscalls: Switch to generic system call table generation
The s390 syscall.tbl format differs slightly from most others, and
therefore requires an s390 specific system call table generation
script.

With compat support gone use the opportunity to switch to generic
system call table generation. The abi for all 64 bit system calls is
now common, since there is no need to specify if system call entry
points are only for 64 bit anymore.

Furthermore create the system call table in C instead of assembler
code in order to get type checking for all system call functions
contained within the table.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:39 +01:00
Heiko Carstens f4e1f1b137 s390/syscalls: Remove system call table pointer from thread_struct
With compat support gone there is only one system call table
left. Therefore remove the sys_call_table pointer from
thread_struct and use the sys_call_table directly.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:39 +01:00
Heiko Carstens 3db5cf9354 s390/uapi: Remove 31 bit support from uapi header files
Since the kernel does not support running 31 bit / compat binaries
anymore, remove also the corresponding 31 bit support from uapi header
files.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:38 +01:00
Heiko Carstens 8e0b986c59 s390: Remove compat support
There shouldn't be any 31 bit code around anymore that matters.
Remove the compat layer support required to run 31 bit code.

Reason for removal is code simplification and reduced test effort.

Note that this comes without any deprecation warnings added to config
options, or kernel messages, since most likely those would be ignored
anyway.

If it turns out there is still a reason to keep the compat layer this
can be reverted at any time in the future.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:38 +01:00
Heiko Carstens 169ebcbb90 tools: Remove s390 compat support
Remove s390 compat support from everything within tools, since s390 compat
support will be removed from the kernel.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO
Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:38 +01:00
Heiko Carstens 7afb095df3 s390/syscalls: Add pt_regs parameter to SYSCALL_DEFINE0() syscall wrapper
All system call wrappers should match the sys_call_ptr_t type. This is not
the case for system calls without parameters. Add the missing pt_regs
parameter there too.

Note: this is currently not a problem, since the parameter is unused.
However it prevents to create a correctly typed system call table in
C. With the current assembler implementation this works because of
missing type checking.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:38 +01:00
Heiko Carstens b2da5f6400 s390/kvm: Use psw32_t instead of psw_compat_t
kvm_s390_handle_lpsw() make use of the psw_compat_t type even though
the code has nothing to do with CONFIG_COMPAT, for which the type is
supposed to be used. Use psw32_t instead.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:37 +01:00
Heiko Carstens 8c633c78c2 s390/ptrace: Rename psw_t32 to psw32_t
Use a standard "_t" suffix for psw_t32 and rename it to psw32_t.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17 11:10:37 +01:00
Sean Christopherson f2f22721ac x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute
Use the exact enum name when documenting "enum sgx_attribute" to fix a
warning if the file is fed into kernel-doc processing:

  WARNING: ./arch/x86/include/asm/sgx.h:139 expecting prototype for enum
           sgx_attributes. Prototype was for enum sgx_attribute instead

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112160708.1343355-6-seanjc%40google.com
2025-11-14 15:30:32 -08:00
Sean Christopherson 55bf13b612 x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h
Drop an asterisk from a file-level copyright comment so that the comment
isn't intrepeted as a kernel-doc comment.

E.g. if arch/x86/include/asm/sgx.h is fed into kernel-doc processing:

  WARNING: ./arch/x86/include/asm/sgx.h:2 This comment starts with '/**',
  but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112160708.1343355-5-seanjc%40google.com
2025-11-14 15:30:28 -08:00
Sean Christopherson 905885fdb1 x86/sgx: Document structs and enums with '@', not '%'
Use '@' to document structure members and enum values in kernel-doc markup,
as per Documentation/doc-guide/kernel-doc.rst and flagged by make htmldocs.

  WARNING: arch/x86/include/uapi/asm/sgx.h:17 Enum value 'SGX_PAGE_MEASURE'
           not described in enum 'sgx_page_flags'

Opportunistically add a missing ':' for SGX_CHILD_PRESENT.

Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112160708.1343355-4-seanjc%40google.com
2025-11-14 15:30:26 -08:00
Sean Christopherson 243ea511fe x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler
Add kernel-doc markup for the register parameters passed by the vDSO blob
to the user handler to suppress build warnings, e.g.

  WARNING: arch/x86/include/uapi/asm/sgx.h:157 function parameter 'r8' not
           described in 'sgx_enclave_user_handler_t'

Call out that except for RSP, the registers are undefined on asynchronous
exits as far as the vDSO ABI is concerned.  E.g. the vDSO's exception
handler clobbers RDX, RDI, and RSI, and the kernel doesn't guarantee that
R8 or R9 will be zero (the synthetic value loaded by the CPU).

Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112160708.1343355-3-seanjc%40google.com
2025-11-14 15:30:22 -08:00
Sean Christopherson 75801ca620 x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run"
Add a missing ':' for the description of sgx_enclave_run.reserved so that
documentation for the member is correctly generated:

  WARNING: arch/x86/include/uapi/asm/sgx.h:184 struct member 'reserved' not
  described in 'sgx_enclave_run'

Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112160708.1343355-2-seanjc%40google.com
2025-11-14 15:30:13 -08:00
Rafael J. Wysocki 07f42f8290 PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
Use new PM_RUNTIME_ACQUIRE() and PM_RUNTIME_ACQUIRE_ERR() wrapper macros
to make the code look more straightforward.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
[ rjw; Typo fix in the changelog ]
Link: https://patch.msgid.link/3932581.kQq0lBPeGt@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 21:25:15 +01:00
Rafael J. Wysocki 70dcad3400 ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()
Use new PM_RUNTIME_ACQUIRE() and PM_RUNTIME_ACQUIRE_ERR() wrapper macros
to make the code look more straightforward.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
[ rjw: Typo fix in the changelog ]
Link: https://patch.msgid.link/2040585.PYKUYFuaPT@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 21:25:15 +01:00
Rafael J. Wysocki ef8057b07c PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR()
Add wrapper macros for ACQUIRE()/ACQUIRE_ERR() and runtime PM
usage counter guards introduced recently: pm_runtime_active_try,
pm_runtime_active_auto_try, pm_runtime_active_try_enabled, and
pm_runtime_active_auto_try_enabled.

The new macros should be more straightforward to use.

For example, they can be used for rewriting a piece of code like below:

        ACQUIRE(pm_runtime_active_try, pm)(dev);
        if ((ret = ACQUIRE_ERR(pm_runtime_active_try, &pm)))
                return ret;

in the following way:

        PM_RUNTIME_ACQUIRE(dev, pm);
        if ((ret = PM_RUNTIME_ACQUIRE_ERR(&pm)))
                return ret;

If the original code does not care about the specific error code
returned when attepmting to resume the device:

        ACQUIRE(pm_runtime_active_try, pm)(dev);
        if (ACQUIRE_ERR(pm_runtime_active_try, &pm))
                return -ENXIO;

it may be changed like this:

        PM_RUNTIME_ACQUIRE(dev, pm);
        if (PM_RUNTIME_ACQUIRE_ERR(&pm))
                return -ENXIO;

Link: https://lore.kernel.org/linux-pm/5068916.31r3eYUQgx@rafael.j.wysocki/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/3400866.aeNJFYEL58@rafael.j.wysocki
2025-11-14 21:24:54 +01:00
Thomas Weißschuh 308bc2e338 selftests/timers/nanosleep: Add tests for return of remaining time
If interrupted by a signal clock_nanosleep() returns the remaining time
into the structure pointed to by the rmtp parameter. So far this
functionality was not tested by the timer selftests.

Extend the nanosleep selftest to cover this feature.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251106-nanosleep-rtmp-selftest-v1-1-f9212fb295fe@linutronix.de
2025-11-14 20:34:50 +01:00
Wake Liu 05d89fe7e4 selftests/timers: Clean up kernel version check in posix_timers
Several tests in the posix_timers selftest which test timer behavior
related to SIG_IGN fail on kernels older than 6.13. This is due to
a refactoring of signal handling in commit caf77435dd ("signal:
Handle ignored signals in do_sigaction(action != SIG_IGN)").

A previous attempt to fix this by adding a kernel version check to each
of the nine affected tests was suboptimal, as it resulted in emitting
the same skip message nine times.

Following the suggestion from Thomas Gleixner, this is refactored to
perform a single version check in main(). To satisfy the kselftest
framework's requirement for the test count to match the declared plan,
the plan is now conditionally set to 10 (for older kernels) or 19.

While setting the plan conditionally may seem complex, it is the
better approach to avoid the alternatives: either running tests on
unsupported kernels that are known to fail, or emitting a noisy series
of nine identical skip messages. A single informational message is now
printed instead when the tests are skipped.

Signed-off-by: Wake Liu <wakel@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250807085042.1690931-1-wakel@google.com/
Link: https://patch.msgid.link/20251103114502.584940-1-wakel@google.com
2025-11-14 20:34:50 +01:00
Jianyun Gao 4518767be9 time: Fix a few typos in time[r] related code comments
Signed-off-by: Jianyun Gao <jianyungao89@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250927093411.1509275-1-jianyungao89@gmail.com
2025-11-14 20:34:50 +01:00
Borislav Petkov (AMD) e67997021f x86/bugs: Get rid of the forward declarations
Get rid of the forward declarations of the mitigation functions by
moving their single caller below them.

No functional changes.

Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20251105200447.GBaQut3w4dLilZrX-z@fat_crate.local
2025-11-14 20:32:21 +01:00
Sunday Adelodun e54dd0474c time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
Several functions in kernel/time/tick-oneshot.c are missing parameter and
return value descriptions in their kernel-doc comments. This causes
warnings during doc generation.

Update the kernel-doc blocks to include detailed @param and Return:
descriptions for better clarity and to fix kernel-doc warnings.  No
functional code changes are made.

Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251106113938.34693-3-adelodunolaoluwa@yahoo.com
2025-11-14 20:17:44 +01:00
Srinivas Pandruvada 3402bc010d Documentation: thermal: Document thermal throttling on Intel platforms
Add documentation for Intel thermal throttling reporting events.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject adjustment, file name change, minor edits ]
Link: https://patch.msgid.link/20251113212104.221632-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:22:45 +01:00
Riwen Lu a10ad1b104 PM: suspend: Make pm_test delay interruptible by wakeup events
Modify the suspend_test() function to allow the test delay to be
interrupted by wakeup events.

This improves the responsiveness of the system during suspend testing
when wakeup events occur, allowing the suspend process to proceed
without waiting for the full test delay to complete when wakeup events
are detected.

Additionally, using msleep() instead of mdelay() avoids potential soft
lockup "CPU stuck" issues when long test delays are configured.

Co-developed-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: xiongxin <xiongxin@kylinos.cn>
Signed-off-by: Riwen Lu <luriwen@kylinos.cn>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251113012638.1362013-1-luriwen@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:09:16 +01:00
Mario Limonciello (AMD) 7b9725b3d1 usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks
When the PM core uses hibernation callbacks for shutdown drivers
will receive PM_EVENT_POWEROFF and should handle it the same as
PM_EVENT_HIBERNATE would have been used.

Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
[ rjw: Changelog adjustment ]
Link: https://patch.msgid.link/20251112224025.2051702-4-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:05:53 +01:00
Mario Limonciello (AMD) 988dd0bd91 scsi: Add PM_EVENT_POWEROFF into suspend callbacks
If the PM core uses hibernation callbacks for powering off the
system, drivers will receive PM_EVENT_POWEROFF and should handle
it the same as they previously handled PM_EVENT_HIBERNATE.

Support this case in the scsi driver.  No functional changes.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251112224025.2051702-3-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:05:53 +01:00
Mario Limonciello (AMD) 0ca04993da PM: Introduce new PMSG_POWEROFF event
PMSG_POWEROFF will be used for the PM core to allow differentiating between
a hibernation or shutdown sequence when re-using callbacks for common code.

Hibernation is started by writing a hibernation method (such as 'platform'
'shutdown', or 'reboot') to use into /sys/power/disk and writing 'disk' to
/sys/power/state.

Shutdown is initiated with the reboot() syscall with arguments on whether
to halt the system or power it off.

Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251112224025.2051702-2-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14 17:05:53 +01:00
Rafael J. Wysocki bdfacf441b Merge back earlier runtime PM changes for 6.19 2025-11-14 16:56:40 +01:00
Thomas Weißschuh 4702f4eceb hrtimer: Store time as ktime_t in restart block
The hrtimer core uses ktime_t to represent times, use that also for the
restart block. CPU timers internally use nanoseconds instead of ktime_t
but use the same restart block, so use the correct accessors for those.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251110-restart-block-expiration-v1-3-5d39cc93df4f@linutronix.de
2025-11-14 16:31:19 +01:00
Rafael J. Wysocki b54df61c74 cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold
If a given governor metric falls below a certain value (8 for
DECAY_SHIFT equal to 3), it will not decay any more due to the
simplistic decay implementation.  This may in some cases lead to
subtle inconsistencies in the governor behavior, so change the
decay implementation to take it into account and set the metric
at hand to 0 in that case.

Suggested-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2819353.mvXUDI8C0e@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki 8f3f01082d cpuidle: governors: teo: Use s64 consistently in teo_update()
Two local variables in teo_update() are defined as u64, but their
values are then compared with s64 values, so it is more consistent
to use s64 as their data type.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3026616.e9J7NaK4W3@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki 17673f64a0 cpuidle: governors: teo: Drop redundant function parameter
The last no_poll parameter of teo_find_shallower_state() is always
false, so drop it.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/2253109.irdbgypaU6@rafael.j.wysocki
2025-11-14 15:20:01 +01:00
Rafael J. Wysocki a03b201180 cpuidle: governors: teo: Drop misguided target residency check
When the target residency of the current candidate idle state is
greater than the expected time till the closest timer (the sleep
length), it does not matter whether or not the tick has already been
stopped or if it is going to be stopped.  The closest timer will
trigger anyway at its due time, so if an idle state with target
residency above the sleep length is selected, energy will be wasted
and there may be excess latency.

Of course, if the closest timer were canceled before it could trigger,
a deeper idle state would be more suitable, but this is not expected
to happen (generally speaking, hrtimers are not expected to be
canceled as a rule).

Accordingly, the teo_state_ok() check done in that case causes energy to
be wasted more often than it allows any energy to be saved (if it allows
any energy to be saved at all), so drop it and let the governor use the
teo_find_shallower_state() return value as the new candidate idle state
index.

Fixes: 21d28cd2fa ("cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/5955081.DvuYhMxLoT@rafael.j.wysocki
2025-11-14 15:19:39 +01:00
Heiko Carstens 52a1f73d17 s390/fault: Print unmodified PSW address on protection exception
In case of a kernel crash caused by a protection exception, print the
unmodified PSW address as reported by the CPU. The protection exception
handler modifies the PSW address in order to keep fault handling easy,
however that leads to misleading call traces.

Therefore restore the original PSW address before printing it.

Before this change the output in case of a protection exception looks like
this:

 Oops: 0004 ilc:2 [#1]SMP
 Krnl PSW : 0704c00180000000 000003ffe0b40d78 (sysrq_handle_crash+0x28/0x40)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
...
 Krnl Code: 000003ffe0b40d66: e3e0f0980024        stg     %r14,152(%r15)
            000003ffe0b40d6c: c010fffffff2        larl    %r1,000003ffe0b40d50
           #000003ffe0b40d72: c0200046b6bc        larl    %r2,000003ffe1417aea
           >000003ffe0b40d78: 92021000            mvi     0(%r1),2
            000003ffe0b40d7c: c0e5ffae03d6        brasl   %r14,000003ffe0101528

With this change it looks like this:

 Oops: 0004 ilc:2 [#1]SMP
 Krnl PSW : 0704c00180000000 000003ffe0b40dfc (sysrq_handle_crash+0x2c/0x40)
            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
...
 Krnl Code: 000003ffe0b40dec: c010fffffff2        larl    %r1,000003ffe0b40dd0
            000003ffe0b40df2: c0200046b67c        larl    %r2,000003ffe1417aea
           *000003ffe0b40df8: 92021000            mvi     0(%r1),2
           >000003ffe0b40dfc: c0e5ffae03b6        brasl   %r14,000003ffe0101568
            000003ffe0b40e02: 0707                bcr     0,%r7

Note that with this change the PSW address points to the instruction behind
the instruction which caused the exception like it is expected for
protection exceptions.

This also replaces the '#' marker in the disassembly with '*', which allows
to distinguish between new and old behavior.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:28 +01:00
Heiko Carstens a603a00399 s390/uprobes: Use __forward_psw() instead of private implementation
With adjust_psw_addr() the uprobes code contains more or less a private
__forward_psw() implementation. Switch it to use __forward_psw(), and
remove adjust_psw_addr().

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:28 +01:00
Heiko Carstens 37450e0994 s390/processor: Add __forward_psw() helper
Similar to __rewind_psw() add the counter part __forward_psw(). This
helps to make code more readable if a PSW address has to be forwarded,
since it is more natural to write

addr = __forward_psw(psw, ilen);

instead of

addr = __rewind_psw(psw, -ilen);

This renames also the ilc parameter of __rewind_psw() to ilen, since
the parameter reflects an instruction length, and not an instruction
length code. Also change the type of ilen from unsigned long to long
so it reflects that lengths can be negative or positive.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:28 +01:00
Aleksei Nikiforov 14e4e4175b s390/fpu: Fix false-positive kmsan report in fpu_vstl()
A false-positive kmsan report is detected when running ping command.

An inline assembly instruction 'vstl' can write varied amount of bytes
depending on value of 'index' argument. If 'index' > 0, 'vstl' writes
at least 2 bytes.

clang generates kmsan write helper call depending on inline assembly
constraints. Constraints are evaluated compile-time, but value of
'index' argument is known only at runtime.

clang currently generates call to __msan_instrument_asm_store with 1 byte
as size. Manually call kmsan function to indicate correct amount of bytes
written and fix false-positive report.

This change fixes following kmsan reports:

[   36.563119] =====================================================
[   36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70
[   36.563852]  virtqueue_add+0x35c6/0x7c70
[   36.564016]  virtqueue_add_outbuf+0xa0/0xb0
[   36.564266]  start_xmit+0x288c/0x4a20
[   36.564460]  dev_hard_start_xmit+0x302/0x900
[   36.564649]  sch_direct_xmit+0x340/0xea0
[   36.564894]  __dev_queue_xmit+0x2e94/0x59b0
[   36.565058]  neigh_resolve_output+0x936/0xb40
[   36.565278]  __neigh_update+0x2f66/0x3a60
[   36.565499]  neigh_update+0x52/0x60
[   36.565683]  arp_process+0x1588/0x2de0
[   36.565916]  NF_HOOK+0x1da/0x240
[   36.566087]  arp_rcv+0x3e4/0x6e0
[   36.566306]  __netif_receive_skb_list_core+0x1374/0x15a0
[   36.566527]  netif_receive_skb_list_internal+0x1116/0x17d0
[   36.566710]  napi_complete_done+0x376/0x740
[   36.566918]  virtnet_poll+0x1bae/0x2910
[   36.567130]  __napi_poll+0xf4/0x830
[   36.567294]  net_rx_action+0x97c/0x1ed0
[   36.567556]  handle_softirqs+0x306/0xe10
[   36.567731]  irq_exit_rcu+0x14c/0x2e0
[   36.567910]  do_io_irq+0xd4/0x120
[   36.568139]  io_int_handler+0xc2/0xe8
[   36.568299]  arch_cpu_idle+0xb0/0xc0
[   36.568540]  arch_cpu_idle+0x76/0xc0
[   36.568726]  default_idle_call+0x40/0x70
[   36.568953]  do_idle+0x1d6/0x390
[   36.569486]  cpu_startup_entry+0x9a/0xb0
[   36.569745]  rest_init+0x1ea/0x290
[   36.570029]  start_kernel+0x95e/0xb90
[   36.570348]  startup_continue+0x2e/0x40
[   36.570703]
[   36.570798] Uninit was created at:
[   36.571002]  kmem_cache_alloc_node_noprof+0x9e8/0x10e0
[   36.571261]  kmalloc_reserve+0x12a/0x470
[   36.571553]  __alloc_skb+0x310/0x860
[   36.571844]  __ip_append_data+0x483e/0x6a30
[   36.572170]  ip_append_data+0x11c/0x1e0
[   36.572477]  raw_sendmsg+0x1c8c/0x2180
[   36.572818]  inet_sendmsg+0xe6/0x190
[   36.573142]  __sys_sendto+0x55e/0x8e0
[   36.573392]  __s390x_sys_socketcall+0x19ae/0x2ba0
[   36.573571]  __do_syscall+0x12e/0x240
[   36.573823]  system_call+0x6e/0x90
[   36.573976]
[   36.574017] Byte 35 of 98 is uninitialized
[   36.574082] Memory access of size 98 starts at 0000000007aa0012
[   36.574218]
[   36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G    B            N  6.17.0-dirty #16 NONE
[   36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST
[   36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux)
[   36.574755] =====================================================

[   63.532541] =====================================================
[   63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70
[   63.533989]  virtqueue_add+0x35c6/0x7c70
[   63.534940]  virtqueue_add_outbuf+0xa0/0xb0
[   63.535861]  start_xmit+0x288c/0x4a20
[   63.536708]  dev_hard_start_xmit+0x302/0x900
[   63.537020]  sch_direct_xmit+0x340/0xea0
[   63.537997]  __dev_queue_xmit+0x2e94/0x59b0
[   63.538819]  neigh_resolve_output+0x936/0xb40
[   63.539793]  ip_finish_output2+0x1ee2/0x2200
[   63.540784]  __ip_finish_output+0x272/0x7a0
[   63.541765]  ip_finish_output+0x4e/0x5e0
[   63.542791]  ip_output+0x166/0x410
[   63.543771]  ip_push_pending_frames+0x1a2/0x470
[   63.544753]  raw_sendmsg+0x1f06/0x2180
[   63.545033]  inet_sendmsg+0xe6/0x190
[   63.546006]  __sys_sendto+0x55e/0x8e0
[   63.546859]  __s390x_sys_socketcall+0x19ae/0x2ba0
[   63.547730]  __do_syscall+0x12e/0x240
[   63.548019]  system_call+0x6e/0x90
[   63.548989]
[   63.549779] Uninit was created at:
[   63.550691]  kmem_cache_alloc_node_noprof+0x9e8/0x10e0
[   63.550975]  kmalloc_reserve+0x12a/0x470
[   63.551969]  __alloc_skb+0x310/0x860
[   63.552949]  __ip_append_data+0x483e/0x6a30
[   63.553902]  ip_append_data+0x11c/0x1e0
[   63.554912]  raw_sendmsg+0x1c8c/0x2180
[   63.556719]  inet_sendmsg+0xe6/0x190
[   63.557534]  __sys_sendto+0x55e/0x8e0
[   63.557875]  __s390x_sys_socketcall+0x19ae/0x2ba0
[   63.558869]  __do_syscall+0x12e/0x240
[   63.559832]  system_call+0x6e/0x90
[   63.560780]
[   63.560972] Byte 35 of 98 is uninitialized
[   63.561741] Memory access of size 98 starts at 0000000005704312
[   63.561950]
[   63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G    B            N  6.17.0-dirty #16 NONE
[   63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST
[   63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux)
[   63.564986] =====================================================

Fixes: dcd3e1de9d ("s390/checksum: provide csum_partial_copy_nocheck()")
Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:27 +01:00
Thomas Richter d17901e8e8 s390/pai: Calculate size of reserved PAI extension control block area
The PAI extension 1 control block area is 512 bytes in total.
It currently contains three address pointer which refer to counter
memory blocks followed by a reserved area.
Calculate the reserved area instead of hardcoding its size. This
makes the code more readable and maintainable.
No functional chance.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Jan Polensky <japo@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:27 +01:00
Heiko Carstens b60d126c8e s390/mm: Let dump_fault_info() print additional information
Let dump_fault_info() print additional information to make debugging
easier:

Print "FSI" if the access-exception-fetch/store-indication facility is
installed. If it is installed the TEID may also indicate if an exception
happened because of a fetch or a store operation.

Print "SOP", "ESOP-1", or "ESOP-2" depending on the type of the installed
Suppression-on-Protection facility. This also gives additional information
about the validity and meaning of the TEID bits.

The output is changed from something like:

Failing address: 0000000000000000 TEID: 0000000000000803

to

Failing address: 0000000000000000 TEID: 0000000000000803 ESOP-2 FSI

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:27 +01:00
Heiko Carstens 76502abca2 s390/mm: Change comment and die() message if teid.b61 is zero
The comments in do_protection() give the impression that a TEID, where bit
61 is zero, indicates a low address protection exception. This is not
necessarily true, and it depends on the type of Suppression-on-Protection
facility of the machine (see Princples of Operation) what this means.

Rework the comments and the die() message to reflect this. This may also
help to avoid confusion.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:27 +01:00
Heiko Carstens 02310adcc6 s390/mm: Remove unused flush_tlb()
flush_tlb() exists for historic reasons and was never used. Remove it.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:34:27 +01:00
Heiko Carstens f518d469fe Merge branch 'pai-pmu-merge'
Thomas Richter says:

====================

The PAI PMUs pai_crypto and pai_ext both operate on memory
mapped counters supported by z16 and follow on machines.
These memory mapped counters have a lot in common, like:
 - validation, installing and removing events
 - starting and stopping events
 - retrieving counter values
 - collecting sample data.

However both PMU drivers have slightly different parameters,
for example:
 - different mapped memory size
 - different number of supported counters
 - different counter numbers and names
 - different bits in the CR0 register
 - different anchor address in lowcore

Due to these different parameters, two independent
PMUs have been developed. However both PMU drivers
have very much in common and most of the PMU call back
functions look very similar and are sometimes identical.

This patch set combines both independent PMU device drivers
perf_pai_crypto.c and per_pai_ext.c into one device driver.
The new device driver operations on a table which contains
the different parameters and uses common functions for
event operations.

Result is one PAI PMU driver which supports both PMUs.
It is also extendable to support new PAI PMUs.

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:50 +01:00
Thomas Richter 492578d3a2 s390/pai: Rename perf_pai_crypto.c to perf_pai.c
Rename perf_pai_crypto.c to perf_pai.c. The new perf_pai.c
contains both PAI device drivers:
 - pai_crypto for PAI crypto counter set
 - pai_ext for PAI NNPA counter set
The rename reflects this common driver supporting both PMUs.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:08 +01:00
Thomas Richter 8b65b0ba35 s390/pai_crypto: Merge pai_ext PMU into pai_crypto
Combine PAI cryptography and PAI extension (NNPA) PMUs in one driver.
Remove file perf_pai_ext.c and registration of PMU "pai_ext"
from perf_pai_crypto.c.

Includes:
- Shared alloc/free and sched_task handling
- NNPA events with exclude_kernel enforced, exclude_user rejected
- Setup CR0 bits for both PMUs

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter 3abb6b1675 s390/pai_crypto: Introduce PAI crypto specific event delete function
Introduce PAI crypto specific event delete function to handle
additional actions to be done at event removal.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter 35a27bad07 s390/pai_crypto: Make pai_root per-PMU and unify naming
Prepare the common PAI PMU driver to handle multiple PMUs.

Convert pai_root into an array indexed by PAI_PMU_IDX(event)
so that per-CPU state becomes per-PMU. Adjust all call sites
accordingly. Rename KMSG_COMPONENT and the s390dbf buffer from
"pai_crypto" to "pai" for consistent naming.

No functional change intended beyond log identifiers.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter f124735413 s390/pai_crypto: Rename paicrypt_copy() to pai_copy()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_copy() to pai_copy() to indicate its common usage.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter 42e6a0f6d2 s390/pai_crypto: Add common pai_del() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Add a common usable function pai_stop() for the event on a CPU.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter ac03223f07 s390/pai_crypto: Add common pai_stop() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Add a common usable function pai_stop() for the event on a CPU.
Call this common pai_stop() from paicrypt_del().

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:07 +01:00
Thomas Richter a65a4d7e80 s390/pai_crypto: Add common pai_add() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Add a common usable function pai_add() for the event on a CPU.
Call this common pai_add() from paicrypt_add().

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 6fe66b2157 s390/pai_crypto: Add common pai_start() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Add a common usable function pai_start() to the event on a CPU.
The function expects a PAI PMU specific read function as second
parameter to read out the start value for an event.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 8f6116fd49 s390/pai_crypto: Add common pai_read() function
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Add a common usable function pai_read() to read counter values.
The function expects a PAI PMU specific read function as second
parameter.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 74466e87e7 s390/pai_crypto: Unify sample push logic and update context handling
Unify naming and logic for PAI PMU drivers to support both PMUs
pai_crypto and pai_ext.

Rename paicrypt_push_sample() to pai_push_sample() to reflect
its common usage.  Add detailed comments about invocation context
and scheduler callbacks.  Use struct pai_pmu to determine area_size
instead of PAGE_SIZE for counter backup.
Remove obsolete variable paicrypt_cnt.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 0f1c0d754a s390/pai_crypto: Rename paicrypt_have_samples() to pai_have_samples()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_have_samples() to pai_have_samples() to reflect
its common usage. No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 360e180d8b s390/pai_crypto: Rename paicrypt_getctr() to pai_getctr()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Rename paicrypt_getctr() to pai_getctr() to reflect is common
purpose. pai_getctr() now uses pai_pmu table to extract PAI PMU
characteristics such as kernel_offset inside the counter area page.
Also rename paicrypt_have_sample() to pai_have_sample().

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:06 +01:00
Thomas Richter 42cd0c8242 s390/pai_crypto: Rename paicrypt_getdata() to pai_getdata()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename paicrypt_getdata() to pai_getdata(). Use the PAI PMU
characteristics in the pai_pmu table to determine the number
of counters to be extracted.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:05 +01:00
Thomas Richter 65b9831bd3 s390/pai_crypto: Rename some function for common usage.
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename functions
 - paicrypt_free() -> pai_free()
 - paicrypt_destroy_event() -> pai_destroy_event()
 - paicrypt_destroy_event_cpu() -> pai_destroy_event_cpu()
to reflect their future common usage.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:05 +01:00
Thomas Richter 413957980a s390/pai_crypto: Introduce generic event init using pai_pmu[]
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.

Rework PAI crypto event initialization. Add a common
function for event initialization. It uses the PAI characteristics
stored in the pai_pmu table instead of hardcoded values.
Enlarge pai_event_valid() to check all event validation aspects.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:05 +01:00
Thomas Richter a3f8423622 s390/pai_crypto: Add PAI crypto characteristics table for parameters
Create and add a PMU characteristics table to store the parameters
of the PAI crypto PMU. This table contains PMU details such as
 - number of available counters
 - name of these counters to export to /sysfs
 - Size of the memory mapped counter area
 - base number of first counter
 - etc

Also define a PMU specific initialization function to be called when
a PAI PMU feature is supported. At device driver initialization
test these features and if available use instruction qpaci to
retrieve the number of available counters. Also export these counter
names to /sysfs and register this PMU.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:05 +01:00
Thomas Richter 387c7b5f04 s390/pai_crypto: Rename paicrypt_root_alloc() and paicrypt_root_free()
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename functions paicrypt_root_alloc() and paicrypt_root_free()
to pai_root_alloc() and pai_root_free().
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:05 +01:00
Thomas Richter 3f082c2e47 s390/pai_crypto: Rename structure paicrypt_root
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_root to pai_root.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:04 +01:00
Thomas Richter a626e0d46a s390/pai_crypto: Rename structure paicrypt_map to pai_map
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_map to pai_map.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:04 +01:00
Thomas Richter 2706ea193a s390/pai_crypto: Rename structure paicrypt_mapptr to pai_mapptr
To support one common PAI PMU device driver which handles
both PMUs pai_crypto and pai_ext, use a common naming scheme
for structures and variables suitable for both device drivers.
Rename structure paicrypt_mapptr to pai_mapptr.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:04 +01:00
Thomas Richter c124208b74 s390/pai_crypto: Rename member paicrypt_map::page
Rename member page in struct paicrypt_map to area. This rename
creates consistent naming for both PMU drivers paicrypto and PMU
paiext. No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:04 +01:00
Thomas Richter abc524caa1 s390/pai_crypto: Rename variable cfm_dbg
The global variable cfm_dbg points to the s390dbf debug buffer.
Rename it to paidbg to better reflect its purpose.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14 11:30:04 +01:00
Sascha Bischoff a04fbfb8a1 arm64/sysreg: Add ICH_VMCR_EL2
Add the ICH_VMCR_EL2 register, which is required for the upcoming
GICv5 KVM support. This register has two different field encodings,
based on if it is used for GICv3 or GICv5-based VMs. The
GICv5-specific field encodings are generated with a FEAT_GCIE prefix.

This register is already described in the GICv3 KVM code
directly. This will be ported across to use the generated encodings as
part of an upcoming change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff a0b130eedd arm64/sysreg: Move generation of RES0/RES1/UNKN to function
The RESx and UNKN define generation happens in two places
(EndSysreg and EndSysregFields), and was using nearly identical
code. Split this out into a function, and call that instead, rather
then keeping the dupliated code.

There are no changes to the generated sysregs as part of this change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff fe2ef46995 arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor
Some system register field encodings change based on, for example the
in-use architecture features, or the context in which they are
accessed. In order to support these different field encodings,
introduce the Prefix descriptor (Prefix, EndPrefix) for describing
such sysregs.

The Prefix descriptor can be used in the following way:

        Sysreg  EXAMPLE 0    1    2    3    4
        Prefix    FEAT_A
	Field   63:0    Foo
	EndPrefix
	Prefix    FEAT_B
	Field   63:1    Bar
 	Res0    0
        EndPrefix
        Field   63:0    Baz
        EndSysreg

This will generate a single set of system register encodings (REG_,
SYS_, ...), and then generate three sets of field definitions for the
system register called EXAMPLE. The first set is prefixed by FEAT_A,
e.g. FEAT_A_EXAMPLE_Foo. The second set is prefixed by FEAT_B, e.g.,
FEAT_B_EXAMPLE_Bar. The third set is not given a prefix at all,
e.g. EXAMPLE_BAZ. For each set, a corresponding set of defines for
Res0, Res1, and Unkn is generated.

The intent for the final prefix-less fields is to describe default or
legacy field encodings. This ensure that prefixed encodings can be
added to already-present sysregs without affecting existing legacy
code. Prefixed fields must be defined before those without a prefix,
and this is checked by the generator. This ensures consisnt ordering
within the sysregs definitions.

The Prefix descriptor can be used within Sysreg or SysregFields
blocks. Field, Res0, Res1, Unkn, Rax, SignedEnum, Enum can all be used
within a Prefix block. Fields and Mapping can not. Fields that vary
with features must be described as part of a SysregFields block,
instead. Mappings, which are just a code comment, make little sense in
this context, and have hence not been included.

There are no changes to the generated system register definitions as
part of this change.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Sascha Bischoff 0aab5772a5 arm64/sysreg: Fix checks for incomplete sysreg definitions
The checks for incomplete sysreg definitions were checking if the
next_bit was greater than 0, which is incorrect and missed occasions
where bit 0 hasn't been defined for a sysreg. The reason is that
next_bit is -1 when all bits have been processed (LSB - 1).

Change the checks to use >= 0, instead. Also, set next_bit in Mapping
to -1 instead of 0 to match these new checks.

There are no changes to the generated sysreg definitons as part of
this change, and conveniently no definitions lack definitions for bit
0.

Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:09:46 +00:00
Yang Shi 37cb0aab90 arm64: mm: make linear mapping permission update more robust for patial range
The commit fcf8dda8cc ("arm64: pageattr: Explicitly bail out when changing
permissions for vmalloc_huge mappings") made permission update for
partial range more robust. But the linear mapping permission update
still assumes update the whole range by iterating from the first page
all the way to the last page of the area.

Make it more robust by updating the linear mapping permission from the
page mapped by start address, and update the number of numpages.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 18:07:21 +00:00
Dev Jain c320dbb7c8 arm64/mm: Elide TLB flush in certain pte protection transitions
Currently arm64 does an unconditional TLB flush in mprotect(). This is not
required for some cases, for example, when changing from PROT_NONE to
PROT_READ | PROT_WRITE (a real usecase - glibc malloc does this to emulate
growing into the non-main heaps), and unsetting uffd-wp in a range.

Therefore, implement pte_needs_flush() for arm64, which is already
implemented by some other arches as well.

Running a userspace program changing permissions back and forth between
PROT_NONE and PROT_READ | PROT_WRITE, and measuring the average time taken
for the none->rw transition, I get a reduction from 3.2 microseconds to
2.85 microseconds, giving a 12.3% improvement.

Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 17:50:25 +00:00
Linu Cherian 1b214452b6 arm64/mm: Rename try_pgd_pgtable_alloc_init_mm
With BUG_ON in pgd_pgtable_alloc_init_mm moved up to higher layer,
gfp flags is the only difference between try_pgd_pgtable_alloc_init_mm
and pgd_pgtable_alloc_init_mm. Hence rename the "try" version
to pgd_pgtable_alloc_init_mm_gfp.

Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Linu Cherian <linu.cherian@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 16:00:19 +00:00
Chaitanya S Prakash bfc184cb1b arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
arch_add_memory() is used to hotplug memory into a system but as a part
of its implementation it calls __create_pgd_mapping(), which uses
pgtable_alloc() in order to build intermediate page tables. As this path
was initally only used during early boot pgtable_alloc() is designed to
BUG_ON() on failure. However, in the event that memory hotplug is
attempted when the system's memory is extremely tight and the allocation
were to fail, it would lead to panicking the system, which is not
desirable. Hence update __create_pgd_mapping and all it's callers to be
non void and propagate -ENOMEM on allocation failure to allow system to
fail gracefully.

But during early boot if there is an allocation failure, we want the
system to panic, hence create a wrapper around __create_pgd_mapping()
called early_create_pgd_mapping() which is designed to panic, if ret
is non zero value. All the init calls are updated to use this wrapper
rather than the modified __create_pgd_mapping() to restore
functionality.

Fixes: 4ab2150615 ("arm64: Add memory hotplug support")
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Signed-off-by: Linu Cherian <linu.cherian@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 16:00:19 +00:00
Anshuman Khandual b0a3f0e894 arm64/sysreg: Replace TCR_EL1 field macros
This just replaces all used TCR_EL1 field macros with tools sysreg variant
based fields and subsequently drops them from the header (pgtable-hwdef.h),
although while retaining the ones used for KVM (represented via the sysreg
tools format).

Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13 15:58:30 +00:00
Xianwei Zhao fc584d871c irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs
The Amlogic S6/S7/S7D SoCs support GPIO interrupt lines:

    S6 IRQ Number:
    - 99:98    2 pins on bank CC
    - 97       1 pin  on bank TESTN
    - 96:81   16 pins on bank A
    - 80:65   16 pins on bank Z
    - 64:45   20 pins on bank X
    - 44:37    8 pins on bank H offs H1
    - 36:32    5 pins on bank F
    - 31:25    7 pins on bank D
    - 24:22    3 pins on bank E
    - 21:14    8 pins on bank C
    - 13:0    14 pins on bank B

    S7 IRQ Number:
    - 83:82    2 pins on bank CC
    - 81       1 pin  on bank TESTN
    - 80:68   13 pins on bank Z
    - 67:48   20 pins on bank X
    - 47:36   12 pins on bank H
    - 35:24   12 pins on bank D
    - 23:22    2 pins on bank E
    - 21:14    8 pins on bank C
    - 13:0    14 pins on bank B

    S7D IRQ Number:
    - 83:82    2 pins on bank CC
    - 81:75    7 pins on bank DV
    - 74       1 pin  on bank TESTN
    - 73:61   13 pins on bank Z
    - 60:41   20 pins on bank X
    - 40:29   12 pins on bank H
    - 28:24    5 pins on bank D
    - 23:22    2 pins on bank E
    - 21:14    8 pins on bank C
    - 13:0    14 pins on bank B

Add the required compatibles and interrupt count initializers.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-2-b4d1fe4781c1@amlogic.com
2025-11-13 14:04:16 +01:00
Xianwei Zhao e4ca152008 dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs
Update the device tree binding document for GPIO interrupt controller of
Amlogic S6 S7 and S7D SoCs.

Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-1-b4d1fe4781c1@amlogic.com
2025-11-13 14:04:16 +01:00
Sean Christopherson 6276c67f2b x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible
Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(),
and use the helper macro to export symbols for KVM throughout x86 if and
only if KVM will build one or more modules, and only for those modules.

To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be
built (because no vendor modules are selected), let arch code #define
EXPORT_SYMBOL_FOR_KVM to suppress/override the exports.

Note, the set of symbols to restrict to KVM was generated by manual search
and audit; any "misses" are due to human error, not some grand plan.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com
2025-11-12 15:29:38 -08:00
Sean Christopherson e6f2d5866c x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs"
Don't export "ptdump_walk_pgd_level_debugfs" as its sole user is
arch/x86/mm/debug_pagetables.c, which can't be built as a module.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251112173944.1380633-4-seanjc%40google.com
2025-11-12 15:24:42 -08:00
Sean Christopherson 9c26c91e10 x86/mtrr: Drop unnecessary export of "mtrr_state"
Don't export "mtrr_state" as usage is limited to arch/x86/kernel/cpu/mtrr
(and nothing outside of that directory even includes the local mtrr.h).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251112173944.1380633-3-seanjc%40google.com
2025-11-12 15:24:42 -08:00
Sean Christopherson ed02882460 x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base"
Don't export x86_spec_ctrl_base as it's used only in bugs.c and process.c,
neither of which can be built into a module.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251112173944.1380633-2-seanjc%40google.com
2025-11-12 15:24:42 -08:00
Haotian Zhang 593ee49222 ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint()
acpi_fwnode_graph_parse_endpoint() calls fwnode_get_parent() to obtain the
parent fwnode but returns without calling fwnode_handle_put() on it. This
potentially leads to a fwnode refcount leak and prevents the parent node
from being released properly.

Call fwnode_handle_put() on the parent fwnode before returning to prevent
the leak from occurring.

Fixes: 3b27d00e7b ("device property: Move fwnode graph ops to firmware specific locations")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
[ rjw: Changelog edits ]
Link: https://patch.msgid.link/20251111075000.1828-1-vulab@iscas.ac.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:23:56 +01:00
Srinivas Pandruvada 172880f7c9 ACPI: DPTF: Support Nova Lake
Add Nova Lake ACPI IDs for DPTF.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251111004552.137984-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:14:21 +01:00
Srinivas Pandruvada dbd911a07f thermal: intel: int340x: Add DLVR support for Nova Lake
Add support for DLVR (Digital Linear Voltage Regulator) for Nova Lake.

There are no new sysfs attributes or difference in operations compared
to prior generations.

MMIO offset and bit positions are changed. Also no mapping is required
as units are already in MHz.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251111004552.137984-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:14:21 +01:00
Srinivas Pandruvada af1b80b941 thermal: int340x: processor_thermal: Add Nova Lake processor thermal device
Add PCI IDs for Nova Lake processor thermal device.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251111004552.137984-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:14:21 +01:00
Kaushlendra Kumar 347d92a795 thermal: intel: int340x: Replace sprintf() with sysfs_emit()
Replace sprintf() calls with sysfs_emit() in sysfs "show" functions to
follow current kernel coding standards.

sysfs_emit() is the preferred method for formatting sysfs output as it
provides better bounds checking and is more secure.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject adjustments, changelog edits ]
Link: https://patch.msgid.link/20251030053410.311656-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:13:13 +01:00
Kaushlendra Kumar 9019907816 thermal: intel: int340x: Use symbolic constant for UUID comparison
Replace sizeof() with a symbolic constant for UUID matching to maintain
existing ABI behavior while improving code clarity. The current behavior
of comparing only the first 7 characters is sufficient to distinguish
all UUIDs and changing to full string comparison would alter the kernel
ABI, potentially breaking existing userspace applications.

Use a defined constant to make the truncated comparison explicit and
maintainable.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject adjustments ]
Link: https://patch.msgid.link/20251030035955.62171-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:13:13 +01:00
Christian Loehle 0796ddf4a7 cpuidle: teo: Use this_cpu_ptr() where possible
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, #16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, #32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, #16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, #32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:02:07 +01:00
Rafael J. Wysocki 76934e495c cpuidle: Add sanity check for exit latency and target residency
Make __cpuidle_driver_init() fail if the exit latency of one of the
driver's idle states is less than its target residency which would
break cpuidle assumptions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Changelog fix ]
Link: https://patch.msgid.link/12779486.O9o76ZdvQC@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 21:00:00 +01:00
Rafael J. Wysocki 9cf02802d6 PM: wakeup: Update after recent wakeup source removal ordering change
After a recent change, wakeup_source_activate() will warn that the given
wakeup source is "unregistered" after its timer has been shut down
in wakeup_source_remove() which may be somewhat confusing, so change
the warning message to say that the wakeup source is "unusable".

Accordingly, rename wakeup_source_not_registered() to
wakeup_source_not_usable() and update the comment in it
to also mention the removal of the wakeup source.

Also restore the comment in wakeup_source_remove() regarding the warning
in wakeup_source_activate() that may trigger after shutting down the
wakeup source timer.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12788103.O9o76ZdvQC@rafael.j.wysocki
2025-11-12 20:56:25 +01:00
Rafael J. Wysocki 62c95ea763 cpufreq: intel_pstate: Use mutex guard for driver locking
Use guard(mutex)(&intel_pstate_driver_lock), or the scoped variant of
it, wherever intel_pstate_driver_lock needs to be held.

This allows some local variables and goto statements to be dropped as
they are not necessary any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://patch.msgid.link/2807232.mvXUDI8C0e@rafael.j.wysocki
2025-11-12 20:54:57 +01:00
Rafael J. Wysocki 25ca66300a amd-pstate content for 6.19 (11/10/25)
* optimizations for parameter array handling
 * fix for mode changes with offline CPUs
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmkSy/cTHHN1cGVybTFA
 a2VybmVsLm9yZwAKCRAtGSymJHcCdqSlD/4nv4PXetfH2Sul/ivGX9tWbGEMBxtD
 iVOGPmeVVDKEkUyptBbIMbN7elWz5BN6VF0EM3FGD2IZcwYDhPRAENoZeotl9isS
 QRgl8427M3UgZ3FbX7QU+SCZRbU6wmuTSxR9mlXtmuVpJ+QQT4BzYXFh0Jcxg9Mp
 6XrL5WFZtzSPrW5RnN/NUm26A+qRwIRZuRTAhxVV9K9Vd6ZlEq7/B+hTrlIz1GW4
 XqSlGnoOQTTTpIlUXlQTtYsn9bquqb+YoPpvEbme7LCWWOxIr9GrETrQ4Yj2ljW6
 cnHvFM1ky2Ld6+yfOMwAdS9aiSgcBtWD41UTxHecVVJyOVNXrw+yOxIpT9K8poWy
 2/iy8qlBz2z5q9VC2ZOM2GNDGlfoiSdKlZpFV8A3O9nb/ZTTKgwN782aycSHMNGC
 5Mj95tgaP00/TU6SIGHyaTg934vn9wnumzVb+cXjLNR35vQU8zfX24ER/Fo4lLSq
 ntTDo1byjCAKi78Ghp0Zlh3wqcWN1cKguqTZ04IRWB5I/D+NoYQvDldm8LGrUkpW
 YS11TxOMupuHFV5Jp7hraJeZDQB7/3wha15SdNUaOaMm+Bb++9zZj7tNAmW0sEz9
 291REgM5Iba1CPK7YGE/4/xtq0nVRzlK4PB2PDJxcTZfbqteiJfeR+9788vWVRIk
 QJIwYJ4sprrlVA==
 =zQeO
 -----END PGP SIGNATURE-----

Merge tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux

Pull amd-pstate content for 6.19 (11/10/25) from Mario Liminciello:

"* optimizations for parameter array handling
 * fix for mode changes with offline CPUs"

* tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux:
  cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs
  cpufreq/amd-pstate: Add static asserts for EPP indices
  cpufreq/amd-pstate: Fix some whitespace issues
  cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status()
  cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL
  cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string
  cpufreq/amd-pstate: Use sysfs_match_string() for epp
2025-11-12 20:51:53 +01:00
Rafael J. Wysocki 377e38859c Merge back cpufreq material for 6.19 2025-11-12 20:50:58 +01:00
Bo Liu 337f7e3a4b arm64: Fix double word in comments
Remove the repeated word "the" in comments.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12 17:07:59 +00:00
mrigendrachaubey 96ac403ea2 arm64: Fix typos and spelling errors in comments
This patch corrects several minor typographical and spelling errors
in comments across multiple arm64 source files.

No functional changes.

Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12 17:06:21 +00:00
Ryan Chen 7083e14225 dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count
Update the AST2700 interrupt controller binding to match the actual
hardware and the irq-aspeed-intc driver behavior.

 - Interrupts:

    First-level INTC banks request multiple interrupt lines to the root
    GIC, with a maximum of 10 per bank. Second-level INTC banks request
    only one interrupt line to their parent INTC-IC. Therefore, set the
    interrupts property to allow a minimum of 1 and a maximum of 10
    entries.

 - #interrupt-cells:

    Set '#interrupt-cells' to <1> since the aspeed intc driver does not
    support specifying a trigger type; only the interrupt index is used.

Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251030060155.2342604-2-ryan_chen@aspeedtech.com
2025-11-11 22:20:45 +01:00
Junhui Liu 47a4ebbf91 irqchip/aclint-sswi: Add Nuclei UX900 support
Reuse the generic ACLINT SSWI probe for Nuclei UX900 since it is
compliant with the ACLINT specification.

Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-9-5478db4f664a@pigmoral.tech
2025-11-11 22:17:22 +01:00
Junhui Liu a1c3a7d7ee dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI
Add SSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a
TIMER unit compliant with the ACLINT specification.

Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-6-5478db4f664a@pigmoral.tech
2025-11-11 22:17:21 +01:00
Junhui Liu 579951da64 dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI
Add MSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a
TIMER unit compliant with the ACLINT specification.

Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-5-5478db4f664a@pigmoral.tech
2025-11-11 22:17:21 +01:00
Junhui Liu b90ac5fe32 dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC
Add PLIC support for Anlogic DR1V90.

Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-4-5478db4f664a@pigmoral.tech
2025-11-11 22:17:21 +01:00
Krzysztof Kozlowski 45cc441de7 irqchip/irq-bcm7038-l1: Remove unused reg_mask_status()
reg_mask_status() is not referenced anywhere leading to W=1 warning:

  irq-bcm7038-l1.c:85:28: error: unused function 'reg_mask_status' [-Werror,-Wunused-function]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20251106155200.337399-2-krzysztof.kozlowski@linaro.org
2025-11-11 22:11:17 +01:00
Charles Mirabile a045359e72 irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path
The code path for M-Mode linux that disables interrupts for other contexts
was missed when refactoring __plic_toggle().

Since the new version caches updates to the state for the primary context,
its use in this codepath is no longer desireable even if it could be made
correct.

Replace the calls to __plic_toggle() with a loop that simply disables all
of the interrupts in groups of 32 with a direct mmio write.

Fixes: 14ff9e54dd ("irqchip/sifive-plic: Cache the interrupt enable state")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251103161813.2437427-1-cmirabil@redhat.com
Closes: https://lore.kernel.org/oe-kbuild-all/202510271316.AQM7gCCy-lkp@intel.com/
2025-11-11 22:11:16 +01:00
Li Qiang df717b9564 arm64: add unlikely hint to MTE async fault check in el0_svc_common
Add unlikely() hint to the _TIF_MTE_ASYNC_FAULT flag check in
el0_svc_common() since asynchronous MTE faults are expected to be
rare occurrences during normal system call execution.

This optimization helps the compiler to improve instruction caching
and branch prediction for the common case where no asynchronous
MTE faults are pending, while maintaining correct behavior for
the exceptional case where such faults need to be handled prior
to system call execution.

Signed-off-by: Li Qiang <liqiang01@kylinos.cn>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:49:19 +00:00
Thomas Huth 287d163322 arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize now
on the __ASSEMBLER__ macro that is provided by the compilers.

This is a mostly mechanical patch (done with a simple "sed -i"
statement), except for the following files where comments with
mis-spelled macros were tweaked manually:

 arch/arm64/include/asm/stacktrace/frame.h
 arch/arm64/include/asm/kvm_ptrauth.h
 arch/arm64/include/asm/debug-monitors.h
 arch/arm64/include/asm/esr.h
 arch/arm64/include/asm/scs.h
 arch/arm64/include/asm/memory.h

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:35:59 +00:00
Thomas Huth 639f08fc20 arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
__ASSEMBLY__ is only defined by the Makefile of the kernel, so
this is not really useful for uapi headers (unless the userspace
Makefile defines it, too). Let's switch to __ASSEMBLER__ which
gets set automatically by the compiler when compiling assembly
code.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:35:58 +00:00
Osama Abdelkader 420cab0155 arm64: acpi: add newline to deferred APEI warning
missing newline in pr_warn_ratelimited in apei_claim_sea

Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:28:37 +00:00
Linus Walleij 555827a064 arm64: entry: Clean out some indirection
The conversion to generic IRQ entry left some functions
in the EL1 (kernel) IRQ entry path very shallow, so drop
the __inner_functions() where appropriate, saving some
time and stack.

This is not a fix but an optimization.

Drop stale comments about irqentry_enter/exit() while we
are at it.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:28:05 +00:00
Anshuman Khandual e2e21a9757 arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52
Although the comment clearly states about PGD table's alignment requirement
(when PA_BITS = 52) but the subsequent BUILD_BUG_ON() tests size comparison
to 64 bytes instead. So change it as an actual alignment test.

Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 19:13:03 +00:00
Ard Biesheuvel a5baf582f4 arm64/efi: Call EFI runtime services without disabling preemption
The only remaining reason why EFI runtime services are invoked with
preemption disabled is the fact that the mm is swapped out behind the
back of the context switching code.

The kernel no longer disables preemption in kernel_neon_begin().
Furthermore, the EFI spec is being clarified to explicitly state that
only baseline FP/SIMD is permitted in EFI runtime service
implementations, and so the existing kernel mode NEON context switching
code is sufficient to preserve and restore the execution context of an
in-progress EFI runtime service call.

Most EFI calls are made from the efi_rts_wq, which is serviced by a
kthread. As kthreads never return to user space, they usually don't have
an mm, and so we can use the existing infrastructure to swap in the
efi_mm while the EFI call is in progress. This is visible to the
scheduler, which will therefore reactivate the selected mm when
switching out the kthread and back in again.

Given that the EFI spec explicitly permits runtime services to be called
with interrupts enabled, firmware code is already required to tolerate
interruptions. So rather than disable preemption, disable only migration
so that EFI runtime services are less likely to cause scheduling delays.
To avoid potential issues where runtime services are interrupted while
polling the secure firmware for async completions, keep migration
disabled so that a runtime service invocation does not resume on a
different CPU from the one it was started on.

Note, though, that the firmware executes at the same privilege level as
the kernel, and is therefore able to disable interrupts altogether.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel 6b9c98e657 arm64/efi: Move uaccess en/disable out of efi_set_pgd()
efi_set_pgd() will no longer be called when invoking EFI runtime
services via the efi_rts_wq work queue, but the uaccess en/disable are
still needed when using PAN emulation using TTBR0 switching. So move
these into the callers.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel 1068cb52e8 arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper
Since commit

  5894cf571e ("acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers")

all EFI runtime calls on arm64 are routed via the EFI runtime wrappers,
which are serialized using the efi_runtime_lock semaphore.

This means the efi_rt_lock spinlock in the arm64 arch wrapper code has
become redundant, and can be dropped. For robustness, replace it with an
assert that the EFI runtime lock is in fact held by 'current'.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel 7137a203b2 arm64/fpsimd: Permit kernel mode NEON with IRQs off
Currently, may_use_simd() will return false when called from a context
where IRQs are disabled. One notable case where this happens is when
calling the ResetSystem() EFI runtime service from the reboot/poweroff
code path. For this case alone, there is a substantial amount of FP/SIMD
support code to handle the corner case where a EFI runtime service is
invoked with IRQs disabled.

The only reason kernel mode SIMD is not allowed when IRQs are disabled
is that re-enabling softirqs in this case produces a noisy diagnostic
when lockdep is enabled. The warning is valid, in the sense that
delivering pending softirqs over the back of the call to
local_bh_enable() is problematic when IRQs are disabled.

While the API lacks a facility to simply mask and unmask softirqs
without triggering their delivery, disabling softirqs is not needed to
begin with when IRQs are disabled, given that softirqs are only every
taken asynchronously over the back of a hard IRQ.

So dis/enable softirq processing conditionally, based on whether IRQs
are enabled, and relax the check in may_use_simd().

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel 1d038e8018 arm64/fpsimd: Don't warn when EFI execution context is preemptible
Kernel mode FP/SIMD no longer requires preemption to be disabled, so
only warn on uses of FP/SIMD from preemptible context if the fallback
path is taken for cases where kernel mode NEON would not be allowed
otherwise.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel a286050120 efi/runtime-wrappers: Keep track of the efi_runtime_lock owner
The EFI runtime wrappers use a file local semaphore to serialize access
to the EFI runtime services. This means that any calls to the arch
wrappers around the runtime services will also be serialized, removing
the need for redundant locking.

For robustness, add a facility that allows those arch wrappers to assert
that the semaphore was taken by the current task.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Ard Biesheuvel 40374d308e efi: Add missing static initializer for efi_mm::cpus_allowed_lock
Initialize the cpus_allowed_lock struct member of efi_mm.

Cc: stable@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11 18:59:22 +00:00
Borislav Petkov (AMD) b2c1dd6c6f x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled()
Drop one redundant definition, while at it.

There should be no functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251031122122.GKaQSpwhLvkinKKbjG@fat_crate.local
2025-11-11 16:42:31 +01:00
Gautham R. Shenoy bb31fef0d0 cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs
amd_pstate_change_mode_without_dvr_change() calls cppc_set_auto_sel()
for all the present CPUs.

However, this callpath eventually calls cppc_set_reg_val() which
accesses the per-cpu cpc_desc_ptr object. This object is initialized
only for online CPUs via acpi_soft_cpu_online() -->
__acpi_processor_start() --> acpi_cppc_processor_probe().

Hence, restrict calling cppc_set_auto_sel() to only the online CPUs.

Fixes: 3ca7bc818d ("cpufreq: amd-pstate: Add guided mode control support via sysfs")
Suggested-by: Mario Limonciello (AMD) (kernel.org) <superm1@kernel.org>
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) 077f23573d cpufreq/amd-pstate: Add static asserts for EPP indices
In case a new index is introduced add a static assert to make sure
that strings and values are updated.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) e9d62ca86a cpufreq/amd-pstate: Fix some whitespace issues
Add whitespace around the equals and remove leading space.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) 92d6146a40 cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status()
get_mode_idx_from_str() already checks the upper boundary for a string
sent.  Drop the extra check in amd_pstate_update_status() and pass
the return code if there is a failure.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) baf106f3a7 cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL
amd_pstate_get_mode_string() is only used by amd-pstate-ut.  Set the
failure path to use AMD_PSTATE_UNDEFINED ("undefined") to avoid showing
"(null)" as a string when running test suite.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) 06791bc017 cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string
None of the users actually look for the NULL value.  To avoid risk
of regression introducing a new value but forgetting to add a string
add a static assert to test AMD_PSTATE_MAX matches the array size.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD) 7e17f48667 cpufreq/amd-pstate: Use sysfs_match_string() for epp
Rather than scanning the buffer and manually matching the string
use the sysfs macros.

Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10 23:35:20 -06:00
Ma Ke f18e71cd6c EDAC/ie31200: Fix error handling in ie31200_register_mci
ie31200_register_mci() calls device_initialize() for priv->dev
unconditionally. However, in the error path, put_device() is not
called, leading to an imbalance. Similarly, in the unload path,
put_device() is missing.

Although edac_mc_free() eventually frees the memory, it does not
release the device initialized by device_initialize(). For code
readability and proper pairing of device_initialize()/put_device(),
add put_device() calls in both error and unload paths.

Found by code review.

Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://patch.msgid.link/20251106084735.35017-1-make24@iscas.ac.cn
2025-11-10 17:06:10 -08:00
Marek Vasut b1c4c05bb0 thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment
The R-Car Gen3 thermal driver supports both R-Car Gen3 and Gen4 SoCs
as well as RZ/G2. Update the driver comment. No functional change.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Link: https://patch.msgid.link/20251110143029.10940-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-11-10 15:56:49 +01:00
Manaf Meethalavalappu Pallikunhi e1304efc19 dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor
Document the Temperature Sensor (TSENS) on the Kaanapali Platform.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com>
Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251021-b4-knp-tsens-v2-1-7b662e2e71b4@oss.qualcomm.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-11-10 13:01:40 +01:00
Ovidiu Panait 30183a67a8 dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU
The Renesas RZ/V2H SoC includes a Thermal Sensor Unit (TSU) block designed
to measure the junction temperature. The device provides real-time
temperature measurements for thermal management, utilizing two dedicated
channels for temperature sensing.

The Renesas RZ/V2H SoC is using the same TSU IP found on the RZ/G3E SoC,
the only difference being that it has two channels instead of one.

Add new compatible string "renesas,r9a09g057-tsu" for RZ/V2H and use
"renesas,r9a09g047-tsu" as a fallback compatible to indicate hardware
compatibility with the RZ/G3E implementation.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20251020143107.13974-3-ovidiu.panait.rb@renesas.com
2025-11-10 12:56:17 +01:00
Uros Bizjak fd4e025526 x86/percpu: Use BIT_WORD() and BIT_MASK() macros
Use BIT_WORD() and BIT_MASK() macros from <linux/bits.h>
in <arch/x86/include/asm/percpu.h> instead of open-coding them.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20250907184915.78041-1-ubizjak@gmail.com
2025-11-10 11:55:54 +01:00
Marco Crivellari 47c303ba6e cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
[ Viresh: Fixed Subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-11-10 16:18:48 +05:30
Christian Marangi 58f5d39d5e cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM
On some IPQ806x SoC SMEM might be not initialized by SBL. This is the
case for some Google devices (the OnHub family) that can't make use of
SMEM to detect the SoC ID (and socinfo can't be used either as it does
depends on SMEM presence).

To handle these specific case, check if the SMEM is not initialized (by
checking if the qcom_smem_get_soc_id returns -ENODEV) and fallback to
OF machine compatible checking to identify the SoC variant.

Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-11-10 16:16:52 +05:30
Kaushlendra Kumar 352899fd91 PM: wakeup: Delete timer before removing wakeup source from list
Replace timer_delete_sync() with timer_shutdown_sync() and move
it before list_del_rcu() in wakeup_source_remove() to improve the
cleanup ordering and code clarity.

This ensures that the timer is stopped before removing the wakeup
source from the events list, providing a more logical cleanup
sequence.

While the current ordering is functionally correct, stopping the
timer first makes the cleanup flow more intuitive and follows the
general pattern of disabling active components before removing data
structures.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251027044127.2456365-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-08 12:17:28 +01:00
Kaushlendra Kumar 2f58be82fc ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation
Replace kfree() with ACPI_FREE() in pch_fivr_read() to follow ACPICA
memory management conventions.

While functionally equivalent in Linux (ACPI_FREE() is implemented
as kfree()), using ACPI_FREE() maintains consistency with ACPICA
coding standards for deallocating ACPI buffer objects.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
[ rjw: Subject and changelog edits ]
Link: https://patch.msgid.link/20251028051554.2862049-1-kaushlendra.kumar@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07 21:24:54 +01:00
Anshuman Khandual fc1abd4093 arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()
These TCR_El1 helpers don't have any other callers. Drop these redundant
indirections completely thus making this code more compact and readable.
No functional change.

Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 20:01:34 +00:00
Mark Rutland a7717cad61 kselftest/arm64: Align zt-test register dumps
The zt-test output is awkward to read, as the 'Expected' value isn't
dumped on its own line and isn't aligned with the 'Got' value beneath.
For example:

  Mismatch: PID=5281, iteration=3270249   Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
          Got      [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
          SVCR: 2

Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get
output that can be read more easily:

  Mismatch: PID=5281, iteration=3270249
          Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469]
          Got      [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
          SVCR: 2

Admittedly this isn't all that important when the 'Got' value is all
zeroes, but otherwise this would be a major help for identifying which
portion of the 'Got' value is not as expected.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kselftest@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 20:00:49 +00:00
Omar Sandoval bf6b3fed18 arm64: remove unused ARCH_PFN_OFFSET
This is only relevant to the FLATMEM memory model, which isn't an option
since commit 782276b4d0 ("arm64: Force SPARSEMEM_VMEMMAP as the only
memory management model").

Signed-off-by: Omar Sandoval <osandov@fb.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 19:59:37 +00:00
Ryo Takakura d3b570eba7 arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack
For those architectures with HAVE_SOFTIRQ_ON_OWN_STACK use
their dedicated softirq stack when !PREEMPT_RT. This condition
is ensured by SOFTIRQ_ON_OWN_STACK.

Let arm64 use SOFTIRQ_ON_OWN_STACK as well to select its
usage of the stack.

Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 19:55:52 +00:00
Dawei Li 4002068508 arm64: Remove assertion on CONFIG_VMAP_STACK
CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit
ef6861b8e6 ("arm64: Mandate VMAP_STACK").

Remove the redundant assertion and headers.

Signed-off-by: Dawei Li <dawei.li@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07 19:46:39 +00:00
Slawomir Rosek 966c9e65ba ACPI: DPTF: Remove int340x thermal scan handler
Using the IS_ENABLED() macro in the int340x_thermal_handler_attach()
forces the kernel to be recompiled when thermal drivers are enabled
or disabled, which is a significant limitation of its modularity.

The IS_ENABLED() macro is particularly problematic for the Android
Generic Kernel Image (GKI) project which uses unified core kernel
while SoC/board support is moved to loadable vendor modules.

The Intel Dynamic Platform and Thermal Framework (DPTF) requires
thermal drivers to be loaded at runtime, thus ACPI bus scan handler
is not needed and acpi_default_enumeration() may create all platform
devices, regardless of the actual setting of CONFIG_INT340X_THERMAL.

Signed-off-by: Slawomir Rosek <srosek@google.com>
Link: https://patch.msgid.link/20251103162516.2606158-3-srosek@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07 20:45:37 +01:00
Slawomir Rosek 13a96342d5 thermal: intel: Select INT340X_THERMAL from INTEL_SOC_DTS_THERMAL
The IRQ used by the Intel SoC DTS thermal device for critical
overheating notification is listed in _CRS of device INT3401 which
therefore needs to be enumerated for Intel SoC DTS thermal to work.

The enumeration happens by binding the int3401_thermal driver to the
INT3401 platform device. Thus CONFIG_INT340X_THERMAL is in fact
necessary for enumerating it, so checking CONFIG_INTEL_SOC_DTS_THERMAL
in int340x_thermal_handler_attach() is pointless and INT340X_THERMAL
may as well be selected by INTEL_SOC_DTS_THERMAL.

Signed-off-by: Slawomir Rosek <srosek@google.com>
[ rjw: New subject ]
Link: https://patch.msgid.link/20251103162516.2606158-2-srosek@google.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07 20:45:37 +01:00
Huisong Li 77ca1612b8 ACPI: processor: idle: Drop redundant C-state count checks
acpi_processor_setup_cstates() and acpi_processor_setup_cpuidle_cx()
are called after successfully obtaining power information. Among other
things, these setup functions check the C-state count against zero.

However, that check is done by acpi_processor_get_power_info_cst()
which will cause acpi_processor_get_power_info() to fail if it does
no pass, so the checks in the two functions mentioned above are
redundant.

Drop those redundant checks.

No intentional functional impact.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
[ rjw: Subject and changelog rewrite ]
Link: https://patch.msgid.link/20251105093647.3557248-1-lihuisong@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07 18:47:58 +01:00
Mario Limonciello (AMD) 39ce15a48f Documentation: power: Correct a mistaken configuration option
Somehow CONFIG_PSTORE_FIRMWARE ended up in this document when I intended
it to be CONFIG_CHROMEOS_PSTORE.  Correct the configuration option and
make it clear that not all options are required.

Fixes: b1f02f005a ("Documentation: power: Add document on debugging shutdown hangs")
Reported-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
[ rjw: Fixes: tag ]
Link: https://patch.msgid.link/20251106142524.3841343-1-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-07 16:19:14 +01:00
Yicong Yang 7ab06ea41a arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY
The arm_pmu driver is using topology_core_has_smt() for retrieving
the SMT implementation which depends on CONFIG_GENERIC_ARCH_TOPOLOGY.
The config is optional on arm platforms so provide a
!CONFIG_GENERIC_ARCH_TOPOLOGY stub for topology_core_has_smt().

Fixes: c3d78c34ad ("perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511041757.vuCGOmFc-lkp@intel.com/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Yicong Yang <yangyccccc@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 13:45:02 +00:00
Robin Murphy 2d7a824807 perf/arm-ni: Fix and optimise register offset calculation
LKP points out an operator precedence oversight in the new NoC S3
support that, annoyingly, my local W=1 build didn't flag. In fixing
that, we can also take the similarly-missed opportunity to cache the
version check itself at event_init time.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511041749.ok8zDP6u-lkp@intel.com/
Fixes: 8fa08f8835 ("perf/arm-ni: Add NoC S3 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07 13:42:46 +00:00
Marco Crivellari 24e3848a2e RAS/CEC: Replace use of system_wq with system_percpu_wq
Switch to using system_percpu_wq because system_wq is going away as part of
a workqueue restructuring.

Currently if a user enqueues a work item using schedule_delayed_work() the
used workqueue is "system_wq" (per-cpu workqueue) while queue_delayed_work()
uses WORK_CPU_UNBOUND (used when a CPU is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use of
WORK_CPU_UNBOUND again.

This lack of consistency cannot be addressed without refactoring the API.
For more details see those commits and the Link tag below.

  128ea9f6cc ("workqueue: Add system_percpu_wq and system_dfl_wq")
  930c2ea566 ("workqueue: Add new WQ_PERCPU flag")

  [ bp: Massage commit message. ]

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de
2025-11-07 13:48:28 +01:00
Sumanth Korikkar c1287d67c3 s390/sclp_mem: Consider global memory_hotplug.memmap_on_memory setting
When the global kernel command line parameter
memory_hotplug.memmap_on_memory is set to false, per-memory-block
memmap_on_memory setting can still be set to true. However, when
configuring memory block, add_memory_resource() would configure it
without memmap_on_memory.

i.e.
Even if the MHP_MEMMAP_ON_MEMORY flag is set,
mhp_supports_memmap_on_memory() returns false unless the kernel command
line parameter "memory_hotplug.memmap_on_memory" is enabled. When both
the flag and the cmdline parameter are set, the memory block can be
configured with or without memmap_on_memory support.

To ensure consistent behavior, permit configuring per-memory-block
memmap_on_memory only when the memory_hotplug.memmap_on_memory kernel
command line parameter is enabled.

This is similar to commit 73954d379e ("dax: add a sysfs knob to
control memmap_on_memory behavior")

Fixes: ff18dcb19a ("s390/sclp: Add support for dynamic (de)configuration of memory")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:18:23 +01:00
Mete Durlu 8840cc4520 s390/hiperdispatch: Decrease steal time threshold
Higher steal time thresholds favor low utilization scenarios, which is not
the common case for s390. Set steal time threshold to a lower value to
prioritize vertical high and medium CPUs sooner and allow high utilization
scenarios to benefit from it.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:17:28 +01:00
Thorsten Blum eb3a9b405b s390/smp: Mark pcpu_delegate() and smp_call_ipl_cpu() as __noreturn
pcpu_delegate() never returns to its caller. If the target CPU is the
current CPU, it calls __pcpu_delegate(), whose delegate function is not
supposed to return. In any case, even if __pcpu_delegate() unexpectedly
returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits
in an infinite loop. Annotate pcpu_delegate() with the __noreturn
attribute to improve compiler optimizations.

Also annotate smp_call_ipl_cpu() accordingly since it always calls
pcpu_delegate().

[hca: Merge two patches from Thorsten Blum]

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:17:28 +01:00
Thorsten Blum f07ebfa5e4 s390/nmi: Annotate s390_handle_damage() with __noreturn
s390_handle_damage() ends by calling the non-returning function
disabled_wait() and therefore also never returns. Annotate it with the
__noreturn compiler attribute to improve compiler optimizations.

Remove the unreachable infinite while loop.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:17:28 +01:00
Bo Liu 858063c1ae s390: Fix double word in comments
Remove the repeated word "the" in comments.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:17:27 +01:00
Heiko Carstens 547e9feb0e Merge branch 'dat-enhancement-1'
Heiko Carstens says:

====================

Add the Dat-Enhancement facility 1 to the list of facilities which are
required to start the kernel. The facility provides the CSPG and IDTE
instructions. In particular the CSPG instruction can be used to replace a
valid page table entry with a different page table entry, which also
differs in the page frame real address.

Without the CSPG instruction it is possible to use the CSP instruction to
change valid page table entries, however it only allows to change the lower
or higher 32 bits of such entries, which means it cannot be used to change
the page frame real address of valid page table entries.

Given that there is code around (e.g. HugeTLB vmemmap optimization) which
requires to change valid page table entries of the kernel mapping, without
the detour over an invalid page table entry, make the CSPG instruction
unconditionally available.

The Dat-Enhancement facility 1 is available since z990, which is older than
the currently supported minimum architecture (z10). Therefore adding this
the architecture level set shouldn't cause any problems.

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:14:00 +01:00
Heiko Carstens 68807a894f s390/mm: Replace the CSP instruction with CSPG
The CSPG instruction is part of the Dat-Enhancement facility 1, which
is always available. Given that it can be used everywhere where also
the CSP instruction can be used, replace CSP with CSPG everywhere.

This allows to remove the csp() inline assembly. Also remove the
unused gmap_pmdp_csp() function.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:12:31 +01:00
Heiko Carstens 220d8e10d6 s390/mm: Remove cpu_has_idte()
Remove cpu_has_idte(). The IDTE instruction is part of the
Dat-Enhancement facility 1, which is always available.
Therefore remove the helper and now superfluous code.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:12:31 +01:00
Heiko Carstens 73c4b5d728 s390: Add Dat-Enhancement facility 1 to architecture level set
Add the Dat-Enhancement facility 1 to the list of facilities which are
required to start the kernel. The facility provides the CSPG and IDTE
instructions. In particular the CSPG instruction can be used to replace a
valid page table entry with a different page table entry, which also
differs in the page frame real address.

Without the CSPG instruction it is possible to use the CSP instruction to
change valid page table entries, however it only allows to change the lower
or higher 32 bits of such entries, which means it cannot be used to change
the page frame real address of valid page table entries.

Given that there is code around (e.g. HugeTLB vmemmap optimization) which
requires to change valid page table entries of the kernel mapping, without
the detour over an invalid page table entry, make the CSPG instruction
unconditionally available.

The Dat-Enhancement facility 1 is available since z990, which is older than
the currently supported minimum architecture (z10). Therefore adding this
to the architecture level set shouldn't cause any problems.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06 14:12:30 +01:00
Avadhut Naik 8616025ae6 EDAC: Remove the legacy EDAC sysfs interface
Commit

  1997471069 ("edac: add a new per-dimm API and make the old per-virtual-rank API obsolete")

introduced a new per-DIMM sysfs interface for EDAC making the old
per-virtual-rank sysfs interface obsolete.

Since this new sysfs interface was introduced more than a decade ago, remove
the obsolete legacy interface.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com
2025-11-06 13:21:29 +01:00
Avadhut Naik 6a85796915 EDAC/amd64: Remove NUM_CONTROLLERS macro
Currently, the NUM_CONTROLLERS macro is used to limit the amount of memory
controllers (UMCs) available per node. The number of UMCs available per node,
however, is already cached by the max_mcs variable of struct amd64_pvt.

Allocate the relevant data structures dynamically using the variable instead
of static allocation through the macro.

The max_mcs variable is used for legacy systems too. These systems have a max
of 2 controllers. Since the default value of max_mcs, set in per_family_init(),
is 2, these legacy systems are also covered.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com
2025-11-06 12:51:33 +01:00
Avadhut Naik e9abd990ae EDAC/amd64: Generate ctl_name string at runtime
Currently, the ctl_name string is statically assigned based on the family and
model of the SOC when the amd64_edac module is loaded.

The same, however, is not exactly needed as the string can be generated and
assigned at runtime through scnprintf().

Remove all static assignments and generate the string at runtime. Also,
cleanup the switch cases which became defunct and consolidate identical cases.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com
2025-11-06 12:35:59 +01:00
Yazen Ghannam 56f17be67a x86/mce/amd: Define threshold restart function for banks
Prepare for CMCI storm support by moving the common bank/block iterator code
to a helper function.

Include a parameter to switch the interrupt enable. This will be used by the
CMCI storm handling function.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 22:38:31 +01:00
Yazen Ghannam 3206b41604 x86/mce/amd: Remove redundant reset_block()
Many of the checks in reset_block() are done again in the block reset
function. So drop the redundant checks.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 22:34:53 +01:00
Yazen Ghannam 4efaec6e16 x86/mce/amd: Support SMCA Corrected Error Interrupt
AMD systems optionally support MCA thresholding which provides the ability for
hardware to send an interrupt when a set error threshold is reached. This
feature counts errors of all severities, but it is commonly used to report
correctable errors with an interrupt rather than polling.

Scalable MCA systems allow the platform to take control of this feature. In
this case, the OS will not see the feature configuration and control bits in
the MCA_MISC* registers. The OS will not receive the MCA thresholding
interrupt, and it will need to poll for correctable errors.

A "corrected error interrupt" will be available on Scalable MCA systems. This
will be used in the same configuration where the platform controls MCA
thresholding. However, the platform will now be able to send the MCA
thresholding interrupt to the OS.

Check for, and enable, this feature during per-CPU SMCA init.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 22:10:23 +01:00
Zuo An 059835bbfa tools/power/cpupower: Support building libcpupower statically
The cpupower Makefile built and installed libcpupower as a shared
library (libcpupower.so) without passing `STATIC=true`, but did not
build a static version of the library even with `STATIC=true`. (Only the
programs were static). Thus, out-of-tree programs using libcpupower
were unable to link statically against the library without having access
to intermediate object files produced during the build.

This fixes that situation by ensuring that libcpupower.a is built and
installed when `STATIC=true` is specified.

Link: https://lore.kernel.org/r/x7geegquiks3zndiavw2arihdc2rk7e2dx3lk7yxkewqii6zpg@tzjijqxyzwmu
Signed-off-by: Zuo An <zuoan.penguin@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-11-05 09:56:01 -07:00
Usama Arif 8436112341 efi/libstub: Fix page table access in 5-level to 4-level paging transition
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3 and
applying PAGE_MASK. This approach has several issues:

- __native_read_cr3() returns the raw CR3 register value, which on x86_64
  includes not just the physical address but also flags Bits above the
  physical address width of the system (i.e. above __PHYSICAL_MASK_SHIFT) are
  also not masked.

- The pgd value is masked by PAGE_SIZE which doesn't take into account the
  higher bits such as _PAGE_BIT_NOPTISHADOW.

Replace this with proper accessor functions:

- native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out
  of CR3 (like SME or LAM bits). All remaining bits are real address bits or
  reserved and must be 0.

- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags
  above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above
  the max physical address are reserved and must be 0.

Fixes: cb1c9e02b0 ("x86/efistub: Perform 4/5 level paging switch from the stub")
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://patch.msgid.link/20251103141002.2280812-3-usamaarif642@gmail.com
2025-11-05 17:31:32 +01:00
Usama Arif eb22663125 x86/boot: Fix page table access in 5-level to 4-level paging transition
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3 and
applying PAGE_MASK. This approach has several issues:

- __native_read_cr3() returns the raw CR3 register value, which on x86_64
  includes not just the physical address but also flags. Bits above the
  physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are
  also not masked.

- The PGD entry is masked by PAGE_SIZE which doesn't take into account the
  higher bits such as _PAGE_BIT_NOPTISHADOW.

Replace this with proper accessor functions:

- native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out
  of CR3 (like SME or LAM bits). All remaining bits are real address bits or
  reserved and must be 0.

- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags
  above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above
  the max physical address are reserved and must be 0.

Fixes: e9d0e6330e ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com
2025-11-05 17:19:11 +01:00
Yazen Ghannam 134b1eabe6 x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
Scalable MCA systems have a per-CPU register that gives the APIC LVT offset
for the thresholding and deferred error interrupts.

Currently, this register is read once to set up the deferred error interrupt
and then read again for each thresholding block. Furthermore, the APIC LVT
registers are configured each time, but they only need to be configured once
per-CPU.

Move the APIC LVT setup to the early part of CPU init, so that the registers
are set up once. Also, this ensures that the kernel is ready to service the
interrupts before the individual error sources (each MCA bank) are enabled.

Apply this change only to SMCA systems to avoid breaking any legacy behavior.
The deferred error interrupt is technically advertised by the SUCCOR feature.
However, this was first made available on SMCA systems.  Therefore, only set
up the deferred error interrupt on SMCA systems and simplify the code.

Guidance from hardware designers is that the LVT offsets provided from the
platform should be used. The kernel should not try to enforce specific values.
However, the kernel should check that an LVT offset is not reused for multiple
sources.

Therefore, remove the extra checking and value enforcement from the MCE code.
The "reuse/conflict" case is already handled in setup_APIC_eilvt().

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 16:51:27 +01:00
Yazen Ghannam 7cb735d7c0 x86/mce: Unify AMD DFR handler with MCA Polling
AMD systems optionally support a deferred error interrupt. The interrupt
should be used as another signal to trigger MCA polling. This is similar to
how other MCA interrupts are handled.

Deferred errors do not require any special handling related to the interrupt,
e.g. resetting or rearming the interrupt, etc.

However, Scalable MCA systems include a pair of registers, MCA_DESTAT and
MCA_DEADDR, that should be checked for valid errors. This check should be done
whenever MCA registers are polled. Currently, the deferred error interrupt
does this check, but the MCA polling function does not.

Call the MCA polling function when handling the deferred error interrupt. This
keeps all "polling" cases in a common function.

Add an SMCA status check helper. This will do the same status check and
register clearing that the interrupt handler has done. And it extends the
common polling flow to find AMD deferred errors.

Clear the MCA_DESTAT register at the end of the handler rather than the
beginning. This maintains the procedure that the 'status' register must be
cleared as the final step.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 16:41:32 +01:00
Yazen Ghannam 34da4a5d68 x86/mce: Unify AMD THR handler with MCA Polling
AMD systems optionally support an MCA thresholding interrupt. The interrupt
should be used as another signal to trigger MCA polling. This is similar to
how the Intel Corrected Machine Check interrupt (CMCI) is handled.

AMD MCA thresholding is managed using the MCA_MISC registers within an MCA
bank. The OS will need to modify the hardware error count field in order to
reset the threshold limit and rearm the interrupt. Management of the MCA_MISC
register should be done as a follow up to the basic MCA polling flow. It
should not be the main focus of the interrupt handler.

Furthermore, future systems will have the ability to send an MCA thresholding
interrupt to the OS even when the OS does not manage the feature, i.e.
MCA_MISC registers are Read-as-Zero/Locked.

Call the common MCA polling function when handling the MCA thresholding
interrupt. This will allow the OS to find any valid errors whether or not the
MCA thresholding feature is OS-managed. Also, this allows the common MCA
polling options and kernel parameters to apply to AMD systems.

Add a callback to the MCA polling function to check and reset any threshold
blocks that have reached their threshold limit.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05 13:41:18 +01:00
Marc Herbert 41f4767000 x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
While restricting access,

  a7e1f67ed2 ("x86/msr: Filter MSR writes")

also added warning and started tainting the kernel.

But the warning message never mentioned tainting. Moreover, this uses the
"CPU_OUT_OF_SPEC" flag which is not clearly related to MSRs: that flag is
overloaded by several, fairly different situations, including some much
scarier ones.

So, without an expert around (thank you Dave Hansen), it would have been
practically impossible to root cause the tainting from just the log file at
hand. So it would be prudent to explicitly mention in the logs when the
tainting happens so that debugging crashes can be made easier.

Fix this by simply appending the CPU_OUT_OF_SPEC flag to the warning message.

This readability issue happened when staring at logs involving the Intel
Memory Latency Checker (among many other things going on in that log). The MLC
disables hardware prefetch.

  [ bp: Massage and extend commit message. ]

Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251101-tainted-msr-v1-1-e00658ba04d4@linux.intel.com
2025-11-05 13:14:42 +01:00
Borislav Petkov (AMD) 47955b58cf x86/cpufeatures: Correct LKGS feature flag description
Quotation marks in cpufeatures.h comments are special and when the
comment begins with a quoted string, that string lands in /proc/cpuinfo,
turning it into a user-visible one.

The LKGS comment doesn't begin with a quoted string but just in case
drop the quoted "kernel" in there to avoid confusion. And while at it,
simply change the description into what the LKGS instruction does for
more clarity.

No functional changes.

Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20251015103548.10194-1-bp@kernel.org
2025-11-04 23:09:34 +01:00
Peter Zijlstra 1fe4002cf7 x86/ptrace: Always inline trivial accessors
A KASAN build bloats these single load/store helpers such that
it fails to inline them:

  vmlinux.o: error: objtool: irqentry_exit+0x5e8: call to instruction_pointer_set() with UACCESS enabled

Make sure the compiler isn't allowed to do stupid.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net
2025-11-04 08:36:20 +01:00
Peter Zijlstra 323d93f043 cleanup: Always inline everything
KASAN bloat caused cleanup helper functions to not get inlined:

  vmlinux.o: error: objtool: irqentry_exit+0x323: call to class_user_rw_access_destructor() with UACCESS enabled

Force inline all the cleanup helpers like they already are on normal
builds.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net
2025-11-04 08:35:58 +01:00
Thomas Gleixner 32034df66b rseq: Switch to TIF_RSEQ if supported
TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is suboptimal especially
with the RSEQ fast path depending on it, but not really handling it.

Define a separate TIF_RSEQ in the generic TIF space and enable the full
separation of fast and slow path for architectures which utilize that.

That avoids the hassle with invocations of resume_user_mode_work() from
hypervisors, which clear TIF_NOTIFY_RESUME. It makes the therefore required
re-evaluation at the end of vcpu_run() a NOOP on architectures which
utilize the generic TIF space and have a separate TIF_RSEQ.

The hypervisor TIF handling does not include the separate TIF_RSEQ as there
is no point in doing so. The guest does neither know nor care about the VMM
host applications RSEQ state. That state is only relevant when the ioctl()
returns to user space.

The fastpath implementation still utilizes TIF_NOTIFY_RESUME for failure
handling, but this only happens within exit_to_user_mode_loop(), so
arguably the hypervisor ioctl() code is long done when this happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.903622031@linutronix.de
2025-11-04 08:35:37 +01:00
Thomas Gleixner 7a5201ea19 rseq: Split up rseq_exit_to_user_mode()
Separate the interrupt and syscall exit handling. Syscall exit does not
require to clear the user_irq bit as it can't be set. On interrupt exit it
can be set when the interrupt did not result in a scheduling event and
therefore the return path did not invoke the TIF work handling, which would
have cleared it.

The debug check for the event state is also not really required even when
debug mode is enabled via the static key. Debug mode is largely aiding user
space by enabling a larger amount of validation checks, which cause a
segfault when a malformed critical section is detected. In production mode
the critical section handling takes the content mostly as is and lets user
space keep the pieces when it screwed up.

On kernel changes in that area the state check is useful, but that can be
done when lockdep is enabled, which is anyway a required test scenario for
fundamental changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.842785700@linutronix.de
2025-11-04 08:35:30 +01:00
Thomas Gleixner 70fe25a3bc entry: Split up exit_to_user_mode_prepare()
exit_to_user_mode_prepare() is used for both interrupts and syscalls, but
there is extra rseq work, which is only required for in the interrupt exit
case.

Split up the function and provide wrappers for syscalls and interrupts,
which allows to separate the rseq exit work in the next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.782234789@linutronix.de
2025-11-04 08:35:17 +01:00
Thomas Gleixner 3db6b38dfe rseq: Switch to fast path processing on exit to user
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.

This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.

This results in the following improvements:

  Kernel build	       Before		  After		      Reduction

  exit to user         80692981		  80514451
  signal checks:          32581		       121	       99%
  slowpath runs:        1201408   1.49%	       198 0.00%      100%
  fastpath runs:			    675941 0.84%       N/A
  id updates:           1233989   1.53%	     50541 0.06%       96%
  cs checks:            1125366   1.39%	         0 0.00%      100%
    cs cleared:         1125366      100%	 0            100%
    cs fixup:                 0        0%	 0

  RSEQ selftests      Before		  After		      Reduction

  exit to user:       386281778		  387373750
  signal checks:       35661203		          0           100%
  slowpath runs:      140542396 36.38%	        100  0.00%    100%
  fastpath runs:			    9509789  2.51%     N/A
  id updates:         176203599 45.62%	    9087994  2.35%     95%
  cs checks:          175587856 45.46%	    4728394  1.22%     98%
    cs cleared:       172359544   98.16%    1319307   27.90%   99%
    cs fixup:           3228312    1.84%    3409087   72.10%

The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to
user invocations, they are relative to the actual 'cs check' invocations.

While some of this could have been avoided in the original code, like the
obvious clearing of CS when it's already clear, the main problem of going
through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
notify handler is invoked more than once before going out to user
space. Doing this once when everything has stabilized is the only solution
to avoid this.

The initial attempt to completely decouple it from the TIF work turned out
to be suboptimal for workloads, which do a lot of quick and short system
calls. Even if the fast path decision is only 4 instructions (including a
conditional branch), this adds up quickly and becomes measurable when the
rate for actually having to handle rseq is in the low single digit
percentage range of user/kernel transitions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de
2025-11-04 08:34:39 +01:00
Thomas Gleixner 05b44aef70 rseq: Implement fast path for exit to user
Implement the actual logic for handling RSEQ updates in a fast path after
handling the TIF work and at the point where the task is actually returning
to user space.

This is the right point to do that because at this point the CPU and the MM
CID are stable and cannot longer change due to yet another reschedule.
That happens when the task is handling it via TIF_NOTIFY_RESUME in
resume_user_mode_work(), which is invoked from the exit to user mode work
loop.

The function is invoked after the TIF work is handled and runs with
interrupts disabled, which means it cannot resolve page faults. It
therefore disables page faults and in case the access to the user space
memory faults, it:

  - notes the fail in the event struct
  - raises TIF_NOTIFY_RESUME
  - returns false to the caller

The caller has to go back to the TIF work, which runs with interrupts
enabled and therefore can resolve the page faults. This happens mostly on
fork() when the memory is marked COW.

If the user memory inspection finds invalid data, the function returns
false as well and sets the fatal flag in the event struct along with
TIF_NOTIFY_RESUME. The slow path notify handler has to evaluate that flag
and terminate the task with SIGSEGV as documented.

The initial decision to invoke any of this is based on one flags in the
event struct: @sched_switch. The decision is in pseudo ASM:

      load	tsk::event::sched_switch
      jnz	inspect_user_space
      mov	$0, tsk::event::events
      ...
      leave

So for the common case where the task was not scheduled out, this really
boils down to three instructions before going out if the compiler is not
completely stupid (and yes, some of them are).

If the condition is true, then it checks, whether CPU ID or MM CID have
changed. If so, then the CPU/MM IDs have to be updated and are thereby
cached for the next round. The update unconditionally retrieves the user
space critical section address to spare another user*begin/end() pair.  If
that's not zero and tsk::event::user_irq is set, then the critical section
is analyzed and acted upon. If either zero or the entry came via syscall
the critical section analysis is skipped.

If the comparison is false then the critical section has to be analyzed
because the event flag is then only true when entry from user was by
interrupt.

This is provided without the actual hookup to let reviewers focus on the
implementation details. The hookup happens in the next step.

Note: As with quite some other optimizations this depends on the generic
entry infrastructure and is not enabled to be sucked into random
architecture implementations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.638929615@linutronix.de
2025-11-04 08:34:18 +01:00
Thomas Gleixner 39a167560a rseq: Optimize event setting
After removing the various condition bits earlier it turns out that one
extra information is needed to avoid setting event::sched_switch and
TIF_NOTIFY_RESUME unconditionally on every context switch.

The update of the RSEQ user space memory is only required, when either

  the task was interrupted in user space and schedules

or

  the CPU or MM CID changes in schedule() independent of the entry mode

Right now only the interrupt from user information is available.

Add an event flag, which is set when the CPU or MM CID or both change.

Evaluate this event in the scheduler to decide whether the sched_switch
event and the TIF bit need to be set.

It's an extra conditional in context_switch(), but the downside of
unconditionally handling RSEQ after a context switch to user is way more
significant. The utilized boolean logic minimizes this to a single
conditional branch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.578058898@linutronix.de
2025-11-04 08:34:03 +01:00
Thomas Gleixner e2d4f42271 rseq: Rework the TIF_NOTIFY handler
Replace the whole logic with a new implementation, which is shared with
signal delivery and the upcoming exit fast path.

Contrary to the original implementation, this ignores invocations from
KVM/IO-uring, which invoke resume_user_mode_work() with the @regs argument
set to NULL.

The original implementation updated the CPU/Node/MM CID fields, but that
was just a side effect, which was addressing the problem that this
invocation cleared TIF_NOTIFY_RESUME, which in turn could cause an update
on return to user space to be lost.

This problem has been addressed differently, so that it's not longer
required to do that update before entering the guest.

That might be considered a user visible change, when the hosts thread TLS
memory is mapped into the guest, but as this was never intentionally
supported, this abuse of kernel internal implementation details is not
considered an ABI break.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.517640811@linutronix.de
2025-11-04 08:33:54 +01:00
Thomas Gleixner 9f6ffd4ceb rseq: Separate the signal delivery path
Completely separate the signal delivery path from the notify handler as
they have different semantics versus the event handling.

The signal delivery only needs to ensure that the interrupted user context
was not in a critical section or the section is aborted before it switches
to the signal frame context. The signal frame context does not have the
original instruction pointer anymore, so that can't be handled on exit to
user space.

No point in updating the CPU/CID ids as they might change again before the
task returns to user space for real.

The fast path optimization, which checks for the 'entry from user via
interrupt' condition is only available for architectures which use the
generic entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.455429038@linutronix.de
2025-11-04 08:33:47 +01:00
Thomas Gleixner 0f085b4188 rseq: Provide and use rseq_set_ids()
Provide a new and straight forward implementation to set the IDs (CPU ID,
Node ID and MM CID), which can be later inlined into the fast path.

It does all operations in one scoped_user_rw_access() section and retrieves
also the critical section member (rseq::cs_rseq) from user space to avoid
another user..begin/end() pair. This is in preparation for optimizing the
fast path to avoid extra work when not required.

On rseq registration set the CPU ID fields to RSEQ_CPU_ID_UNINITIALIZED and
node and MM CID to zero. That's the same as the kernel internal reset
values. That makes the debug validation in the exit code work correctly on
the first exit to user space.

Use it to replace the whole related zoo in rseq.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.393972266@linutronix.de
2025-11-04 08:33:33 +01:00
Thomas Gleixner eaa9088d56 rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y
Make the syscall exit debug mechanism available via the static branch on
architectures which utilize the generic entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.333440475@linutronix.de
2025-11-04 08:33:27 +01:00
Thomas Gleixner c1cbad8f99 rseq: Make exit debugging static branch based
Disconnect it from the config switch and use the static debug branch. This
is a temporary measure for validating the rework. At the end this check
needs to be hidden behind lockdep as it has nothing to do with the other
debug infrastructure, which mainly aids user space debugging by enabling a
zoo of checks which terminate misbehaving tasks instead of letting them
keep the hard to diagnose pieces.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.272660745@linutronix.de
2025-11-04 08:33:20 +01:00
Thomas Gleixner f7ee1964ac rseq: Replace the original debug implementation
Just utilize the new infrastructure and put the original one to rest.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.212510692@linutronix.de
2025-11-04 08:33:12 +01:00
Thomas Gleixner abc850e761 rseq: Provide and use rseq_update_user_cs()
Provide a straight forward implementation to check for and eventually
clear/fixup critical sections in user space.

The non-debug version does only the minimal sanity checks and aims for
efficiency.

There are two attack vectors, which are checked for:

  1) An abort IP which is in the kernel address space. That would cause at
     least x86 to return to kernel space via IRET.

  2) A rogue critical section descriptor with an abort IP pointing to some
     arbitrary address, which is not preceded by the RSEQ signature.

If the section descriptors are invalid then the resulting misbehaviour of
the user space application is not the kernels problem.

The kernel provides a run-time switchable debug slow path, which implements
the full zoo of checks including termination of the task when one of the
gazillion conditions is not met.

Replace the zoo in rseq.c with it and invoke it from the TIF_NOTIFY_RESUME
handler. Move the remainders into the CONFIG_DEBUG_RSEQ section, which will
be replaced and removed in a subsequent step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.151465632@linutronix.de
2025-11-04 08:32:57 +01:00
Thomas Gleixner 9c37cb6e80 rseq: Provide static branch for runtime debugging
Config based debug is rarely turned on and is not available easily when
things go wrong.

Provide a static branch to allow permanent integration of debug mechanisms
along with the usual toggles in Kconfig, command line and debugfs.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de
2025-11-04 08:32:49 +01:00
Thomas Gleixner 5412910487 rseq: Expose lightweight statistics in debugfs
Analyzing the call frequency without actually using tracing is helpful for
analysis of this infrastructure. The overhead is minimal as it just
increments a per CPU counter associated to each operation.

The debugfs readout provides a racy sum of all counters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de
2025-11-04 08:32:41 +01:00
Thomas Gleixner dab344753e rseq: Provide tracepoint wrappers for inline code
Provide tracepoint wrappers for the upcoming RSEQ exit to user space inline
fast path, so that the header can be safely included by code which defines
actual trace points.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.967114316@linutronix.de
2025-11-04 08:32:35 +01:00
Thomas Gleixner 2fc0e4b412 rseq: Record interrupt from user space
For RSEQ the only relevant reason to inspect and eventually fixup (abort)
user space critical sections is when user space was interrupted and the
task was scheduled out.

If the user to kernel entry was from a syscall no fixup is required. If
user space invokes a syscall from a critical section it can keep the
pieces as documented.

This is only supported on architectures which utilize the generic entry
code. If your architecture does not use it, bad luck.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.905067101@linutronix.de
2025-11-04 08:32:23 +01:00
Thomas Gleixner 4b7de6df20 rseq: Cache CPU ID and MM CID values
In preparation for rewriting RSEQ exit to user space handling provide
storage to cache the CPU ID and MM CID values which were written to user
space. That prepares for a quick check, which avoids the update when
nothing changed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.841964081@linutronix.de
2025-11-04 08:32:14 +01:00
Thomas Gleixner 4fc9225d19 sched: Move MM CID related functions to sched.h
There is nothing mm specific in that and including mm.h can cause header
recursion hell.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.778457951@linutronix.de
2025-11-04 08:32:04 +01:00
Thomas Gleixner 7702a9c285 entry: Inline irqentry_enter/exit_from/to_user_mode()
There is no point to have this as a function which just inlines
enter_from_user_mode(). The function call overhead is larger than the
function itself.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.715309918@linutronix.de
2025-11-04 08:31:47 +01:00
Thomas Gleixner 54a5ab5624 entry: Remove syscall_enter_from_user_mode_prepare()
Open code the only user in the x86 syscall code and reduce the zoo of
functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.652839989@linutronix.de
2025-11-04 08:31:37 +01:00
Thomas Gleixner 5204be1679 entry: Clean up header
Clean up the include ordering, kernel-doc and other trivialities before
making further changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.590338411@linutronix.de
2025-11-04 08:31:14 +01:00
Thomas Gleixner faba9d250e rseq: Introduce struct rseq_data
In preparation for a major rewrite of this code, provide a data structure
for rseq management.

Put all the rseq related data into it (except for the debug part), which
allows to simplify fork/execve by using memset() and memcpy() instead of
adding new fields to initialize over and over.

Create a storage struct for event management as well and put the
sched_switch event and a indicator for RSEQ on a task into it as a
start. That uses a union, which allows to mask and clear the whole lot
efficiently.

The indicators are explicitly not a bit field. Bit fields generate abysmal
code.

The boolean members are defined as u8 as that actually guarantees that it
fits. There seem to be strange architecture ABIs which need more than 8
bits for a boolean.

The has_rseq member is redundant vs. task::rseq, but it turns out that
boolean operations and quick checks on the union generate better code than
fiddling with separate entities and data types.

This struct will be extended over time to carry more information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.527086690@linutronix.de
2025-11-04 08:30:50 +01:00
Thomas Gleixner 566d8015f7 rseq: Avoid CPU/MM CID updates when no event pending
There is no need to update these values unconditionally if there is no
event pending.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.462964916@linutronix.de
2025-11-04 08:30:43 +01:00
Thomas Gleixner 83409986f4 rseq, virt: Retrigger RSEQ after vcpu_run()
Hypervisors invoke resume_user_mode_work() before entering the guest, which
clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user
space context available to them, so the rseq notify handler skips
inspecting the critical section, but updates the CPU/MM CID values
unconditionally so that the eventual pending rseq event is not lost on the
way to user space.

This is a pointless exercise as the task might be rescheduled before
actually returning to user space and it creates unnecessary work in the
vcpu_run() loops.

It's way more efficient to ignore that invocation based on @regs == NULL
and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the
vcpu_run() loop before returning from the ioctl().

This ensures that a pending RSEQ update is not lost and the IDs are updated
before returning to user space.

Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into
a NOOP.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251027084306.399495855@linutronix.de
2025-11-04 08:30:23 +01:00
Thomas Gleixner d923739e2e rseq: Simplify the event notification
Since commit 0190e4198e ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_*
flags") the bits in task::rseq_event_mask are meaningless and just extra
work in terms of setting them individually.

Aside of that the only relevant point where an event has to be raised is
context switch. Neither the CPU nor MM CID can change without going through
a context switch.

Collapse them all into a single boolean which simplifies the code a lot and
remove the pointless invocations which have been sprinkled all over the
place for no value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.336978188@linutronix.de
2025-11-04 08:30:09 +01:00
Thomas Gleixner 067b3b41b4 rseq: Simplify registration
There is no point to read the critical section element in the newly
registered user space RSEQ struct first in order to clear it.

Just clear it and be done with it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.274661227@linutronix.de
2025-11-04 08:30:05 +01:00
Thomas Gleixner 41b43a6ba3 rseq: Remove the ksig argument from rseq_handle_notify_resume()
There is no point for this being visible in the resume_to_user_mode()
handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.211520245@linutronix.de
2025-11-04 08:30:01 +01:00
Thomas Gleixner 77f19e4d4f rseq: Move algorithm comment to top
Move the comment which documents the RSEQ algorithm to the top of the file,
so it does not create horrible diffs later when the actual implementation
is fed into the mincer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.149519580@linutronix.de
2025-11-04 08:29:52 +01:00
Thomas Gleixner fdc0f39d28 rseq: Condense the inline stubs
Scrolling over tons of pointless

	{
	}

lines to find the actual code is annoying at best.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.085971048@linutronix.de
2025-11-04 08:29:08 +01:00
Thomas Gleixner 3ca59da7aa rseq: Avoid pointless evaluation in __rseq_notify_resume()
The RSEQ critical section mechanism only clears the event mask when a
critical section is registered, otherwise it is stale and collects
bits.

That means once a critical section is installed the first invocation of
that code when TIF_NOTIFY_RESUME is set will abort the critical section,
even when the TIF bit was not raised by the rseq preempt/migrate/signal
helpers.

This also has a performance implication because TIF_NOTIFY_RESUME is a
multiplexing TIF bit, which is utilized by quite some infrastructure. That
means every invocation of __rseq_notify_resume() goes unconditionally
through the heavy lifting of user space access and consistency checks even
if there is no reason to do so.

Keeping the stale event mask around when exiting to user space also
prevents it from being utilized by the upcoming time slice extension
mechanism.

Avoid this by reading and clearing the event mask before doing the user
space critical section access with interrupts or preemption disabled, which
ensures that the read and clear operation is CPU local atomic versus
scheduling and the membarrier IPI.

This is correct as after re-enabling interrupts/preemption any relevant
event will set the bit again and raise TIF_NOTIFY_RESUME, which makes the
user space exit code take another round of TIF bit clearing.

If the event mask was non-zero, invoke the slow path. On debug kernels the
slow path is invoked unconditionally and the result of the event mask
evaluation is handed in.

Add a exit path check after the TIF bit loop, which validates on debug
kernels that the event mask is zero before exiting to user space.

While at it reword the convoluted comment why the pt_regs pointer can be
NULL under certain circumstances.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084306.022571576@linutronix.de
2025-11-04 08:28:38 +01:00
Thomas Gleixner 3ce17e6909 select: Convert to scoped user access
Replace the open coded implementation with the scoped user access guard.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.862419776@linutronix.de
2025-11-04 08:28:34 +01:00
Thomas Gleixner e02718c986 x86/futex: Convert to scoped user access
Replace the open coded implementation with the scoped user access
guards

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251027083745.799714344@linutronix.de
2025-11-04 08:28:29 +01:00
Thomas Gleixner e4e28fd698 futex: Convert to get/put_user_inline()
Replace the open coded implementation with the new get/put_user_inline()
helpers. This might be replaced by a regular get/put_user(), but that needs
a proper performance evaluation.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251027083745.736737934@linutronix.de
2025-11-04 08:28:23 +01:00
Thomas Gleixner b2cfc0cd68 uaccess: Provide put/get_user_inline()
Provide convenience wrappers around scoped user access similar to
put/get_user(), which reduce the usage sites to:

       if (!get_user_inline(val, ptr))
       		return -EFAULT;

Should only be used if there is a demonstrable performance benefit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.609031602@linutronix.de
2025-11-04 08:28:15 +01:00
Thomas Gleixner e497310b4f uaccess: Provide scoped user access regions
User space access regions are tedious and require similar code patterns all
over the place:

     	if (!user_read_access_begin(from, sizeof(*from)))
		return -EFAULT;
	unsafe_get_user(val, from, Efault);
	user_read_access_end();
	return 0;
Efault:
	user_read_access_end();
	return -EFAULT;

This got worse with the recent addition of masked user access, which
optimizes the speculation prevention:

	if (can_do_masked_user_access())
		from = masked_user_read_access_begin((from));
	else if (!user_read_access_begin(from, sizeof(*from)))
		return -EFAULT;
	unsafe_get_user(val, from, Efault);
	user_read_access_end();
	return 0;
Efault:
	user_read_access_end();
	return -EFAULT;

There have been issues with using the wrong user_*_access_end() variant in
the error path and other typical Copy&Pasta problems, e.g. using the wrong
fault label in the user accessor which ends up using the wrong accesss end
variant.

These patterns beg for scopes with automatic cleanup. The resulting outcome
is:
    	scoped_user_read_access(from, Efault)
		unsafe_get_user(val, from, Efault);
	return 0;
  Efault:
	return -EFAULT;

The scope guarantees the proper cleanup for the access mode is invoked both
in the success and the failure (fault) path.

The scoped_user_$MODE_access() macros are implemented as self terminating
nested for() loops. Thanks to Andrew Cooper for pointing me at them. The
scope can therefore be left with 'break', 'goto' and 'return'.  Even
'continue' "works" due to the self termination mechanism. Both GCC and
clang optimize all the convoluted macro maze out and the above results with
clang in:

 b80:	f3 0f 1e fa          	       endbr64
 b84:	48 b8 ef cd ab 89 67 45 23 01  movabs $0x123456789abcdef,%rax
 b8e:	48 39 c7    	               cmp    %rax,%rdi
 b91:	48 0f 47 f8          	       cmova  %rax,%rdi
 b95:	90                   	       nop
 b96:	90                   	       nop
 b97:	90                   	       nop
 b98:	31 c9                	       xor    %ecx,%ecx
 b9a:	8b 07                	       mov    (%rdi),%eax
 b9c:	89 06                	       mov    %eax,(%rsi)
 b9e:	85 c9                	       test   %ecx,%ecx
 ba0:	0f 94 c0             	       sete   %al
 ba3:	90                   	       nop
 ba4:	90                   	       nop
 ba5:	90                   	       nop
 ba6:	c3                   	       ret

Which looks as compact as it gets. The NOPs are placeholder for STAC/CLAC.
GCC emits the fault path seperately:

 bf0:	f3 0f 1e fa          	       endbr64
 bf4:	48 b8 ef cd ab 89 67 45 23 01  movabs $0x123456789abcdef,%rax
 bfe:	48 39 c7             	       cmp    %rax,%rdi
 c01:	48 0f 47 f8          	       cmova  %rax,%rdi
 c05:	90                   	       nop
 c06:	90                   	       nop
 c07:	90                   	       nop
 c08:	31 d2                	       xor    %edx,%edx
 c0a:	8b 07                	       mov    (%rdi),%eax
 c0c:	89 06                	       mov    %eax,(%rsi)
 c0e:	85 d2                	       test   %edx,%edx
 c10:	75 09                	       jne    c1b <afoo+0x2b>
 c12:	90                   	       nop
 c13:	90                   	       nop
 c14:	90                   	       nop
 c15:	b8 01 00 00 00       	       mov    $0x1,%eax
 c1a:	c3                   	       ret
 c1b:	90                   	       nop
 c1c:	90                   	       nop
 c1d:	90                   	       nop
 c1e:	31 c0                	       xor    %eax,%eax
 c20:	c3                   	       ret

The fault labels for the scoped*() macros and the fault labels for the
actual user space accessors can be shared and must be placed outside of the
scope.

If masked user access is enabled on an architecture, then the pointer
handed in to scoped_user_$MODE_access() can be modified to point to a
guaranteed faulting user address. This modification is only scope local as
the pointer is aliased inside the scope. When the scope is left the alias
is not longer in effect. IOW the original pointer value is preserved so it
can be used e.g. for fixup or diagnostic purposes in the fault path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.546420421@linutronix.de
2025-11-04 08:27:52 +01:00
Thomas Gleixner 2db48d8bf8 arm64: uaccess: Use unsafe wrappers for ASM GOTO
Clang propagates a provided label, which is outside of a cleanup scope to
ASM GOTO despite the fact that __raw_get_mem() has a local label for that
purpose:

  "error: cannot jump from this asm goto statement to one of its possible targets"

Using the unsafe wrapper with the extra local label indirection cures that.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-11-04 08:27:20 +01:00
Mario Limonciello (AMD) b1f02f005a Documentation: power: Add document on debugging shutdown hangs
If the kernel hangs while shutting down, ideally a UART log should
be captured to debug the problem.  However if one isn't available,
users can use the pstore functionality to retrieve logs.

Add a document explaining how this works to make it more accessible
to users.

Tested-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251025004341.2386868-1-superm1@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 20:55:07 +01:00
Bagas Sanjaya 4ab25c9214 Documentation: intel-pstate: Use :ref: directive for internal linking
intel_pstate docs uses standard reST construct (`Section title`_) for
cross-referencing sections (internal linking), rather than for external
links. Incorrect cross-references are not caught when these are written
in that syntax, however (fortunately docutils 0.22 raise duplicate
target warnings that get fixed in cb908f8b0a ("Documentation:
intel_pstate: fix duplicate hyperlink target errors")).

Convert the cross-references to use :ref: directive, which doesn't
exhibit this problem.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Changelog tweak ]
Link: https://patch.msgid.link/20251101055614.32270-1-bagasdotme@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 19:20:53 +01:00
Marco Crivellari 2817e6fa84 ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20251030154739.262582-6-marco.crivellari@suse.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 18:45:42 +01:00
Marco Crivellari ec4291f524 ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20251030154739.262582-5-marco.crivellari@suse.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 18:45:42 +01:00
Marco Crivellari 87c21e2406 ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20251030154739.262582-4-marco.crivellari@suse.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 18:45:42 +01:00
Marco Crivellari 6447ece47c ACPI: OSL: replace use of system_wq with system_percpu_wq
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

system_wq should be the per-cpu workqueue, yet in this name nothing makes
that clear, so replace system_wq with system_percpu_wq.

The old wq (system_wq) will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251030154739.262582-3-marco.crivellari@suse.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 18:45:42 +01:00
Marco Crivellari 0327c504e2 ACPI: scan: replace use of system_unbound_wq with system_dfl_wq
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251030154739.262582-2-marco.crivellari@suse.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-03 18:45:42 +01:00
Thomas Gleixner 43cc54d8db s390/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.

S390 is not affected for unsafe_*_user() as it uses its own local label
already, but __get/put_kernel_nofault() lack that.

Rename them to arch_*_kernel_nofault() which makes the generic uaccess
header wrap it with a local label that makes both compilers emit correct
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://patch.msgid.link/20251027083745.483079889@linutronix.de
2025-11-03 15:26:10 +01:00
Thomas Gleixner 0988ea18c6 riscv/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.

Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.419351819@linutronix.de
2025-11-03 15:26:10 +01:00
Thomas Gleixner 5002dd5314 powerpc/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.

Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.356628509@linutronix.de
2025-11-03 15:26:09 +01:00
Thomas Gleixner 14219398e3 x86/uaccess: Use unsafe wrappers for ASM GOTO
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

It ends up leaking the pagefault disable counter in the fault path. clang
at least fails the build.

Rename unsafe_*_user() to arch_unsafe_*_user() which makes the generic
uaccess header wrap it with a local label that makes both compilers emit
correct code. Same for the kernel_nofault() variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027083745.294359925@linutronix.de
2025-11-03 15:26:09 +01:00
Thomas Gleixner 3eb6660f26 uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user()
ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope:

bool foo(u32 __user *p, u32 val)
{
	scoped_guard(pagefault)
		unsafe_put_user(val, p, efault);
	return true;
efault:
	return false;
}

 e80:	e8 00 00 00 00       	call   e85 <foo+0x5>
 e85:	65 48 8b 05 00 00 00 00 mov    %gs:0x0(%rip),%rax
 e8d:	83 80 04 14 00 00 01 	addl   $0x1,0x1404(%rax)   // pf_disable++
 e94:	89 37                	mov    %esi,(%rdi)
 e96:	83 a8 04 14 00 00 01 	subl   $0x1,0x1404(%rax)   // pf_disable--
 e9d:	b8 01 00 00 00       	mov    $0x1,%eax           // success
 ea2:	e9 00 00 00 00       	jmp    ea7 <foo+0x27>      // ret
 ea7:	31 c0                	xor    %eax,%eax           // fail
 ea9:	e9 00 00 00 00       	jmp    eae <foo+0x2e>      // ret

which is broken as it leaks the pagefault disable counter on failure.

Clang at least fails the build.

Linus suggested to add a local label into the macro scope and let that
jump to the actual caller supplied error label.

       	__label__ local_label;                                  \
        arch_unsafe_get_user(x, ptr, local_label);              \
	if (0) {                                                \
	local_label:                                            \
		goto label;                                     \

That works for both GCC and clang.

clang:

 c80:	0f 1f 44 00 00       	   nopl   0x0(%rax,%rax,1)
 c85:	65 48 8b 0c 25 00 00 00 00 mov    %gs:0x0,%rcx
 c8e:	ff 81 04 14 00 00    	   incl   0x1404(%rcx)	   // pf_disable++
 c94:	31 c0                	   xor    %eax,%eax        // set retval to false
 c96:	89 37                      mov    %esi,(%rdi)      // write
 c98:	b0 01                	   mov    $0x1,%al         // set retval to true
 c9a:	ff 89 04 14 00 00    	   decl   0x1404(%rcx)     // pf_disable--
 ca0:	2e e9 00 00 00 00    	   cs jmp ca6 <foo+0x26>   // ret

The exception table entry points correctly to c9a

GCC:

 f70:   e8 00 00 00 00          call   f75 <baz+0x5>
 f75:   65 48 8b 05 00 00 00 00 mov    %gs:0x0(%rip),%rax
 f7d:   83 80 04 14 00 00 01    addl   $0x1,0x1404(%rax)  // pf_disable++
 f84:   8b 17                   mov    (%rdi),%edx
 f86:   89 16                   mov    %edx,(%rsi)
 f88:   83 a8 04 14 00 00 01    subl   $0x1,0x1404(%rax) // pf_disable--
 f8f:   b8 01 00 00 00          mov    $0x1,%eax         // success
 f94:   e9 00 00 00 00          jmp    f99 <baz+0x29>    // ret
 f99:   83 a8 04 14 00 00 01    subl   $0x1,0x1404(%rax) // pf_disable--
 fa0:   31 c0                   xor    %eax,%eax         // fail
 fa2:   e9 00 00 00 00          jmp    fa7 <baz+0x37>    // ret

The exception table entry points correctly to f99

So both compilers optimize out the extra goto and emit correct and
efficient code.

Provide a generic wrapper to do that to avoid modifying all the affected
architecture specific implementation with that workaround.

The only change required for architectures is to rename unsafe_*_user() to
arch_unsafe_*_user(). That's done in subsequent changes.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/877bweujtn.ffs@tglx
2025-11-03 15:26:09 +01:00
Thomas Gleixner 44c5b6768e ARM: uaccess: Implement missing __get_user_asm_dword()
When CONFIG_CPU_SPECTRE=n then get_user() is missing the 8 byte ASM variant
for no real good reason. This prevents using get_user(u64) in generic code.

Implement it as a sequence of two 4-byte reads with LE/BE awareness and
make the unsigned long (or long long) type for the intermediate variable to
read into dependend on the the target type.

The __long_type() macro and idea was lifted from PowerPC. Thanks to
Christophe for pointing it out.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509120155.pFgwfeUD-lkp@intel.com/
Link: https://patch.msgid.link/20251027083745.168468637@linutronix.de
2025-11-03 15:26:09 +01:00
Rob Herring (Arm) 989b40b757 perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs
Add CPU PMU compatible strings for Cortex-A320, Cortex-A520AE,
Cortex-A720AE, and C1 Nano/Premium/Pro/Ultra.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 14:21:55 +00:00
Ma Ke 970e1e4180 perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister()
driver_find_device() calls get_device() to increment the reference
count once a matching device is found. device_release_driver()
releases the driver, but it does not decrease the reference count that
was incremented by driver_find_device(). At the end of the loop, there
is no put_device() to balance the reference count. To avoid reference
count leakage, add put_device() to decrease the reference count.

Found by code review.

Cc: stable@vger.kernel.org
Fixes: bfc653aa89 ("perf: arm_cspmu: Separate Arm and vendor module")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 14:20:03 +00:00
Robin Murphy 8fa08f8835 perf/arm-ni: Add NoC S3 support
NoC S3 and its SI L1 sibling look largely similar to their predecessors,
but add the notion of subfeatures to the discovery process, which we now
use to find the event muxes for each device node. Plus, as ever, more
mildly annoying shuffling around of some of the PMU registers (this time
it's the counters...)

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 14:19:17 +00:00
Besar Wicaksono decc3684c2 perf/arm_cspmu: nvidia: Add pmevfiltr2 support
Support NVIDIA PMU that utilizes the optional event filter2 register.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 13:35:07 +00:00
Besar Wicaksono 82dfd72bfb perf/arm_cspmu: nvidia: Add revision id matching
Distinguish NVIDIA devices by revision and variant bits
in PMIIDR register in addition to product id.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 13:35:07 +00:00
Besar Wicaksono 04330be8dc perf/arm_cspmu: Add pmpidr support
The PMIIDR value is composed by the values in PMPIDR registers.
We can use PMPIDR registers as alternative for device
identification for systems that do not implement PMIIDR.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 13:35:07 +00:00
Besar Wicaksono a2573bc790 perf/arm_cspmu: Add callback to reset filter config
Implementer may need to reset a filter config when
stopping a counter, thus adding a callback for this.

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 13:35:07 +00:00
Yicong Yang c3d78c34ad perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
one of the SMT siblings is not idle on a multi-threaded implementation.
So don't use it on SMT cores.

Introduce topology_core_has_smt() for knowing the SMT implementation and
cached it in arm_pmu::has_smt during allocation.

When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
patch we'll get:
[root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2880457316      cycles
CPU3           2880459810      cycles
       1.254688470 seconds time elapsed

With this patch the idle state of CPU3 is observed as expected:
[root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2558580492      cycles
CPU3               305749      cycles
       1.113626410 seconds time elapsed

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
2025-11-03 13:28:48 +00:00
Lukas Wunner 51d0656959 genirq/manage: Reduce priority of forced secondary interrupt handler
Crystal reports that the PCIe Advanced Error Reporting driver gets stuck
in an infinite loop on PREEMPT_RT:

Both the primary interrupt handler aer_irq() as well as the secondary
handler aer_isr() are forced into threads with identical priority.
Crystal writes that on the ARM system in question, the primary handler
has to clear an error in the Root Error Status register...

   "before the next error happens, or else the hardware will set the
    Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
    can't rely on the Error Source Identification register, so it scans
    through all devices looking for errors -- and for some reason, on
    this system, accessing the AER registers (or any Config Space above
    0x400, even though there are capabilities located there) generates
    an Unsupported Request Error (but returns valid data).  Since this
    happens more than once, without aer_irq() preempting, it causes
    another multi error and we get stuck in a loop."

The issue does not show on non-PREEMPT_RT because the primary handler
runs in hardirq context and thus can preempt the threaded secondary
handler, clear the Root Error Status register and prevent the secondary
handler from getting stuck.

Emulate the same behavior on PREEMPT_RT by assigning a lower default
priority to the secondary handler if the primary handler is forced into
a thread.

Reported-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/f6dcdb41be2694886b8dbf4fe7b3ab89e9d5114c.1761569303.git.lukas@wunner.de
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
2025-11-01 21:30:02 +01:00
Frederic Weisbecker ba14500e4b timers/migration: Remove dead code handling idle CPU checking for remote timers
Idle migrators don't walk the whole tree in order to find out if there
are timers to migrate because they recorded the next deadline to be
verified within a single check in tmigr_requires_handle_remote().

Remove the related dead code and data.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-7-frederic@kernel.org
2025-11-01 20:38:25 +01:00
Frederic Weisbecker 93643b90d6 timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-6-frederic@kernel.org
2025-11-01 20:38:25 +01:00
Frederic Weisbecker 3c8eb36e2a timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
The CPU doing the prepare work for a remote target must be online from
the tree point of view and its hierarchy must be active, otherwise
propagating its active state up to the new root branch would be either
incorrect or racy.

Assert those conditions with more sanity checks.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-5-frederic@kernel.org
2025-11-01 20:38:25 +01:00
Frederic Weisbecker 5eb579dfd4 timers/migration: Fix imbalanced NUMA trees
When a CPU from a new node boots, the old root may happen to be
connected to the new root even if their node mismatch, as depicted in
the following scenario:

1) CPU 0 boots and creates the first group for node 0.

   [GRP0:0]
    node 0
      |
    CPU 0

2) CPU 1 from node 1 boots and creates a new top that corresponds to
   node 1, but it also connects the old root from node 0 to the new root
   from node 1 by mistake.

             [GRP1:0]
              node 1
            /        \
           /          \
   [GRP0:0]             [GRP0:1]
    node 0               node 1
      |                    |
    CPU 0                CPU 1

3) This eventually leads to an imbalanced tree where some node 0 CPUs
   migrate node 1 timers (and vice versa) way before reaching the
   crossnode groups, resulting in more frequent remote memory accesses
   than expected.

                      [GRP2:0]
                      NUMA_NO_NODE
                     /             \
             [GRP1:0]              [GRP1:1]
              node 1               node 0
            /        \                |
           /          \             [...]
   [GRP0:0]             [GRP0:1]
    node 0               node 1
      |                    |
    CPU 0...              CPU 1...

A balanced tree should only contain groups having children that belong
to the same node:

                      [GRP2:0]
                      NUMA_NO_NODE
                     /             \
             [GRP1:0]              [GRP1:0]
              node 0               node 1
            /        \             /      \
           /          \           /        \
   [GRP0:0]          [...]      [...]    [GRP0:1]
    node 0                                node 1
      |                                     |
    CPU 0...                              CPU 1...

In order to fix this, the hierarchy must be unfolded up to the crossnode
level as soon as a node mismatch is detected. For example the stage 2
above should lead to this layout:

                      [GRP2:0]
                      NUMA_NO_NODE
                     /             \
             [GRP1:0]              [GRP1:1]
              node 0               node 1
              /                         \
             /                           \
        [GRP0:0]                        [GRP0:1]
        node 0                           node 1
          |                                |
       CPU 0                             CPU 1

This means that not only GRP1:0 must be created but also GRP1:1 and
GRP2:0 in order to prepare a balanced tree for next CPUs to boot.

Fixes: 7ee9887703 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-4-frederic@kernel.org
2025-11-01 20:38:25 +01:00
Frederic Weisbecker fa9620355d timers/migration: Remove locking on group connection
Initializing the tmc's group, the group's number of children and the
group's parent can all be done without locking because:

  1) Reading the group's parent and its group mask is done locklessly.

  2) The connections prepared for a given CPU hierarchy are visible to the
     target CPU once online, thanks to the CPU hotplug enforced memory
     ordering.

  3) In case of a newly created upper level, the new root and its
     connections and initialization are made visible by the CPU which made
     the connections. When that CPUs goes idle in the future, the new link
     is published by tmigr_inactive_up() through the atomic RmW on
     ->migr_state.

  4) If CPUs were still walking up the active hierarchy, they could observe
     the new root earlier. In this case the ordering is enforced by an
     early initialization of the group mask and by barriers that maintain
     address dependency as explained in:

     b729cc1ec2 ("timers/migration: Fix another race between hotplug and idle entry/exit")
     de3ced72a7 ("timers/migration: Enforce group initialization visibility to tree walkers")

  5) Timers are propagated by a chain of group locking from the bottom to
     the top. And while doing so, the tree also propagates groups links
     and initialization. Therefore remote expiration, which also relies
     on group locking, will observe those links and initialization while
     holding the root lock before walking the tree remotely and update
     remote timers. This is especially important for migrators in the
     active hierarchy that may observe the new root early.

Therefore the locking is unnecessary at initialization. If anything, it
just brings confusion. Remove it.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-3-frederic@kernel.org
2025-11-01 20:38:25 +01:00
Frederic Weisbecker 6c181b5667 timers/migration: Convert "while" loops to use "for"
Both the "do while" and "while" loops in tmigr_setup_groups() eventually
mimic the behaviour of "for" loops.

Simplify accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024132536.39841-2-frederic@kernel.org
2025-11-01 20:38:24 +01:00
Steve Wahl 4138787408 tick/sched: Limit non-timekeeper CPUs calling jiffies update
On large NUMA systems, while running a test program that saturates the
inter-processor and inter-NUMA links, acquiring the jiffies_lock can be
very expensive.

If the cpu designated to do jiffies updates (tick_do_timer_cpu) gets
delayed and other cpus decide to do the jiffies update themselves, a large
number of them decide to do so at the same time.

The inexpensive check against tick_next_period is far quicker than actually
acquiring the lock, so most of these get in line to obtain the lock.  If
obtaining the lock is slow enough, this spirals into the vast majority of
CPUs continuously being stuck waiting for this lock, just to obtain it and
find out that time has already been updated by another cpu. For example, on
one random entry to kdb by manually-injected NMI, 2912 of 3840 CPUs were
observed to be stuck there.

To avoid this, allow only one non-timekeeper CPU to call
tick_do_update_jiffies64() at any given time, resetting ts->stalled jiffies
only if the jiffies update function is actually called.

With this change, manually interrupting the test at most two CPUs are
observed to invoke tick_do_update_jiffies64() - the timekeeper and one
other.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20251027183456.343407-1-steve.wahl@hpe.com
2025-11-01 20:25:53 +01:00
Muchun Song 9ea2b810d5 genirq/proc: Fix race in show_irq_affinity()
Reading /proc/irq/N/smp_affinity* races with irq_set_affinity() and
irq_move_masked_irq(), leading to old or torn output for users.

After a user writes a new CPU mask to /proc/irq/N/affinity*, the syscall
returns success, yet a subsequent read of the same file immediately returns
a value different from what was just written.

That's due to a race between show_irq_affinity() and irq_move_masked_irq()
which lets the read observe a transient, inconsistent affinity mask.

Cure it by guarding the read with irq_desc::lock.

[ tglx: Massaged change log ]

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251028090408.76331-1-songmuchun@bytedance.com
2025-10-31 22:30:05 +01:00
Marc Zyngier 68c4c159a0 genirq: Fix percpu_devid irq affinity documentation
Stephen points out that some of the percpu_devid irq affinity
documentation is either missing or not matching the data structures.

Address all the issues in one go.

Fixes: 87b0031f7f ("irqdomain: Add firmware info reporting interface")
Fixes: 258e7d28a3 ("genirq: Add affinity to percpu_devid interrupt requests")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251030143032.2035987-1-maz@kernel.org
2025-10-31 22:25:34 +01:00
Rafael J. Wysocki 1cf9c4f115 Merge back system sleep material for 6.19 2025-10-31 11:33:01 +01:00
Srinivas Pandruvada 39f421f2e3 powercap: intel_rapl: Add support for Wildcat Lake platform
Add Wildcat Lake to the list of supported processors for RAPL.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20251023174532.1882008-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:15:02 +01:00
Kuppuswamy Sathyanarayanan 790e826be8 cpufreq: intel_pstate: Add Diamond Rapids OOB mode support
Prevent intel_pstate from loading when Out-of-Band (OOB) P-states mode
is enabled.

The OOB identification mechanism for Diamond Rapids servers is the same
as for prior generation CPUs such as Granite Rapids. Add the Diamond
Rapids CPU model to intel_pstate_cpu_oob_ids[] to ensure correct OOB
handling.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: https://patch.msgid.link/20251022215425.3566218-1-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:13:14 +01:00
Tejun Heo 8e4ec90701 freezer: Clarify that only cgroup1 freezer uses PM freezer
cgroup1 freezer piggybacks on the PM freezer, which inadvertently allowed
userspace to produce uninterruptible tasks at will. To avoid the issue,
cgroup2 freezer switched to a separate job control based mechanism. While
this happened a long time ago, the code and comment haven't been updated
making it confusing to people who aren't familiar with the history.

Rename cgroup_freezing() to cgroup1_freezing() and update comments on top of
freezing() and frozen() to clarify that cgroup2 freezer isn't covered by the
PM freezer mechanism.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Qu Wenruo <wqu@suse.com>
Link: https://patch.msgid.link/aPZ3q6Hm865NicBC@slm.duckdns.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:10:27 +01:00
Xueqin Luo ea358066de PM: hibernate: add sysfs interface for hibernate_compression_threads
Add a sysfs attribute `/sys/power/hibernate_compression_threads` to
allow runtime configuration of the number of threads used for
compressing and decompressing hibernation images.

The new sysfs interface enables dynamic adjustment at runtime:

    # cat /sys/power/hibernate_compression_threads
    3
    # echo 4 > /sys/power/hibernate_compression_threads

This change provides greater flexibility for debugging and performance
tuning of hibernation without requiring a reboot.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
Link: https://patch.msgid.link/c68c62f97fabf32507b8794ad8c16cd22ee656ac.1761046167.git.luoxueqin@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:07:00 +01:00
Xueqin Luo 090bf5a0f4 PM: hibernate: make compression threads configurable
The number of compression/decompression threads has a direct impact on
hibernate image generation and resume latency. Using more threads can
reduce overall resume time, but on systems with fewer CPU cores it may
also introduce contention and reduce efficiency.

Performance was evaluated on an 8-core ARM system, averaged over 10 runs:

    Threads  Hibernate(s)  Resume(s)
    --------------------------------
       3         12.14       18.86
       4         12.28       17.48
       5         11.09       16.77
       6         11.08       16.44

With 5–6 threads, resume latency improves by approximately 12% compared
to the default 3-thread configuration, with negligible impact on
hibernate time.

Introduce a new kernel parameter `hibernate_compression_threads=` that
allows users and integrators to tune the number of
compression/decompression threads at boot. This provides a way to
balance performance and CPU utilization across a wide range of hardware
without recompiling the kernel.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
Link: https://patch.msgid.link/f24b3ca6416e230a515a154ed4c121d72a7e05a6.1761046167.git.luoxueqin@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:07:00 +01:00
Xueqin Luo e114e2eb7e PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads
Convert crc->unc_len and crc->unc from fixed-size arrays to dynamically
allocated arrays, sized according to the actual number of threads selected
at runtime. This removes the fixed limit imposed by CMP_THREADS.

Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn>
Link: https://patch.msgid.link/b5db63bb95729482d2649b12d3a11cb7547b7fcc.1761046167.git.luoxueqin@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30 20:07:00 +01:00
John Allen 92ad6505a4 x86/sev: Include XSS value in GHCB CPUID request
When a guest issues a CPUID instruction for Fn0000000D_x01, the hypervisor may
be intercepting the CPUID instruction and need to access the guest XSS value.
For SEV-ES, the XSS value is encrypted and needs to be included in the GHCB to
be visible to the hypervisor.

Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/all/20250924200852.4452-3-john.allen@amd.com/
2025-10-30 17:47:49 +01:00
John Allen 9249bcdea0 x86/boot: Move boot_*msr helpers to asm/shared/msr.h
The boot_{rdmsr,wrmsr}() helpers are *just* the barebones MSR access
functionality, without any tracing or exception handling glue as it is done in
kernel proper.

Move these helpers to asm/shared/msr.h and rename to raw_{rdmsr,wrmsr}() to
indicate what they are.

  [ bp: Correct the reason why those helpers exist. I should've caught that in
    the original patch that added them:
      176db62257 ("x86/boot: Introduce helpers for MSR reads/writes"
    but oh well...
    - fixup include path delimiters to <> ]

Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://patch.msgid.link/all/20250924200852.4452-2-john.allen@amd.com
2025-10-30 16:29:53 +01:00
Yu Peng ca8313fd83 x86/microcode: Mark early_parse_cmdline() as __init
Fix section mismatch warning reported by modpost:

  .text:early_parse_cmdline() -> .init.data:boot_command_line

The function early_parse_cmdline() is only called during init and accesses
init data, so mark it __init to match its usage.

  [ bp: This happens only when the toolchain fails to inline the function and
    I haven't been able to reproduce it with any toolchain I'm using. Patch is
    obviously correct regardless. ]

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/all/20251030123757.1410904-1-pengyu@kylinos.cn
2025-10-30 14:33:31 +01:00
Borislav Petkov (AMD) 8d17104506 x86/microcode/AMD: Select which microcode patch to load
All microcode patches up to the proper BIOS Entrysign fix are loaded
only after the sha256 signature carried in the driver has been verified.

Microcode patches after the Entrysign fix has been applied, do not need
that signature verification anymore.

In order to not abandon machines which haven't received the BIOS update
yet, add the capability to select which microcode patch to load.

The corresponding microcode container supplied through firmware-linux
has been modified to carry two patches per CPU type
(family/model/stepping) so that the proper one gets selected.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Waiman Long <longman@redhat.com>
Link: https://patch.msgid.link/20251027133818.4363-1-bp@kernel.org
2025-10-30 14:29:54 +01:00
Swaraj Gaikwad cb908f8b0a Documentation: intel_pstate: fix duplicate hyperlink target errors
Fix reST warnings in
Documentation/admin-guide/pm/intel_pstate.rst caused by missing explicit
hyperlink labels for section titles.

Before this change, the following errors were printed during
`make htmldocs`:

  Documentation/admin-guide/pm/intel_pstate.rst:401:
    ERROR: Indirect hyperlink target (id="id6") refers to target
    "passive mode", which is a duplicate, and cannot be used as a
    unique reference.
  Documentation/admin-guide/pm/intel_pstate.rst:517:
    ERROR: Indirect hyperlink target (id="id9") refers to target
    "active mode", which is a duplicate, and cannot be used as a
    unique reference.
  Documentation/admin-guide/pm/intel_pstate.rst:611:
    ERROR: Indirect hyperlink target (id="id15") refers to target
    "global attributes", which is a duplicate, and cannot be used as
    a unique reference.
  ERROR: Duplicate target name, cannot be used as a unique reference:
  "passive mode", "active mode", "global attributes".

These errors occurred because the sections "Active Mode",
"Active Mode With HWP", "Passive Mode", and "Global Attributes"
did not define explicit hyperlink labels. As a result, Sphinx
auto-generated duplicate anchors when the same titles appeared
multiple times within the document.

Because of this, the generated HTML documentation contained
broken references such as:

  `active mode <Active Mode_>`_
  `passive mode <Passive Mode_>`_
  `global attributes <Global Attributes_>`_

This patch adds explicit hyperlink labels for the affected sections,
ensuring all references are unique and correctly resolved.

After applying this patch, `make htmldocs` completes without
any warnings, and all hyperlinks in intel_pstate.html render properly.

Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject adjustment ]
Link: https://patch.msgid.link/20251029134737.42229-1-swarajgaikwad1925@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-29 20:04:50 +01:00
Malaya Kumar Rout 4e48e7baa3 PM: runtime: fix typos in runtime.c comments
Fix several typos in comments:
- "timesptamp" -> "timestamp"
- "involed" -> "involved"
- "nonero" -> "nonzero"

Fix typos in comments to improve code documentation clarity.

Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Link: https://patch.msgid.link/20251026170527.262003-1-mrout@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-29 19:58:58 +01:00
Peng Fan 65df3a9629 PM: EM: Add to em_pd_list only when no failure
When em_create_perf_table() returns failure, pd is freed, there dev->em_pd
is not valid. Then accessing dev->em_pd->node will trigger kernel panic
in em_dev_register_pd_no_update(). So return early if 'ret' is non-zero.

Kernel dump:
cpu cpu0: EM: invalid power: 0
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
Mem abort info:
pc : em_dev_register_pd_no_update+0xb4/0x79c
lr : em_dev_register_pd_no_update+0x9c/0x79c
Call trace:
 em_dev_register_pd_no_update+0xb4/0x79c (P)
 em_dev_register_perf_domain+0x18/0x58
 scmi_cpufreq_register_em+0x84/0xb8
 cpufreq_online+0x48c/0xb74
 cpufreq_add_dev+0x80/0x98
 subsys_interface_register+0x100/0x11c
 cpufreq_register_driver+0x158/0x278
 scmi_cpufreq_probe+0x1f8/0x2e0
 scmi_dev_probe+0x28/0x3c
 really_probe+0xbc/0x29c
 __driver_probe_device+0x78/0x12c
 driver_probe_device+0x3c/0x15c
 __device_attach_driver+0xb8/0x134
 bus_for_each_drv+0x84/0xe4

Fixes: cbe5aeedec ("PM: EM: Assign a unique ID when creating a performance domain")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251028-fix-energy-v1-1-ab854fd6a97c@nxp.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-29 13:37:00 +01:00
Jie Zhan 1971b18785 cpufreq: CPPC: Don't warn if FIE init fails to read counters
During the CPPC FIE initialization, reading perf counters on offline cpus
should be expected to fail.  Don't warn on this case.

Also, change the error log level to debug since FIE is optional.

Co-developed-by: Bowen Yu <yubowen8@huawei.com>
Signed-off-by: Bowen Yu <yubowen8@huawei.com> # Changing loglevel to debug
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
[ Viresh: Added back the dropped comment. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-28 10:40:47 +05:30
Miaoqian Lin 9600156bb9 cpufreq: nforce2: fix reference count leak in nforce2
There are two reference count leaks in this driver:

1. In nforce2_fsb_read(): pci_get_subsys() increases the reference count
   of the PCI device, but pci_dev_put() is never called to release it,
   thus leaking the reference.

2. In nforce2_detect_chipset(): pci_get_subsys() gets a reference to the
   nforce2_dev which is stored in a global variable, but the reference
   is never released when the module is unloaded.

Fix both by:
- Adding pci_dev_put(nforce2_sub5) in nforce2_fsb_read() after reading
  the configuration.
- Adding pci_dev_put(nforce2_dev) in nforce2_exit() to release the
  global device reference.

Found via static analysis.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-28 10:28:13 +05:30
Armin Wolf a5c2fcd82e ACPI: fan: Add support for Microsoft fan extensions
Microsoft has designed a set of extensions for the ACPI fan device
allowing the OS to specify a set of fan speed trip points. The
platform firmware will then notify the ACPI fan device when one
of the trip points is triggered.

Unfortunatly, some device manufacturers (like HP) blindly assume
that the OS will use said extensions and thus only update the values
returned by the _FST control method when receiving such a
notification. As a result, the ACPI fan driver is currently unusable
on such machines, always reporting a constant value.

Fix this by adding support for the Microsoft extensions.

During probe and when resuming from suspend, the driver will attempt to
trigger an initial notification that will update the values returned by
_FST.

Said trip points will be updated each time a notification is received
from the platform firmware to ensure that the values returned by the
_FST control method are updated.

Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide
Closes: https://github.com/lm-sensors/lm-sensors/issues/506
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
[ rjw: Edits of the new code comments ]
Link: https://patch.msgid.link/20251024183824.5656-4-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 20:56:01 +01:00
Armin Wolf 3d4ca76369 ACPI: fan: Add hwmon notification support
The platform firmware can notify the ACPI fan device that the fan
speed has changed. Relay this notification to the hwmon device if
present so that userspace applications can react to it.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20251024183824.5656-3-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 20:56:01 +01:00
Armin Wolf 0670b9ad4d ACPI: fan: Add basic notification support
The ACPI specification states that the platform firmware can notify
the ACPI fan device that the fan speed has changed an that the _FST
control method should be reevaluated. Add support for this mechanism
to prepare for future changes.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://patch.msgid.link/20251024183824.5656-2-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 20:56:01 +01:00
Rafael J. Wysocki 58ca21d591 ACPI: TAD: Improve runtime PM using guard macros
Use guard pm_runtime_active_try to simplify runtime PM cleanup and
implement runtime resume error handling in multiple places.

Also use guard pm_runtime_noresume to simplify acpi_tad_remove().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/13881356.uLZWGnKmhe@rafael.j.wysocki
2025-10-27 20:32:13 +01:00
Rafael J. Wysocki f9f5e22b75 ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove()
It is not necessary to resume the device upfront in acpi_tad_remove()
because both acpi_tad_disable_timer() and acpi_tad_clear_status()
attempt to resume it, but it is better to prevent it from suspending
between these calls by incrementing its runtime PM usage counter.

Accordingly, replace the pm_runtime_get_sync() call in acpi_tad_remove()
with a pm_runtime_get_noresume() one and put the latter right before the
first invocation of acpi_tad_disable_timer().

In addition, use pm_runtime_put_noidle() to drop the device's runtime
PM usage counter after using pm_runtime_get_noresume() to bump it up
to follow a common pattern and use pm_runtime_suspend() for suspending
the device afterward.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/5031965.GXAFRqVoOG@rafael.j.wysocki
2025-10-27 20:32:13 +01:00
Siyuan Huang 040beccb03 rust: acpi: replace `core::mem::zeroed` with `pin_init::zeroed`
All types in `bindings` implement `Zeroable` if they can, so use
`pin_init::zeroed` instead of relying on `unsafe` code.

If this ends up not compiling in the future, something in bindgen or on
the C side changed and is most likely incorrect.

Link: https://github.com/Rust-for-Linux/linux/issues/1189
Suggested-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Siyuan Huang <huangsiyuan@kylinos.cn>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Kunwu Chan <chentao@kylinos.cn>
Link: https://patch.msgid.link/20251020031204.78917-1-huangsiyuan@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 20:27:05 +01:00
Rafael J. Wysocki 86bfd21a0b ACPI: battery: Drop redundant locking
All of the evaluations of objects in the ACPI namespace are carried out
under the namespace lock and interpreter lock in ACPICA, so it is not
necessary to put any additional locks around them for synchronization.

However, the ACPI battery driver does just that, so remove the
redundant locking around ACPI object evaluation from it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2344462.iZASKD2KPV@rafael.j.wysocki
2025-10-27 20:19:52 +01:00
Yazen Ghannam 187d1b27a1 RAS/AMD/ATL: Require PRM support for future systems
Currently, the AMD Address Translation Library will fail to load for new,
unrecognized systems (based on Data Fabric revision). The intention is to
prevent the code from executing on new systems and returning incorrect
results.

Recent AMD systems, however, may provide UEFI PRM handlers for address
translation. This is code provided by the platform through BIOS tables.  These
are the preferred method for translation, and the Linux native code can be
used as a fallback.

Future AMD systems are expected to provide PRM handlers by default. And Linux
native code will not be used.

Adjust the ATL init code so that new, unrecognized systems will default to
using PRM handlers only.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org>
Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-2-7ab1df4a5fbc@amd.com
2025-10-27 19:56:41 +01:00
Marc Zyngier fa9d277738 perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer
Having removed the use of the cpu_armpmu per-CPU variable from the
interrupt handling, the only user left is the BRBE scheduler hook.

It is easy to drop the use of this variable by following the pointer to the
generic PMU structure, and get the arm_pmu structure from there.

Perform the conversion and kill cpu_armpmu altogether.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-27-maz@kernel.org
2025-10-27 17:16:37 +01:00
Marc Zyngier ebac4649fc irqdomain: Kill of_node_to_fwnode() helper
There is no in-tree users of this helper since b13b41cc3d ("misc:
ti_fpc202: Switch to of_fwnode_handle()"), and is replaced with
of_fwnode_handle().

Get rid of it.

Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-26-maz@kernel.org
2025-10-27 17:16:37 +01:00
Marc Zyngier ee2d50a9f5 genirq: Kill irq_{g,s}et_percpu_devid_partition()
These two helpers do not have any user anymore, and can be removed,
together with the affinity field kept in the irqdesc structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-25-maz@kernel.org
2025-10-27 17:16:37 +01:00
Marc Zyngier c620438ef2 irqchip: Kill irq-partition-percpu
This code is now completely unused, and nobody will ever miss it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-24-maz@kernel.org
2025-10-27 17:16:36 +01:00
Marc Zyngier 7443813f10 irqchip/apple-aic: Drop support for custom PMU irq partitions
Similarly to what has been done for GICv3, drop the irq partitioning
support from the AIC driver, effectively merging the two per-cpu interrupts
for the PMU.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Sven Peter <sven@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-23-maz@kernel.org
2025-10-27 17:16:36 +01:00
Marc Zyngier 64b9738eaa irqchip/gic-v3: Drop support for custom PPI partitions
The only thing getting in the way of correctly handling PPIs the way they
were intended is the GICv3 hack that deals with PPI partitions.

Remove that code, allowing the common code to kick in.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-22-maz@kernel.org
2025-10-27 17:16:36 +01:00
Marc Zyngier 4cdf4813f5 coresight: trbe: Request specific affinities for per CPU interrupts
Let the TRBE driver request interrupts with an affinity mask matching the
TRBE implementation affinity.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://patch.msgid.link/20251020122944.3074811-21-maz@kernel.org
2025-10-27 17:16:36 +01:00
Marc Zyngier f8112d29ba perf: arm_spe_pmu: Request specific affinities for per CPU interrupts
Let the SPE driver request interrupts with an affinity mask matching the SPE
implementation affinity.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-20-maz@kernel.org
2025-10-27 17:16:36 +01:00
Will Deacon 54b350fa8e perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts
Let the PMU driver request both NMIs and normal interrupts with an affinity mask
matching the PMU affinity.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-19-maz@kernel.org
2025-10-27 17:16:35 +01:00
Marc Zyngier c734af3b2b genirq: Add request_percpu_irq_affinity() helper
While it would be nice to simply make request_percpu_irq() take an affinity
mask, the churn is likely to be on the irritating side given that most
drivers do not give a damn about affinities.

So take the more innocuous path to provide a helper that parallels
request_percpu_irq(), with an affinity as a bonus argument.

Yes, request_percpu_irq_affinity() is a bit of a mouthful.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-18-maz@kernel.org
2025-10-27 17:16:35 +01:00
Marc Zyngier bdf4e2ac29 genirq: Allow per-cpu interrupt sharing for non-overlapping affinities
Interrupt sharing for percpu-devid interrupts is forbidden, and for good
reasons. These are interrupts generated *from* a CPU and handled by itself
(timer, for example). Nobody in their right mind would put two devices on
the same pin (and if they have, they get to keep the pieces...).

But this also prevents more benign cases, where devices are connected
to groups of CPUs, and for which the affinities are not overlapping.
Effectively, the only thing they share is the interrupt number, and
nothing else.

Tweak the definition of IRQF_SHARED applied to percpu_devid interrupts to
allow this particular use case. This results in extra validation at the
point of the interrupt being setup and freed, as well as a tiny bit of
extra complexity for interrupts at handling time (to pick the correct
irqaction).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-17-maz@kernel.org
2025-10-27 17:16:35 +01:00
Marc Zyngier b9c6aa9efc genirq: Update request_percpu_nmi() to take an affinity
Continue spreading the notion of affinity to the per CPU interrupt request
code by updating the call sites that use request_percpu_nmi() (all two of
them) to take an affinity pointer. This pointer is firmly NULL for now.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-16-maz@kernel.org
2025-10-27 17:16:35 +01:00
Marc Zyngier 258e7d28a3 genirq: Add affinity to percpu_devid interrupt requests
Add an affinity field to both the irqaction structure and the interrupt
request primitives. Nothing is making use of it yet, and the only value
used it NULL, which is used as a shorthand for cpu_possible_mask.

This will shortly get used with actual affinities.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-15-maz@kernel.org
2025-10-27 17:16:34 +01:00
Marc Zyngier 9047a39daa genirq: Factor-in percpu irqaction creation
Move the code creating a per-cpu irqaction into its own helper, so that
future changes to this code can be kept localised.

At the same time, fix the documentation which appears to say the wrong
thing when it comes to interrupts being automatically enabled
(percpu_devid interrupts never are).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-14-maz@kernel.org
2025-10-27 17:16:34 +01:00
Marc Zyngier 5c2b2cc472 genirq: Merge irqaction::{dev_id,percpu_dev_id}
When irqaction::percpu_dev_id was introduced, it was hoped that it could be
part of an anonymous union with dev_id, as the two fields are mutually
exclusive.

However, toolchains used at the time were often showing terrible support
for anonymous unions, breaking the build on a number of architectures. It
was therefore decided to keep the two fields separate and address this down
the line.

14 years later, the compiler dark age is over, and there is universal
support for anonymous unions. Get a whole pointer back that can immediately
be spent on something else.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-13-maz@kernel.org
2025-10-27 17:16:34 +01:00
Marc Zyngier 5ff78c8de9 genirq: Kill handle_percpu_devid_fasteoi_nmi()
There is no in-tree user of this flow handler anymore, so simply remove it.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-12-maz@kernel.org
2025-10-27 17:16:34 +01:00
Marc Zyngier 21bbbc50f3 irqchip/gic-v3: Switch high priority PPIs over to handle_percpu_devid_irq()
It so appears that handle_percpu_devid_irq() is extremely similar to
handle_percpu_devid_fasteoi_nmi(), and that the differences do no justify
the horrid machinery in the GICv3 driver to handle the flow handler switch.

Stick with the standard flow handler, even for NMIs.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-11-maz@kernel.org
2025-10-27 17:16:34 +01:00
Marc Zyngier f6c8aced7c perf: arm_spe_pmu: Convert to new interrupt affinity retrieval API
Now that the relevant interrupt controllers are equipped with a callback
returning the affinity of per-CPU interrupts, switch the ARM SPE driver
over to this new method.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-10-maz@kernel.org
2025-10-27 17:16:33 +01:00
Marc Zyngier 663783e001 perf: arm_pmu: Convert to the new interrupt affinity retrieval API
Now that the relevant interrupt controllers are equipped with a callback
returning the affinity of per-CPU interrupts, switch the OF side of the ARM
PMU driver over to this new method.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-9-maz@kernel.org
2025-10-27 17:16:33 +01:00
Marc Zyngier 541454dd20 coresight: trbe: Convert to the new interrupt affinity retrieval API
Now that the relevant interrupt controllers are equipped with a callback
returning the affinity of per-CPU interrupts, switch the TRBE driver over
to this new method.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://patch.msgid.link/20251020122944.3074811-8-maz@kernel.org
2025-10-27 17:16:33 +01:00
Marc Zyngier de575de83c irqchip/apple-aic: Add FW info retrieval support
Plug the new .get_fwspec_info() callback into the Apple AIC driver, using
some of the existing FIQ affinity handling infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Acked-by: Sven Peter <sven@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-7-maz@kernel.org
2025-10-27 17:16:33 +01:00
Marc Zyngier 68905ea65c irqchip/gic-v3: Add FW info retrieval support
Plug the new .get_fwspec_info() callback into the GICv3 core driver, using
some of the existing PPI affinity handling infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-6-maz@kernel.org
2025-10-27 17:16:33 +01:00
Marc Zyngier 0d5daa938c platform: Add firmware-agnostic irq and affinity retrieval interface
Expand platform_get_irq_optional() to also return an affinity if available,
renaming it to platform_get_irq_affinity() in the process.

platform_get_irq_optional() is preserved with its current semantics by
calling into the new helper with a NULL affinity pointer.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-5-maz@kernel.org
2025-10-27 17:16:32 +01:00
Marc Zyngier 5404f5c06d of/irq: Add interrupt affinity reporting interface
Plug the irq_populate_fwspec_info() helper into the OF layer to offer an
interrupt affinity reporting function.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-4-maz@kernel.org
2025-10-27 17:16:32 +01:00
Marc Zyngier 5324fe21ba ACPI: irq: Add interrupt affinity reporting interface
Plug the irq_populate_fwspec_info() helper into the ACPI layer to offer an
interrupt affinity reporting function. This is currently only supported for
the CONFIG_ACPI_GENERIC_GSI configurations, but could later be extended to
legacy architectures if necessary.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20251020122944.3074811-3-maz@kernel.org
2025-10-27 17:16:32 +01:00
Marc Zyngier 87b0031f7f irqdomain: Add firmware info reporting interface
Add an irqdomain callback to report firmware-provided information that is
otherwise not available in a generic way. This is reported using a new data
structure (struct irq_fwspec_info).

This callback is optional and the only information that can be reported
currently is the affinity of an interrupt. However, the containing
structure is designed to be extensible, allowing other potentially relevant
information to be reported in the future.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Will Deacon <will@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251020122944.3074811-2-maz@kernel.org
2025-10-27 17:16:32 +01:00
Yazen Ghannam 83be4bee57 ACPI: PRM: Add acpi_prm_handler_available()
Add a helper function to check if a PRM handler/module is present.

This can be used during init time by code that depends on a particular
handler. If the handler is not present, then the code does not need to
be loaded.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org>
Acked-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org>
Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-1-7ab1df4a5fbc@amd.com
2025-10-27 15:45:22 +01:00
Aboorva Devarajan 07d8157012 cpuidle: menu: Use residency threshold in polling state override decisions
On virtualized PowerPC (pseries) systems, where only one polling state
(Snooze) and one deep state (CEDE) are available, selecting CEDE when
the predicted idle duration is less than the target residency of CEDE
state can hurt performance. In such cases, the entry/exit overhead of
CEDE outweighs the power savings, leading to unnecessary state
transitions and higher latency.

Menu governor currently contains a special-case rule that prioritizes
the first non-polling state over polling, even when its target residency
is much longer than the predicted idle duration. On PowerPC/pseries,
where the gap between the polling state (Snooze) and the first non-polling
state (CEDE) is large, this behavior causes performance regressions.

Refine that special case by adding an extra requirement: the first
non-polling state can only be chosen if its target residency is below
the defined RESIDENCY_THRESHOLD_NS. If this condition is not satisfied,
polling is allowed instead, avoiding suboptimal non-polling state
entries.

This change is limited to the single special-case rule for the first
non-polling state. The general non-polling state selection logic in the
menu governor remains unchanged.

Performance improvement observed with pgbench on PowerPC (pseries)
system:
+---------------------------+------------+------------+------------+
| Metric                    | Baseline   | Patched    | Change (%) |
+---------------------------+------------+------------+------------+
| Transactions/sec (TPS)    | 495,210    | 536,982    | +8.45%     |
| Avg latency (ms)          | 0.163      | 0.150      | -7.98%     |
+---------------------------+------------+------------+------------+

CPUIdle state usage:
+--------------+--------------+-------------+
| Metric       | Baseline     | Patched     |
+--------------+--------------+-------------+
| Total usage  | 12,735,820   | 13,918,442  |
| Above usage  | 11,401,520   | 1,598,210   |
| Below usage  | 20,145       | 702,395     |
+--------------+--------------+-------------+

Above/Total and Below/Total usage percentages:
+------------------------+-----------+---------+
| Metric                 | Baseline  | Patched |
+------------------------+-----------+---------+
| Above % (Above/Total)  | 89.56%    | 11.49%  |
| Below % (Below/Total)  | 0.16%     | 5.05%   |
| Total cpuidle miss (%) | 89.72%    | 16.54%  |
+------------------------+-----------+---------+

The results indicate that restricting CEDE selection to cases where
its residency matches the predicted idle time reduces mispredictions,
lowers unnecessary state transitions, and improves overall throughput.

Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
[ rjw: Changelog edits, rebase ]
Link: https://patch.msgid.link/20251006013954.17972-1-aboorvad@linux.ibm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-27 14:53:46 +01:00
Borislav Petkov (AMD) 4058386498 - Remove dead code leftovers after a recent mitigations cleanup which fail
a Clang build
 
 - Make sure a Retbleed mitigation message is printed only when necessary
 
 - Correct the last Zen1 microcode revision for which Entrysign sha256 check is
   needed
 
 - Fix a NULL ptr deref when mounting the resctrl fs on a system which supports
   assignable counters but where L3 total and local bandwidth monitoring has
   been disabled at boot
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj+FSYACgkQEsHwGGHe
 VUqh5RAAwTAfMsEs57v6gQqnm/rbNjGXoZuNcT9xhk4jbRC7xCcyJrZVyYA+mWIe
 5rgGOuSThOsOgqJHwVqn4kdym9yUwLradZS8gn5vHFIlDVXDoMRYJuvm8U7PdTug
 UWJv0uw0B393RNb+7yCeEN7Zpe2bvbh25PF66uh/7dQYKmWIaiTVlDhrZ+Ba51IK
 mmJzbVb6zqWrSP3heISZRjfV3rv+/SifUb+wIgWcCzcAb36fFIlUKaEYd/g5249R
 BBcEY5n/eUUKjMJVOki4vDqJyQdPdJCz9yH3qdZaz661Wh9/FVy/rLCQC/O1ruwt
 Ovoi6UJAjleb0OXfi00Hl1LT3v92xH/OwyVCamBAYyaIhTdPaoQS6YADGstt3PTx
 RUc/BG5wHyaOWsG94zVEvqK9MElyjW3DPiBH4E+O7OB348WAfhsbrUDnnaveDSym
 n2LivNnkiaXi8DpPhWL7XsJJjYAy1fi2piDrh952I5oVfhf5iYeNwFjNdtgAft7G
 wNr01qraqdPKfMYHZHdkaqrPH/Qy9DlLuDuTjQqtjGm8lsZK/g+txzQLfeXoDJSe
 RtKtRYlq0bVCOnAuA8MN4xi9H2WaKAZNgavJxywZslmaQuQzh21g7ISwxcAFe07n
 nevcypF1s/dnCUPK8yuKTmFzkwbg7I2OgrmX0RKZdFxY8uzg4Co=
 =EZGc
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.18_rc3' into x86/microcode

Pick up the below urgent upstream change in order to base more work
ontop:

- Correct the last Zen1 microcode revision for which Entrysign sha256 check is
  needed

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-27 14:06:38 +01:00
Charles Mirabile 539d147ef6 irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC
Add a new compatible for the plic found in UltraRISC DP1000 with a quirk to
work around a known hardware bug with IRQ claiming in the UR-CP100 cores.

When claiming an interrupt on UR-CP100 cores, all other interrupts must be
disabled before the claim register is accessed to prevent incorrect
handling of the interrupt. This is a hardware bug in the CP100 core
implementation, not specific to the DP1000 SoC.

When the PLIC_QUIRK_CP100_CLAIM_REGISTER_ERRATUM flag is present, a
specialized handler (plic_handle_irq_cp100) disables all interrupts except
for the first pending one before reading the claim register, and then
restores the interrupts before further processing of the claimed interrupt
continues.

This implementation leverages the enable_save optimization, which maintains
the current interrupt enable state in memory, avoiding additional register
reads during the workaround.

The driver matches on "ultrarisc,cp100-plic" to apply the quirk to all
SoCs using UR-CP100 cores, regardless of the specific SoC implementation.
This has no impact on other platforms.

[ tglx: Condensed the code a bit, massaged change log and comments ]

Co-developed-by: Zhang Xincheng <zhangxincheng@ultrarisc.com>
Signed-off-by: Zhang Xincheng <zhangxincheng@ultrarisc.com>
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://patch.msgid.link/20251024083647.475239-5-lzampier@redhat.com
2025-10-27 12:11:56 +01:00
Brendan Jackman 5385dec724 x86/mm: Unify __phys_addr_symbol()
There are two implementations on 64-bit, depending on CONFIG_DEBUG_VIRTUAL,
but they differ only regarding the presence of VIRTUAL_BUG_ON, which is
already ifdef'd on CONFIG_DEBUG_VIRTUAL.

To avoid adding a function call on non-LTO non-DEBUG_VIRTUAL builds, move the
function into the header. (Note the function is already only used on 64-bit).

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/all/20250813-phys-addr-cleanup-v1-1-19e334b1c466@google.com/
2025-10-24 22:13:00 +02:00
Matthew Wilcox (Oracle) 70e0a80a1f treewide: Remove in_irq()
This old alias for in_hardirq() has been marked as deprecated since
2020; remove the stragglers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024180654.1691095-1-willy@infradead.org
2025-10-24 21:39:27 +02:00
Charles Mirabile 14ff9e54dd irqchip/sifive-plic: Cache the interrupt enable state
Optimize the PLIC driver by maintaining the interrupt enable state in the
handler's enable_save array during normal operation rather than only during
suspend/resume. This eliminates the need to read enable registers during
suspend and makes the enable state immediately available for other
purposes.

Let __plic_toggle() update both the hardware registers and the cached
enable_save state atomically within the existing enable_lock protection.

That allows to remove the suspend-time enable register reading since
handler::enable_save now always reflects the current state.

[ tglx: Massaged change log ]

Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251024083647.475239-4-lzampier@redhat.com
2025-10-24 21:34:32 +02:00
Charles Mirabile 9dfb295a93 dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC
Add compatible strings for the PLIC found in UltraRISC DP1000 SoC.

The PLIC is part of the UR-CP100 core and has a hardware bug requiring
a workaround.

Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20251024083647.475239-3-lzampier@redhat.com
2025-10-24 21:34:32 +02:00
Lucas Zampieri e95f66dd0e dt-bindings: vendor-prefixes: Add UltraRISC
Add vendor prefix for UltraRISC Technology Co., Ltd.

Signed-off-by: Lucas Zampieri <lzampier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20251024083647.475239-2-lzampier@redhat.com
2025-10-24 21:34:31 +02:00
Petr Tesarik e9cc99142a x86/tsx: Get the tsx= command line parameter with early_param()
Use early_param() to get the value of the tsx= command line parameter. It is
an early parameter, because it must be parsed before tsx_init(), which is
called long before kernel_init(), where normal parameters are parsed.

Although cmdline_find_option() from tsx_init() works fine, the option is later
reported as unknown and passed to user space. The latter is not a real issue,
but the former is confusing and makes people wonder if the tsx= parameter had
any effect and double-check for typos unnecessarily.

The behavior changes slightly if "tsx" is given without any argument (which is
invalid syntax). Until now, the kernel logged an error message and disabled
TSX. Now, the kernel still issues a warning (Malformed early option 'tsx'),
but TSX state is unchanged. The new behavior is consistent with other
parameters, e.g. "tsx_async_abort".

   [ bp: Fixup minor formatting request during review. ]

Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com
2025-10-24 18:35:17 +02:00
Petr Tesarik f018fca8f9 x86/tsx: Make tsx_ctrl_state static
Move all definitions related to tsx_ctrl_state to tsx.c. They are
never referenced outside this file.

No functional change.

Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com
2025-10-24 18:24:42 +02:00
Heiko Carstens 020d5dc578 s390/ap: Don't leak debug feature files if AP instructions are not available
If no AP instructions are available the AP bus module leaks registered
debug feature files. Change function call order to fix this.

Fixes: cccd85bfb7 ("s390/zcrypt: Rework debug feature invocations.")
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-24 15:25:56 +02:00
Jens Remus f25d952ab6 s390/ptrace: Explicitly include <linux/typecheck.h>
The psw_bits() macro makes use of typecheck() without that typecheck.h
is included. Add the missing include to avoid potential future compile
problems.

[hca@linux.ibm.com: change commit message]

Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-24 15:25:56 +02:00
Armin Wolf 2e00f7a4bb ACPI: fan: Workaround for 64-bit firmware bug
Some firmware implementations use the "Ones" ASL opcode to produce
an integer with all bits set in order to indicate missing speed or
power readings. This however only works when using 32-bit integers,
as the ACPI spec requires a 32-bit integer (0xFFFFFFFF) to be
returned for missing speed/power readings. With 64-bit integers the
"Ones" opcode produces a 64-bit integer with all bits set, violating
the ACPI spec regarding the placeholder value for missing readings.

Work around such buggy firmware implementation by also checking for
64-bit integers with all bits set when reading _FST.

Signed-off-by: Armin Wolf <W_Armin@gmx.de>
[ rjw: Typo fix in the changelog ]
Link: https://patch.msgid.link/20251007234149.2769-3-W_Armin@gmx.de
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-24 10:29:52 +02:00
Tamir Duberstein 33ffb0aa8c rust: opp: simplify callers of `to_c_str_array`
Use `Option` combinators to make this a bit less noisy.

Wrap the `dev_pm_opp_set_config` operation in a closure and use type
ascription to leverage the compiler to check for use after free.

Signed-off-by: Tamir Duberstein <tamird@kernel.org>
Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23 20:51:17 +05:30
Rafael J. Wysocki cea54f8e34 PM: runtime: docs: Update pm_runtime_allow/forbid() documentation
Drop confusing descriptions of pm_runtime_allow() and pm_runtime_forbid()
from Documentation/power/runtime_pm.rst and update the kerneldoc comments
of these functions to better explain their purpose.

Link: https://lore.kernel.org/linux-pm/08976178-298f-79d9-1d63-cff5a4e56cc3@linux.intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/12780841.O9o76ZdvQC@rafael.j.wysocki
2025-10-23 16:13:33 +02:00
Harald Freudenberger 51d921a613 s390/ap: Expose ap_bindings_complete_count counter via sysfs
The AP bus udev event BINDINGS=complete is sent out when the
first time all devices detected by the AP bus scan have been
bound to device drivers. This is the ideal time to for example
change the AP bus masks apmask and aqmask to re-establish a
persistent change on the decision about which cards/domains
should be available for the host and which should go into the
pool for kvm guests.

However, if exactly this initial udev event is sent out early
in the boot process a udev rule may not have been established
yet and thus this event will never be recognized. To have
some indication about if the AP bus binding complete has
already happened, the internal ap_bindings_complete_count
counter is exposed via sysfs with this patch.

Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-23 16:11:38 +02:00
Heiko Carstens 07a75d08cf s390/smp: Fix fallback CPU detection
In case SCLP CPU detection does not work a fallback mechanism using SIGP is
in place. Since a cleanup this does not work correctly anymore: new CPUs
are only considered if their type matches the boot CPU.

Before the cleanup the information if a CPU type should be considered was
also part of a structure generated by the fallback mechanism and indicated
that a CPU type should not be considered when adding CPUs.

Since the rework a global SCLP state is used instead. If the global SCLP
state indicates that the CPU type should be considered and the fallback
mechanism is used, there may be a mismatch with CPU types if CPUs are
added. This can lead to a system with only a single CPU even tough there
are many more CPUs.

Address this by simply copying the boot cpu type into the generated data
structure from the fallback mechanism.

Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Fixes: d08d94306e ("s390/smp: cleanup core vs. cpu in the SCLP interface")
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-23 16:11:38 +02:00
Gerd Bayer 564ebcae6a s390/pci: Highlight failure to enable PCI function
Emit an error log when a PCI function cannot be enabled for use, despite
being reported as configured to the system.

This brings to attention situations where functions might go missing
without notice. Going unnoticed is less likely when functions are added
to the system through hotplug, but will produce the same error log.

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-23 16:11:37 +02:00
Aaron Kling 85976d3774 cpufreq: tegra186: add OPP support and set bandwidth
Add support to use OPP table from DT in Tegra186 cpufreq driver.
Tegra SoC's receive the frequency lookup table (LUT) from BPMP-FW.
Cross check the OPP's present in DT against the LUT from BPMP-FW
and enable only those DT OPP's which are present in LUT also.

The OPP table in DT has CPU Frequency to bandwidth mapping where
the bandwidth value is per MC channel. DRAM bandwidth depends on the
number of MC channels which can vary as per the boot configuration.
This per channel bandwidth from OPP table will be later converted by
MC driver to final bandwidth value by multiplying with number of
channels before being handled in the EMC driver.

If OPP table is not present in DT, then use the LUT from BPMP-FW
directly as the CPU frequency table and not do the DRAM frequency
scaling which is same as the current behavior.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
[ Viresh: Fix _free() definitions ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23 12:10:11 +05:30
Hal Feng 6e7970cab5 cpufreq: dt-platdev: Add JH7110S SOC to the allowlist
Add the compatible strings for supporting the generic
cpufreq driver on the StarFive JH7110S SoC.

Signed-off-by: Hal Feng <hal.feng@starfivetech.com>
Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23 12:10:11 +05:30
Shuhao Fu 2de5cb9606 cpufreq: s5pv210: fix refcount leak
In function `s5pv210_cpu_init`, a possible refcount inconsistency has
been identified, causing a resource leak.

Why it is a bug:
1. For every clk_get, there should be a matching clk_put on every
successive error handling path.
2. After calling `clk_get(dmc1_clk)`, variable `dmc1_clk` will not be
freed even if any error happens.

How it is fixed: For every failed path, an extra goto label is added to
ensure `dmc1_clk` will be freed regardlessly.

Signed-off-by: Shuhao Fu <sfual@cse.ust.hk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23 12:10:11 +05:30
Viresh Kumar 173e02d674 OPP: Initialize scope-based pointers inline
Uninitialized pointers with `__free` attribute can cause undefined
behaviour as the memory allocated to the pointer is freed automatically
when the pointer goes out of scope.

The OPP core doesn't have any bugs related to this as of now, but it is
better to initialize pointers marked with `__free` attribute at
declaration to simplify the code and ensure proper scope-based cleanup.

Reported-by: Joe Perches <joe@perches.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23 11:58:05 +05:30
Changwoo Min a1b17c9ac8 PM: EM: Notify an event when the performance domain changes
Send an event to userspace when a performance domain is created or deleted,
or its energy model is updated.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-11-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:38 +02:00
Changwoo Min b95a0c02ad PM: EM: Implement em_notify_pd_created/updated()
Implement two event notifications when a performance domain is created
(EM_CMD_PD_CREATED) and updated (EM_CMD_PD_UPDATED). The message format
of these two event notifications is the same as EM_CMD_GET_PD_TABLE --
containing the performance domain's ID and its energy model table.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-10-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:38 +02:00
Changwoo Min b2b1bbcac7 PM: EM: Implement em_notify_pd_deleted()
Add the event notification infrastructure and implement the event
notification for when a performance domain is deleted (EM_CMD_PD_DELETED).

The event contains the ID of the performance domain (EM_A_PD_TABLE_PD_ID)
so the userspace can identify the changed performance domain for further
processing.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-9-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min f2d2946eaa PM: EM: Implement em_nl_get_pd_table_doit()
When a userspace requests EM_CMD_GET_PD_TABLE with an ID of a performance
domain, the kernel reports back the energy model table of the specified
performance domain. The message format of the response is as follows:

EM_A_PD_TABLE_PD_ID (NLA_U32)
EM_A_PD_TABLE_PS (NLA_NESTED)*
    EM_A_PS_PERFORMANCE (NLA_U64)
    EM_A_PS_FREQUENCY (NLA_U64)
    EM_A_PS_POWER (NLA_U64)
    EM_A_PS_COST (NLA_U64)
    EM_A_PS_FLAGS (NLA_U64)

where EM_A_PD_TABLE_PS can be repeated as many times as there are
performance states (struct em_perf_state).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-8-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min d8eef04531 PM: EM: Implement em_nl_get_pds_doit()
When a userspace requests EM_CMD_GET_PDS, the kernel responds with
information on all performance domains. The message format of the
response is as follows:

EM_A_PDS_PD (NLA_NESTED)*
    EM_A_PD_PD_ID (NLA_U32)
    EM_A_PD_FLAGS (NLA_U64)
    EM_A_PD_CPUS (NLA_STRING)

where EM_A_PDS_PD can be repeated as many times as there are performance
domains, and EM_A_PD_CPUS is a hexadecimal string representing a CPU
bitmask.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-7-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min 7928339cfe PM: EM: Add an iterator and accessor for the performance domain
Add an iterator function (for_each_em_perf_domain) that iterates all the
performance domains in the global list. A passed callback function (cb) is
called for each performance domain.

Additionally, add a lookup function (em_perf_domain_get_by_id) that
searches for a performance domain by matching the ID in the global list.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-6-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min e4ed8d26c5 PM: EM: Add a skeleton code for netlink notification
Add a boilerplate code for netlink notification to register the new
protocol family. Also, initialize and register the netlink during booting.
The initialization is called at the postcore level, which is late enough
after the generic netlink is initialized.

Finally, update MAINTAINERS to include new files.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-5-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min bd26631ccd PM: EM: Add em.yaml and autogen files
Add a generic netlink spec in YAML format and autogenerate boilerplate
code using ynl-regen.sh to introduce a generic netlink for the energy
model. It allows a userspace program to read the performance domain and
its energy model. It notifies the userspace program when a performance
domain is created or deleted or its energy model is updated through a
multicast interface.

Specifically, it supports two commands:
  - EM_CMD_GET_PDS: Get the list of information for all performance
    domains.
  - EM_CMD_GET_PD_TABLE: Get the energy model table of a performance
    domain.

Also, it supports three notification events:
  - EM_CMD_PD_CREATED: When a performance domain is created.
  - EM_CMD_PD_DELETED: When a performance domain is deleted.
  - EM_CMD_PD_UPDATED: When the energy model table of a performance domain
    is updated.

Finally, update MAINTAINERS to include new files.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-4-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min ee50b8bb6b PM: EM: Expose the ID of a performance domain via debugfs
For ease of debugging, let's expose the assigned ID of a performance
domain through debugfs (e.g., /sys/kernel/debug/energy_model/cpu0/id).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-3-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Changwoo Min cbe5aeedec PM: EM: Assign a unique ID when creating a performance domain
It is necessary to refer to a specific performance domain from a
userspace. For example, the energy model of a particular performance
domain is updated.

To this end, assign a unique ID to each performance domain to address it,
and manage them in a global linked list to look up a specific one by
matching ID. IDA is used for ID assignment, and the mutex is used to
protect the global list from concurrent access.

Note that the mutex (em_pd_list_mutex) is not supposed to hold while
holding em_pd_mutex to avoid ABBA deadlock.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/20251020220914.320832-2-changwoo@igalia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 21:44:37 +02:00
Sakari Ailus b889ed5abf ACPI: property: Rework acpi_graph_get_next_endpoint()
Rework the code obtaining the next endpoint in
acpi_graph_get_next_endpoint(). The resulting code removes unnecessary
contitionals and should be easier to follow.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20251001104320.1272752-4-sakari.ailus@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 16:57:00 +02:00
Sakari Ailus 5d010473cd ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only
Calling fwnode_get_next_child_node() in ACPI implementation of the fwnode
property API is somewhat problematic as the latter is used in the
impelementation of the former. Instead of using
fwnode_get_next_child_node() in acpi_graph_get_next_endpoint(), call
acpi_get_next_subnode() directly instead.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251001104320.1272752-3-sakari.ailus@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 16:57:00 +02:00
Sakari Ailus 159e851108 ACPI: property: Make acpi_get_next_subnode() static
acpi_get_next_subnode() is only used in drivers/acpi/property.c. Remove
its prototype from include/linux/acpi.h and make it static.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20251001104320.1272752-2-sakari.ailus@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 16:57:00 +02:00
Huisong Li 945661d581 ACPI: processor: idle: Relocate state flags initialization
Since acpi_processor_setup_cstates() is a more logical place for setting
idle state flags than acpi_processor_setup_cpuidle_cx(), move that code
from the latter to the former.

It also allows direct references to acpi_idle_driver in
acpi_processor_setup_cpuidle_cx() to be avoided.

No intentional functional impact.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
[ rjw: Subject and changelog rewrite ]
Link: https://patch.msgid.link/20250929093754.3998136-5-lihuisong@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-22 16:47:40 +02:00
Tamir Duberstein e6fdbe8fea rust: opp: fix broken rustdoc link
Correct the spelling of "CString" to make the link work.

Fixes: ce32e2d47c ("rust: opp: Add abstractions for the configuration options")
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-22 09:26:20 +05:30
Heiko Carstens 5c02c74dd4 Merge branch 'ap-bus-trace-events'
Harald Freudenberger says:

====================

Investigations related to runtime of crypto requests has revealed a lack of
performance or runtime information with crypto requests. There are the two
zcrypt ioctl trace events covering the entry and exit of an ioctl with
crypto requests giving the overall runtime within the kernel. However,
there is no way to figure out the time where a request is enqueued into the
AP bus queue but not pushed into the firmware queue. Then there is no
information about the runtime of an request during processing in the
firmware. And finally some info about pulling the reply from the firmware
and delivering it into user space is missing.

This series is aiming to provide a way to collect measurements which can be
used to cover these runtime information for each crypto request/reply.

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 11:10:55 +02:00
Harald Freudenberger 9c11918040 s390/ap: Introduce new AP nqap and dqap trace events
Introduce two new AP bus related tracepoint events:
- There is a tracepoint s390_ap_nqap event immediately after a request
  has been pushed into the AP firmware queue with the NQAP AP command.
- The other tracepoint s390_ap_dqap event fires immediately after a
  reply has been pulled out of the AP firmware queue via DQAP AP
  command.
Both events are triggered unconditional and may need filtering.
Filtering can be done based on the status value which is part of
the nqap and dqap trace. So for example a
  echo "!(status & 0x00ff0000)" >.../s390_ap_dqap/filter
filters out all trace events which have a response_code != 0
leaving just the successful nqap and dqap invocations.

The idea of these two trace events focuses on performance to measure
the runtime of a crypto request/reply as close as possible at the
firmware level. In combination with the two zcrypt tracepoints (see
the zcrypt.h trace event definition file) this gives measurement data
about the runtime of a request/reply within the zcrpyt and AP bus
layer. However, with having the status of these AP commands in hand
also other usage may be possible.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 11:09:21 +02:00
Harald Freudenberger 7f124d78d4 s390/ap: Extend struct ap_queue_status with some convenience fields
Sometimes there is a different view of the AP status word
needed. So here is slight rework of the struct ap_queue_status
to open up the possibility to have different ways of accessing
the AP status bits and fields.

The new struct ap_queue_status

struct ap_queue_status {
	union {
		unsigned int value			: 32;
		struct {
			unsigned int status_bits	: 8;
			unsigned int rc			: 8;
			unsigned int			: 16;
		};
		struct {
			unsigned int queue_empty	: 1;
			unsigned int replies_waiting	: 1;
			unsigned int queue_full		: 1;
			unsigned int			: 3;
			unsigned int async		: 1;
			unsigned int irq_enabled	: 1;
			unsigned int response_code	: 8;
			unsigned int			: 16;
		};
	};
};

comprises the old struct ap_queue_status but extends it
to have this also accessible as an unsigned int required
for example for a simple print or trace of the whole value.

Note that this rework is fully backward compatible to the
existing code exploiting the struct ap_queue_status.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 11:09:21 +02:00
Harald Freudenberger 507cff242a s390/zcrypt: Rework zcrypt request and reply trace event definition
This is a slight rework of the s390_zcrypt_req and s390_zcrypt_rep
trace event:
- the psmid has been added to the s390_zcrypt_rep
- "dev" renamed to "card"
- "domain" renamed to "dom"
The motivation of these changes is to make these traces more
aligned to new upcoming traces for AP bus related trace events.
Additionally the psmid is needed to match the reply (and thus
indirect the request) to AP bus related trace events where only
the psmid is unique identifying AP messages.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 11:09:21 +02:00
Josephine Pfeiffer 215231deea s390/ptdump: Use seq_puts() in pt_dump_seq_puts() macro
The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of
seq_puts(). This is both a performance issue and conceptually wrong,
as the macro name suggests plain string output (puts) but the
implementation uses formatted output (printf).

The macro is used in dump_pagetables.c:67-68 and 131 to output
constant strings. Using seq_printf() adds unnecessary overhead for
format string parsing.

Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:29:50 +02:00
Heiko Carstens 5e09c0a03e Merge branch 'tape-block-sizes'
Jan Höppner says:

====================

The tape device driver is limited to a block size of 65535 bytes since a
single CCW can only transfer up to 64K-1 bytes (The count field is a
16bit value). This series introduces data chaining for all read/write
functions to support block sizes larger than 65535.

The tape device type 3490 (emulated) and 3590/3592 can handle up to
256K. [1]

[1] https://www.ibm.com/docs/en/zos/3.1.0?topic=blksize-system-determined-block-size

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:26:26 +02:00
Jan Höppner 319d3d6653 s390/tape: Add support for bigger block sizes
The tape device type 3590/3592 and emulated 3490 VTS can handle a block
size of up to 256K bytes. Currently the tape device driver is limited to
a block size of 65535 bytes (64K-1). This limitation stems from the
maximum of 65535 bytes of data that can be transferred with one
Channel-Command Word (CCW).

To work around this limitation data chaining is used which uses several
CCW to transfer an entire 256K block of data. A single CCW holds a
maximum of 65535 bytes of data.

Set MAX_BLOCKSIZE to 262144 (= 256K) to allow for data transfers with
larger block sizes. The read_block() and write_block() discipline
functions calculate the number of CCWs required based on the IDAL buffer
array size that was created for a given block size. If there is more
than one CCW required for the data transfer, the new helper function
tape_ccw_dc_idal() is used to build the data chain accordingly.

The Interruption-Repsonse Block (irb) is added to the tape_request
struct so that the tapechar_read/write() functions can analyze what data
was read or written accordingly.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:55 +02:00
Jan Höppner 574817d6c0 s390/tape: Introduce idal buffer array
The tape device driver uses a single idal_buffer for I/O. While the
buffer itself can be arbitrary big, the limit for data transfer for a
single Channel-Command Word is at 65535 bytes (64K-1) since the count
field specifying the amount of data designated by the CCW is a 16-bit
unsigned value.

Provide functionality that allocates an array of multiple IDAL buffer
with the limitation mentioned above in mind.
A call to idal_buffer_array_alloc() allocates an array with a certain
amount of IDAL buffers which is determined based on the total size of
@size. Each individual buffer is limited to a size of CCW_MAX_BYTE_COUNT
(65535 bytes).

Add helper functions that determine the size (# of elements) and the
total data size covered by the array as well.

Current users of the single IDAL buffer are adapted to use the new
functions with one buffer to allocate.

The single IDAL buffer is removed from the tape_char_data struct.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:55 +02:00
Jan Höppner a5e2ca22c1 s390/tape: Move idal allocation to core functions
Currently tapechar_check_idalbuffer() is part of tape_char.c and is used
to ensure the idal buffer is big enough for the requested I/O and
reallocates a new one if required. The same is done in tape_std.c when a
fixed block size is set using the mtsetblk command. This is essentially
duplicate code.

The allocation of the buffer that is required for I/O can be considered
core functionality. Move the idal buffer allocation to tape_core.c,
make it generally available, and reduce code duplication.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:55 +02:00
Jan Höppner e039400f75 s390/tape: Fix return value of ccw helper functions
In contrast to all other helper functions used to build CCW chains,
tape_ccw_cc_idal() and tape_ccw_end_idal() return values using
post-increments, which results in returning the same CCW pointer.

Though, the intent of the CCW helper functions is to return the _next_
CCW in the chain, which can then be processed.

There is currently no actual issue, as tape_ccw_cc_idal() is not used
yet and tape_ccw_end_idal() is only used at the end of a chain.

Change both functions return statement to ccw + 1 and bring them in line
with the other helper functions.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:55 +02:00
Jan Höppner 83cff1b124 s390/tape: Remove extra CCW allocation for error recovery
The Read Opposite error recovery code required 2 extra CCWs to be
allocated in order to transform the request. As this error recovery code
for both 34xx and 3590 was removed the additional allocation isn't
required anymore. Reduce it to two.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:54 +02:00
Jan Höppner a984d71277 s390/tape: Remove 3590 Read Opposite error recovery
On old native type 3590 tape devices a Read Opposite error recovery
procedure on Error Recovery Action Code (ERA) 26 was issued if a Read
Forward command failed. This recovery procedure was implemented with the
Read Backward command. This is no longer supported.

Remove 3590 ERA 26 and Read Backward related recovery code.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:54 +02:00
Jan Höppner 1b9df1a28f s390/tape: Remove 34xx Read Opposite error recovery
On old native type 3490 tape devices a Read Opposite error recovery
procedure on Error Recovery Action Code (ERA) 26 was issued if a Read
Forward command failed. This recovery procedure was implemented with the
Read Backward command.

As a preparation for a subsequent commit, that adds support for bigger
block sizes, remove the 34xx ERA 26 related recovery code. The recovery
code would need to be adapted to the bigger block sizes, without any
possibility to be tested, as modern Virtual Tape Servers (VTS) do
neither report ERA 26 on a Read Forward command failure nor support the
error recovery procedure anymore.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:54 +02:00
Jan Höppner 39376c77a5 s390/tape: Remove count parameter from read/write_block functions
The count parameter of the read/write_block discipline functions was
never used. Remove it.

Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:25:54 +02:00
Heiko Carstens ec9b3b85ea Merge branch 'memory-hotplug'
Sumanth Korikkar says:

====================

Provide a new interface for dynamic configuration and
deconfiguration of hotplug memory on s390, allowing with/without
memmap_on_memory support. It is a follow up on the discussion with David
when introducing memmap_on_memory support for s390 and support dynamic
(de)configuration of memory:
https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/
https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/

The original motivation for introducing memmap_on_memory on s390 was to
avoid using online memory to store struct pages metadata, particularly
for standby memory blocks. This became critical in cases where there was
an imbalance between standby and online memory, potentially leading to
boot failures due to insufficient memory for metadata allocation.

To address this, memmap_on_memory was utilized on s390. However, in its
current form, it adds struct pages metadata at the start of each memory
block at the time of addition (only standby memory), and this
configuration is static. It cannot be changed at runtime  (When the user
needs continuous physical memory).

Inorder to provide more flexibility to the user and overcome the above
limitation, add an option to dynamically configure and deconfigure
hotpluggable memory block with/without memmap_on_memory.

With the new interface, s390 will not add all possible hotplug memory in
advance, like before, to make it visible in sysfs for online/offline
actions. Instead, before memory block can be set online, it has to be
configured via a new interface in /sys/firmware/memory/memoryX/config,
which makes s390 similar to others.  i.e. Adding of hotpluggable memory is
controlled by the user instead of adding it at boottime.

s390 kernel sysfs interface to configure/deconfigure memory with
memmap_on_memory (with upcoming lsmem changes):

* Initial memory layout:
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                 SIZE   STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff   2G  online 0-15  yes        no
0x80000000-0xffffffff   2G offline 16-31 no         yes

* Configure memory
echo 1 > /sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff    2G  online  0-15  yes        no
0x80000000-0x87ffffff  128M offline    16  yes        yes
0x88000000-0xffffffff  1.9G offline 17-31  no         yes

* Deconfigure memory
echo 0 > /sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                 SIZE   STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff   2G  online 0-15  yes        no
0x80000000-0xffffffff   2G offline 16-31 no         yes

* Enable memmap_on_memory and online it.
(Deconfigure first)
echo 0 > /sys/devices/system/memory/memory5/online
echo 0 > /sys/firmware/memory/memory5/config

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE  BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online 0-4   yes        no
0x28000000-0x2fffffff  128M offline 5     no         no
0x30000000-0x7fffffff  1.3G  online 6-15  yes        no
0x80000000-0xffffffff    2G offline 16-31 no         yes

(Enable memmap_on_memory and online it)
echo 1 > /sys/firmware/memory/memory5/memmap_on_memory
echo 1 > /sys/firmware/memory/memory5/config
echo 1 > /sys/devices/system/memory/memory5/online

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online  0-4   yes        no
0x28000000-0x2fffffff  128M  online  5     yes        yes
0x30000000-0x7fffffff  1.3G  online  6-15  yes        no
0x80000000-0xffffffff    2G  offline 16-31 no         yes

* Disable memmap_on_memory and online it.
(Deconfigure first)
echo 0 > /sys/devices/system/memory/memory5/online
echo 0 > /sys/firmware/memory/memory5/config

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE  BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online 0-4   yes        no
0x28000000-0x2fffffff  128M offline 5     no         yes
0x30000000-0x7fffffff  1.3G  online 6-15  yes        no
0x80000000-0xffffffff    2G offline 16-31 no         yes

(Disable memmap_on_memory and online it)
echo 0 > /sys/firmware/memory/memory5/memmap_on_memory
echo 1 > /sys/firmware/memory/memory5/config
echo 1 > /sys/devices/system/memory/memory5/online

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff  2G    online  0-15  yes        no
0x80000000-0xffffffff  2G    offline 16-31 no         yes

* Userspace changes:
lsmem/chmem tool is also changed to use the new interface. I will send
it to util-linux soon.

Patch 1 adds support for removal of boot-allocated memory blocks.

Patch 2 provides option to dynamically configure and deconfigure memory
with/without memmap_on_memory.

Patch 3 removes MHP_OFFLINE_INACCESSIBLE from s390. The mhp flag was
used to mark memory as not accessible until memory hotplug online phase
begins.  However, with patch 2, it is no longer essential. Memory can be
brought to accessible state before adding memory, as the memory is added
during runttime now instead of boottime.

Patch 4 removes the MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. It
is no longer needed.  Memory can be brought to accessible state before
adding memory now, with runtime (de)configuration of memory.

Note: The patches apply to the linux-next branch.

v3:
Thanks David
* Avoid goto label in create_standby_sclp_mems().
* Use unsigned long instead of u64.
* Add Acked-by.

v2:
Thanks David
* Rename struct mblock/mblock_arg with struct sclp_mem/sclp_mem_arg.
* Rename all mblocks/mblock references with sclp_mems/sclp_mem -
  structures, functions.
* Rename create_online_mblock() with create_configured_sclp_mem().
* Rename config_mblock_show()/config_mblock_store() with
  config_sclp_mem_show()/config_sclp_mem_store().
* Remove contains_standby_increment() and
  sclp_mem_notifier. sclp mem state change is performed when
  adding/removing memory. sclp memory notifier - no longer needed with
  this patchset.
* Recover sclp mem state when add_memory() fails.
* Refactor and add function init_sclp_mem().
* Use unsigned long instead of unsigned long long.
* Simplify and correct kobj handling. Thanks Heiko.

====================

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:19:30 +02:00
Heiko Carstens c97689345c s390/con3270: Use scnprintf() instead of sprintf()
Use scnprintf() instead of sprintf() for those cases where the destination
is an array and the size of the array is known at compile time.

This prevents theoretical buffer overflows, but also avoids that people
again and again spend time to figure out if the code is actually safe.

Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:30 +02:00
Heiko Carstens c769941de8 s390/tape: Use scnprintf() instead of sprintf()
Use scnprintf() instead of sprintf() for those cases where the destination
is an array and the size of the array is known at compile time.

This prevents theoretical buffer overflows, but also avoids that people
again and again spend time to figure out if the code is actually safe.

Reviewed-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:30 +02:00
Heiko Carstens ffb5d3af5e s390/dcss: Use scnprintf() instead of sprintf()
Use scnprintf() instead of sprintf() for those cases where the destination
is an array and the size of the array is known at compile time.

This prevents theoretical buffer overflows, but also avoids that people
again and again spend time to figure out if the code is actually safe.

Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:30 +02:00
Heiko Carstens ba06238bbe s390/cio: Use scnprintf() instead of sprintf()
Use scnprintf() instead of sprintf() for those cases where the destination
is an array and the size of the array is known at compile time.

This prevents theoretical buffer overflows, but also avoids that people
again and again spend time to figure out if the code is actually safe.

Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:30 +02:00
Heiko Carstens 6850221116 s390/early: Use scnprintf() instead of sprintf()
Use scnprintf() instead of sprintf() for those cases where the destination
is an array and the size of the array is known at compile time.

This prevents theoretical buffer overflows, but also avoids that people
again and again spend time to figure out if the code is actually safe.

Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:30 +02:00
Thomas Richter 4d065f3c80 s390/pai_crypto: Adjust paicrypt_copy() return statement
Adjust the return statement in paicrypt_copy() to the same statement
as in paiext_copy(). Use one common style. No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Jan Polensky <japo@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:29 +02:00
Josephine Pfeiffer 4738e11662 s390/sysinfo: Replace sprintf() with snprintf() for buffer safety
Replace sprintf() with snprintf() when formatting symlink target name
to prevent potential buffer overflow. The link_to buffer is only 10
bytes, and using snprintf() ensures proper bounds checking if the
topology nesting limit value is unexpectedly large.

Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:29 +02:00
Josephine Pfeiffer 5379879a76 s390/extmem: Replace sprintf() with snprintf() for buffer safety
Replace unsafe sprintf() calls with snprintf() in segment_save() to
prevent potential buffer overflows. The function builds command strings
by repeatedly appending to a fixed-size buffer, which could overflow if
segment ranges are numerous or values are large.

Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:29 +02:00
Josephine Pfeiffer dd7d1d34ae s390/cmm: Replace sprintf() with scnprintf() for buffer safety
Replace sprintf() with scnprintf() in cmm_timeout_handler() to prevent
potential buffer overflow. The scnprintf() function ensures we don't
write beyond the buffer size and provides safer string formatting.

Signed-off-by: Josephine Pfeiffer <hi@josie.lol>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-21 10:17:20 +02:00
Thorsten Blum ace0471774 cpufreq: Replace deprecated strcpy() in cpufreq_unregister_governor()
strcpy() is deprecated; assign the NUL terminator directly instead.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/20251017153354.82009-2-thorsten.blum@linux.dev
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 21:25:36 +02:00
Rafael J. Wysocki 5313ec4a21 cpufreq: intel_pstate: Improve printing of debug messages
Some debug messages generated by intel_pstate on a given hybrid system
are only printed for some CPUs which is confusing, so modify the driver
to print them for all CPUs.  Also change those messages to avoid
printing local variable names in them.

Moreover, some debug messages printed by intel_pstate are quite hard
to understand without looking at the code printing them, so make them
somewhat clearer while at it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8609836.T7Z3S40VBb@rafael.j.wysocki
2025-10-20 21:23:36 +02:00
Rafael J. Wysocki d852b6f67b cpufreq: intel_pstate: hybrid: Adjust energy model rules
Instead of using HWP-to-frequency scaling factors for computing cost
coefficients in the energy model used on hybrid systems, which is
fragile, rely on CPU type information that is easily accessible now and
the information on whether or not L3 cache is present for this purpose.

This also allows the cost coefficients for P-cores to be adjusted so
that they start to be populated somewhat earlier (that is, before
E-cores are loaded up to their full capacity).

In addition to the above, replace an inaccurate comment regarding the
reason why the freq value is added to the cost in hybrid_get_cost().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Link: https://patch.msgid.link/5932894.DvuYhMxLoT@rafael.j.wysocki
2025-10-20 21:22:21 +02:00
Rafael J. Wysocki c17add7349 cpufreq: intel_pstate: Add and use hybrid_has_l3()
Introduce a function for checking whether or not a given CPU has L3
cache, called hybrid_has_l3(), and use it in hybrid_get_cost() for
computing cost coefficients associated with a given perf domain.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/13884343.uLZWGnKmhe@rafael.j.wysocki
2025-10-20 21:20:49 +02:00
Rafael J. Wysocki 528dde6619 cpufreq: intel_pstate: Add and use hybrid_get_cpu_type()
Introduce a function for identifying the type of a given CPU in a
hybrid system, called hybrid_get_cpu_type(), and use if for hybrid
scaling factor determination in hwp_get_cpu_scaling().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1954386.tdWV9SEqCh@rafael.j.wysocki
2025-10-20 21:20:49 +02:00
Zihuan Zhang 6db0f533d3 cpufreq: preserve freq_table_sorted across suspend/hibernate
During S3/S4 suspend and resume, cpufreq policies are not freed or
recreated; the freq_table and policy structure remain intact. However,
set_freq_table_sorted() currently resets policy->freq_table_sorted to
UNSORTED unconditionally, which is unnecessary since the table order
does not change across suspend/resume.

This patch adds a check to skip validation if policy->freq_table_sorted
is already ASCENDING or DESCENDING. This avoids unnecessary traversal
of the frequency table on S3/S4 resume or repeated online events,
reducing overhead while preserving correctness.

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
Link: https://patch.msgid.link/20251011072420.11495-1-zhangzihuan@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 21:01:35 +02:00
Rafael J. Wysocki d3db87f89c PM: hibernate: Rework message printing in swsusp_save()
The messages printed by swsusp_save() are basically only useful for
debug, so printing them every time a hibernation image is created at
the "info" log level is not particularly useful.  Also printing a
message on a failing memory allocation is redundant.

Use pm_deferred_pr_dbg() for printing those messages so they will only
be printed when requested and the "deferred" variant is used because
this code runs in a deeply atomic context (one CPU with interrupts
off, no functional devices).  Also drop the useless message printed
when memory allocations fails.

While at it, extend one of the messages in question so it is less
cryptic.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ rjw: Dropped a useless colon at the end of one of the messages ]
Link: https://patch.msgid.link/10750389.nUPlyArG6x@rafael.j.wysocki
2025-10-20 20:43:09 +02:00
Rafael J. Wysocki 32ece31db4 ACPI: PM: s2idle: Only retrieve constraints when needed
The evaluation of LPS0 _DSM Function 1 in lps0_device_attach() may be
useless if pm_debug_messages_on is never set.

For this reason, instead of evaluating it in lps0_device_attach(), do
that in a new .begin() callback for s2idle, acpi_s2idle_begin_lps0(),
only when pm_debug_messages_on is set at that point.

However, never attempt to evaluate LPS0 _DSM Function 1 more than once
to avoid recurring failures.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/3027060.e9J7NaK4W3@rafael.j.wysocki
2025-10-20 20:39:33 +02:00
Rafael J. Wysocki bfc09902de ACPI: PM: s2idle: Staticise LPS0 callback functions
The LPS0 callback functions in x86/s2idle.c can be made static, so do
that and remove their declarations from sleep.h.

While at it, add the _lps0 suffix to their names to indicate that
they are LPS0-specific.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/2254836.irdbgypaU6@rafael.j.wysocki
2025-10-20 20:39:33 +02:00
Rafael J. Wysocki a00f3dea03 ACPI: PM: s2idle: Drop acpi_get_lps0_constraint()
Drop unused function acpi_get_lps0_constraint().

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/5032801.GXAFRqVoOG@rafael.j.wysocki
2025-10-20 20:39:33 +02:00
Christophe JAILLET ac646f4495 genirq/msi: Slightly simplify msi_domain_alloc()
The return value of irq_find_mapping() is only tested, not used for
anything else.

Replaced it by irq_resolve_mapping() which is internally used by
irq_find_mapping() and allows a simple boolean decision.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/1ce680114cdb8d40b072c54d7f015696a540e5a6.1760863194.git.christophe.jaillet@wanadoo.fr
2025-10-20 20:18:48 +02:00
Sergey Senozhatsky a67818f745 PM: dpm_watchdog: add module param to backtrace all CPUs
Add dpm_watchdog_all_cpu_backtrace module parameter which
controls all CPU backtrace dump before the DPM watchdog panics
the system.

This is expected to help understand what might have caused device
timeout.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://patch.msgid.link/20251007063551.3147937-1-senozhatsky@chromium.org
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 20:07:02 +02:00
Kaushlendra Kumar 5a151c2328 PM: sleep: Introduce CALL_PM_OP() macro to simplify code
Add CALL_PM_OP() macro to eliminate a repetitive code pattern in
power management generic operations.

Replace analogous driver PM callback invocation logic across all
pm_generic_*() functions with a single macro that handles the NULL
pointer checks and function calls.

This reduces code size while maintaining the same functionality and
improving code maintainability.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Link: https://patch.msgid.link/20250919124437.3075016-1-kaushlendra.kumar@intel.com
[ rjw: Subject and changelog edits, adjust white space ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20 19:54:25 +02:00
Malaya Kumar Rout b57100a3d9 PM: console: Fix memory allocation error handling in pm_vt_switch_required()
The pm_vt_switch_required() function fails silently when memory
allocation fails, offering no indication to callers that the operation
was unsuccessful. This behavior prevents drivers from handling allocation
errors correctly or implementing retry mechanisms. By ensuring that
failures are reported back to the caller, drivers can make informed
decisions, improve robustness, and avoid unexpected behavior during
critical power management operations.

Change the function signature to return an integer error code and modify
the implementation to return -ENOMEM when kmalloc() fails. Update both
the function declaration and the inline stub in include/linux/pm.h to
maintain consistency across CONFIG_VT_CONSOLE_SLEEP configurations.

The function now returns:
 - 0 on success (including when updating existing entries)
 - -ENOMEM when memory allocation fails

This change improves error reporting without breaking existing callers,
as the current callers in drivers/video/fbdev/core/fbmem.c already
ignore the return value, making this a backward-compatible improvement.

Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Malaya Kumar Rout <mrout@redhat.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Link: https://patch.msgid.link/20251013193028.89570-1-mrout@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-18 14:38:23 +02:00
Johan Hovold a7f25e00c4 irqchip/qcom-irq-combiner: Rename driver structure
The "_probe" suffix of the driver structure name prevents modpost from
warning about section mismatches so replace it to catch any future
issues like the recently fixed probe function being incorrectly marked
as __init.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-17 15:18:18 +02:00
Yazen Ghannam 6553c68bc7 RAS/AMD/ATL: Return error codes from helper functions
Pass up error codes from helper functions rather than discarding them.

Suggested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-17 14:38:42 +02:00
Elena Reshetova 0f2753efc5 x86/sgx: Enable automatic SVN updates for SGX enclaves
== Background ==

ENCLS[EUPDATESVN] is a new SGX instruction [1] which allows enclave
attestation to include information about updated microcode SVN without a
reboot. Before an EUPDATESVN operation can be successful, all SGX memory
(aka. EPC) must be marked as “unused” in the SGX hardware metadata
(aka.EPCM). This requirement ensures that no compromised enclave can
survive the EUPDATESVN procedure and provides an opportunity to generate
new cryptographic assets.

== Solution ==

Attempt to execute ENCLS[EUPDATESVN] every time the first file descriptor
is obtained via sgx_(vepc_)open(). In the most common case the microcode
SVN is already up-to-date, and the operation succeeds without updating SVN.

Note: while in such cases the underlying crypto assets are regenerated, it
does not affect enclaves' visible keys obtained via EGETKEY instruction.

If it fails with any other error code than SGX_INSUFFICIENT_ENTROPY, this
is considered unexpected and the *open() returns an error. This should not
happen in practice.

On contrary, SGX_INSUFFICIENT_ENTROPY might happen due to a pressure on the
system's DRNG (RDSEED) and therefore the *open() can be safely retried to
allow normal enclave operation.

[1] Runtime Microcode Updates with Intel Software Guard Extensions,
https://cdrdv2.intel.com/v1/dl/getContent/648682

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16 14:42:09 -07:00
Elena Reshetova 4e75697faa x86/sgx: Implement ENCLS[EUPDATESVN]
All running enclaves and cryptographic assets (such as internal SGX
encryption keys) are assumed to be compromised whenever an SGX-related
microcode update occurs. To mitigate this assumed compromise the new
supervisor SGX instruction ENCLS[EUPDATESVN] can generate fresh
cryptographic assets.

Before executing EUPDATESVN, all SGX memory must be marked as unused. This
requirement ensures that no potentially compromised enclave survives the
update and allows the system to safely regenerate cryptographic assets.

Add the method to perform ENCLS[EUPDATESVN]. However, until the follow up
patch that wires calling sgx_update_svn() from sgx_inc_usage_count(), this
code is not reachable.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16 14:42:09 -07:00
Elena Reshetova 7b502832ee x86/sgx: Define error codes for use by ENCLS[EUPDATESVN]
Add error codes for ENCLS[EUPDATESVN], then SGX CPUSVN update process can
know the execution state of EUPDATESVN and notify userspace.

EUPDATESVN will be called when no active SGX users is guaranteed. Only add
the error codes that can legally happen. E.g., it could also fail due to
"SGX not ready" when there's SGX users but it wouldn't happen in this
implementation.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16 14:42:09 -07:00
Elena Reshetova 6ffdb49101 x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag
Add a flag indicating whenever ENCLS[EUPDATESVN] SGX instruction is
supported. This will be used by SGX driver to perform CPU SVN updates.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16 14:42:08 -07:00
Elena Reshetova 483fc19e9c x86/sgx: Introduce functions to count the sgx_(vepc_)open()
Currently, when SGX is compromised and the microcode update fix is applied,
the machine needs to be rebooted to invalidate old SGX crypto-assets and
make SGX be in an updated safe state. It's not friendly for the cloud.

To avoid having to reboot, a new ENCLS[EUPDATESVN] is introduced to update
SGX environment at runtime. This process needs to be done when there's no
SGX users to make sure no compromised enclaves can survive from the update
and allow the system to regenerate crypto-assets.

For now there's no counter to track the active SGX users of host enclave
and virtual EPC. Introduce such counter mechanism so that the EUPDATESVN
can be done only when there's no SGX users.

Define placeholder functions sgx_inc/dec_usage_count() that are used to
increment and decrement such a counter. Also, wire the call sites for
these functions. Encapsulate the current sgx_(vepc_)open() to
__sgx_(vepc_)open() to make the new sgx_(vepc_)open() easy to read.

The definition of the counter itself and the actual implementation of
sgx_inc/dec_usage_count() functions come next.

Note: The EUPDATESVN, which may fail, will be done in
sgx_inc_usage_count(). Make it return 'int' to make subsequent patches
which implement EUPDATESVN easier to review. For now it always returns
success.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Nataliia Bondarevska <bondarn@google.com>
2025-10-16 14:42:08 -07:00
Nam Cao dce7450093 PCI/MSI: Delete pci_msi_create_irq_domain()
pci_msi_create_irq_domain() is now unused. Delete it.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2025-10-16 21:09:52 +02:00
Samuel Holland 3a16b05384 irqchip/riscv-imsic: Inline imsic_vector_from_local_id()
This function is only called from one place, which is in the interrupt
handling hot path. Inline it to improve code generation and to take
advantage of this_cpu operations. lpriv and imsic->base_domain can never be
NULL because irq_set_chained_handler() is called after they are allocated.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 18:17:28 +02:00
Samuel Holland 79eaabc61d irqchip/riscv-imsic: Embed the vector array in lpriv
Reduce pointer chasing and the number of allocations by using a flexible
array member for the vector array instead of a separate allocation.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 18:17:28 +02:00
Samuel Holland c475c0b713 irqchip/riscv-imsic: Remove redundant irq_data lookups
imsic_irq_set_affinity() already takes the irq_data pointer as a
parameter, so it is pointless to look it up again from the IRQ number.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 18:17:28 +02:00
Johan Hovold dcc31768ff irqchip/ts4800: Drop unused module alias
The driver has never supported anything but OF probing so drop the
unused platform alias.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 18:17:28 +02:00
Johan Hovold b03127a4e7 irqchip/mvebu-pic: Drop unused module alias
The driver has never supported anything but OF probing so drop the
unused platform alias.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
2025-10-16 18:17:28 +02:00
Johan Hovold 867c6aa283 irqchip/meson-gpio: Drop unused module alias
The driver has never supported anything but OF probing so drop the
unused platform alias that was erroneously added by commit a947aa00ed
("irqchip/meson-gpio: Make it possible to build as a module").

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 18:17:27 +02:00
Johan Hovold 1230fbb225 irqchip: Enable compile testing of Broadcom drivers
There seems to be nothing preventing the Broadcom drivers from being
compile tested so enable that for wider build coverage.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 18:17:27 +02:00
Johan Hovold 1e3e330c07 irqchip: Pass platform device to platform drivers
The IRQCHIP_PLATFORM_DRIVER macros can be used to convert OF irqchip
drivers to platform drivers but currently reuse the OF init callback
prototype that only takes OF nodes as arguments. This forces drivers to
do reverse lookups of their struct devices during probe if they need
them for things like dev_printk() and device managed resources.

Half of the drivers doing reverse lookups also currently fail to release
the additional reference taken during the lookup, while other drivers
have had the reference leak plugged in various ways (e.g. using
non-intuitive cleanup constructs which still confuse static checkers).

Switch to using a probe callback that takes a platform device as its
first argument to simplify drivers and plug the remaining (mostly
benign) reference leaks.

Fixes: 32c6c05466 ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller")
Fixes: 70afdab904 ("irqchip: Add IMX MU MSI controller driver")
Fixes: a6199bb514 ("irqchip: Add Qualcomm MPM controller driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>
2025-10-16 18:17:27 +02:00
Randy Dunlap 762a3d1ca2 x86/idtentry: Add missing '*' to kernel-doc lines
Fix kernel-doc warnings by adding the missing '*' to each line.

  Warning: include/asm/idtentry.h:395 bad line:    when raised from kernel mode
  Warning: include/asm/idtentry.h:405 bad line:    when raised from user mode

Since this is in a kernel-doc block, these lines need a leading
" *" on each line to prevent the warnings.

Fixes: a13644f3a5 ("x86/entry/64: Add entry code for #VC handler")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-16 17:45:42 +02:00
Johan Hovold 3540d99c03 irqchip: Drop leftover brackets
Drop some unnecessary brackets in platform_irqchip_probe() mistakenly
left by commit 9322d1915f ("irqchip: Plug a OF node reference leak in
platform_irqchip_probe()").

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-16 11:30:38 +02:00
Johan Hovold 9b685058ca irqchip/qcom-irq-combiner: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the probe callback must not live in init.

Fixes: f20cc9b00c ("irqchip/qcom: Add IRQ combiner driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 11:30:38 +02:00
Johan Hovold f798bdb9aa irqchip/starfive-jh8100: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callback must not live in init.

Fixes: e4e5350361 ("irqchip: Add StarFive external interrupt controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>
2025-10-16 11:30:38 +02:00
Johan Hovold 5b338fbb2b irqchip/renesas-rzg2l: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callbacks must not live in init.

Fixes: d011c022ef ("irqchip/renesas-rzg2l: Add support for RZ/Five SoC")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-10-16 11:30:38 +02:00
Johan Hovold 64acfd8e68 irqchip/imx-mu-msi: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callbacks must not live in init.

Fixes: 70afdab904 ("irqchip: Add IMX MU MSI controller driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-10-16 11:30:38 +02:00
Johan Hovold bbe1775924 irqchip/irq-brcmstb-l2: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callbacks must not live in init.

Fixes: 51d9db5c8f ("irqchip/irq-brcmstb-l2: Switch to IRQCHIP_PLATFORM_DRIVER")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 11:30:37 +02:00
Johan Hovold bfc0c5beab irqchip/irq-bcm7120-l2: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callbacks must not live in init.

Fixes: 3ac268d5ed ("irqchip/irq-bcm7120-l2: Switch to IRQCHIP_PLATFORM_DRIVER")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 11:30:37 +02:00
Johan Hovold e9db5332ca irqchip/irq-bcm7038-l1: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callback must not live in init.

Fixes: c057c799e3 ("irqchip/irq-bcm7038-l1: Switch to IRQCHIP_PLATFORM_DRIVER")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 11:30:37 +02:00
Johan Hovold a8452d1d59 irqchip/bcm2712-mip: Fix section mismatch
Platform drivers can be probed after their init sections have been
discarded so the irqchip init callback must not live in init.

Fixes: 32c6c05466 ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 11:30:37 +02:00
Johan Hovold 0435bcc4e5 irqchip/bcm2712-mip: Fix OF node reference imbalance
The init callback must not decrement the reference count of the provided
irqchip OF node.

This should not cause any trouble currently, but if the driver ever
starts probe deferring it could lead to warnings about reference
underflow and saturation.

Fixes: 32c6c05466 ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
2025-10-16 11:30:37 +02:00
Chang S. Bae bffeb2fd0b x86/microcode/intel: Enable staging when available
With staging support implemented, enable it when the CPU reports the
feature.

  [ bp: Sort in the MSR properly. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:47:50 +02:00
Chang S. Bae 4ab410287b x86/microcode/intel: Support mailbox transfer
The functions for sending microcode data and retrieving the next offset
were previously placeholders, as they need to handle a specific mailbox
format.

While the kernel supports similar mailboxes, none of them are compatible
with this one. Attempts to share code led to unnecessary complexity, so
add a dedicated implementation instead.

  [ bp: Sort the include properly. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:47:43 +02:00
Chang S. Bae afc3b50954 x86/microcode/intel: Implement staging handler
Previously, per-package staging invocations and their associated state
data were established. The next step is to implement the actual staging
handler according to the specified protocol. Below are key aspects to
note:

  (a)  Each staging process must begin by resetting the staging hardware.

  (b)  The staging hardware processes up to a page-sized chunk of the
       microcode image per iteration, requiring software to submit data
       incrementally.

  (c)  Once a data chunk is processed, the hardware responds with an
       offset in the image for the next chunk.

  (d)  The offset may indicate completion or request retransmission of an
       already transferred chunk. As long as the total transferred data
       remains within the predefined limit (twice the image size),
       retransmissions should be acceptable.

Incorporate them in the handler, while data transmission and mailbox
format handling are implemented separately.

  [ bp: Sort the headers in a reversed name-length order. ]

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:47:37 +02:00
Chang S. Bae 079b90d4ba x86/microcode/intel: Define staging state struct
Define a staging_state struct to simplify function prototypes by consolidating
relevant data, instead of passing multiple local variables.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:47:31 +02:00
Chang S. Bae 740144bc6b x86/microcode/intel: Establish staging control logic
When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers.

Prepare staging with the following steps:

  1.  Ensure the microcode image is 32-bit aligned to match the MMIO
      register size.

  2.  Identify each MMIO interface based on its per-package scope.

  3.  Invoke the staging function for each identified interface, which
      will be implemented separately.

  [ bp: Improve error logging. ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/all/871pznq229.ffs@tglx
2025-10-15 16:47:20 +02:00
Chang S. Bae 7cdda85ed9 x86/microcode: Introduce staging step to reduce late-loading time
As microcode patch sizes continue to grow, late-loading latency spikes can
lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational solution is
needed to address the issue.

To mitigate the problem, introduce a microcode staging feature. This option
processes most of the microcode update (excluding activation) on
a non-critical path, allowing CPUs to remain operational during the majority
of the update. By offloading work from the critical path, staging can
significantly reduce latency spikes.

Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.

Staging follows an opportunistic model:

  *  If successful, it reduces CPU rendezvous time
  *  Even though it fails, the process falls back to the legacy path to
     finish the loading process but with potentially higher latency.

Extend struct microcode_ops to incorporate staging properties, which will be
implemented in the vendor code separately.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:46:58 +02:00
Chang S. Bae ed44a5625f x86/cpu/topology: Make primary thread mask available with SMP=n
cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even
in UP kernels there is always exactly one CPU, which can reasonably be
treated as the primary thread.

Historically, topology_is_primary_thread() always returned true with
CONFIG_SMP=n. A recent commit:

  4b455f5994 ("cpu/SMT: Provide a default topology_is_primary_thread()")

replaced it with a generic implementation with the note:

  "When disabling SMT, the primary thread of the SMT will remain
   enabled/active. Architectures that have a special primary thread (e.g.
   x86) need to override this function. ..."

For consistency and clarity, make the primary thread mask available
regardless of SMP, similar to cpu_possible_mask and cpu_present_mask.

Move __cpu_primary_thread_mask into common code to prevent build issues.
Let cpu_mark_primary_thread() configure the mask even for UP kernels,
alongside other masks. Then, topology_is_primary_thread() can
consistently reference it.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250320234104.8288-1-chang.seok.bae@intel.com
2025-10-15 16:46:11 +02:00
Sumanth Korikkar 300709fbef mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers
MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced
to prepare the transition of memory to and from a physically accessible
state. This enhancement was crucial for implementing the "memmap on memory"
feature for s390.

With introduction of dynamic (de)configuration of hotpluggable memory,
memory can be brought to accessible state before add_memory(). Memory
can be brought to inaccessible state before remove_memory(). Hence,
there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory
notifiers anymore.

This basically reverts commit
c5f1e2d189 ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers")
Additionally, apply minor adjustments to the function parameters of
move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure
compatibility with the latest branch.

Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14 14:24:53 +02:00
Sumanth Korikkar ce2071e02d s390/sclp: Remove MHP_OFFLINE_INACCESSIBLE
mhp_flag MHP_OFFLINE_INACCESSIBLE was used to mark memory as not
accessible until memory hotplug online phase begins.

Earlier, standby memory blocks were added upfront during boottime and
MHP_OFFLINE_INACCESSIBLE flag avoided page_init_poison() on memmap
during mhp addition phase.

However with dynamic runtime configuration of memory, standby memory can
be brought to accessible state before performing add_memory(). Hence,
remove MHP_OFFLINE_INACCESSIBLE.

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14 14:24:53 +02:00
Sumanth Korikkar ff18dcb19a s390/sclp: Add support for dynamic (de)configuration of memory
Provide a new interface for dynamic configuration and deconfiguration of
hotplug memory, allowing with/without memmap_on_memory support. It is a
follow up on the discussion with David when introducing memmap_on_memory
support for s390 and support dynamic (de)configuration of memory:
https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/
https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/

The original motivation for introducing memmap_on_memory on s390 was to
avoid using online memory to store struct pages metadata, particularly
for standby memory blocks. This became critical in cases where there was
an imbalance between standby and online memory, potentially leading to
boot failures due to insufficient memory for metadata allocation.

To address this, memmap_on_memory was utilized on s390. However, in its
current form, it adds struct pages metadata at the start of each memory
block at the time of addition and this configuration is static. It
cannot be changed at runtime. (When the user needs continuous physical
memory).

Inorder to provide more flexibility to the user and overcome the above
limitation, add option to dynamically configure and deconfigure
hotpluggable memory block with/without memmap_on_memory.

With the new interface, s390 will not add all possible hotplug memory in
advance, like before, to make it visible in sysfs for online/offline
actions. Instead, before memory block can be set online, it has to be
configured via a new interface in /sys/firmware/memory/memoryX/config,
which makes s390 similar to others.  i.e. Adding of hotpluggable memory is
controlled by the user instead of adding it at boottime.

The s390 kernel sysfs interface to configure and deconfigure memory is
as follows (considering the upcoming lsmem changes):

* Initial memory layout:
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                 SIZE   STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff   2G  online 0-15  yes        no
0x80000000-0xffffffff   2G offline 16-31 no         yes

* Configure memory
sys="/sys"
echo 1 > $sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff    2G  online  0-15  yes        no
0x80000000-0x87ffffff  128M offline    16  yes        yes
0x88000000-0xffffffff  1.9G offline 17-31  no         yes

* Deconfigure memory
echo 0 > $sys/firmware/memory/memory16/config
lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                 SIZE   STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff   2G  online 0-15  yes        no
0x80000000-0xffffffff   2G offline 16-31 no         yes

3. Enable memmap_on_memory and online it.
echo 0 > $sys/devices/system/memory/memory5/online
echo 0 > $sys/firmware/memory/memory5/config

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE  BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online 0-4   yes        no
0x28000000-0x2fffffff  128M offline 5     no         no
0x30000000-0x7fffffff  1.3G  online 6-15  yes        no
0x80000000-0xffffffff    2G offline 16-31 no         yes

echo 1 > $sys/firmware/memory/memory5/memmap_on_memory
echo 1 > $sys/firmware/memory/memory5/config
echo 1 > $sys/devices/system/memory/memory5/online

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online  0-4   yes        no
0x28000000-0x2fffffff  128M  online  5     yes        yes
0x30000000-0x7fffffff  1.3G  online  6-15  yes        no
0x80000000-0xffffffff    2G  offline 16-31 no         yes

4. Disable memmap_on_memory and online it.
echo 0 > $sys/devices/system/memory/memory5/online
echo 0 > $sys/firmware/memory/memory5/config

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE  BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x27ffffff  640M  online 0-4   yes        no
0x28000000-0x2fffffff  128M offline 5     no         yes
0x30000000-0x7fffffff  1.3G  online 6-15  yes        no
0x80000000-0xffffffff    2G offline 16-31 no         yes

echo 0 > $sys/firmware/memory/memory5/memmap_on_memory
echo 1 > $sys/firmware/memory/memory5/config
echo 1 > $sys/devices/system/memory/memory5/online

lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY
RANGE                  SIZE  STATE   BLOCK CONFIGURED MEMMAP_ON_MEMORY
0x00000000-0x7fffffff  2G    online  0-15  yes        no
0x80000000-0xffffffff  2G    offline 16-31 no         yes

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14 14:24:53 +02:00
Sumanth Korikkar d5e88d32de s390/mm: Support removal of boot-allocated virtual memory map
On s390, memory blocks are not currently removed via
arch_remove_memory(). With upcoming dynamic memory (de)configuration
support, runtime removal of memory blocks is possible. This internally
involves tearing down identity mapping, virtual memory mappings and
freeing the physical memory backing the struct pages metadata.

During early boot, physical memory used to back the struct pages
metadata in vmemmap is allocated through:

setup_arch()
  -> sparse_init()
    -> sparse_init_nid()
      -> __populate_section_memmap()
        -> vmemmap_alloc_block_buf()
          -> sparse_buffer_alloc()
            -> memblock_alloc()

Here, sparse_init_nid() sets up virtual-to-physical mapping for struct
pages backed by memblock_alloc(). This differs from runtime addition of
hotplug memory which uses the buddy allocator later.

To correctly free identity mappings, vmemmap mappings during hot-remove,
boot-time and runtime allocations must be distinguished using the
PageReserved bit:

* Boot-time memory, such as identity-mapped page tables allocated via
  boot_crst_alloc() and reserved via reserve_pgtables() is marked
  PageReserved in memmap_init_reserved_pages().

* Physical memory backing vmemmap (struct pages from memblock_alloc())
  is also marked PageReserved similarly.

During teardown, PageReserved bit is checked to distinguish between
boot-time allocation or buddy allocation.

This is similar to commit 645d5ce2f7 ("powerpc/mm/radix: Fix PTE/PMD
fragment count for early page table mappings")

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14 14:24:53 +02:00
Andrew Cooper 4ab13be5ed x86/fred: Fix 64bit identifier in fred_ss
FRED can only be enabled in Long Mode.  This is the 64bit mode (as opposed to
compatibility mode) identifier, rather than being something hard-wired at 1.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-10-13 14:05:42 -07:00
Kaushlendra Kumar 67434ce57c PM: sleep: Replace snprintf() with scnprintf() in show_trace_dev_match()
Replace snprintf() with scnprintf() in show_trace_dev_match() to simplify
buffer length handling. The scnprintf() function returns the number of
characters actually written (excluding the null terminator), which
eliminates the need for manual length checking and clamping.

This change removes the redundant size check since scnprintf() guarantees
that the return value will never exceed the buffer size, making the code
cleaner and less error-prone.

Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com>
Link: https://patch.msgid.link/20250922055231.3523680-1-kaushlendra.kumar@intel.com
[ rjw: Subject adjustment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-13 21:19:12 +02:00
Marco Crivellari c9ff363738 PM: WQ_UNBOUND added to pm_wq workqueue
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This change add the WQ_UNBOUND flag to pm_wq, to make explicit this
workqueue can be unbound and that it does not benefit from per-cpu work.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-13 20:50:09 +02:00
Chen Yu a0a0999507 x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
2025-10-13 16:59:55 +02:00
Borislav Petkov (AMD) ddde4abaa0 x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
That cpuinfo_x86.x86_capability[] element was supposed to mirror CPUID flags
from CPUID_0x80000007_EBX but that leaf has still to this day only three bits
defined in it. So move those bits to scattered.c and free the capability
element for synthetic flags.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-10-13 16:21:25 +02:00
822 changed files with 19311 additions and 12519 deletions

View File

@ -454,3 +454,19 @@ Description:
disables it. Reads from the file return the current value.
The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config
flag is unset, or "0" otherwise.
What: /sys/power/hibernate_compression_threads
Date: October 2025
Contact: <luoxueqin@kylinos.cn>
Description:
Controls the number of threads used for compression
and decompression of hibernation images.
The value can be adjusted at runtime to balance
performance and CPU utilization.
The change takes effect on the next hibernation or
resume operation.
Minimum value: 1
Default value: 3

View File

@ -406,24 +406,8 @@ index of the MC::
|->mc2
....
Under each ``mcX`` directory each ``csrowX`` is again represented by a
``csrowX``, where ``X`` is the csrow index::
.../mc/mc0/
|
|->csrow0
|->csrow2
|->csrow3
....
Notice that there is no csrow1, which indicates that csrow0 is composed
of a single ranked DIMMs. This should also apply in both Channels, in
order to have dual-channel mode be operational. Since both csrow2 and
csrow3 are populated, this indicates a dual ranked set of DIMMs for
channels 0 and 1.
Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
control and attribute files.
Within each of the ``mcX`` directory are several EDAC control and
attribute files.
``mcX`` directories
-------------------
@ -569,7 +553,7 @@ this ``X`` memory module:
- Unbuffered-DDR
.. [#f5] On some systems, the memory controller doesn't have any logic
to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.
to identify the memory module. On such systems, the directory is called ``rankX``.
On modern Intel memory controllers, the memory controller identifies the
memory modules directly. On such systems, the directory is called ``dimmX``.
@ -577,126 +561,6 @@ this ``X`` memory module:
symlinks inside the sysfs mapping that are automatically created by
the sysfs subsystem. Currently, they serve no purpose.
``csrowX`` directories
----------------------
When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``
directories. As this API doesn't work properly for Rambus, FB-DIMMs and
modern Intel Memory Controllers, this is being deprecated in favor of
``dimmX`` directories.
In the ``csrowX`` directories are EDAC control and attribute files for
this ``X`` instance of csrow:
- ``ue_count`` - Total Uncorrectable Errors count attribute file
This attribute file displays the total count of uncorrectable
errors that have occurred on this csrow. If panic_on_ue is set
this counter will not have a chance to increment, since EDAC
will panic the system.
- ``ce_count`` - Total Correctable Errors count attribute file
This attribute file displays the total count of correctable
errors that have occurred on this csrow. This count is very
important to examine. CEs provide early indications that a
DIMM is beginning to fail. This count field should be
monitored for non-zero values and report such information
to the system administrator.
- ``size_mb`` - Total memory managed by this csrow attribute file
This attribute file displays, in count of megabytes, the memory
that this csrow contains.
- ``mem_type`` - Memory Type attribute file
This attribute file will display what type of memory is currently
on this csrow. Normally, either buffered or unbuffered memory.
Examples:
- Registered-DDR
- Unbuffered-DDR
- ``edac_mode`` - EDAC Mode of operation attribute file
This attribute file will display what type of Error detection
and correction is being utilized.
- ``dev_type`` - Device type attribute file
This attribute file will display what type of DRAM device is
being utilized on this DIMM.
Examples:
- x1
- x2
- x4
- x8
- ``ch0_ce_count`` - Channel 0 CE Count attribute file
This attribute file will display the count of CEs on this
DIMM located in channel 0.
- ``ch0_ue_count`` - Channel 0 UE Count attribute file
This attribute file will display the count of UEs on this
DIMM located in channel 0.
- ``ch0_dimm_label`` - Channel 0 DIMM Label control file
This control file allows this DIMM to have a label assigned
to it. With this label in the module, when errors occur
the output can provide the DIMM label in the system log.
This becomes vital for panic events to isolate the
cause of the UE event.
DIMM Labels must be assigned after booting, with information
that correctly identifies the physical slot with its
silk screen label. This information is currently very
motherboard specific and determination of this information
must occur in userland at this time.
- ``ch1_ce_count`` - Channel 1 CE Count attribute file
This attribute file will display the count of CEs on this
DIMM located in channel 1.
- ``ch1_ue_count`` - Channel 1 UE Count attribute file
This attribute file will display the count of UEs on this
DIMM located in channel 0.
- ``ch1_dimm_label`` - Channel 1 DIMM Label control file
This control file allows this DIMM to have a label assigned
to it. With this label in the module, when errors occur
the output can provide the DIMM label in the system log.
This becomes vital for panic events to isolate the
cause of the UE event.
DIMM Labels must be assigned after booting, with information
that correctly identifies the physical slot with its
silk screen label. This information is currently very
motherboard specific and determination of this information
must occur in userland at this time.
System Logging
--------------

View File

@ -1907,6 +1907,16 @@
/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
is set. Default value is 5.
hibernate_compression_threads=
[HIBERNATION]
Set the number of threads used for compressing or decompressing
hibernation images.
Format: <integer>
Default: 3
Minimum: 1
Example: hibernate_compression_threads=4
highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact
size of <nn>. This works even on boxes that have no
highmem otherwise. This also works to reduce highmem
@ -6207,7 +6217,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
mba, smba, bmec, abmc.
mba, smba, bmec, abmc, sdciae.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
@ -6500,6 +6510,10 @@
Memory area to be used by remote processor image,
managed by CMA.
rseq_debug= [KNL] Enable or disable restartable sequence
debug mode. Defaults to CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE.
Format: <bool>
rt_group_sched= [KNL] Enable or disable SCHED_RR/FIFO group scheduling
when CONFIG_RT_GROUP_SCHED=y. Defaults to
!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.

View File

@ -580,6 +580,15 @@ the given CPU as the upper limit for the exit latency of the idle states that
they are allowed to select for that CPU. They should never select any idle
states with exit latency beyond that limit.
While the above CPU QoS constraints apply to CPU idle time management, user
space may also request a CPU system wakeup latency QoS limit, via the
`cpu_wakeup_latency` file. This QoS constraint is respected when selecting a
suitable idle state for the CPUs, while entering the system-wide suspend-to-idle
sleep state, but also to the regular CPU idle time management.
Note that, the management of the `cpu_wakeup_latency` file works according to
the 'cpu_dma_latency' file from user space point of view. Moreover, the unit
is also microseconds.
Idle States Control Via Kernel Command Line
===========================================

View File

@ -48,8 +48,9 @@ only way to pass early-configuration-time parameters to it is via the kernel
command line. However, its configuration can be adjusted via ``sysfs`` to a
great extent. In some configurations it even is possible to unregister it via
``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
registered (see `below <status_attr_>`_).
registered (see :ref:`below <status_attr>`).
.. _operation_modes:
Operation Modes
===============
@ -62,6 +63,8 @@ a certain performance scaling algorithm. Which of them will be in effect
depends on what kernel command line options are used and on the capabilities of
the processor.
.. _active_mode:
Active Mode
-----------
@ -94,6 +97,8 @@ Which of the P-state selection algorithms is used by default depends on the
Namely, if that option is set, the ``performance`` algorithm will be used by
default, and the other one will be used by default if it is not set.
.. _active_mode_hwp:
Active Mode With HWP
~~~~~~~~~~~~~~~~~~~~
@ -123,7 +128,7 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
internal P-state selection logic is expected to focus entirely on performance.
This will override the EPP/EPB setting coming from the ``sysfs`` interface
(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
(see :ref:`energy_performance_hints` below). Moreover, any attempts to change
the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
configuration will be rejected.
@ -192,6 +197,8 @@ This is the default P-state selection algorithm if the
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
is not set.
.. _passive_mode:
Passive Mode
------------
@ -289,12 +296,12 @@ Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
the entire range of available P-states, including the whole turbo range, to the
``CPUFreq`` core and (in the passive mode) to generic scaling governors. This
generally causes turbo P-states to be set more often when ``intel_pstate`` is
used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
for more information).
used relative to ACPI-based CPU performance scaling (see
:ref:`below <acpi-cpufreq>` for more information).
Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
(even if the Configurable TDP feature is enabled in the processor), its
``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should
work as expected in all cases (that is, if set to disable turbo P-states, it
always should prevent ``intel_pstate`` from using them).
@ -307,12 +314,12 @@ pieces of information on it to be known, including:
* The minimum supported P-state.
* The maximum supported `non-turbo P-state <turbo_>`_.
* The maximum supported :ref:`non-turbo P-state <turbo>`.
* Whether or not turbo P-states are supported at all.
* The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
are supported).
* The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo
P-states are supported).
* The scaling formula to translate the driver's internal representation
of P-states into frequencies and the other way around.
@ -400,10 +407,10 @@ Energy-Aware Scheduling Support
If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
:ref:`CAS` it registers an Energy Model for the processor. This allows the
Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
to operate in the `passive mode <Passive Mode_>`_.
to operate in the :ref:`passive mode <passive_mode>`.
The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
based on abstract cost values and it does not include any real power numbers)
@ -432,6 +439,8 @@ the ``energy_model`` directory in ``debugfs`` (typlically mounted on
User Space Interface in ``sysfs``
=================================
.. _global_attributes:
Global Attributes
-----------------
@ -444,8 +453,8 @@ argument is passed to the kernel in the command line.
``max_perf_pct``
Maximum P-state the driver is allowed to set in percent of the
maximum supported performance level (the highest supported `turbo
P-state <turbo_>`_).
maximum supported performance level (the highest supported :ref:`turbo
P-state <turbo>`).
This attribute will not be exposed if the
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@ -453,8 +462,8 @@ argument is passed to the kernel in the command line.
``min_perf_pct``
Minimum P-state the driver is allowed to set in percent of the
maximum supported performance level (the highest supported `turbo
P-state <turbo_>`_).
maximum supported performance level (the highest supported :ref:`turbo
P-state <turbo>`).
This attribute will not be exposed if the
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@ -463,18 +472,18 @@ argument is passed to the kernel in the command line.
``num_pstates``
Number of P-states supported by the processor (between 0 and 255
inclusive) including both turbo and non-turbo P-states (see
`Turbo P-states Support`_).
:ref:`turbo`).
This attribute is present only if the value exposed by it is the same
for all of the CPUs in the system.
The value of this attribute is not affected by the ``no_turbo``
setting described `below <no_turbo_attr_>`_.
setting described :ref:`below <no_turbo_attr>`.
This attribute is read-only.
``turbo_pct``
Ratio of the `turbo range <turbo_>`_ size to the size of the entire
Ratio of the :ref:`turbo range <turbo>` size to the size of the entire
range of supported P-states, in percent.
This attribute is present only if the value exposed by it is the same
@ -486,7 +495,7 @@ argument is passed to the kernel in the command line.
``no_turbo``
If set (equal to 1), the driver is not allowed to set any turbo P-states
(see `Turbo P-states Support`_). If unset (equal to 0, which is the
(see :ref:`turbo`). If unset (equal to 0, which is the
default), turbo P-states can be set by the driver.
[Note that ``intel_pstate`` does not support the general ``boost``
attribute (supported by some other scaling drivers) which is replaced
@ -495,11 +504,11 @@ argument is passed to the kernel in the command line.
This attribute does not affect the maximum supported frequency value
supplied to the ``CPUFreq`` core and exposed via the policy interface,
but it affects the maximum possible value of per-policy P-state limits
(see `Interpretation of Policy Attributes`_ below for details).
(see :ref:`policy_attributes_interpretation` below for details).
``hwp_dynamic_boost``
This attribute is only present if ``intel_pstate`` works in the
`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
:ref:`active mode with the HWP feature enabled <active_mode_hwp>` in
the processor. If set (equal to 1), it causes the minimum P-state limit
to be increased dynamically for a short time whenever a task previously
waiting on I/O is selected to run on a given logical CPU (the purpose
@ -514,12 +523,12 @@ argument is passed to the kernel in the command line.
Operation mode of the driver: "active", "passive" or "off".
"active"
The driver is functional and in the `active mode
<Active Mode_>`_.
The driver is functional and in the :ref:`active mode
<active_mode>`.
"passive"
The driver is functional and in the `passive mode
<Passive Mode_>`_.
The driver is functional and in the :ref:`passive mode
<passive_mode>`.
"off"
The driver is not functional (it is not registered as a scaling
@ -547,13 +556,15 @@ argument is passed to the kernel in the command line.
attribute to "1" enables the energy-efficiency optimizations and setting
to "0" disables them.
.. _policy_attributes_interpretation:
Interpretation of Policy Attributes
-----------------------------------
The interpretation of some ``CPUFreq`` policy attributes described in
Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``
as the current scaling driver and it generally depends on the driver's
`operation mode <Operation Modes_>`_.
:ref:`operation mode <operation_modes>`.
First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
``scaling_cur_freq`` attributes are produced by applying a processor-specific
@ -562,9 +573,10 @@ Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
attributes are capped by the frequency corresponding to the maximum P-state that
the driver is allowed to set.
If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver
is not allowed to use turbo P-states, so the maximum value of
``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum
non-turbo P-state frequency.
Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
``scaling_min_freq`` to go down to that value if they were above it before.
However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
@ -576,7 +588,7 @@ and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
which also is the value of ``cpuinfo_max_freq`` in either case.
Next, the following policy attributes have special meaning if
``intel_pstate`` works in the `active mode <Active Mode_>`_:
``intel_pstate`` works in the :ref:`active mode <active_mode>`:
``scaling_available_governors``
List of P-state selection algorithms provided by ``intel_pstate``.
@ -597,20 +609,22 @@ processor:
Shows the base frequency of the CPU. Any frequency above this will be
in the turbo frequency range.
The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the
same as for other scaling drivers.
Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
depends on the operation mode of the driver. Namely, it is either
"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
`passive mode <Passive Mode_>`_).
"intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq"
(in the :ref:`passive mode <passive_mode>`).
.. _pstate_limits_coordination:
Coordination of P-State Limits
------------------------------
``intel_pstate`` allows P-state limits to be set in two ways: with the help of
the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes
<global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq``
``CPUFreq`` policy attributes. The coordination between those limits is based
on the following rules, regardless of the current operation mode of the driver:
@ -632,17 +646,18 @@ on the following rules, regardless of the current operation mode of the driver:
3. The global and per-policy limits can be set independently.
In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the
resulting effective values are written into hardware registers whenever the
limits change in order to request its internal P-state selection logic to always
set P-states within these limits. Otherwise, the limits are taken into account
by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
every time before setting a new P-state for a CPU.
by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the
driver every time before setting a new P-state for a CPU.
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
at all and the only way to set the limits is by using the policy attributes.
.. _energy_performance_hints:
Energy vs Performance Hints
---------------------------
@ -702,9 +717,9 @@ output.
On those systems each ``_PSS`` object returns a list of P-states supported by
the corresponding CPU which basically is a subset of the P-states range that can
be used by ``intel_pstate`` on the same system, with one exception: the whole
`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By
convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
than the frequency of the highest non-turbo P-state listed by it, but the
:ref:`turbo range <turbo>` is represented by one item in it (the topmost one).
By convention, the frequency returned by ``_PSS`` for that item is greater by
1 MHz than the frequency of the highest non-turbo P-state listed by it, but the
corresponding P-state representation (following the hardware specification)
returned for it matches the maximum supported turbo P-state (or is the
special value 255 meaning essentially "go as high as you can get").
@ -730,18 +745,18 @@ benefit from running at turbo frequencies will be given non-turbo P-states
instead.
One more issue related to that may appear on systems supporting the
`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
turbo threshold. Namely, if that is not coordinated with the lists of P-states
returned by ``_PSS`` properly, there may be more than one item corresponding to
a turbo P-state in those lists and there may be a problem with avoiding the
turbo range (if desirable or necessary). Usually, to avoid using turbo
P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
by ``_PSS``, but that is not sufficient when there are other turbo P-states in
the list returned by it.
:ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set
the turbo threshold. Namely, if that is not coordinated with the lists of
P-states returned by ``_PSS`` properly, there may be more than one item
corresponding to a turbo P-state in those lists and there may be a problem with
avoiding the turbo range (if desirable or necessary). Usually, to avoid using
turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state
listed by ``_PSS``, but that is not sufficient when there are other turbo
P-states in the list returned by it.
Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
`passive mode <Passive Mode_>`_, except that the number of P-states it can set
is limited to the ones listed by the ACPI ``_PSS`` objects.
:ref:`passive mode <passive_mode>`, except that the number of P-states it can
set is limited to the ones listed by the ACPI ``_PSS`` objects.
Kernel Command Line Options for ``intel_pstate``
@ -756,11 +771,11 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
processor is supported by it.
``active``
Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
with.
Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to
start with.
``passive``
Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to
start with.
``force``
@ -793,12 +808,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
and this option has no effect.
``per_cpu_perf_limits``
Use per-logical-CPU P-State limits (see `Coordination of P-state
Limits`_ for details).
Use per-logical-CPU P-State limits (see
:ref:`pstate_limits_coordination` for details).
``no_cas``
Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
default on hybrid systems without SMT.
Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled
by default on hybrid systems without SMT.
Diagnostics and Tuning
======================
@ -810,7 +825,7 @@ There are two static trace events that can be used for ``intel_pstate``
diagnostics. One of them is the ``cpu_frequency`` trace event generally used
by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if
it works in the `active mode <Active Mode_>`_.
it works in the :ref:`active mode <active_mode>`.
The following sequence of shell commands can be used to enable them and see
their output (if the kernel is generally configured to support event tracing)::
@ -822,7 +837,7 @@ their output (if the kernel is generally configured to support event tracing)::
gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the
``cpu_frequency`` trace event will be triggered either by the ``schedutil``
scaling governor (for the policies it is attached to), or by the ``CPUFreq``
core (for the policies with other scaling governors).

View File

@ -6,3 +6,4 @@ Thermal Subsystem
:maxdepth: 1
intel_powerclamp
intel_thermal_throttle

View File

@ -0,0 +1,91 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
=======================================
Intel thermal throttle events reporting
=======================================
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Introduction
------------
Intel processors have built in automatic and adaptive thermal monitoring
mechanisms that force the processor to reduce its power consumption in order
to operate within predetermined temperature limits.
Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and
IA-32 Architectures Software Developers Manual Volume 3 (3A, 3B, 3C, & 3D):
System Programming Guide" for more details.
In general, there are two mechanisms to control the core temperature of the
processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)".
The status of the temperature sensor that triggers the thermal monitor (TM1/TM2)
is indicated through the "thermal status flag" and "thermal status log flag" in
MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for
package level.
Thermal Status flag, bit 0 — When set, indicates that the processor core
temperature is currently at the trip temperature of the thermal monitor and that
the processor power consumption is being reduced via either TM1 or TM2, depending
on which is enabled. When clear, the flag indicates that the core temperature is
below the thermal monitor trip temperature. This flag is read only.
Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has
tripped since the last power-up or reset or since the last time that software
cleared this flag. This flag is a sticky bit; once set it remains set until
cleared by software or until a power-up or reset of the processor. The default
state is clear.
It is possible that when user reads MSR_IA32_THERM_STATUS or
MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case,
"Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set
to show any previous "TM1/TM2" activation. But since it needs to be cleared by
the software, it can't show the number of occurrences of "TM1/TM2" activations.
Hence, Linux provides counters of how many times the "Thermal Status flag" was
set. Also presents how long the "Thermal Status flag" was active in milliseconds.
Using these counters, users can check if the performance was limited because of
thermal events. It is recommended to read from sysfs instead of directly reading
MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate
control.
Sysfs Interface
---------------
Thermal throttling events are presented for each CPU under
"/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number.
All these counters are read-only. They can't be reset to 0. So, they can potentially
overflow after reaching the maximum 64 bit unsigned integer.
``core_throttle_count``
Shows the number of times "Thermal Status flag" changed from 0 to 1 for this
CPU since OS boot and thermal vector is initialized. This is a 64 bit counter.
``package_throttle_count``
Shows the number of times "Thermal Status flag" changed from 0 to 1 for the
package containing this CPU since OS boot and thermal vector is initialized.
Package status is broadcast to all CPUs; all CPUs in the package increment
this count. This is a 64-bit counter.
``core_throttle_max_time_ms``
Shows the maximum amount of time for which "Thermal Status flag" has been
set to 1 for this CPU at the core level since OS boot and thermal vector
is initialized.
``package_throttle_max_time_ms``
Shows the maximum amount of time for which "Thermal Status flag" has been
set to 1 for the package containing this CPU since OS boot and thermal
vector is initialized.
``core_throttle_total_time_ms``
Shows the cumulative time for which "Thermal Status flag" has been
set to 1 for this CPU for core level since OS boot and thermal vector
is initialized.
``package_throttle_total_time_ms``
Shows the cumulative time for which "Thermal Status flag" has been set
to 1 for the package containing this CPU since OS boot and thermal vector
is initialized.

View File

@ -391,13 +391,13 @@ Before jumping into the kernel, the following conditions must be met:
- SMCR_EL2.LEN must be initialised to the same value for all CPUs the
kernel will execute on.
- HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- HFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- HFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
- HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
- HFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
- HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
- HFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64):

View File

@ -402,6 +402,11 @@ The regset data starts with struct user_sve_header, containing:
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
if the target was not in streaming mode.
* On systems that do not support SVE it is permitted to use SETREGSET to
write SVE_PT_REGS_FPSIMD formatted data via NT_ARM_SVE, in this case the
vector length should be specified as 0. This allows streaming mode to be
disabled on systems with SME but not SVE.
* If any register data is provided along with SVE_PT_VL_ONEXEC then the
registers data will be interpreted with the current vector length, not
the vector length configured for use on exec.

View File

@ -243,9 +243,8 @@ Examples:
Changing the size of debug areas
------------------------------------
It is possible the change the size of debug areas through piping
the number of pages to the debugfs file "pages". The resize request will
also flush the debug areas.
To resize a debug area, write the desired page count to the "pages" file.
Existing data is preserved if it fits; otherwise, oldest entries are dropped.
Example:

View File

@ -39,6 +39,9 @@ properties:
- amlogic,a4-gpio-ao-intc
- amlogic,a5-gpio-intc
- amlogic,c3-gpio-intc
- amlogic,s6-gpio-intc
- amlogic,s7-gpio-intc
- amlogic,s7d-gpio-intc
- amlogic,t7-gpio-intc
- const: amlogic,meson-gpio-intc

View File

@ -25,13 +25,14 @@ properties:
interrupt-controller: true
'#interrupt-cells':
const: 2
const: 1
description:
The first cell is the IRQ number, the second cell is the trigger
type as defined in interrupt.txt in this directory.
interrupts:
maxItems: 6
minItems: 1
maxItems: 10
description: |
Depend to which INTC0 or INTC1 used.
INTC0 and INTC1 are two kinds of interrupt controller with enable and raw
@ -74,13 +75,17 @@ examples:
interrupt-controller@12101b00 {
compatible = "aspeed,ast2700-intc-ic";
reg = <0 0x12101b00 0 0x10>;
#interrupt-cells = <2>;
#interrupt-cells = <1>;
interrupt-controller;
interrupts = <GIC_SPI 192 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 194 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 195 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 196 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>;
<GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 199 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 200 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 201 IRQ_TYPE_LEVEL_HIGH>;
};
};

View File

@ -58,6 +58,7 @@ properties:
- const: andestech,nceplic100
- items:
- enum:
- anlogic,dr1v90-plic
- canaan,k210-plic
- eswin,eic7700-plic
- sifive,fu540-c000-plic
@ -75,6 +76,9 @@ properties:
- sophgo,sg2044-plic
- thead,th1520-plic
- const: thead,c900-plic
- items:
- const: ultrarisc,dp1000-plic
- const: ultrarisc,cp100-plic
- items:
- const: sifive,plic-1.0.0
- const: riscv,plic0

View File

@ -4,18 +4,23 @@
$id: http://devicetree.org/schemas/interrupt-controller/thead,c900-aclint-mswi.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Sophgo sg2042 CLINT Machine-level Software Interrupt Device
title: ACLINT Machine-level Software Interrupt Device
maintainers:
- Inochi Amaoto <inochiama@outlook.com>
properties:
compatible:
items:
- enum:
- sophgo,sg2042-aclint-mswi
- sophgo,sg2044-aclint-mswi
- const: thead,c900-aclint-mswi
oneOf:
- items:
- enum:
- sophgo,sg2042-aclint-mswi
- sophgo,sg2044-aclint-mswi
- const: thead,c900-aclint-mswi
- items:
- enum:
- anlogic,dr1v90-aclint-mswi
- const: nuclei,ux900-aclint-mswi
reg:
maxItems: 1

View File

@ -30,6 +30,10 @@ properties:
- const: thead,c900-aclint-sswi
- items:
- const: mips,p8700-aclint-sswi
- items:
- enum:
- anlogic,dr1v90-aclint-sswi
- const: nuclei,ux900-aclint-sswi
reg:
maxItems: 1

View File

@ -14,6 +14,7 @@ properties:
oneOf:
- enum:
- fsl,imx8-ddr-pmu
- fsl,imx8dxl-db-pmu
- fsl,imx8m-ddr-pmu
- fsl,imx8mq-ddr-pmu
- fsl,imx8mm-ddr-pmu
@ -28,7 +29,10 @@ properties:
- fsl,imx8mp-ddr-pmu
- const: fsl,imx8m-ddr-pmu
- items:
- const: fsl,imx8dxl-ddr-pmu
- enum:
- fsl,imx8dxl-ddr-pmu
- fsl,imx8qm-ddr-pmu
- fsl,imx8qxp-ddr-pmu
- const: fsl,imx8-ddr-pmu
- items:
- enum:
@ -43,6 +47,14 @@ properties:
interrupts:
maxItems: 1
clocks:
maxItems: 2
clock-names:
items:
- const: ipg
- const: cnt
required:
- compatible
- reg
@ -50,6 +62,21 @@ required:
additionalProperties: false
allOf:
- if:
properties:
compatible:
contains:
const: fsl,imx8dxl-db-pmu
then:
required:
- clocks
- clock-names
else:
properties:
clocks: false
clock-names: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>

View File

@ -0,0 +1,87 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/thermal/fsl,imx91-tmu.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: NXP i.MX91 Thermal
maintainers:
- Pengfei Li <pengfei.li_1@nxp.com>
description:
i.MX91 features a new temperature sensor. It includes programmable
temperature threshold comparators for both normal and privileged
accesses and allows a programmable measurement frequency for the
Periodic One-Shot Measurement mode. Additionally, it provides
status registers for indicating the end of measurement and threshold
violation events.
properties:
compatible:
items:
- const: fsl,imx91-tmu
reg:
maxItems: 1
clocks:
maxItems: 1
interrupts:
items:
- description: Comparator 1 irq
- description: Comparator 2 irq
- description: Data ready irq
interrupt-names:
items:
- const: thr1
- const: thr2
- const: ready
nvmem-cells:
items:
- description: Phandle to the trim control 1 provided by ocotp
- description: Phandle to the trim control 2 provided by ocotp
nvmem-cell-names:
items:
- const: trim1
- const: trim2
"#thermal-sensor-cells":
const: 0
required:
- compatible
- reg
- clocks
- interrupts
- interrupt-names
allOf:
- $ref: thermal-sensor.yaml
unevaluatedProperties: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
#include <dt-bindings/clock/imx93-clock.h>
thermal-sensor@44482000 {
compatible = "fsl,imx91-tmu";
reg = <0x44482000 0x1000>;
#thermal-sensor-cells = <0>;
clocks = <&clk IMX93_CLK_TMC_GATE>;
interrupt-parent = <&gic>;
interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "thr1", "thr2", "ready";
nvmem-cells = <&tmu_trim1>, <&tmu_trim2>;
nvmem-cell-names = "trim1", "trim2";
};
...

View File

@ -36,10 +36,15 @@ properties:
- qcom,msm8974-tsens
- const: qcom,tsens-v0_1
- description:
v1 of TSENS without RPM which requires to be explicitly reset
and enabled in the driver.
enum:
- qcom,ipq5018-tsens
- description: v1 of TSENS
items:
- enum:
- qcom,ipq5018-tsens
- qcom,msm8937-tsens
- qcom,msm8956-tsens
- qcom,msm8976-tsens
@ -50,11 +55,13 @@ properties:
items:
- enum:
- qcom,glymur-tsens
- qcom,kaanapali-tsens
- qcom,milos-tsens
- qcom,msm8953-tsens
- qcom,msm8996-tsens
- qcom,msm8998-tsens
- qcom,qcm2290-tsens
- qcom,qcs8300-tsens
- qcom,qcs615-tsens
- qcom,sa8255p-tsens
- qcom,sa8775p-tsens

View File

@ -16,7 +16,11 @@ description:
properties:
compatible:
const: renesas,r9a09g047-tsu
oneOf:
- const: renesas,r9a09g047-tsu # RZ/G3E
- items:
- const: renesas,r9a09g057-tsu # RZ/V2H
- const: renesas,r9a09g047-tsu # RZ/G3E
reg:
maxItems: 1

View File

@ -0,0 +1,47 @@
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/timer/realtek,rtd1625-systimer.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Realtek System Timer
maintainers:
- Hao-Wen Ting <haowen.ting@realtek.com>
description:
The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating
at a fixed 1MHz frequency. Thanks to its compare match interrupt capability,
the timer natively supports oneshot mode for tick broadcast functionality.
properties:
compatible:
oneOf:
- const: realtek,rtd1625-systimer
- items:
- const: realtek,rtd1635-systimer
- const: realtek,rtd1625-systimer
reg:
maxItems: 1
interrupts:
maxItems: 1
required:
- compatible
- reg
- interrupts
additionalProperties: false
examples:
- |
#include <dt-bindings/interrupt-controller/arm-gic.h>
timer@89420 {
compatible = "realtek,rtd1635-systimer",
"realtek,rtd1625-systimer";
reg = <0x89420 0x18>;
interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH>;
};

View File

@ -1705,6 +1705,8 @@ patternProperties:
description: Universal Scientific Industrial Co., Ltd.
"^usr,.*":
description: U.S. Robotics Corporation
"^ultrarisc,.*":
description: UltraRISC Technology Co., Ltd.
"^ultratronik,.*":
description: Ultratronik GmbH
"^utoo,.*":

View File

@ -409,3 +409,26 @@ based on the processor generation.
Limit 1 from being exhausted.
4 Unknown: Can't classify.
On processors starting from Panther Lake additional hints are provided.
The hardware analyzes workload residencies over an extended period to
determine whether the workload classification tends toward idle/battery
life states or sustained/performance states. Based on this long-term
analysis, it classifies:
Power Classification: If the workload exhibits more idle or battery life
residencies, it is classified as "power".
Performance Classification: If the workload exhibits more sustained or
performance residencies, it is classified as "performance".
This approach enables applications to ignore short-term workload
fluctuations and instead respond to longer-term power vs. performance
trends.
Residency thresholds for this classification are CPU generation-specific.
Classification is reported via bit 4 of the workload_type_index:
Bit 4 = 1: Power classification
Bit 4 = 0: Performance classification

View File

@ -17,17 +17,18 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
flag bits:
=============================================== ================================
RDT (Resource Director Technology) Allocation "rdt_a"
CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
=============================================================== ================================
RDT (Resource Director Technology) Allocation "rdt_a"
CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
ABMC (Assignable Bandwidth Monitoring Counters) ""
SDCIAE (Smart Data Cache Injection Allocation Enforcement) ""
=============================================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
resulted in the feature flags becoming hard to parse by humans. Adding a new
@ -72,6 +73,11 @@ The 'info' directory contains information about the enabled
resources. Each resource has its own subdirectory. The subdirectory
names reflect the resource names.
Most of the files in the resource's subdirectory are read-only, and
describe properties of the resource. Resources that support global
configuration options also include writable files that can be used
to modify those settings.
Each subdirectory contains the following files with respect to
allocation:
@ -90,12 +96,19 @@ related to allocation:
must be set when writing a mask.
"shareable_bits":
Bitmask of shareable resource with other executing
entities (e.g. I/O). User can use this when
setting up exclusive cache partitions. Note that
some platforms support devices that have their
own settings for cache use which can over-ride
these bits.
Bitmask of shareable resource with other executing entities
(e.g. I/O). Applies to all instances of this resource. User
can use this when setting up exclusive cache partitions.
Note that some platforms support devices that have their
own settings for cache use which can over-ride these bits.
When "io_alloc" is enabled, a portion of each cache instance can
be configured for shared use between hardware and software.
"bit_usage" should be used to see which portions of each cache
instance is configured for hardware use via "io_alloc" feature
because every cache instance can have its "io_alloc" bitmask
configured independently via "io_alloc_cbm".
"bit_usage":
Annotated capacity bitmasks showing how all
instances of the resource are used. The legend is:
@ -109,16 +122,16 @@ related to allocation:
"H":
Corresponding region is used by hardware only
but available for software use. If a resource
has bits set in "shareable_bits" but not all
of these bits appear in the resource groups'
schematas then the bits appearing in
"shareable_bits" but no resource group will
be marked as "H".
has bits set in "shareable_bits" or "io_alloc_cbm"
but not all of these bits appear in the resource
groups' schemata then the bits appearing in
"shareable_bits" or "io_alloc_cbm" but no
resource group will be marked as "H".
"X":
Corresponding region is available for sharing and
used by hardware and software. These are the
bits that appear in "shareable_bits" as
well as a resource group's allocation.
used by hardware and software. These are the bits
that appear in "shareable_bits" or "io_alloc_cbm"
as well as a resource group's allocation.
"S":
Corresponding region is used by software
and available for sharing.
@ -136,6 +149,77 @@ related to allocation:
"1":
Non-contiguous 1s value in CBM is supported.
"io_alloc":
"io_alloc" enables system software to configure the portion of
the cache allocated for I/O traffic. File may only exist if the
system supports this feature on some of its cache resources.
"disabled":
Resource supports "io_alloc" but the feature is disabled.
Portions of cache used for allocation of I/O traffic cannot
be configured.
"enabled":
Portions of cache used for allocation of I/O traffic
can be configured using "io_alloc_cbm".
"not supported":
Support not available for this resource.
The feature can be modified by writing to the interface, for example:
To enable::
# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
To disable::
# echo 0 > /sys/fs/resctrl/info/L3/io_alloc
The underlying implementation may reduce resources available to
general (CPU) cache allocation. See architecture specific notes
below. Depending on usage requirements the feature can be enabled
or disabled.
On AMD systems, io_alloc feature is supported by the L3 Smart
Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
io_alloc is the highest CLOSID supported by the resource. When
io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and
no longer available for general (CPU) cache allocation. When CDP is
enabled, io_alloc routes I/O traffic using the highest CLOSID allocated
for the instruction cache (CDP_CODE), making this CLOSID no longer
available for general (CPU) cache allocation for both the CDP_CODE
and CDP_DATA resources.
"io_alloc_cbm":
Capacity bitmasks that describe the portions of cache instances to
which I/O traffic from supported I/O devices are routed when "io_alloc"
is enabled.
CBMs are displayed in the following format:
<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
Example::
# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
0=ffff;1=ffff
CBMs can be configured by writing to the interface.
Example::
# echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm
# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
0=ffff;1=00ff
# echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm
# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
0=00ff;1=000f
When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE
resources may reflect the same values. For example, values read from and
written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by
/sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa.
Memory bandwidth(MB) subdirectory contains the following files
with respect to allocation:

View File

@ -0,0 +1,113 @@
# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
name: em
doc: |
Energy model netlink interface to notify its changes.
protocol: genetlink
uapi-header: linux/energy_model.h
attribute-sets:
-
name: pds
attributes:
-
name: pd
type: nest
nested-attributes: pd
multi-attr: true
-
name: pd
attributes:
-
name: pad
type: pad
-
name: pd-id
type: u32
-
name: flags
type: u64
-
name: cpus
type: string
-
name: pd-table
attributes:
-
name: pd-id
type: u32
-
name: ps
type: nest
nested-attributes: ps
multi-attr: true
-
name: ps
attributes:
-
name: pad
type: pad
-
name: performance
type: u64
-
name: frequency
type: u64
-
name: power
type: u64
-
name: cost
type: u64
-
name: flags
type: u64
operations:
list:
-
name: get-pds
attribute-set: pds
doc: Get the list of information for all performance domains.
do:
reply:
attributes:
- pd
-
name: get-pd-table
attribute-set: pd-table
doc: Get the energy model table of a performance domain.
do:
request:
attributes:
- pd-id
reply:
attributes:
- pd-id
- ps
-
name: pd-created
doc: A performance domain is created.
notify: get-pd-table
mcgrp: event
-
name: pd-updated
doc: A performance domain is updated.
notify: get-pd-table
mcgrp: event
-
name: pd-deleted
doc: A performance domain is deleted.
attribute-set: pd-table
event:
attributes:
- pd-id
mcgrp: event
mcast-groups:
list:
-
name: event

View File

@ -19,6 +19,7 @@ Power Management
power_supply_class
runtime_pm
s2ram
shutdown-debugging
suspend-and-cpuhotplug
suspend-and-interrupts
swsusp-and-swap-files

View File

@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
From user space:
The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system wakeup
latency QoS.
Only processes can register a PM QoS request. To provide for automatic
@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
parameter requests as follows.
To register the default PM QoS target for the CPU latency QoS, the process must
open /dev/cpu_dma_latency.
open /dev/cpu_dma_latency. To register a CPU system wakeup QoS limit, the
process must open /dev/cpu_wakeup_latency.
As long as the device node is held open that process has a registered
request on the parameter.
To change the requested target value, the process needs to write an s32 value to
the open device node. Alternatively, it can write a hex string for the value
using the 10 char long format e.g. "0x12345678". This translates to a
cpu_latency_qos_update_request() call.
using the 10 char long format e.g. "0x12345678".
To remove the user mode request for a target value simply close the device
node.

View File

@ -480,16 +480,6 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
`bool pm_runtime_status_suspended(struct device *dev);`
- return true if the device's runtime PM status is 'suspended'
`void pm_runtime_allow(struct device *dev);`
- set the power.runtime_auto flag for the device and decrease its usage
counter (used by the /sys/devices/.../power/control interface to
effectively allow the device to be power managed at run time)
`void pm_runtime_forbid(struct device *dev);`
- unset the power.runtime_auto flag for the device and increase its usage
counter (used by the /sys/devices/.../power/control interface to
effectively prevent the device from being power managed at run time)
`void pm_runtime_no_callbacks(struct device *dev);`
- set the power.no_callbacks flag for the device and remove the runtime
PM attributes from /sys/devices/.../power (or prevent them from being

View File

@ -0,0 +1,53 @@
.. SPDX-License-Identifier: GPL-2.0
Debugging Kernel Shutdown Hangs with pstore
+++++++++++++++++++++++++++++++++++++++++++
Overview
========
If the system hangs while shutting down, the kernel logs may need to be
retrieved to debug the issue.
On systems that have a UART available, it is best to configure the kernel to use
this UART for kernel console output.
If a UART isn't available, the ``pstore`` subsystem provides a mechanism to
persist this data across a system reset, allowing it to be retrieved on the next
boot.
Kernel Configuration
====================
To enable ``pstore`` and enable saving kernel ring buffer logs, set the
following kernel configuration options:
* ``CONFIG_PSTORE=y``
* ``CONFIG_PSTORE_CONSOLE=y``
Additionally, enable a backend to store the data. Depending upon your platform
some potential options include:
* ``CONFIG_EFI_VARS_PSTORE=y``
* ``CONFIG_PSTORE_RAM=y``
* ``CONFIG_CHROMEOS_PSTORE=y``
* ``CONFIG_PSTORE_BLK=y``
Kernel Command-line Parameters
==============================
Add these parameters to your kernel command line:
* ``printk.always_kmsg_dump=Y``
* Forces the kernel to dump the entire message buffer to pstore during
shutdown
* ``efi_pstore.pstore_disable=N``
* For EFI-based systems, ensures the EFI backend is active
Userspace Interaction and Log Retrieval
=======================================
On the next boot after a hang, pstore logs will be available in the pstore
filesystem (``/sys/fs/pstore``) and can be retrieved by userspace.
On systemd systems, the ``systemd-pstore`` service will help do the following:
#. Locate pstore data in ``/sys/fs/pstore``
#. Read and save it to ``/var/lib/systemd/pstore``
#. Clear pstore data for the next event

View File

@ -9188,6 +9188,9 @@ S: Maintained
F: kernel/power/energy_model.c
F: include/linux/energy_model.h
F: Documentation/power/energy-model.rst
F: Documentation/netlink/specs/em.yaml
F: include/uapi/linux/energy_model.h
F: kernel/power/em_netlink*.*
EPAPR HYPERVISOR BYTE CHANNEL DEVICE DRIVER
M: Laurentiu Tudor <laurentiu.tudor@nxp.com>
@ -17470,6 +17473,16 @@ S: Maintained
F: Documentation/devicetree/bindings/leds/backlight/mps,mp3309c.yaml
F: drivers/video/backlight/mp3309c.c
MPAM DRIVER
M: James Morse <james.morse@arm.com>
M: Ben Horgan <ben.horgan@arm.com>
R: Reinette Chatre <reinette.chatre@intel.com>
R: Fenghua Yu <fenghuay@nvidia.com>
S: Maintained
F: drivers/resctrl/mpam_*
F: drivers/resctrl/test_mpam_*
F: include/linux/arm_mpam.h
MPS MP2869 DRIVER
M: Wensheng Wang <wenswang@yeah.net>
L: linux-hwmon@vger.kernel.org
@ -21681,6 +21694,11 @@ S: Maintained
F: Documentation/devicetree/bindings/spi/realtek,rtl9301-snand.yaml
F: drivers/spi/spi-realtek-rtl-snand.c
REALTEK SYSTIMER DRIVER
M: Hao-Wen Ting <haowen.ting@realtek.com>
S: Maintained
F: drivers/clocksource/timer-realtek.c
REALTEK WIRELESS DRIVER (rtlwifi family)
M: Ping-Ke Shih <pkshih@realtek.com>
L: linux-wireless@vger.kernel.org

View File

@ -283,10 +283,17 @@ extern int __put_user_8(void *, unsigned long long);
__gu_err; \
})
/*
* This is a type: either unsigned long, if the argument fits into
* that type, or otherwise unsigned long long.
*/
#define __long_type(x) \
__typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
#define __get_user_err(x, ptr, err, __t) \
do { \
unsigned long __gu_addr = (unsigned long)(ptr); \
unsigned long __gu_val; \
__long_type(x) __gu_val; \
unsigned int __ua_flags; \
__chk_user_ptr(ptr); \
might_fault(); \
@ -295,6 +302,7 @@ do { \
case 1: __get_user_asm_byte(__gu_val, __gu_addr, err, __t); break; \
case 2: __get_user_asm_half(__gu_val, __gu_addr, err, __t); break; \
case 4: __get_user_asm_word(__gu_val, __gu_addr, err, __t); break; \
case 8: __get_user_asm_dword(__gu_val, __gu_addr, err, __t); break; \
default: (__gu_val) = __get_user_bad(); \
} \
uaccess_restore(__ua_flags); \
@ -353,6 +361,22 @@ do { \
#define __get_user_asm_word(x, addr, err, __t) \
__get_user_asm(x, addr, err, "ldr" __t)
#ifdef __ARMEB__
#define __WORD0_OFFS 4
#define __WORD1_OFFS 0
#else
#define __WORD0_OFFS 0
#define __WORD1_OFFS 4
#endif
#define __get_user_asm_dword(x, addr, err, __t) \
({ \
unsigned long __w0, __w1; \
__get_user_asm(__w0, addr + __WORD0_OFFS, err, "ldr" __t); \
__get_user_asm(__w1, addr + __WORD1_OFFS, err, "ldr" __t); \
(x) = ((u64)__w1 << 32) | (u64) __w0; \
})
#define __put_user_switch(x, ptr, __err, __fn) \
do { \
const __typeof__(*(ptr)) __user *__pu_ptr = (ptr); \

View File

@ -47,7 +47,6 @@ config ARM64
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select ARCH_STACKWALK
select ARCH_HAS_STRICT_KERNEL_RWX
@ -2023,6 +2022,31 @@ config ARM64_TLB_RANGE
ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
range of input addresses.
config ARM64_MPAM
bool "Enable support for MPAM"
select ARM64_MPAM_DRIVER if EXPERT # does nothing yet
select ACPI_MPAM if ACPI
help
Memory System Resource Partitioning and Monitoring (MPAM) is an
optional extension to the Arm architecture that allows each
transaction issued to the memory system to be labelled with a
Partition identifier (PARTID) and Performance Monitoring Group
identifier (PMG).
Memory system components, such as the caches, can be configured with
policies to control how much of various physical resources (such as
memory bandwidth or cache memory) the transactions labelled with each
PARTID can consume. Depending on the capabilities of the hardware,
the PARTID and PMG can also be used as filtering criteria to measure
the memory system resource consumption of different parts of a
workload.
Use of this extension requires CPU support, support in the
Memory System Components (MSC), and a description from firmware
of where the MSCs are in the address space.
MPAM is exposed to user-space via the resctrl pseudo filesystem.
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"

View File

@ -19,7 +19,7 @@
#error "cpucaps have overflown ARM64_CB_BIT"
#endif
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/stringify.h>
@ -207,7 +207,7 @@ alternative_endif
#define _ALTERNATIVE_CFG(insn1, insn2, cap, cfg, ...) \
alternative_insn insn1, insn2, cap, IS_ENABLED(cfg)
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
/*
* Usage: asm(ALTERNATIVE(oldinstr, newinstr, cpucap));
@ -219,7 +219,7 @@ alternative_endif
#define ALTERNATIVE(oldinstr, newinstr, ...) \
_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/types.h>
@ -263,6 +263,6 @@ alternative_has_cap_unlikely(const unsigned long cpucap)
return true;
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_ALTERNATIVE_MACROS_H */

View File

@ -4,7 +4,7 @@
#include <asm/alternative-macros.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/init.h>
#include <linux/types.h>
@ -37,5 +37,5 @@ static inline int apply_alternatives_module(void *start, size_t length)
void alt_cb_patch_nops(struct alt_instr *alt, __le32 *origptr,
__le32 *updptr, int nr_inst);
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_ALTERNATIVE_H */

View File

@ -9,7 +9,7 @@
#include <asm/sysreg.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/irqchip/arm-gic-common.h>
#include <linux/stringify.h>
@ -188,5 +188,5 @@ static inline bool gic_has_relaxed_pmr_sync(void)
return cpus_have_cap(ARM64_HAS_GIC_PRIO_RELAXED_SYNC);
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_ARCH_GICV3_H */

View File

@ -27,7 +27,7 @@
/* Data fields for EX_TYPE_UACCESS_CPY */
#define EX_DATA_UACCESS_WRITE BIT(0)
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
.pushsection __ex_table, "a"; \
@ -77,7 +77,7 @@
__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_CPY, \uaccess_is_write)
.endm
#else /* __ASSEMBLY__ */
#else /* __ASSEMBLER__ */
#include <linux/stringify.h>
@ -132,6 +132,6 @@
EX_DATA_REG(ADDR, addr) \
")")
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_ASM_EXTABLE_H */

View File

@ -5,7 +5,7 @@
* Copyright (C) 1996-2000 Russell King
* Copyright (C) 2012 ARM Ltd.
*/
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#error "Only include this from assembly code"
#endif
@ -325,14 +325,14 @@ alternative_cb_end
* tcr_set_t0sz - update TCR.T0SZ so that we can load the ID map
*/
.macro tcr_set_t0sz, valreg, t0sz
bfi \valreg, \t0sz, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH
bfi \valreg, \t0sz, #TCR_EL1_T0SZ_SHIFT, #TCR_EL1_T0SZ_WIDTH
.endm
/*
* tcr_set_t1sz - update TCR.T1SZ
*/
.macro tcr_set_t1sz, valreg, t1sz
bfi \valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH
bfi \valreg, \t1sz, #TCR_EL1_T1SZ_SHIFT, #TCR_EL1_T1SZ_WIDTH
.endm
/*
@ -371,7 +371,7 @@ alternative_endif
* [start, end) with dcache line size explicitly provided.
*
* op: operation passed to dc instruction
* domain: domain used in dsb instruciton
* domain: domain used in dsb instruction
* start: starting virtual address of the region
* end: end virtual address of the region
* linesz: dcache line size
@ -412,7 +412,7 @@ alternative_endif
* [start, end)
*
* op: operation passed to dc instruction
* domain: domain used in dsb instruciton
* domain: domain used in dsb instruction
* start: starting virtual address of the region
* end: end virtual address of the region
* fixup: optional label to branch to on user fault
@ -589,7 +589,7 @@ alternative_endif
.macro offset_ttbr1, ttbr, tmp
#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)
mrs \tmp, tcr_el1
and \tmp, \tmp, #TCR_T1SZ_MASK
and \tmp, \tmp, #TCR_EL1_T1SZ_MASK
cmp \tmp, #TCR_T1SZ(VA_BITS_MIN)
orr \tmp, \ttbr, #TTBR1_BADDR_4852_OFFSET
csel \ttbr, \tmp, \ttbr, eq

View File

@ -103,17 +103,17 @@ static __always_inline void __lse_atomic_and(int i, atomic_t *v)
return __lse_atomic_andnot(~i, v);
}
#define ATOMIC_FETCH_OP_AND(name, mb, cl...) \
#define ATOMIC_FETCH_OP_AND(name) \
static __always_inline int \
__lse_atomic_fetch_and##name(int i, atomic_t *v) \
{ \
return __lse_atomic_fetch_andnot##name(~i, v); \
}
ATOMIC_FETCH_OP_AND(_relaxed, )
ATOMIC_FETCH_OP_AND(_acquire, a, "memory")
ATOMIC_FETCH_OP_AND(_release, l, "memory")
ATOMIC_FETCH_OP_AND( , al, "memory")
ATOMIC_FETCH_OP_AND(_relaxed)
ATOMIC_FETCH_OP_AND(_acquire)
ATOMIC_FETCH_OP_AND(_release)
ATOMIC_FETCH_OP_AND( )
#undef ATOMIC_FETCH_OP_AND
@ -210,17 +210,17 @@ static __always_inline void __lse_atomic64_and(s64 i, atomic64_t *v)
return __lse_atomic64_andnot(~i, v);
}
#define ATOMIC64_FETCH_OP_AND(name, mb, cl...) \
#define ATOMIC64_FETCH_OP_AND(name) \
static __always_inline long \
__lse_atomic64_fetch_and##name(s64 i, atomic64_t *v) \
{ \
return __lse_atomic64_fetch_andnot##name(~i, v); \
}
ATOMIC64_FETCH_OP_AND(_relaxed, )
ATOMIC64_FETCH_OP_AND(_acquire, a, "memory")
ATOMIC64_FETCH_OP_AND(_release, l, "memory")
ATOMIC64_FETCH_OP_AND( , al, "memory")
ATOMIC64_FETCH_OP_AND(_relaxed)
ATOMIC64_FETCH_OP_AND(_acquire)
ATOMIC64_FETCH_OP_AND(_release)
ATOMIC64_FETCH_OP_AND( )
#undef ATOMIC64_FETCH_OP_AND

View File

@ -7,7 +7,7 @@
#ifndef __ASM_BARRIER_H
#define __ASM_BARRIER_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/kasan-checks.h>
@ -221,6 +221,6 @@ do { \
#include <asm-generic/barrier.h>
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_BARRIER_H */

View File

@ -35,7 +35,7 @@
#define ARCH_DMA_MINALIGN (128)
#define ARCH_KMALLOC_MINALIGN (8)
#if !defined(__ASSEMBLY__) && !defined(BUILD_VDSO)
#if !defined(__ASSEMBLER__) && !defined(BUILD_VDSO)
#include <linux/bitops.h>
#include <linux/kasan-enabled.h>
@ -135,6 +135,6 @@ static inline u32 __attribute_const__ read_cpuid_effective_cachetype(void)
return ctr;
}
#endif /* !defined(__ASSEMBLY__) && !defined(BUILD_VDSO) */
#endif /* !defined(__ASSEMBLER__) && !defined(BUILD_VDSO) */
#endif

View File

@ -5,7 +5,7 @@
#include <asm/cpucap-defs.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/types.h>
/*
* Check whether a cpucap is possible at compiletime.
@ -77,6 +77,6 @@ cpucap_is_possible(const unsigned int cap)
return true;
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_CPUCAPS_H */

View File

@ -19,7 +19,7 @@
#define ARM64_SW_FEATURE_OVERRIDE_HVHE 4
#define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF 8
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bug.h>
#include <linux/jump_label.h>
@ -199,7 +199,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
* registers (e.g, SCTLR, TCR etc.) or patching the kernel via
* alternatives. The kernel patching is batched and performed at later
* point. The actions are always initiated only after the capability
* is finalised. This is usally denoted by "enabling" the capability.
* is finalised. This is usually denoted by "enabling" the capability.
* The actions are initiated as follows :
* a) Action is triggered on all online CPUs, after the capability is
* finalised, invoked within the stop_machine() context from
@ -251,7 +251,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;
#define ARM64_CPUCAP_SCOPE_LOCAL_CPU ((u16)BIT(0))
#define ARM64_CPUCAP_SCOPE_SYSTEM ((u16)BIT(1))
/*
* The capabilitiy is detected on the Boot CPU and is used by kernel
* The capability is detected on the Boot CPU and is used by kernel
* during early boot. i.e, the capability should be "detected" and
* "enabled" as early as possibly on all booting CPUs.
*/
@ -1078,6 +1078,6 @@ static inline bool cpu_has_lpa2(void)
#endif
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif

View File

@ -247,9 +247,9 @@
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
#define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX
#define MIDR_FUJITSU_ERRATUM_010001_MASK (~MIDR_CPU_VAR_REV(1, 0))
#define TCR_CLEAR_FUJITSU_ERRATUM_010001 (TCR_NFD1 | TCR_NFD0)
#define TCR_CLEAR_FUJITSU_ERRATUM_010001 (TCR_EL1_NFD1 | TCR_EL1_NFD0)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/sysreg.h>
@ -328,6 +328,6 @@ static inline u32 __attribute_const__ read_cpuid_cachetype(void)
{
return read_cpuid(CTR_EL0);
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif

View File

@ -4,7 +4,7 @@
#include <linux/compiler.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
struct task_struct;
@ -23,7 +23,7 @@ static __always_inline struct task_struct *get_current(void)
#define current get_current()
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_CURRENT_H */

View File

@ -48,7 +48,7 @@
#define AARCH32_BREAK_THUMB2_LO 0xf7f0
#define AARCH32_BREAK_THUMB2_HI 0xa000
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
struct task_struct;
#define DBG_ARCH_ID_RESERVED 0 /* In case of ptrace ABI updates. */
@ -88,5 +88,5 @@ static inline bool try_step_suspended_breakpoints(struct pt_regs *regs)
bool try_handle_aarch32_break(struct pt_regs *regs);
#endif /* __ASSEMBLY */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_DEBUG_MONITORS_H */

View File

@ -126,21 +126,14 @@ static inline void efi_set_pgd(struct mm_struct *mm)
if (mm != current->active_mm) {
/*
* Update the current thread's saved ttbr0 since it is
* restored as part of a return from exception. Enable
* access to the valid TTBR0_EL1 and invoke the errata
* workaround directly since there is no return from
* exception when invoking the EFI run-time services.
* restored as part of a return from exception.
*/
update_saved_ttbr0(current, mm);
uaccess_ttbr0_enable();
post_ttbr_update_workaround();
} else {
/*
* Defer the switch to the current thread's TTBR0_EL1
* until uaccess_enable(). Restore the current
* thread's saved ttbr0 corresponding to its active_mm
* Restore the current thread's saved ttbr0
* corresponding to its active_mm
*/
uaccess_ttbr0_disable();
update_saved_ttbr0(current, current->active_mm);
}
}

View File

@ -7,7 +7,7 @@
#ifndef __ARM_KVM_INIT_H__
#define __ARM_KVM_INIT_H__
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#error Assembly-only header
#endif
@ -24,7 +24,7 @@
* ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it
* can reset into an UNKNOWN state and might not read as 1 until it has
* been initialized explicitly.
* Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
* Initialize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H
* indicating whether the CPU is running in E2H mode.
*/
mrs_s x1, SYS_ID_AA64MMFR4_EL1

View File

@ -133,7 +133,7 @@
#define ELF_ET_DYN_BASE (2 * DEFAULT_MAP_WINDOW_64 / 3)
#endif /* CONFIG_ARM64_FORCE_52BIT */
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <uapi/linux/elf.h>
#include <linux/bug.h>
@ -293,6 +293,6 @@ static inline int arch_check_elf(void *ehdr, bool has_interp,
return 0;
}
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif

View File

@ -431,7 +431,7 @@
#define ESR_ELx_IT_GCSPOPCX 6
#define ESR_ELx_IT_GCSPOPX 7
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/types.h>
static inline unsigned long esr_brk_comment(unsigned long esr)
@ -534,6 +534,6 @@ static inline bool esr_iss_is_eretab(unsigned long esr)
}
const char *esr_get_class_string(unsigned long esr);
#endif /* __ASSEMBLY */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_ESR_H */

View File

@ -15,7 +15,7 @@
#ifndef _ASM_ARM64_FIXMAP_H
#define _ASM_ARM64_FIXMAP_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/kernel.h>
#include <linux/math.h>
#include <linux/sizes.h>
@ -117,5 +117,5 @@ extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t pr
#include <asm-generic/fixmap.h>
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* _ASM_ARM64_FIXMAP_H */

View File

@ -12,7 +12,7 @@
#include <asm/sigcontext.h>
#include <asm/sysreg.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bitmap.h>
#include <linux/build_bug.h>

View File

@ -37,7 +37,7 @@
*/
#define ARCH_FTRACE_SHIFT_STACK_TRACER 1
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/compat.h>
extern void _mcount(unsigned long);
@ -217,9 +217,9 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
*/
return !strcmp(sym + 8, name);
}
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASSEMBLER__ */
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,

View File

@ -2,7 +2,7 @@
#ifndef __ASM_GPR_NUM_H
#define __ASM_GPR_NUM_H
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
.irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
.equ .L__gpr_num_x\num, \num
@ -11,7 +11,7 @@
.equ .L__gpr_num_xzr, 31
.equ .L__gpr_num_wzr, 31
#else /* __ASSEMBLY__ */
#else /* __ASSEMBLER__ */
#define __DEFINE_ASM_GPR_NUMS \
" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \
@ -21,6 +21,6 @@
" .equ .L__gpr_num_xzr, 31\n" \
" .equ .L__gpr_num_wzr, 31\n"
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_GPR_NUM_H */

View File

@ -46,7 +46,7 @@
#define COMPAT_HWCAP2_SB (1 << 5)
#define COMPAT_HWCAP2_SSBS (1 << 6)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/log2.h>
/*

View File

@ -20,7 +20,7 @@
#define ARM64_IMAGE_FLAG_PAGE_SIZE_64K 3
#define ARM64_IMAGE_FLAG_PHYS_BASE 1
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#define arm64_image_flag_field(flags, field) \
(((flags) >> field##_SHIFT) & field##_MASK)
@ -54,6 +54,6 @@ struct arm64_image_header {
__le32 res5;
};
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_IMAGE_H */

View File

@ -12,7 +12,7 @@
#include <asm/insn-def.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
enum aarch64_insn_hint_cr_op {
AARCH64_INSN_HINT_NOP = 0x0 << 5,
@ -730,6 +730,6 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn);
typedef bool (pstate_check_t)(unsigned long);
extern pstate_check_t * const aarch32_opcode_cond_checks[16];
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_INSN_H */

View File

@ -8,7 +8,7 @@
#ifndef __ASM_JUMP_LABEL_H
#define __ASM_JUMP_LABEL_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/types.h>
#include <asm/insn.h>
@ -58,5 +58,5 @@ static __always_inline bool arch_static_branch_jump(struct static_key * const ke
return true;
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_JUMP_LABEL_H */

View File

@ -2,7 +2,7 @@
#ifndef __ASM_KASAN_H
#define __ASM_KASAN_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/linkage.h>
#include <asm/memory.h>

View File

@ -25,7 +25,7 @@
#define KEXEC_ARCH KEXEC_ARCH_AARCH64
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
/**
* crash_setup_regs() - save registers for the panic kernel
@ -130,6 +130,6 @@ extern int load_other_segments(struct kimage *image,
char *cmdline);
#endif
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif

View File

@ -14,7 +14,7 @@
#include <linux/ptrace.h>
#include <asm/debug-monitors.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
static inline void arch_kgdb_breakpoint(void)
{
@ -36,7 +36,7 @@ static inline int kgdb_single_step_handler(struct pt_regs *regs,
}
#endif
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
/*
* gdb remote procotol (well most versions of it) expects the following

View File

@ -46,7 +46,7 @@
#define __KVM_HOST_SMCCC_FUNC___kvm_hyp_init 0
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/mm.h>
@ -303,7 +303,7 @@ void kvm_compute_final_ctr_el0(struct alt_instr *alt,
void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt,
u64 elr_phys, u64 par, uintptr_t vcpu, u64 far, u64 hpfar);
#else /* __ASSEMBLY__ */
#else /* __ASSEMBLER__ */
.macro get_host_ctxt reg, tmp
adr_this_cpu \reg, kvm_host_data, \tmp

View File

@ -49,7 +49,7 @@
* mappings, and none of this applies in that case.
*/
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#include <asm/alternative.h>
@ -396,5 +396,5 @@ void kvm_s2_ptdump_create_debugfs(struct kvm *kvm);
static inline void kvm_s2_ptdump_create_debugfs(struct kvm *kvm) {}
#endif /* CONFIG_PTDUMP_STAGE2_DEBUGFS */
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ARM64_KVM_MMU_H__ */

View File

@ -5,7 +5,7 @@
#ifndef __ASM_KVM_MTE_H
#define __ASM_KVM_MTE_H
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#include <asm/sysreg.h>
@ -62,5 +62,5 @@ alternative_else_nop_endif
.endm
#endif /* CONFIG_ARM64_MTE */
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_KVM_MTE_H */

View File

@ -8,7 +8,7 @@
#ifndef __ASM_KVM_PTRAUTH_H
#define __ASM_KVM_PTRAUTH_H
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#include <asm/sysreg.h>
@ -100,7 +100,7 @@ alternative_else_nop_endif
.endm
#endif /* CONFIG_ARM64_PTR_AUTH */
#else /* !__ASSEMBLY */
#else /* !__ASSEMBLER__ */
#define __ptrauth_save_key(ctxt, key) \
do { \
@ -120,5 +120,5 @@ alternative_else_nop_endif
__ptrauth_save_key(ctxt, APGA); \
} while(0)
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_KVM_PTRAUTH_H */

View File

@ -1,7 +1,7 @@
#ifndef __ASM_LINKAGE_H
#define __ASM_LINKAGE_H
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#include <asm/assembler.h>
#endif

View File

@ -207,7 +207,7 @@
*/
#define TRAMP_SWAPPER_OFFSET (2 * PAGE_SIZE)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bitops.h>
#include <linux/compiler.h>
@ -392,7 +392,6 @@ static inline unsigned long virt_to_pfn(const void *kaddr)
* virt_to_page(x) convert a _valid_ virtual address to struct page *
* virt_addr_valid(x) indicates whether a virtual address is valid
*/
#define ARCH_PFN_OFFSET ((unsigned long)PHYS_PFN_OFFSET)
#if defined(CONFIG_DEBUG_VIRTUAL)
#define page_to_virt(x) ({ \
@ -422,7 +421,7 @@ static inline unsigned long virt_to_pfn(const void *kaddr)
})
void dump_mem_limit(void);
#endif /* !ASSEMBLY */
#endif /* !__ASSEMBLER__ */
/*
* Given that the GIC architecture permits ITS implementations that can only be

View File

@ -12,7 +12,7 @@
#define USER_ASID_FLAG (UL(1) << USER_ASID_BIT)
#define TTBR_ASID_MASK (UL(0xffff) << 48)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/refcount.h>
#include <asm/cpufeature.h>
@ -112,5 +112,5 @@ void kpti_install_ng_mappings(void);
static inline void kpti_install_ng_mappings(void) {}
#endif
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif

View File

@ -8,7 +8,7 @@
#ifndef __ASM_MMU_CONTEXT_H
#define __ASM_MMU_CONTEXT_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/compiler.h>
#include <linux/sched.h>
@ -61,11 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)
cpu_do_switch_mm(virt_to_phys(pgd),mm);
}
/*
* TCR.T0SZ value to use when the ID map is active.
*/
#define idmap_t0sz TCR_T0SZ(IDMAP_VA_BITS)
/*
* Ensure TCR.T0SZ is set to the provided value.
*/
@ -73,18 +68,15 @@ static inline void __cpu_set_tcr_t0sz(unsigned long t0sz)
{
unsigned long tcr = read_sysreg(tcr_el1);
if ((tcr & TCR_T0SZ_MASK) == t0sz)
if ((tcr & TCR_EL1_T0SZ_MASK) == t0sz)
return;
tcr &= ~TCR_T0SZ_MASK;
tcr &= ~TCR_EL1_T0SZ_MASK;
tcr |= t0sz;
write_sysreg(tcr, tcr_el1);
isb();
}
#define cpu_set_default_tcr_t0sz() __cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual))
#define cpu_set_idmap_tcr_t0sz() __cpu_set_tcr_t0sz(idmap_t0sz)
/*
* Remove the idmap from TTBR0_EL1 and install the pgd of the active mm.
*
@ -103,7 +95,7 @@ static inline void cpu_uninstall_idmap(void)
cpu_set_reserved_ttbr0();
local_flush_tlb_all();
cpu_set_default_tcr_t0sz();
__cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual));
if (mm != &init_mm && !system_uses_ttbr0_pan())
cpu_switch_mm(mm->pgd, mm);
@ -113,7 +105,7 @@ static inline void cpu_install_idmap(void)
{
cpu_set_reserved_ttbr0();
local_flush_tlb_all();
cpu_set_idmap_tcr_t0sz();
__cpu_set_tcr_t0sz(TCR_T0SZ(IDMAP_VA_BITS));
cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);
}
@ -330,6 +322,6 @@ static inline void deactivate_mm(struct task_struct *tsk,
#include <asm-generic/mmu_context.h>
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* !__ASM_MMU_CONTEXT_H */

View File

@ -9,7 +9,7 @@
#include <asm/cputype.h>
#include <asm/mte-def.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/types.h>
@ -259,6 +259,6 @@ static inline int mte_enable_kernel_store_only(void)
#endif /* CONFIG_ARM64_MTE */
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_MTE_KASAN_H */

View File

@ -8,7 +8,7 @@
#include <asm/compiler.h>
#include <asm/mte-def.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bitfield.h>
#include <linux/kasan-enabled.h>
@ -282,5 +282,5 @@ static inline void mte_check_tfsr_exit(void)
}
#endif /* CONFIG_KASAN_HW_TAGS */
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_MTE_H */

View File

@ -10,7 +10,7 @@
#include <asm/page-def.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/personality.h> /* for READ_IMPLIES_EXEC */
#include <linux/types.h> /* for gfp_t */
@ -45,7 +45,7 @@ int pfn_is_map_memory(unsigned long pfn);
#include <asm/memory.h>
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)

View File

@ -228,102 +228,53 @@
/*
* TCR flags.
*/
#define TCR_T0SZ_OFFSET 0
#define TCR_T1SZ_OFFSET 16
#define TCR_T0SZ(x) ((UL(64) - (x)) << TCR_T0SZ_OFFSET)
#define TCR_T1SZ(x) ((UL(64) - (x)) << TCR_T1SZ_OFFSET)
#define TCR_TxSZ(x) (TCR_T0SZ(x) | TCR_T1SZ(x))
#define TCR_TxSZ_WIDTH 6
#define TCR_T0SZ_MASK (((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T0SZ_OFFSET)
#define TCR_T1SZ_MASK (((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T1SZ_OFFSET)
#define TCR_T0SZ(x) ((UL(64) - (x)) << TCR_EL1_T0SZ_SHIFT)
#define TCR_T1SZ(x) ((UL(64) - (x)) << TCR_EL1_T1SZ_SHIFT)
#define TCR_EPD0_SHIFT 7
#define TCR_EPD0_MASK (UL(1) << TCR_EPD0_SHIFT)
#define TCR_IRGN0_SHIFT 8
#define TCR_IRGN0_MASK (UL(3) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_NC (UL(0) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WBWA (UL(1) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WT (UL(2) << TCR_IRGN0_SHIFT)
#define TCR_IRGN0_WBnWA (UL(3) << TCR_IRGN0_SHIFT)
#define TCR_T0SZ_MASK TCR_EL1_T0SZ_MASK
#define TCR_T1SZ_MASK TCR_EL1_T1SZ_MASK
#define TCR_EPD1_SHIFT 23
#define TCR_EPD1_MASK (UL(1) << TCR_EPD1_SHIFT)
#define TCR_IRGN1_SHIFT 24
#define TCR_IRGN1_MASK (UL(3) << TCR_IRGN1_SHIFT)
#define TCR_IRGN1_NC (UL(0) << TCR_IRGN1_SHIFT)
#define TCR_IRGN1_WBWA (UL(1) << TCR_IRGN1_SHIFT)
#define TCR_IRGN1_WT (UL(2) << TCR_IRGN1_SHIFT)
#define TCR_IRGN1_WBnWA (UL(3) << TCR_IRGN1_SHIFT)
#define TCR_EPD0_MASK TCR_EL1_EPD0_MASK
#define TCR_EPD1_MASK TCR_EL1_EPD1_MASK
#define TCR_IRGN_NC (TCR_IRGN0_NC | TCR_IRGN1_NC)
#define TCR_IRGN_WBWA (TCR_IRGN0_WBWA | TCR_IRGN1_WBWA)
#define TCR_IRGN_WT (TCR_IRGN0_WT | TCR_IRGN1_WT)
#define TCR_IRGN_WBnWA (TCR_IRGN0_WBnWA | TCR_IRGN1_WBnWA)
#define TCR_IRGN_MASK (TCR_IRGN0_MASK | TCR_IRGN1_MASK)
#define TCR_IRGN0_MASK TCR_EL1_IRGN0_MASK
#define TCR_IRGN0_WBWA (TCR_EL1_IRGN0_WBWA << TCR_EL1_IRGN0_SHIFT)
#define TCR_ORGN0_MASK TCR_EL1_ORGN0_MASK
#define TCR_ORGN0_WBWA (TCR_EL1_ORGN0_WBWA << TCR_EL1_ORGN0_SHIFT)
#define TCR_ORGN0_SHIFT 10
#define TCR_ORGN0_MASK (UL(3) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_NC (UL(0) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WBWA (UL(1) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WT (UL(2) << TCR_ORGN0_SHIFT)
#define TCR_ORGN0_WBnWA (UL(3) << TCR_ORGN0_SHIFT)
#define TCR_SH0_MASK TCR_EL1_SH0_MASK
#define TCR_SH0_INNER (TCR_EL1_SH0_INNER << TCR_EL1_SH0_SHIFT)
#define TCR_ORGN1_SHIFT 26
#define TCR_ORGN1_MASK (UL(3) << TCR_ORGN1_SHIFT)
#define TCR_ORGN1_NC (UL(0) << TCR_ORGN1_SHIFT)
#define TCR_ORGN1_WBWA (UL(1) << TCR_ORGN1_SHIFT)
#define TCR_ORGN1_WT (UL(2) << TCR_ORGN1_SHIFT)
#define TCR_ORGN1_WBnWA (UL(3) << TCR_ORGN1_SHIFT)
#define TCR_SH1_MASK TCR_EL1_SH1_MASK
#define TCR_ORGN_NC (TCR_ORGN0_NC | TCR_ORGN1_NC)
#define TCR_ORGN_WBWA (TCR_ORGN0_WBWA | TCR_ORGN1_WBWA)
#define TCR_ORGN_WT (TCR_ORGN0_WT | TCR_ORGN1_WT)
#define TCR_ORGN_WBnWA (TCR_ORGN0_WBnWA | TCR_ORGN1_WBnWA)
#define TCR_ORGN_MASK (TCR_ORGN0_MASK | TCR_ORGN1_MASK)
#define TCR_TG0_SHIFT TCR_EL1_TG0_SHIFT
#define TCR_TG0_MASK TCR_EL1_TG0_MASK
#define TCR_TG0_4K (TCR_EL1_TG0_4K << TCR_EL1_TG0_SHIFT)
#define TCR_TG0_64K (TCR_EL1_TG0_64K << TCR_EL1_TG0_SHIFT)
#define TCR_TG0_16K (TCR_EL1_TG0_16K << TCR_EL1_TG0_SHIFT)
#define TCR_SH0_SHIFT 12
#define TCR_SH0_MASK (UL(3) << TCR_SH0_SHIFT)
#define TCR_SH0_INNER (UL(3) << TCR_SH0_SHIFT)
#define TCR_TG1_SHIFT TCR_EL1_TG1_SHIFT
#define TCR_TG1_MASK TCR_EL1_TG1_MASK
#define TCR_TG1_16K (TCR_EL1_TG1_16K << TCR_EL1_TG1_SHIFT)
#define TCR_TG1_4K (TCR_EL1_TG1_4K << TCR_EL1_TG1_SHIFT)
#define TCR_TG1_64K (TCR_EL1_TG1_64K << TCR_EL1_TG1_SHIFT)
#define TCR_SH1_SHIFT 28
#define TCR_SH1_MASK (UL(3) << TCR_SH1_SHIFT)
#define TCR_SH1_INNER (UL(3) << TCR_SH1_SHIFT)
#define TCR_SHARED (TCR_SH0_INNER | TCR_SH1_INNER)
#define TCR_TG0_SHIFT 14
#define TCR_TG0_MASK (UL(3) << TCR_TG0_SHIFT)
#define TCR_TG0_4K (UL(0) << TCR_TG0_SHIFT)
#define TCR_TG0_64K (UL(1) << TCR_TG0_SHIFT)
#define TCR_TG0_16K (UL(2) << TCR_TG0_SHIFT)
#define TCR_TG1_SHIFT 30
#define TCR_TG1_MASK (UL(3) << TCR_TG1_SHIFT)
#define TCR_TG1_16K (UL(1) << TCR_TG1_SHIFT)
#define TCR_TG1_4K (UL(2) << TCR_TG1_SHIFT)
#define TCR_TG1_64K (UL(3) << TCR_TG1_SHIFT)
#define TCR_IPS_SHIFT 32
#define TCR_IPS_MASK (UL(7) << TCR_IPS_SHIFT)
#define TCR_A1 (UL(1) << 22)
#define TCR_ASID16 (UL(1) << 36)
#define TCR_TBI0 (UL(1) << 37)
#define TCR_TBI1 (UL(1) << 38)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)
#define TCR_HPD0_SHIFT 41
#define TCR_HPD0 (UL(1) << TCR_HPD0_SHIFT)
#define TCR_HPD1_SHIFT 42
#define TCR_HPD1 (UL(1) << TCR_HPD1_SHIFT)
#define TCR_TBID0 (UL(1) << 51)
#define TCR_TBID1 (UL(1) << 52)
#define TCR_NFD0 (UL(1) << 53)
#define TCR_NFD1 (UL(1) << 54)
#define TCR_E0PD0 (UL(1) << 55)
#define TCR_E0PD1 (UL(1) << 56)
#define TCR_TCMA0 (UL(1) << 57)
#define TCR_TCMA1 (UL(1) << 58)
#define TCR_DS (UL(1) << 59)
#define TCR_IPS_SHIFT TCR_EL1_IPS_SHIFT
#define TCR_IPS_MASK TCR_EL1_IPS_MASK
#define TCR_A1 TCR_EL1_A1
#define TCR_ASID16 TCR_EL1_AS
#define TCR_TBI0 TCR_EL1_TBI0
#define TCR_TBI1 TCR_EL1_TBI1
#define TCR_HA TCR_EL1_HA
#define TCR_HD TCR_EL1_HD
#define TCR_HPD0 TCR_EL1_HPD0
#define TCR_HPD1 TCR_EL1_HPD1
#define TCR_TBID0 TCR_EL1_TBID0
#define TCR_TBID1 TCR_EL1_TBID1
#define TCR_E0PD0 TCR_EL1_E0PD0
#define TCR_E0PD1 TCR_EL1_E0PD1
#define TCR_DS TCR_EL1_DS
/*
* TTBR.

View File

@ -62,7 +62,7 @@
#define _PAGE_READONLY_EXEC (_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)
#define _PAGE_EXECONLY (_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/cpufeature.h>
#include <asm/pgtable-types.h>
@ -84,7 +84,7 @@ extern unsigned long prot_ns_shared;
#else
static inline bool __pure lpa2_is_enabled(void)
{
return read_tcr() & TCR_DS;
return read_tcr() & TCR_EL1_DS;
}
#define PTE_MAYBE_SHARED (lpa2_is_enabled() ? 0 : PTE_SHARED)
@ -127,7 +127,7 @@ static inline bool __pure lpa2_is_enabled(void)
#define PAGE_READONLY_EXEC __pgprot(_PAGE_READONLY_EXEC)
#define PAGE_EXECONLY __pgprot(_PAGE_EXECONLY)
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#define pte_pi_index(pte) ( \
((pte & BIT(PTE_PI_IDX_3)) >> (PTE_PI_IDX_3 - 3)) | \

View File

@ -30,7 +30,7 @@
#define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/cmpxchg.h>
#include <asm/fixmap.h>
@ -130,12 +130,16 @@ static inline void arch_leave_lazy_mmu_mode(void)
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
/*
* Outside of a few very special situations (e.g. hibernation), we always
* use broadcast TLB invalidation instructions, therefore a spurious page
* fault on one CPU which has been handled concurrently by another CPU
* does not need to perform additional invalidation.
* We use local TLB invalidation instruction when reusing page in
* write protection fault handler to avoid TLBI broadcast in the hot
* path. This will cause spurious page faults if stale read-only TLB
* entries exist.
*/
#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)
#define flush_tlb_fix_spurious_fault(vma, address, ptep) \
local_flush_tlb_page_nonotify(vma, address)
#define flush_tlb_fix_spurious_fault_pmd(vma, address, pmdp) \
local_flush_tlb_page_nonotify(vma, address)
/*
* ZERO_PAGE is a global shared page that is always zero: used
@ -433,7 +437,7 @@ bool pgattr_change_is_safe(pteval_t old, pteval_t new);
* 1 0 | 1 0 1
* 1 1 | 0 1 x
*
* When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via
* When hardware DBM is not present, the software PTE_DIRTY bit is updated via
* the page fault mechanism. Checking the dirty status of a pte becomes:
*
* PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
@ -599,7 +603,7 @@ static inline int pte_protnone(pte_t pte)
/*
* pte_present_invalid() tells us that the pte is invalid from HW
* perspective but present from SW perspective, so the fields are to be
* interpretted as per the HW layout. The second 2 checks are the unique
* interpreted as per the HW layout. The second 2 checks are the unique
* encoding that we use for PROT_NONE. It is insufficient to only use
* the first check because we share the same encoding scheme with pmds
* which support pmd_mkinvalid(), so can be present-invalid without
@ -1949,6 +1953,6 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,
#endif /* CONFIG_ARM64_CONTPTE */
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __ASM_PGTABLE_H */

View File

@ -9,7 +9,7 @@
#ifndef __ASM_PROCFNS_H
#define __ASM_PROCFNS_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/page.h>
@ -21,5 +21,5 @@ extern u64 cpu_do_resume(phys_addr_t ptr, u64 idmap_ttbr);
#include <asm/memory.h>
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_PROCFNS_H */

View File

@ -25,7 +25,7 @@
#define MTE_CTRL_STORE_ONLY (1UL << 19)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/build_bug.h>
#include <linux/cache.h>
@ -437,5 +437,5 @@ int set_tsc_mode(unsigned int val);
#define GET_TSC_CTL(adr) get_tsc_mode((adr))
#define SET_TSC_CTL(val) set_tsc_mode((val))
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_PROCESSOR_H */

View File

@ -94,7 +94,7 @@
*/
#define NO_SYSCALL (-1)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bug.h>
#include <linux/types.h>
@ -361,5 +361,5 @@ static inline void procedure_link_pointer_set(struct pt_regs *regs,
extern unsigned long profile_pc(struct pt_regs *regs);
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif

View File

@ -122,7 +122,7 @@
*/
#define SMC_RSI_ATTESTATION_TOKEN_CONTINUE SMC_RSI_FID(0x195)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
struct realm_config {
union {
@ -142,7 +142,7 @@ struct realm_config {
*/
} __aligned(0x1000);
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
/*
* Read configuration for the current Realm.

View File

@ -5,7 +5,7 @@
#ifndef __ASM_RWONCE_H
#define __ASM_RWONCE_H
#if defined(CONFIG_LTO) && !defined(__ASSEMBLY__)
#if defined(CONFIG_LTO) && !defined(__ASSEMBLER__)
#include <linux/compiler_types.h>
#include <asm/alternative-macros.h>
@ -62,7 +62,7 @@
})
#endif /* !BUILD_VDSO */
#endif /* CONFIG_LTO && !__ASSEMBLY__ */
#endif /* CONFIG_LTO && !__ASSEMBLER__ */
#include <asm-generic/rwonce.h>

View File

@ -2,7 +2,7 @@
#ifndef _ASM_SCS_H
#define _ASM_SCS_H
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#include <asm/asm-offsets.h>
#include <asm/sysreg.h>
@ -55,6 +55,6 @@ enum {
int __pi_scs_patch(const u8 eh_frame[], int size, bool skip_dry_run);
#endif /* __ASSEMBLY __ */
#endif /* __ASSEMBLER__ */
#endif /* _ASM_SCS_H */

View File

@ -9,7 +9,7 @@
#define SDEI_STACK_SIZE IRQ_STACK_SIZE
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/linkage.h>
#include <linux/preempt.h>
@ -49,5 +49,5 @@ unsigned long do_sdei_event(struct pt_regs *regs,
unsigned long sdei_arch_get_entry_point(int conduit);
#define sdei_arch_get_entry_point(x) sdei_arch_get_entry_point(x)
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_SDEI_H */

View File

@ -29,7 +29,7 @@ static __must_check inline bool may_use_simd(void)
*/
return !WARN_ON(!system_capabilities_finalized()) &&
system_supports_fpsimd() &&
!in_hardirq() && !irqs_disabled() && !in_nmi();
!in_hardirq() && !in_nmi();
}
#else /* ! CONFIG_KERNEL_MODE_NEON */

View File

@ -23,7 +23,7 @@
#define CPU_STUCK_REASON_52_BIT_VA (UL(1) << CPU_STUCK_REASON_SHIFT)
#define CPU_STUCK_REASON_NO_GRAN (UL(2) << CPU_STUCK_REASON_SHIFT)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/threads.h>
#include <linux/cpumask.h>
@ -155,6 +155,6 @@ bool cpus_are_stuck_in_kernel(void);
extern void crash_smp_send_stop(void);
extern bool smp_crash_stop_failed(void);
#endif /* ifndef __ASSEMBLY__ */
#endif /* ifndef __ASSEMBLER__ */
#endif /* ifndef __ASM_SMP_H */

View File

@ -12,7 +12,7 @@
#define BP_HARDEN_EL2_SLOTS 4
#define __BP_HARDEN_HYP_VECS_SZ ((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/smp.h>
#include <asm/percpu.h>
@ -119,5 +119,5 @@ void spectre_bhb_patch_clearbhb(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
void spectre_print_disabled_mitigations(void);
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_SPECTRE_H */

View File

@ -25,7 +25,7 @@
#define FRAME_META_TYPE_FINAL 1
#define FRAME_META_TYPE_PT_REGS 2
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
/*
* A standard AAPCS64 frame record.
*/
@ -43,6 +43,6 @@ struct frame_record_meta {
struct frame_record record;
u64 type;
};
#endif /* __ASSEMBLY */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_STACKTRACE_FRAME_H */

View File

@ -23,7 +23,7 @@ struct cpu_suspend_ctx {
* __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().
* This data must survive until cpu_resume() is called.
*
* This struct desribes the size and the layout of the saved cpu state.
* This struct describes the size and the layout of the saved cpu state.
* The layout of the callee_saved_regs is defined by the implementation
* of __cpu_suspend_enter(), and cpu_resume(). This struct must be passed
* in by the caller as __cpu_suspend_enter()'s stack-frame is gone once it

View File

@ -52,7 +52,7 @@
#ifndef CONFIG_BROKEN_GAS_INST
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
// The space separator is omitted so that __emit_inst(x) can be parsed as
// either an assembler directive or an assembler macro argument.
#define __emit_inst(x) .inst(x)
@ -71,11 +71,11 @@
(((x) >> 24) & 0x000000ff))
#endif /* CONFIG_CPU_BIG_ENDIAN */
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
#define __emit_inst(x) .long __INSTR_BSWAP(x)
#else /* __ASSEMBLY__ */
#else /* __ASSEMBLER__ */
#define __emit_inst(x) ".long " __stringify(__INSTR_BSWAP(x)) "\n\t"
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* CONFIG_BROKEN_GAS_INST */
@ -1129,9 +1129,7 @@
#define gicr_insn(insn) read_sysreg_s(GICV5_OP_GICR_##insn)
#define gic_insn(v, insn) write_sysreg_s(v, GICV5_OP_GIC_##insn)
#define ARM64_FEATURE_FIELD_BITS 4
#ifdef __ASSEMBLY__
#ifdef __ASSEMBLER__
.macro mrs_s, rt, sreg
__emit_inst(0xd5200000|(\sreg)|(.L__gpr_num_\rt))

View File

@ -7,7 +7,7 @@
#ifndef __ASM_SYSTEM_MISC_H
#define __ASM_SYSTEM_MISC_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/compiler.h>
#include <linux/linkage.h>
@ -28,6 +28,6 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,
struct mm_struct;
extern void __show_regs(struct pt_regs *);
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_SYSTEM_MISC_H */

View File

@ -10,7 +10,7 @@
#include <linux/compiler.h>
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
struct task_struct;

View File

@ -8,7 +8,7 @@
#ifndef __ASM_TLBFLUSH_H
#define __ASM_TLBFLUSH_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/bitfield.h>
#include <linux/mm_types.h>
@ -249,6 +249,19 @@ static inline unsigned long get_trans_granule(void)
* cannot be easily determined, the value TLBI_TTL_UNKNOWN will
* perform a non-hinted invalidation.
*
* local_flush_tlb_page(vma, addr)
* Local variant of flush_tlb_page(). Stale TLB entries may
* remain in remote CPUs.
*
* local_flush_tlb_page_nonotify(vma, addr)
* Same as local_flush_tlb_page() except MMU notifier will not be
* called.
*
* local_flush_tlb_contpte(vma, addr)
* Invalidate the virtual-address range
* '[addr, addr+CONT_PTE_SIZE)' mapped with contpte on local CPU
* for the user address space corresponding to 'vma->mm'. Stale
* TLB entries may remain in remote CPUs.
*
* Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
* on top of these routines, since that is our interface to the mmu_gather
@ -282,6 +295,33 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);
}
static inline void __local_flush_tlb_page_nonotify_nosync(struct mm_struct *mm,
unsigned long uaddr)
{
unsigned long addr;
dsb(nshst);
addr = __TLBI_VADDR(uaddr, ASID(mm));
__tlbi(vale1, addr);
__tlbi_user(vale1, addr);
}
static inline void local_flush_tlb_page_nonotify(struct vm_area_struct *vma,
unsigned long uaddr)
{
__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);
dsb(nsh);
}
static inline void local_flush_tlb_page(struct vm_area_struct *vma,
unsigned long uaddr)
{
__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, uaddr & PAGE_MASK,
(uaddr & PAGE_MASK) + PAGE_SIZE);
dsb(nsh);
}
static inline void __flush_tlb_page_nosync(struct mm_struct *mm,
unsigned long uaddr)
{
@ -472,6 +512,22 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
dsb(ish);
}
static inline void local_flush_tlb_contpte(struct vm_area_struct *vma,
unsigned long addr)
{
unsigned long asid;
addr = round_down(addr, CONT_PTE_SIZE);
dsb(nshst);
asid = ASID(vma->vm_mm);
__flush_tlb_range_op(vale1, addr, CONT_PTES, PAGE_SIZE, asid,
3, true, lpa2_is_enabled());
mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, addr,
addr + CONT_PTE_SIZE);
dsb(nsh);
}
static inline void flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
@ -524,6 +580,33 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b
{
__flush_tlb_range_nosync(mm, start, end, PAGE_SIZE, true, 3);
}
static inline bool __pte_flags_need_flush(ptdesc_t oldval, ptdesc_t newval)
{
ptdesc_t diff = oldval ^ newval;
/* invalid to valid transition requires no flush */
if (!(oldval & PTE_VALID))
return false;
/* Transition in the SW bits requires no flush */
diff &= ~PTE_SWBITS_MASK;
return diff;
}
static inline bool pte_needs_flush(pte_t oldpte, pte_t newpte)
{
return __pte_flags_need_flush(pte_val(oldpte), pte_val(newpte));
}
#define pte_needs_flush pte_needs_flush
static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)
{
return __pte_flags_need_flush(pmd_val(oldpmd), pmd_val(newpmd));
}
#define huge_pmd_needs_flush huge_pmd_needs_flush
#endif
#endif

View File

@ -422,9 +422,9 @@ static __must_check __always_inline bool user_access_begin(const void __user *pt
}
#define user_access_begin(a,b) user_access_begin(a,b)
#define user_access_end() uaccess_ttbr0_disable()
#define unsafe_put_user(x, ptr, label) \
#define arch_unsafe_put_user(x, ptr, label) \
__raw_put_mem("sttr", x, uaccess_mask_ptr(ptr), label, U)
#define unsafe_get_user(x, ptr, label) \
#define arch_unsafe_get_user(x, ptr, label) \
__raw_get_mem("ldtr", x, uaccess_mask_ptr(ptr), label, U)
/*

View File

@ -7,7 +7,7 @@
#define __VDSO_PAGES 4
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <generated/vdso-offsets.h>
@ -19,6 +19,6 @@
extern char vdso_start[], vdso_end[];
extern char vdso32_start[], vdso32_end[];
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __ASM_VDSO_H */

View File

@ -5,7 +5,7 @@
#ifndef __COMPAT_BARRIER_H
#define __COMPAT_BARRIER_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
/*
* Warning: This code is meant to be used from the compat vDSO only.
*/
@ -31,6 +31,6 @@
#define smp_rmb() aarch32_smp_rmb()
#define smp_wmb() aarch32_smp_wmb()
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __COMPAT_BARRIER_H */

View File

@ -5,7 +5,7 @@
#ifndef __ASM_VDSO_COMPAT_GETTIMEOFDAY_H
#define __ASM_VDSO_COMPAT_GETTIMEOFDAY_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/barrier.h>
#include <asm/unistd_compat_32.h>
@ -161,6 +161,6 @@ static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)
}
#define vdso_clocksource_ok vdso_clocksource_ok
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __ASM_VDSO_COMPAT_GETTIMEOFDAY_H */

View File

@ -3,7 +3,7 @@
#ifndef __ASM_VDSO_GETRANDOM_H
#define __ASM_VDSO_GETRANDOM_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/unistd.h>
#include <asm/vdso/vsyscall.h>
@ -33,6 +33,6 @@ static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, uns
return ret;
}
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __ASM_VDSO_GETRANDOM_H */

View File

@ -7,7 +7,7 @@
#ifdef __aarch64__
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/alternative.h>
#include <asm/arch_timer.h>
@ -96,7 +96,7 @@ static __always_inline const struct vdso_time_data *__arch_get_vdso_u_time_data(
#define __arch_get_vdso_u_time_data __arch_get_vdso_u_time_data
#endif /* IS_ENABLED(CONFIG_CC_IS_GCC) && IS_ENABLED(CONFIG_PAGE_SIZE_64KB) */
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#else /* !__aarch64__ */

View File

@ -5,13 +5,13 @@
#ifndef __ASM_VDSO_PROCESSOR_H
#define __ASM_VDSO_PROCESSOR_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
static inline void cpu_relax(void)
{
asm volatile("yield" ::: "memory");
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* __ASM_VDSO_PROCESSOR_H */

View File

@ -2,7 +2,7 @@
#ifndef __ASM_VDSO_VSYSCALL_H
#define __ASM_VDSO_VSYSCALL_H
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <vdso/datapage.h>
@ -22,6 +22,6 @@ void __arch_update_vdso_clock(struct vdso_clock *vc)
/* The asm-generic header needs to be included after the definitions above */
#include <asm-generic/vdso/vsyscall.h>
#endif /* !__ASSEMBLY__ */
#endif /* !__ASSEMBLER__ */
#endif /* __ASM_VDSO_VSYSCALL_H */

View File

@ -56,7 +56,7 @@
*/
#define BOOT_CPU_FLAG_E2H BIT_ULL(32)
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <asm/ptrace.h>
#include <asm/sections.h>
@ -161,6 +161,6 @@ static inline bool is_hyp_nvhe(void)
return is_hyp_mode_available() && !is_kernel_in_hyp_mode();
}
#endif /* __ASSEMBLY__ */
#endif /* __ASSEMBLER__ */
#endif /* ! __ASM__VIRT_H */

View File

@ -3,9 +3,7 @@
#ifndef __ASM_VMAP_STACK_H
#define __ASM_VMAP_STACK_H
#include <linux/bug.h>
#include <linux/gfp.h>
#include <linux/kconfig.h>
#include <linux/vmalloc.h>
#include <linux/pgtable.h>
#include <asm/memory.h>
@ -19,8 +17,6 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)
{
void *p;
BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
__builtin_return_address(0));
return kasan_reset_tag(p);

View File

@ -31,7 +31,7 @@
#define KVM_SPSR_FIQ 4
#define KVM_NR_SPSR 5
#ifndef __ASSEMBLY__
#ifndef __ASSEMBLER__
#include <linux/psci.h>
#include <linux/types.h>
#include <asm/ptrace.h>

Some files were not shown because too many files have changed in this diff Show More