linux

...

Author	SHA1	Message	Date
Linus Torvalds	619f4edc8d	Thermal control updates for 6.19-rc1 - Add Nova Lake processor thermal device to the int340x processor_thermal driver, add DLVR support for Nova Lake to it, add Nova Lake support to the ACPI DPTF code, document thermal throttling on Intel platforms, and update workload type hint interface documentation (Srinivas Pandruvada) - Remove int340x thermal scan handler from the ACPI DPTF code because it turned out to be unnecessary (Slawomir Rosek) - Clean up the Intel int340x thermal driver (Kaushlendra Kumar) - Document the RZ/V2H TSU DT bindings (Ovidiu Panait) - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu Pallikunhi) - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut) - Convert to DEFINE_SIMPLE_DEV_PM_OPS() in R-Car [Gen3] (Geert Uytterhoeven) - Fix format string bug in thermal-engine (Malaya Kumar Rout) - Make ipq5018 tsens standalone compatible (George Moussalem) - Add the QCS8300 compatible for QCom Tsens (Gaurav Kohli) - Add support for the NXP i.MX91 thermal module, including the DT bindings (Pengfei Li) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkpt1cSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1cikH/jW6IUXUvrTy9VEi3wGTzLAcnOuJJtQL zKQBzrjtuGngr4xIeE+chr9Gr8+S4EfVcD17twp59I6C3T9fBZngfMxbi7VLdyd7 gIJs2IxIqfIlQwK32lBOkLM2/YHa0AYU3Dd/YHsgYOU3Y25adSvmiwoTqG3kUmXB YnoHUobPskzV/9iKf2sptM7XBLDaBdoPHcDAM2BN4rfKhgOy/1ha7KzigJyuHHyW V+b9KR/IlFOVct8OrvhXKT4mzsS9VZv6IJ6KrRsDsCIgVwM/fO1YTmRgfOkuRASZ fw6vNNi+49xK0LN8zEvHKuYthBFy+7lIP+MiErMr2Fw5nuY2uXitZA8= =OgXe -----END PGP SIGNATURE----- Merge tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These add Nova Lake processor support to the Intel thermal drivers and DPTF code, update thermal control documentation, simplify the ACPI DPTF code related to thermal control, add QCS8300 compatible to the tsens thermal DT bindings, add DT bindings for NXP i.MX91 thermal module and add support for it to the imx91 thermal driver, update a few other thermal drivers and fix a format string issue in a thermal utility: - Add Nova Lake processor thermal device to the int340x processor_thermal driver, add DLVR support for Nova Lake to it, add Nova Lake support to the ACPI DPTF code, document thermal throttling on Intel platforms, and update workload type hint interface documentation (Srinivas Pandruvada) - Remove int340x thermal scan handler from the ACPI DPTF code because it turned out to be unnecessary (Slawomir Rosek) - Clean up the Intel int340x thermal driver (Kaushlendra Kumar) - Document the RZ/V2H TSU DT bindings (Ovidiu Panait) - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu Pallikunhi) - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut) - Convert to DEFINE_SIMPLE_DEV_PM_OPS() in R-Car [Gen3] (Geert Uytterhoeven) - Fix format string bug in thermal-engine (Malaya Kumar Rout) - Make ipq5018 tsens standalone compatible (George Moussalem) - Add the QCS8300 compatible for QCom Tsens (Gaurav Kohli) - Add support for the NXP i.MX91 thermal module, including the DT bindings (Pengfei Li)" * tag 'thermal-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module dt-bindings: thermal: tsens: Add QCS8300 compatible dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible tools/thermal/thermal-engine: Fix format string bug in thermal-engine docs: driver-api/thermal/intel_dptf: Add new workload type hint thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS() thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS() Documentation: thermal: Document thermal throttling on Intel platforms ACPI: DPTF: Support Nova Lake thermal: intel: int340x: Add DLVR support for Nova Lake thermal: int340x: processor_thermal: Add Nova Lake processor thermal device thermal: intel: int340x: Replace sprintf() with sysfs_emit() thermal: intel: int340x: Use symbolic constant for UUID comparison thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU ACPI: DPTF: Remove int340x thermal scan handler thermal: intel: Select INT340X_THERMAL from INTEL_SOC_DTS_THERMAL	2025-12-02 17:49:12 -08:00
Linus Torvalds	d348c22394	Power management updates for 6.19-rc1 - Introduce and document a QoS limit on CPU exit latency during wakeup from suspend-to-idle (Ulf Hansson) - Add support for building libcpupower statically (Zuo An) - Add support for sending netlink notifications to user space on energy model updates (Changwoo Mini, Peng Fan) - Minor improvements to the Rust OPP interface (Tamir Duberstein) - Fixes to scope-based pointers in the OPP library (Viresh Kumar) - Use residency threshold in polling state override decisions in the menu cpuidle governor (Aboorva Devarajan) - Add sanity check for exit latency and target residency in the cpufreq core (Rafael Wysocki) - Use this_cpu_ptr() where possible in the teo governor (Christian Loehle) - Rework the handling of tick wakeups in the teo cpuidle governor to increase the likelihood of stopping the scheduler tick in the cases when tick wakeups can be counted as non-timer ones (Rafael Wysocki) - Fix a reverse condition in the teo cpuidle governor and drop a misguided target residency check from it (Rafael Wysocki) - Clean up multiple minor defects in the teo cpuidle governor (Rafael Wysocki) - Update header inclusion to make it follow the Include What You Use principle (Andy Shevchenko) - Enable MSR-based RAPL PMU support in the intel_rapl power capping driver and arrange for using it on the Panther Lake and Wildcat Lake processors (Kuppuswamy Sathyanarayanan) - Add support for Nova Lake and Wildcat Lake processors to the intel_rapl power capping driver (Kaushlendra Kumar, Srinivas Pandruvada) - Add OPP and bandwidth support for Tegra186 (Aaron Kling) - Optimizations for parameter array handling in the amd-pstate cpufreq driver (Mario Limonciello) - Fix for mode changes with offline CPUs in the amd-pstate cpufreq driver (Gautham Shenoy) - Preserve freq_table_sorted across suspend/hibernate in the cpufreq core (Zihuan Zhang) - Adjust energy model rules for Intel hybrid platforms in the intel_pstate cpufreq driver and improve printing of debug messages in it (Rafael Wysocki) - Replace deprecated strcpy() in cpufreq_unregister_governor() (Thorsten Blum) - Fix duplicate hyperlink target errors in the intel_pstate cpufreq driver documentation and use :ref: directive for internal linking in it (Swaraj Gaikwad, Bagas Sanjaya) - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq driver (Kuppuswamy Sathyanarayanan) - Use mutex guard for driver locking in the intel_pstate driver and eliminate some code duplication from it (Rafael Wysocki) - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra Kumar) - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu) - Replace snprintf() with scnprintf() in show_trace_dev_match() (Kaushlendra Kumar) - Fix memory allocation error handling in pm_vt_switch_required() (Malaya Kumar Rout) - Introduce CALL_PM_OP() macro and use it to simplify code in generic PM operations (Kaushlendra Kumar) - Add module param to backtrace all CPUs in the device power management watchdog (Sergey Senozhatsky) - Rework message printing in swsusp_save() (Rafael Wysocki) - Make it possible to change the number of hibernation compression threads (Xueqin Luo) - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo) - Add document on debugging shutdown hangs to PM documentation and correct a mistaken configuration option in it (Mario Limonciello) - Shut down wakeup source timer before removing the wakeup source from the list (Kaushlendra Kumar, Rafael Wysocki) - Introduce new PMSG_POWEROFF event for system shutdown handling with the help of PM device callbacks (Mario Limonciello) - Make pm_test delay interruptible by wakeup events (Riwen Lu) - Clean up kernel-doc comment style usage in the core hibernation code and remove unuseful comments from it (Sunday Adelodun, Rafael Wysocki) - Add support for handling wakeup events and aborting the suspend process while it is syncing file systems (Samuel Wu, Rafael Wysocki) - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari) - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use them in the PCI core and the ACPI TAD driver (Rafael Wysocki) - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki) - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki) - Fix typos in runtime.c comments (Malaya Kumar Rout) - Move governor.h from devfreq under include/linux/ and rename to devfreq-governor.h to allow devfreq governor definitions in out of drivers/devfreq/ (Dmitry Baryshkov) - Use min() to improve readability in tegra30-devfreq.c (Thorsten Blum) - Fix potential use-after-free issue of OPP handling in hisi_uncore_freq.c (Pengjie Zhang) - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in governor_simpleondemand.c in devfreq (Riwen Lu) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkp0BYSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1Pc8H/2G5d0aD/ym1a8MDTpKqn7t3/rVMHa76 YGfxXMBr1oY++r5GTJTKBxZBHmF89VH71kdyvsMidTAtHjR+iZAS1ajd2Q5VYjOF QNMld1qgPEzAZU8WSetDrBqMr89zls05Uubo4aCoNy6rFmgRaLHh3AmIKSS9aJuo C1eH8dRONME5I/rafkOUpFs1+/Agq1vePwPZmwVnZX9A3qI+UOhMRdU9A37kYkx9 YwfQvR2fKTIPjZ6B9f/wGXPOvdrT37d4+dWT3EABOHMkxlpAPDMvmVzZsUaXSQMr 0d9NGEjPGo33qciKJJpHqNOdDOhi90606WBBf7aaMF+GMhDX3PznOK4= =rzXO -----END PGP SIGNATURE----- Merge tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "There are quite a few interesting things here, including new hardware support, new features, some bug fixes and documentation updates. In addition, there are a usual bunch of minor fixes and cleanups all over. In the new hardware support category, there are intel_pstate and intel_rapl driver updates to support new processors, Panther Lake, Wildcat Lake, Noval Lake, and Diamond Rapids in the OOB mode, OPP and bandwidth allocation support in the tegra186 cpufreq driver, and JH7110S SOC support in dt-platdev cpufreq. The new features are the PM QoS CPU latency limit for suspend-to-idle, the netlink support for the energy model management, support for terminating system suspend via a wakeup event during the sync of file systems, configurable number of hibernation compression threads, the runtime PM auto-cleanup macros, and the "poweroff" PM event that is expected to be used during system shutdown. Bugs are mostly fixed in cpuidle governors, but there are also fixes elsewhere, like in the amd-pstate cpufreq driver. Documentation updates include, but are not limited to, a new doc on debugging shutdown hangs, cross-referencing fixes and cleanups in the intel_pstate documentation, and updates of comments in the core hibernation code. Specifics: - Introduce and document a QoS limit on CPU exit latency during wakeup from suspend-to-idle (Ulf Hansson) - Add support for building libcpupower statically (Zuo An) - Add support for sending netlink notifications to user space on energy model updates (Changwoo Mini, Peng Fan) - Minor improvements to the Rust OPP interface (Tamir Duberstein) - Fixes to scope-based pointers in the OPP library (Viresh Kumar) - Use residency threshold in polling state override decisions in the menu cpuidle governor (Aboorva Devarajan) - Add sanity check for exit latency and target residency in the cpufreq core (Rafael Wysocki) - Use this_cpu_ptr() where possible in the teo governor (Christian Loehle) - Rework the handling of tick wakeups in the teo cpuidle governor to increase the likelihood of stopping the scheduler tick in the cases when tick wakeups can be counted as non-timer ones (Rafael Wysocki) - Fix a reverse condition in the teo cpuidle governor and drop a misguided target residency check from it (Rafael Wysocki) - Clean up multiple minor defects in the teo cpuidle governor (Rafael Wysocki) - Update header inclusion to make it follow the Include What You Use principle (Andy Shevchenko) - Enable MSR-based RAPL PMU support in the intel_rapl power capping driver and arrange for using it on the Panther Lake and Wildcat Lake processors (Kuppuswamy Sathyanarayanan) - Add support for Nova Lake and Wildcat Lake processors to the intel_rapl power capping driver (Kaushlendra Kumar, Srinivas Pandruvada) - Add OPP and bandwidth support for Tegra186 (Aaron Kling) - Optimizations for parameter array handling in the amd-pstate cpufreq driver (Mario Limonciello) - Fix for mode changes with offline CPUs in the amd-pstate cpufreq driver (Gautham Shenoy) - Preserve freq_table_sorted across suspend/hibernate in the cpufreq core (Zihuan Zhang) - Adjust energy model rules for Intel hybrid platforms in the intel_pstate cpufreq driver and improve printing of debug messages in it (Rafael Wysocki) - Replace deprecated strcpy() in cpufreq_unregister_governor() (Thorsten Blum) - Fix duplicate hyperlink target errors in the intel_pstate cpufreq driver documentation and use :ref: directive for internal linking in it (Swaraj Gaikwad, Bagas Sanjaya) - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq driver (Kuppuswamy Sathyanarayanan) - Use mutex guard for driver locking in the intel_pstate driver and eliminate some code duplication from it (Rafael Wysocki) - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra Kumar) - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu) - Replace snprintf() with scnprintf() in show_trace_dev_match() (Kaushlendra Kumar) - Fix memory allocation error handling in pm_vt_switch_required() (Malaya Kumar Rout) - Introduce CALL_PM_OP() macro and use it to simplify code in generic PM operations (Kaushlendra Kumar) - Add module param to backtrace all CPUs in the device power management watchdog (Sergey Senozhatsky) - Rework message printing in swsusp_save() (Rafael Wysocki) - Make it possible to change the number of hibernation compression threads (Xueqin Luo) - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo) - Add document on debugging shutdown hangs to PM documentation and correct a mistaken configuration option in it (Mario Limonciello) - Shut down wakeup source timer before removing the wakeup source from the list (Kaushlendra Kumar, Rafael Wysocki) - Introduce new PMSG_POWEROFF event for system shutdown handling with the help of PM device callbacks (Mario Limonciello) - Make pm_test delay interruptible by wakeup events (Riwen Lu) - Clean up kernel-doc comment style usage in the core hibernation code and remove unuseful comments from it (Sunday Adelodun, Rafael Wysocki) - Add support for handling wakeup events and aborting the suspend process while it is syncing file systems (Samuel Wu, Rafael Wysocki) - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari) - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use them in the PCI core and the ACPI TAD driver (Rafael Wysocki) - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki) - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki) - Fix typos in runtime.c comments (Malaya Kumar Rout) - Move governor.h from devfreq under include/linux/ and rename to devfreq-governor.h to allow devfreq governor definitions in out of drivers/devfreq/ (Dmitry Baryshkov) - Use min() to improve readability in tegra30-devfreq.c (Thorsten Blum) - Fix potential use-after-free issue of OPP handling in hisi_uncore_freq.c (Pengjie Zhang) - Fix typo in DFSO_DOWNDIFFERENTIAL macro name in governor_simpleondemand.c in devfreq (Riwen Lu)" * tag 'pm-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (96 commits) PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name cpuidle: Warn instead of bailing out if target residency check fails cpuidle: Update header inclusion Documentation: power/cpuidle: Document the CPU system wakeup latency QoS cpuidle: Respect the CPU system wakeup QoS limit for cpuidle sched: idle: Respect the CPU system wakeup QoS limit for s2idle pmdomain: Respect the CPU system wakeup QoS limit for cpuidle pmdomain: Respect the CPU system wakeup QoS limit for s2idle PM: QoS: Introduce a CPU system wakeup QoS limit cpuidle: governors: teo: Add missing space to the description PM: hibernate: Extra cleanup of comments in swap handling code PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate PM / devfreq: hisi: Fix potential UAF in OPP handling PM / devfreq: Move governor.h to a public header location powercap: intel_rapl: Enable MSR-based RAPL PMU support powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper() PM: sleep: Add support for wakeup during filesystem sync cpufreq: ACPI: Replace udelay() with usleep_range() ...	2025-12-02 17:31:22 -08:00
Linus Torvalds	959bfe496b	ACPI support updates for 6.19-rc1 - Avoid walking the ACPI namespace in the AML interpreter if the starting node cannot be determined (Cryolitia PukNgae) - Use min() instead of min_t() in the ACPI device properties handling code to avoid discarding significant bits (David Laight) - Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() that may prevent the parent fwnode from being released (Haotian Zhang) - Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove unnecessary conditionals from it to make it easier to follow, and make acpi_get_next_subnode() static (Sakari Ailus) - Drop unused function acpi_get_lps0_constraint(), make some Low-Power S0 callback functions for suspend-to-idle static, and rearrange the code retrieving Low-Power S0 constraints so it only runs when the constraints are actually used (Rafael Wysocki) - Drop redundant locking from the ACPI battery driver (Rafael Wysocki) - Improve runtime PM in the ACPI time and alarm device (TAD) driver using guard macros and rearrange code related to runtime PM in acpi_tad_remove() (Rafael Wysocki) - Add support for Microsoft fan extensions to the ACPI fan driver along with notification support and work around a 64-bit firmware bug in that driver (Armin Wolf) - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code (Kaushlendra Kumar) - Fix a memory leak and a resource leak in the ACPI pfrut utility (Malaya Kumar Rout) - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI Rust code (Siyuan Huang) - Update the ACPI code to use the new style of allocating workqueues and new global workqueues (Marco Crivellari) - Fix two spelling mistakes in the ACPI code (Chu Guangqing) - Fix ISAPNP to generate uevents to auto-load modules (René Rebe) - Relocate the state flags initialization in the ACPI processor idle driver and drop redundant C-state count checks from it (Huisong Li) - Fix map_x2apic_id() in the ACPI processor core driver for amd-pstate on am4 (René Rebe) -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmkpsnISHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO1OoAH/0hw86dPEF2hj1Pw06/o2pS4+Ka/yCAm vSRn0WOyCEVPWzFmNWg6bZCgUC8AmFRkqXlafI2q9SCcgUaoG8dQ1sWijAEe4Pdz eo5G1pnDiNiljAF9JYUCtkAmmEZo7k9aQovi3RIhyS+rOdrLjCGziz5sbzalj2hJ CF6w3rN5O5Cp9lf3zPFF90AZsg9WuPVGa1xr2CjaNbrTuSwbmQn73X6JHuc8ROSX aeAIwvtSNIqdyBFLx52hdM9g7M+cm0UMe6eLMCtVTu4wOw1kL/QHNklzETTv6Fce P2JihZDhClaT2CKA4k/6vD4BQaPtTDvMFOV8TUM4rbmTw0hpxzKgLTc= =P5HW -----END PGP SIGNATURE----- Merge tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These add Microsoft fan extensions support to the ACPI fan driver, fix a bug in ACPICA, update other ACPI drivers (processor, time and alarm device), update ACPI power management code and ACPI device properties management, and fix an ACPI utility: - Avoid walking the ACPI namespace in the AML interpreter if the starting node cannot be determined (Cryolitia PukNgae) - Use min() instead of min_t() in the ACPI device properties handling code to avoid discarding significant bits (David Laight) - Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() that may prevent the parent fwnode from being released (Haotian Zhang) - Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove unnecessary conditionals from it to make it easier to follow, and make acpi_get_next_subnode() static (Sakari Ailus) - Drop unused function acpi_get_lps0_constraint(), make some Low-Power S0 callback functions for suspend-to-idle static, and rearrange the code retrieving Low-Power S0 constraints so it only runs when the constraints are actually used (Rafael Wysocki) - Drop redundant locking from the ACPI battery driver (Rafael Wysocki) - Improve runtime PM in the ACPI time and alarm device (TAD) driver using guard macros and rearrange code related to runtime PM in acpi_tad_remove() (Rafael Wysocki) - Add support for Microsoft fan extensions to the ACPI fan driver along with notification support and work around a 64-bit firmware bug in that driver (Armin Wolf) - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code (Kaushlendra Kumar) - Fix a memory leak and a resource leak in the ACPI pfrut utility (Malaya Kumar Rout) - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI Rust code (Siyuan Huang) - Update the ACPI code to use the new style of allocating workqueues and new global workqueues (Marco Crivellari) - Fix two spelling mistakes in the ACPI code (Chu Guangqing) - Fix ISAPNP to generate uevents to auto-load modules (René Rebe) - Relocate the state flags initialization in the ACPI processor idle driver and drop redundant C-state count checks from it (Huisong Li) - Fix map_x2apic_id() in the ACPI processor core driver for amd-pstate on am4 (René Rebe)" * tag 'acpi-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits) ACPI: PM: Fix a spelling mistake ACPI: LPSS: Fix a spelling mistake ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4 ACPICA: Avoid walking the Namespace if start_node is NULL ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c ACPI: property: use min() instead of min_t() PNP: Fix ISAPNP to generate uevents to auto-load modules ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation ACPI: processor: idle: Drop redundant C-state count checks ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users ACPI: OSL: replace use of system_wq with system_percpu_wq ACPI: scan: replace use of system_unbound_wq with system_dfl_wq ACPI: fan: Add support for Microsoft fan extensions ACPI: fan: Add hwmon notification support ACPI: fan: Add basic notification support ACPI: TAD: Improve runtime PM using guard macros ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove() ...	2025-12-02 17:24:03 -08:00
Linus Torvalds	44fc84337b	arm64 updates for 6.19: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set__tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmkvMjkACgkQa9axLQDI XvGaGg//dtT/ZAqrWa6Yniv1LOlh837C07YdxAYTTuJ+I87DnrxIqjwbW+ye+bF+ 61RTkioeCUm3PH+ncO9gPVNi4ASZ1db3/Rc8Fb6rr1TYOI1sMIeBsbbVdRJgsbX6 zu9197jOBHscTAeDceB6jZBDyW8iSLINPZ7LN6lGxXsZM/Vn5zfE0heKEEio6Fsx +AzO2vos0XcwBR9vFGXtiCDx57T+/cXUtrWfA0Cjz4nvHSgD8+ghS+Jwv+kHMt1L zrarqbeQfj+Iixm9PVHiazv+8THo9QdNl1yGLxDmJ4LEVPewjW5jBs8+5e8e3/Gj p5JEvmSyWvKTTbFoM5vhxC72A7yuT1QwAk2iCyFIxMbQ25PndHboKVp/569DzOkT +6CjI88sVSP6D7bVlN6pFlzc/Fa07YagnDMnMCSfk4LBjUfE3jYb+usaFydyv/rl jwZbJrnSF/H+uQlyoJFgOEXSoQdDsll3dv6yEsUCwbd8RqXbAe3svbguOUHSdvIj sCViezGZQ7Rkn6D21AfF9j6e7ceaSDaf5DWMxPI3dAxFKG8TJbCBsToR59NnoSj+ bNEozbZ1mCxmwH8i43wZ6P0RkClvJnoXcvRA+TJj02fSZACO39d3XDNswfXWL41r KiWGUJZyn2lPKtiAWVX6pSBtDJ+5rFhuoFgADLX6trkxDe9/EMQ= =4Sb6 -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set__tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...	2025-12-02 17:03:55 -08:00
Linus Torvalds	2547f79b0b	s390 updates for 6.19 merge window - Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with and without memmap_on_memory support. This makes the way memory hotplug is handled on s390 much more similar to other architectures - Remove compat support. There shouldn't be any compat user space around anymore, therefore get rid of a lot of code which also doesn't need to be tested anymore - Add stackprotector support. GCC 16 will get new compiler options, which allow to generate code required for kernel stackprotector support - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes a lot of duplicated code. The new driver is also extendable and allows to support new PMUs - Add driver override support for AP queues - Rework and extend zcrypt and AP trace events to allow for tracing of crypto requests - Support block sizes larger than 65535 bytes for CCW tape devices - Since the rework of the virtual kernel address space the module area and the kernel image are within the same 4GB area. This eliminates the need of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU - Various other small improvements and fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmktZioACgkQIg7DeRsp bsK4Rw//VzkvHyzOtGKZ8Hb4S+Sh/PFlaZQXNhj+Xt5gWoOhP1uPmmhBe6LxjYaB J9Ns3hpONQ1dTHV7VVkds8FvM/SBcGe8m5RpefmChC/bjm5UEOV/MppKtA0aLnEH hJmdubIrrRAXKggxlHEfRLzBsFvV/rJ9Xf16FhRxGDc4pgmgkI1NPQ41/dyCHklQ dB3YrFVPIETywVYYVB/G3h11JgF5Z6CKtjYCdSx72Fkbj65+6JPfcPgLKMpcJuPd UxUXtCo1FCXlP70jsz8JQI8cdieG0KDQTtnZP4P/pqjQ3wirOqvMewNa9t9xmQ2e p6Rc1Vx5DESkq9bRWtQEaprTVVzK7DDLH3RuZwB+uLrcLGD8JvVS6/m9n9CgzBMT BnJXG2sLZH+gdQy+DSD/fVDD7OvIk8TGrH+OFwVIKhrT/J3B2E7ZSYyZZCNIS7VG yiuypoDGYg3ZpYjH9+qOXWB3nc0vQWrlFzb1bsQu1omJGmunLv4jtTjAKGN82C33 auBsIYAlQW20X7DV0vZa59PwqwtBqtdQQcTidwtSztzKogRXAdK8KKHtN60JM4S2 7sWFOFCQaTChAeDNw6MF5EtULb551nwH2RtJ9x3CrJj+OGK6clbQNcxIA7Oy0veR Sl9v1lMfeKOgDrPdDy3ArQBJ8WLlF9qX9wLKbiaNyIKmkz2ymkg= =CNrb -----END PGP SIGNATURE----- Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with and without memmap_on_memory support. This makes the way memory hotplug is handled on s390 much more similar to other architectures - Remove compat support. There shouldn't be any compat user space around anymore, therefore get rid of a lot of code which also doesn't need to be tested anymore - Add stackprotector support. GCC 16 will get new compiler options, which allow to generate code required for kernel stackprotector support - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes a lot of duplicated code. The new driver is also extendable and allows to support new PMUs - Add driver override support for AP queues - Rework and extend zcrypt and AP trace events to allow for tracing of crypto requests - Support block sizes larger than 65535 bytes for CCW tape devices - Since the rework of the virtual kernel address space the module area and the kernel image are within the same 4GB area. This eliminates the need of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU - Various other small improvements and fixes * tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits) watchdog: diag288_wdt: Remove KMSG_COMPONENT macro s390/entry: Use lay instead of aghik s390/vdso: Get rid of -m64 flag handling s390/vdso: Rename vdso64 to vdso s390: Rename head64.S to head.S s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros s390: Add stackprotector support s390/modules: Simplify module_finalize() slightly s390: Remove KMSG_COMPONENT macro s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU s390/ap: Restrict driver_override versus apmask and aqmask use s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex s390/ap: Support driver_override for AP queue devices s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks s390/debug: Update description of resize operation s390/syscalls: Switch to generic system call table generation s390/syscalls: Remove system call table pointer from thread_struct s390/uapi: Remove 31 bit support from uapi header files s390: Remove compat support tools: Remove s390 compat support ...	2025-12-02 16:37:00 -08:00
Linus Torvalds	4a21d1b33f	m68k updates for v6.19 - Defconfig updates. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCaS1kJwAKCRCKwlD9ZEnx cFC6AQDkzu/3/YADECgLcWeuGw0GqxrYma0cEXkk0YJsfYU6dQEAsTcIl0zUP+IV cqMRQveanl9yDRpmmXlmRPDBK58hZQI= =n3hx -----END PGP SIGNATURE----- Merge tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k update from Geert Uytterhoeven: - defconfig update * tag 'm68k-for-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.18-rc1	2025-12-02 16:32:02 -08:00
Linus Torvalds	d61f1cc5db	* Enable Linear Address Space Separation (LASS) * Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuIXAACgkQaDWVMHDJ krAwoRAAqqavNrthj26XJHjR3x7FVGu11/rvYXAd1U2moN/dhM2w82HMHNFvPuQY 3iq9GDRQdc2rKL7LTkREvN4ZM/rFvkFLt6a5Yv0eCRK8KAiSJEw6Yzu/qgG7kF+0 9clujDUskjjHU0zR5v+o1RxirrLVQ+R50sMVI5uoFx6+WJRiW1BvMG4Csw4BgbvA AqgrZpyq1dQ/GQOW4f0yxBPH0z84wgUbdllYzQzE0GeUlGWQSI4lqa8GFMOmE/Gr 7gBcKmyE0M/BycwTZW7tiMnjWgNL+Y5/RroQJ7hh6R+f5WOd+SpGvlyOihbF7GER L3yZfeQ+EWz1aY1QMWwOSvSawIPJo8EkSn3d9/JFq5Vl9zsFh+ZoPZfZ8bEi36U0 inO93swDcyMkkfOTh4sIgxedLgHja5GFNCGPs0yblvLulWbw7yYVzzEmEjXnclzS fmmifsJjGrUpegEnWdEjAQzXkWPd/hKiAvpzDE/3thBal5NkOzFrudITFvCVuk8w uS2MW0U8VCskNoON0jjwnvv84p0XdHJOsgPB9WnsuMMASKC1RqKAJWXh8AXvZA+I TfCNdSyHDTm+o1e+SMQZRbqoE/r7MmAxUQOkKnlvpJDCz58tsLzW64hRXTe7QpCt rry9/wODswu+oaHoDgfjAmzYde2RhCjwWLzGmqmapNIYfCCVhYs= =5bcW -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU feature updates from Dave Hansen: "The biggest thing of note here is Linear Address Space Separation (LASS). It represents the first time I can think of that the upper=>kernel/lower=>user address space convention is actually recognized by the hardware on x86. It ensures that userspace can not even get the hardware to _start_ page walks for the kernel address space. This, of course, is a really nice generic side channel defense. This is really only a down payment on LASS support. There are still some details to work out in its interaction with EFI calls and vsyscall emulation. For now, LASS is disabled if either of those features is compiled in (which is almost always the case). There's also one straggler commit in here which converts an under-utilized AMD CPU feature leaf into a generic Linux-defined leaf so more feature can be packed in there. Summary: - Enable Linear Address Space Separation (LASS) - Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined" * tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Enable LASS during CPU initialization selftests/x86: Update the negative vsyscall tests to expect a #GP x86/traps: Communicate a LASS violation in #GP message x86/kexec: Disable LASS during relocate kernel x86/alternatives: Disable LASS when patching kernel code x86/asm: Introduce inline memcpy and memset x86/cpu: Add an LASS dependency on SMAP x86/cpufeatures: Enumerate the LASS feature bits x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific	2025-12-02 14:48:08 -08:00
Linus Torvalds	a7610b8465	* Fix 64-bit identifier structure member name in fred_ss -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuHhcACgkQaDWVMHDJ krDXKg//UVSrG9dyLoyhXKLEF33mKm98l3AKeIQSV9yY56K71HwxNDr44di6GGY7 Rwlwl0GPXRCW1qDC11Kbd1m6buj5CZeuCMp6nnIhAvefw+NOlApg4mhzVIdjUTg0 EKagZPlq13g1puOPiV9T+ybFsRJxV5MPoY+fLCvGCJmTMwavfpYylXo0xkU7vgkR Asc7LDK/H+t7Cbe51h+XAAdJMRC/tjBH7s7MhYVZuJtzXiin2MLAw/wV9Uz+YJ5d fGy1N945igItyH+qG9TqbKQDi1s5kf70NPcGAL+UxF8ox8NNOVEF90VFs8YEaF/s sHegUtahfFmkb7YEBGVz+ctHoSgwQfwiScYdjBPosj3EzPksSOogcy2PqTxGo3wx kNp9jc1oHfMziOLVbVDyQg31XEEy6Vvbc/2O7UTPx8fJbivG0+7ZpisKb1XC1MFh fsX9mPP2aNQCXs8kIjozeHQTEGCIDWsvnlX3co3bhlIIIO8eIFp6JYoy9A+wTimo WOrFH8E9ktbhBQcvIOrf+9WZ9P3uJxq60h6SLtGeL6/x7c3geCajFLcJzwlIYPQU m1gZqMLV4/ceQeUw2wE7rExzjD56JSDtnP6FgW3ygalLbxj9mRqqbg5ArsjH9gJ+ xA01rH6XcXNSUOzuu2yqBps3FqOfp5A4sLyUfnSbs0eTztmX3Q8= =CN+R -----END PGP SIGNATURE----- Merge tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 entry update from Dave Hansen: "This one is pretty trivial: fix a badly-named FRED data structure member" * tag 'x86_entry_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Fix 64bit identifier in fred_ss	2025-12-02 14:24:21 -08:00
Linus Torvalds	e2aa39b368	* Make MSR-induced taint easier for users to track down * Restrict KVM-specific exports to KVM itself -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuHIIACgkQaDWVMHDJ krAxMg//RQUz6JnQnMASuN/NhrjIANRjcPJI9S0LoKcTbZ0nZ5aH6oR1VOFszLLa ShGcUO2RuDbCl2wPAG/lRWV8eL/4k4mZi0zNT7vEKTkX/EZn5RDV59p88zCo62KV 835OpX8W9Hvyiichw51RoVrJxEcqgCmlUYO2fCwtk2rpntUCOVQgHMeLhhqMsZ0e yMQECAE75oXQ4vhAG+zO7/KmLqVbSGgqpXYw6DOZGEJF0T7tdZIgFhd25WAPgcf0 UN8VmTX971Eq67OrUX9OojN6+SxBqQ7vc+qBtd5bDlkZsRxVyV157Zso2PCPbsm2 FkE65eJBa9qacqvwkCPND6J7gvE/Sm8DaLVafLPKDNWTaqSo4cfKJD7P/sgN1L69 O8QsiLfafy8ITIA8AXS90C8x/puhqk15OKW2kJFFfUkhrGdu72/AxVlo6JcM1N0u qkDXUNBSX9/LHkRT9AtkLch27MEFXRKxsajjx2lFoBIR2VjIijm9314cRczHGZEV R/pqBh21yL/ZTriNIgmEPrFOV4zDxaOsHRh8YSEFAXRe2xWvm7dZwNSPRSh7hMT+ q0ABPuYqTZ4PDGMaAB0gNRqmR9aQKpVMY+4xmTdmqscYkgV4usZQcrQOeiKVwh7F KdMC5tr4yFOMMl8CaMgOK+27ZrSYI1hwtXCc/orAhOwxhg62Z40= =tjcN -----END PGP SIGNATURE----- Merge tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)	2025-12-02 14:16:42 -08:00
Linus Torvalds	54de197c9a	* Allow security version (SVN) updates so enclaves can attest to new microcode. * Fix kernel docs typos -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmkuGsMACgkQaDWVMHDJ krB/AhAAlRe+7+jNXp4Xtor9Q9VpTG5Q7WL7yLUHOd+JbsVKCcCLy2i86rrXEVvo AN3nQqkAotd1xx9P1SDz2SL4S0PS8zM3avj8JLppBb16VEiApFqUsXYMUJrFVjne x/+fEsKVOXp+kfM72qhI3/sqgq/zPUanm/6GR0wlpe2+TPOmnw588GgCV9ADW8ha ktLWaBGPHgyzNGAO5CqSSnN+2u25vRcyKX6mfUZeeooudPPVbANxKLN1WzBP2zZo eO5jbyeiPNH0bC1p6H5TQOl/bfktfhl99S+ta3e7+/PduDvDR01Z3kMl5JSb74hk 75fXUI1R+/zUyXbyJ+exxTqAkS0F66m1Tv+DU9U4WorKjgAwYFPcdvjkJh/Ju6XB cqvwvLUE5rH2f3hzUmffSP8FwQO65svLsB4BIWd+3sgu6ZbAV4zSVOSb51W39Q7U WlAMYtlM6+bZF6RaJCweDUMq407KcHFqjN6rzX7x+VguNe/FMERRsGBz6kBT2FZH oxMzUXTwRfwSxblvVJMjjecfplywH5viHATbApMIGdamHWis8Dmvl0DMmBi4O6CI /O792TOHxIqLmxkh9SE7ONOCRe1hUfK2JY4n6sgNrBH8OwylPqkl2kgE5X3Cr9c1 1qwMGfto5t0izr/SA5wruEJtPjQPsAyUKjfP+msyBemTg7dzTFU= =2lh5 -----END PGP SIGNATURE----- Merge tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Dave HansenL "The main content here is adding support for the new EUPDATESVN SGX ISA. Before this, folks who updated microcode had to reboot before enclaves could attest to the new microcode. The new functionality lets them do this without a reboot. The rest are some nice, but relatively mundane comment and kernel-doc fixups. Summary: - Allow security version (SVN) updates so enclaves can attest to new microcode - Fix kernel docs typos" * tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h x86/sgx: Document structs and enums with '@', not '%' x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run" x86/sgx: Enable automatic SVN updates for SGX enclaves x86/sgx: Implement ENCLS[EUPDATESVN] x86/sgx: Define error codes for use by ENCLS[EUPDATESVN] x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag x86/sgx: Introduce functions to count the sgx_(vepc_)open()	2025-12-02 14:03:05 -08:00
Linus Torvalds	c76431e3b5	- Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkt8McACgkQEsHwGGHe VUpxJBAAg6PaKVNOmceCCcwDb331YLHpd18eeLy7Cdr6ktcdDflo39TiKnwy/BEs 2uENe9OrS52JL98vMhZxPVFL/3yplrMo7jfuamthSEcFuvlxe2wh7NGhxbNl2gOe +9BpYTbHe5wts+W+ij/srcBCzGDIYoYhh7Dbc8wB1dh/jcH2qkEnYvBTGoYtgELF lWt1pWsdHVnUORn9qKNI3iAX47jmkUTBqEgQHyPFcSM6s8WGtIOKib7+UtvNiMTw V0ZMzfsL5k4J6ifwR5PLLaMNXdwQoZeArWbCA6VYhnOEP0MBmgLxFFCCi5z6iGwv ph+YYWm2/kMEOdJDfDlZqjZFcw/QOfk44chGMTqf+G3rFdNrHMdTiovtvzg6vGvG akJK5r2JsAJu8ymuwd3Rke3F3k1SP7QfdYB1Tipu4wvt7iSOQNqIA/xcHjMprHBx MZ6BifOxwXhhihUr9UA0TSQM6fJfnzrKPdzDSh/h5qThSpjbH/qkNlJwNGy/Knm5 5MTftDkDkpkmJDiOhJAOCweMBGNyFQrOH1QYuqURrB+AGo3Iq9HIJ+2fVXtUdIZy AMmvEROjMRgxD2hoBCVa4AF5Gm3cNiGxGn+jEitdLgqVbTi0tWSO+oPOK+uH2Zib 77r8hNmd9hE7ikHSRGhWS+5D3mVWejsDtrs8YyCMuXN/Ft2omRU= =+p5z -----END PGP SIGNATURE----- Merge tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions * tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Fix page table access in 5-level to 4-level paging transition x86/boot: Fix page table access in 5-level to 4-level paging transition x86/mm: Unify __phys_addr_symbol()	2025-12-02 13:32:52 -08:00
Linus Torvalds	a9a10e920e	- Convert the tsx= cmdline parsing to use early_param() - Cleanup forward declarations gunk in bugs.c -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkt3IMACgkQEsHwGGHe VUqPxRAAkul5R9zZgR9XcV/ZRA6iulo4M0o38Z2COeWtlDtDXpAJ5vNarDBB1l+K ea9YzbfN4iECXNXOCtSrEr7SmJ2ld8xBSyzISeWx2V1AXi4A6A8GX574egtDDCaV vN+eTT1ki9YuJLhgi0eZ+uHkdoTLhChrBehuvlcHnEOXN1N9D/P8/QfDYG7IyCxu /jSnuXyQ3AA2uM3IMEAGLcAarI+cU4HgMsF2Za5Dp8SKhSbhpgpEfFkp8+p+bUO/ nKLgUj2dqibFZd+hvhrwtrnsDL8rTJxLr0dVwg7MZeBea4GliUsaT8jBRBMQddIL DGjAR2niGeGM42DWvFUVZczcgdP0NI8ChpZ3zCjdICrZBGMkZe70iqcPaL0BELob toeBjlHIrlUnXIfZhVBYj59+E+q4yjSwvOd23FYqj1XKcgSR+j305dxo9jTAndLV M/f8o8au3tlccE308OQ/XgIkXFBlGLjrNlak32ze4P2nHFMTlYJBCbAFIti3qcpT Er7q866s7kzagphmZ/2vf0I7JuYI2le/8OnneVFz7l+SVXY1SNEpA7qRM5bA220J n2Orfaen3CgotX7c3yT/XP0c9Wqbd29LMXiQbOdGMMONv5O4aaWMsfRPhlCEYiTK VZ/FklfXrVxtSqa96zBny4MlvtKoYwYklk+McuGF5M4iDXGCthw= =kAho -----END PGP SIGNATURE----- Merge tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU mitigation updates from Borislav Petkov: - Convert the tsx= cmdline parsing to use early_param() - Cleanup forward declarations gunk in bugs.c * tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Get rid of the forward declarations x86/tsx: Get the tsx= command line parameter with early_param() x86/tsx: Make tsx_ctrl_state static	2025-12-02 13:27:09 -08:00
Linus Torvalds	cb502f0e5e	- Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmkttt0ACgkQEsHwGGHe VUoaFg/8CY+UAE1VtnaWhG7hpxCqBLlVtyt3gVhIn6ZCZ5mxtFoEcZI8BnxnFbbM Rpd+5LsMbu4GWfq/AEx/a+IT3rKZIT1RjRrd73JZZYRNpv6/Vnmv/7OizjDbBqhU n3Q1rHJCaVk90oP2sbB4dr9qsYHOx624jz0CrBxUM7GnaCQwtofqa6hK6HMJDu3g OLjJCoaHY0ry779QUjCmMJ1BbOLsy1fGsmuQO8LcE6xWRJv5ueJPcZbH0I0g5UIF NExe03uxaSvrM0ZYdjHpQU590kyPwjzo0Jx8IANQDb0dyY4mIFPdnZwbRBr/OPnZ 205c0EllHZvDZ4nKTYfeJYjXnPWmovHXJATr/BuqW+0GQZYmTbDq2IgvbbnE9gs1 67Sy94ISuxs683hNb9U2cLI7OFHcVDGfESuHhmeJTQsVY+VazL00p6azFP1ONpsn N93GYK+ZFvOeFFssO/gm97jbkKyUH9PS2+TEbhijeQkZ/PYKVbObM89LDLSRrKC7 5mEFCZIKIehKLdSoAc8yTzKRE0piyK/PR6ykzP9A2rEjrSN2HDqvsR+nDr0hQ1/V Uye4a8V0XiHyZsvD+AIYJnARGYdcnUCiezep81WaC55hn0sdqrhlqnycnkd+7fDE MZF9epXCnIAA8IF7I5jCLfwTa1b6ouMhBxpwiMpaL9DGmQghWcM= =GCGn -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Largely cleanups along with a change to save XSS to the GHCB (Guest-Host Communication Block) in SEV-ES guests so that the hypervisor can determine the guest's XSAVES buffer size properly and thus support shadow stacks in AMD confidential guests * tag 'x86_sev_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cc: Fix enum spelling to fix kernel-doc warnings x86/boot: Drop unused sev_enable() fallback x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() x86/sev: Include XSS value in GHCB CPUID request x86/boot: Move boot_*msr helpers to asm/shared/msr.h	2025-12-02 13:07:53 -08:00
Linus Torvalds	d748981834	- The mandatory pile of cleanups the cat drags in every merge window -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktqqMACgkQEsHwGGHe VUpBDg/8CT4uBJLqgQYoh9YYN9U/C1walUMLivHq9VJbld7a55nGCirUTBuiTRzp V2np/6V/XpG2YmV9Xj6E4Q6QLiOsfrWKFRzfr7OlPZ7yW5GR8aKBcEZm2Voi7j94 qLVSraAjtfmTtU5Ym7Lp1fqswDW2Y+iU6zqIyotqy+/7qZ6yp4NwlrUwSocDsSLo n6Gh1Vv6fXdeMckzT1WJ/CMtx07IfQB/wqVVTO4WwBu1Pv71WO+Cv1pG2mewUVjK 879X7+oc1icxCtD2OZxHfOEJGtC97N7cZDq6VEd4s/bjgYF75glyFtkv/DVgbQ6E BSkwciT5Sqb7B5N09RJWWJiNLlNQPFQdDFAuU/Sk18Vqc01jySFoWLLsiS/DeCsz opEnPK6uXO4m+2DI44dWpWgVj33a/ao6zej3uEPENkp9+FYaZPwip8Bdjn1FptBp ZGmqa7oy38MXTzV6hOctMAx/nXE05lff41Xe8fLlisNc7a6ZUwql9oA4A4yULtKI BMlddTZ5/N5LzOGVH5nixX+ig3wDqtEEOD1tdLb/f8Nhy71Z3Kmbf7zDuJCTaabn VAGaPUDZ4qfeMks1qRIVVR7rUo6PwRRNvi8Wyiauc7/fnxWyVR0z1vrqs2o3MtZJ WQ0/HAZZ/K54HWk049ea2b1kFjjOe6R5hKKX1Hc2nPIS3APD/Vc= =3uJO -----END PGP SIGNATURE----- Merge tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The mandatory pile of cleanups the cat drags in every merge window * tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Clean up whitespace in a20.c x86/mm: Delete disabled debug code x86/{boot,mtrr}: Remove unused function declarations x86/percpu: Use BIT_WORD() and BIT_MASK() macros x86/cpufeatures: Correct LKGS feature flag description x86/idtentry: Add missing '*' to kernel-doc lines	2025-12-02 12:17:47 -08:00
Linus Torvalds	2ae20d6510	- Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktpFQACgkQEsHwGGHe VUop4g/9GTb/5rcFMQzeGlG3USnJOqJ+SmiAalA9lm1c933en9tqUgL/K0C0xC6h yraB3ICuob1YayiZkBwKIOQiei9gmfhH/CGf5vLcZMM+D6fqvlk1D+C40SuFoDFV DOH3H2nYoJ3vbZRtRZsD3bv/djST/OVk28g7eY8OwpZIwN5VSFULJwjK1ePPy+nL l65s/yrgLY0oLDBCGxtJ9gVxjCBqAoqfbbwVbcJm5hXv+2sYk8BH6de/CU+0v/vo K6Qu4GbmWqDKYH9thjC4ZC/DPXjtoCxGkg/l1Af5T1PiZF0ZtgEZI6i9JTR33jYJ 7j6BpkCwPzY07MKj/Ub1RemlMfY4XMN/qssEfFmnwG+aMBtbojNAjdb00Pu9Ffn+ TKFKiZ6WBTcYhqPQsFVruwHh8wDbJp2/x/yBfjD4qovo1HuyCln4iGDmoFcU2wTD UlOXW89bxOT56A3FL77ElnOg9nRltvdKduOluGtkpSkmBbzmDfoXrhG2z9zuuAui FB6GT2c5MRVXEC4BY30xwQBG5MArVRMyz9uYDyXf9+KHhWVdmq9K0ZAkIaUmPCvy BvBXpRhfxm/dKJPhtSuUPhh5A+a87gqoiu1McaFoVGyjVJIJ5gflge8+/mLj1lQz kG56SnLOzdtcwKcmQ5ncv5EkrTBD1Ph12u1kcd+4IZwkpgGZteE= =o7Dg -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest	2025-12-02 11:55:58 -08:00
Linus Torvalds	2a47c26e55	- Add microcode staging support on Intel: it moves the sole microcode blobs loading to a non-critical path so that microcode loading latencies are kept at minimum. The actual "directing" the hardware to load microcode is the only step which is done on the critical path. This scheme is also opportunistic as in: on a failure, the machinery falls back to normal loading - Add the capability to the AMD side of the loader to select one of two per-family/model/stepping patches: one is pre-Entrysign and the other is post-Entrysign; with the goal to take care of machines which haven't updated their BIOS yet - something they should absolutely do as this is the only proper Entrysign fix - Other small cleanups and fixlets -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktjK4ACgkQEsHwGGHe VUqCzg/+NTMgw/cb6zvgXviUTTL62127q4YBr0G3AoNruYbWvdt65suK1pMoRUZL CDtflIjDTj8ZSIreXS6tUoFIAzsZUnPApUshHCXlHbK6hYbHDjQgkZme48P+AIqC kuP8zcqL0Epzv/Il/d9M8LEmP/0JUoACiibI5T0xMA5Ji9yw0njiHaHCBnrwXduy oNsTW8KSaGSaq+zbqa+cS7T06b6SNtUpAQyNSg4Jgj9u3+uPb3a9AfD81jGxUmYl SoM/gsiwYjujKV/ZAldnN6tOoRSECqeYLRT/J/Bbqe4zSM5gYh7TRg7N4AcZXKuY BLps8IbmiS6ZF2qziicJ7+zN35kXLeuVC+T4rq+IjvkTyH+eJsuGFnGYbXxCwV8A nkinSLtn6x0sebem/6H77OjNMLZU0zmLgWfiUfvgnXCErb7SZfs967aG8nxs5bDX CnEzS7/98sSkZm0yDSjp0TuXzo1PSGS9wcv30vOR4hClx42YmTZlBUJ5QHJQ9AB0 1PNmLptwUk9rorTemAzB3Cstm490U7BEd32Od6b+NiIyKogL7uPJKHsQ2Q/t07tw ubBm5nFzIhCXWz9v5q1fkvInKAXytHdIN4OnzOPw+7jHF95Vpa2o22OBWaBaCRex 96jCa4b6pPomxPD+LxdSSMtSihUa4PQz9VrrqnYn7vulumQ/YDo= =rxMs -----END PGP SIGNATURE----- Merge tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: - Add microcode staging support on Intel: it moves the sole microcode blobs loading to a non-critical path so that microcode loading latencies are kept at minimum. The actual "directing" the hardware to load microcode is the only step which is done on the critical path. This scheme is also opportunistic as in: on a failure, the machinery falls back to normal loading - Add the capability to the AMD side of the loader to select one of two per-family/model/stepping patches: one is pre-Entrysign and the other is post-Entrysign; with the goal to take care of machines which haven't updated their BIOS yet - something they should absolutely do as this is the only proper Entrysign fix - Other small cleanups and fixlets * tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Mark early_parse_cmdline() as __init x86/microcode/AMD: Select which microcode patch to load x86/microcode/intel: Enable staging when available x86/microcode/intel: Support mailbox transfer x86/microcode/intel: Implement staging handler x86/microcode/intel: Define staging state struct x86/microcode/intel: Establish staging control logic x86/microcode: Introduce staging step to reduce late-loading time x86/cpu/topology: Make primary thread mask available with SMP=n	2025-12-02 11:35:49 -08:00
Linus Torvalds	a61288200e	- The second part of the AMD MCA interrupts rework after the last-minute show-stopper from the last merge window was sorted out. After this, the AMD MCA deferred errors, thresholding and corrected errors interrupt handlers use common MCA code and are tightly integrated into the core MCA code, thereby getting rid of considerable duplication. All culminating into allowing CMCI error thresholding storms to be detected at AMD too, using the common infrastructure - Add support for two new MCA bank bits on AMD Zen6 which denote whether the error address logged is a system physical address, which obviates the need for it to be translated before further error recovery can be done -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktlV8ACgkQEsHwGGHe VUrfGRAAsoVknP8SPap1dFpT82+avi7knEnZ56zuwCxjXOSlXDbvsrAUFsS8Io4o sf60gyUnFEFLW551qXUJoSnuSjf0S63tKmnX6ebUXtxe6mVC5Y0l3VGHz8/ymbCV 8tLFF1yx6qMEwE2WutuIIeKGdZjn4lpg2lvhtaZnzeUSBk/BQcANjPaVYKQZPx/Q mXqpfvJnEBxkP6gy9VlrKxkpPyR0obD2/RFcN1M5dEbk0q52KNtcwyjblYR2XmNB 7SVmwAcRkH+7Icp14XgHZamAs9NMdAShaQ7Rov7OjEucTnot+Q5BO/3ftvFOzvGu GHiY4rSew6QtKv4MWIYVHGrxIm6o6Sco7EFmESEC9UDX/Ck60WAj1LY6v6jKEF0g nnbqxO1hoD0ygNApBXMYleut8eqiriJlXCrImlaldkG8iQqsmf11kEHagS9EVtk0 X28/eCoyD14a90NqmY13hBf2xscU41jy+LxdYy7sisL3LC4rhGgBpE/5vd/Ynnlf HELeQA8/5bIOgcbVvOIFxQGC+pBwhrHxIIOF0Z6pJZzznUO2cTUepJaLgWXdne7P EFE30+tDfeIy/bbB6CmkPV19NW3jNlkZib28t7L9uMCShPKiaza+Qv0SgzfeEy6t IERhzgmPxJiJ/7fOtdCUDL8YTlisiZ9t9RbSKNbriL54JHjX+Mc= =TY7F -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - The second part of the AMD MCA interrupts rework after the last-minute show-stopper from the last merge window was sorted out. After this, the AMD MCA deferred errors, thresholding and corrected errors interrupt handlers use common MCA code and are tightly integrated into the core MCA code, thereby getting rid of considerable duplication. All culminating into allowing CMCI error thresholding storms to be detected at AMD too, using the common infrastructure - Add support for two new MCA bank bits on AMD Zen6 which denote whether the error address logged is a system physical address, which obviates the need for it to be translated before further error recovery can be done * tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle AMD threshold interrupt storms x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems x86/mce: Add support for physical address valid bit x86/mce: Save and use APEI corrected threshold limit x86/mce/amd: Define threshold restart function for banks x86/mce/amd: Remove redundant reset_block() x86/mce/amd: Support SMCA Corrected Error Interrupt x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems x86/mce: Unify AMD DFR handler with MCA Polling x86/mce: Unify AMD THR handler with MCA Polling	2025-12-02 11:04:37 -08:00
Linus Torvalds	49219bba01	- imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future incarnations of this memory controllers architecture - amd64_edac: Remove the legacy csrow sysfs interface which has been deprecated and unused (we assume) for at least a decade - Add the capability to fallback to BIOS-provided address translation functionality (ACPI PRM) which can be used on systems unsupported by the current AMD address translation library - The usual fixes, fixlets, cleanups and improvements all over the place -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmktdyMACgkQEsHwGGHe VUpXTxAAhdQxn1v1tYKya6YHxBS3T3Y3+4fec+LeKgoY1YnoFHMse3TAU+G67opR 1xnEKHKrkX4v1FAwe7eD2G6qyz2ytqcApv4XGxmQ1WgldFWuPl/lI3ngPNMCHMog dqeQFRQ7MXsk0no0cjMA6NjafFpYOGGGhIzdU3wvgZawH4hG9wHLS6Urvn2SfWj6 Pf/449qS7XoPU5G22qWPqqixRHpc9BPkJfKMIYeaWbxldePlwbh9cOMLqwsZo1QV v5cv/3CAIVFzRvNVIx05kDhRrwqTjIZL+u9IYHg2g9DA45GQuktYQwd1KksbVpUn CijhpKMoSnQHN+ZLW84XzvEH2rvroSTZl28d5suY1GHXG3ePc9HpmTVbVElFXWKZ dq0X2RIbMEbSxneePFHJ4ESUfNN2HbPSfh/sXN4epxcMQI0VWVhXYs5+Ek4UV1+E hvhCS/kuAypODzEi0cULoMcXdyKr2V1zpaAHNlZshepp/kUzY46b3cBhxKiL3Fsd x+IhZgow9a+iMJfMpCJhMABKEkoZRgS3gs5nWMJ6t0EvulvknG+aovGB/Q0VaIIa H69Fn+R2ewnEuZf1JGZDMit1y+wjGgeamk+uWTym+tCyNH1eHaSq48POribajcYF UtcobK4kG7hPodsbwwD4MhqtSLhuyIcXTHbI3x4+r+LLAgdAPKM= =NidS -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - imh_edac: Add a new EDAC driver for Intel Diamond Rapids and future incarnations of this memory controllers architecture - amd64_edac: Remove the legacy csrow sysfs interface which has been deprecated and unused (we assume) for at least a decade - Add the capability to fallback to BIOS-provided address translation functionality (ACPI PRM) which can be used on systems unsupported by the current AMD address translation library - The usual fixes, fixlets, cleanups and improvements all over the place * tag 'edac_updates_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16() EDAC/igen6: Fix error handling in igen6_edac driver EDAC/imh: Setup 'imh_test' debugfs testing node EDAC/{skx_comm,imh}: Detect 2-level memory configuration EDAC/skx_common: Extend the maximum number of DRAM chip row bits EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers EDAC/skx_common: Prepare for skx_set_hi_lo() EDAC/skx_common: Prepare for skx_get_edac_list() EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error() EDAC/ie31200: Fix error handling in ie31200_register_mci RAS/CEC: Replace use of system_wq with system_percpu_wq EDAC: Remove the legacy EDAC sysfs interface EDAC/amd64: Remove NUM_CONTROLLERS macro EDAC/amd64: Generate ctl_name string at runtime RAS/AMD/ATL: Require PRM support for future systems ACPI: PRM: Add acpi_prm_handler_available() RAS/AMD/ATL: Return error codes from helper functions	2025-12-02 10:45:50 -08:00
Linus Torvalds	7f8d5f70ff	Tree wide cleanup of the remaining users of in_irq() which got replaced by in_hardirq() and marked deprecated in 2020. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkvDhUTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoW+XD/959KAIm2JpcEYUWuNBmlhEyuYWvPLw ZyOiraLYBNyWmfCO/Yz4Ff8VZSR9gdWQoNfvBb8uxkbSXa0UOEUhCbzWsuoTnqR5 ObTIHCJ9QmPlRiFDvs4Sf5TGmy/4nXh6/PoH3JykNdlD3rZMTxiAz/k6QuO/S2iu ykA+DNtNL7jDkQHzrWa3rf597BkBN1Z+hUD8zHRt8LYKRfmLYWjCMggjPLMnuqcn 240fnV/FubCLd9f5ZgNxHQMQCQH2qB7GYMk08YwXwCZQqIIXWqbNnhedkkNO3kWq Sws4TEO6yg9pgTFqkuiDU5QgYEboRY4pDT45KSkdTHHGZl2OAAl3eVIGCto72UEI Eyzn4k900hZ1iI/Rad5mx3D4XJZEXFgEbXhjph0odn6jVvmSj+Fmg3J67u1niO2a obzB+xeaIkbGNQIgJFy8+A9SSnZckvuPlXdZdUxS2S95zH7f9+vBY8HWJMuyursa 3AJAKa82mN1i3A9FdSuMTdttQWkDmrwPKVzxvixs1mBu7kB70XaRIKsPjZj7LH6X CiqP9Kt5FO0hVA7K+nKTeUA5DdjB4HzYzOgMqzFUhExY3hksVsj8rQEO6B0bCp9t CfITA3BvU7GXxhXZHOq3dABQ21J/ZHgeuK3QdQSnOxSQOv2ElYIdKvYirJy2QdS1 tSM3O3GXb4zWDg== =6LKf -----END PGP SIGNATURE----- Merge tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core irq cleanup from Thomas Gleixner: "Tree wide cleanup of the remaining users of in_irq() which got replaced by in_hardirq() and marked deprecated in 2020" * tag 'core-core-2025-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide: Remove in_irq()	2025-12-02 10:18:49 -08:00
Linus Torvalds	d42e504a55	Update to the time/timers core: - Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable. - A set of improvements for the timer migration mechanism: - Support imbalanced NUMA trees correctly - Support dynamic exclusion of CPUs from the migrator duty to allow the cpuset/isolation mechanism to exclude them from handling timers of remote idle CPUs. - The usual small updates, cleanups and enhancements -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmks7doTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoaxrD/40nxx+8cEXsVbVLIkP2PQbd2Y8+7sk YbNu/Cb7j7Bg7R8YIs4p5GHk+7Yt/hNsW77SmbAzRPUyYYG6L3bUYlBa3yQlvIuo xRPbzGA+RJies9skIGHbQ8z6ig1zUASRJPcBYiuaVIAuQhCfLNc4Nii9cEWtjZ24 +5gfRwV+vy74ArWwRkwaGejDK1tav+gd62OkFQZC8WtjQ08ozGZ6VBJNg7nYq/gH FYO1rH2tQ/ZyjlO/x5NF8gFcjYD8iv5PDp8oH35MPx+XTdDccf0G3QB7ug0ffVdV b4gA6lZTAmpsu/NHb6ByN4i/kf3wf8la/i+EaAh/Ov7NW078gunvVKVA7jStcbBl ZgG5SRHiKRvQF/WXLGVQAnilRDZwRuS0nmJlqfExa44v23l5o3768RwdRYwQlv8g X5KSRl0jlVgVtZHgNBlZtgX9+rnQSr9sB5sVGBP2a6a1WhVXQV/2kp0wjdnU0mPw jLCnSdsHqBlSf9V7O/na823WCnBFb7blrLBXUoSbHBnICqtVFzhE1kBXWw3S7Kqh CiaWM+S4WfR0HRnUlWMTS8BZ82MgiDnd7nGUXWwXBbdqWmoj/9CoU6SZRjbMBkzi EY1XvmoYf6eSzdxfydI1hFi0/bbb8K9umHQlrpW3HeN9uXnVc0/+TroVPLuaKUdi 53ClqXjzE+CpJg== =lQKn -----END PGP SIGNATURE----- Merge tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Prevent a thundering herd problem when the timekeeper CPU is delayed and a large number of CPUs compete to acquire jiffies_lock to do the update. Limit it to one CPU with a separate "uncontended" atomic variable. - A set of improvements for the timer migration mechanism: - Support imbalanced NUMA trees correctly - Support dynamic exclusion of CPUs from the migrator duty to allow the cpuset/isolation mechanism to exclude them from handling timers of remote idle CPUs - The usual small updates, cleanups and enhancements * tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Exclude isolated cpus from hierarchy cpumask: Add initialiser to use cleanup helpers sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() timers/migration: Use scoped_guard on available flag set/clear timers/migration: Add mask for CPUs available in the hierarchy timers/migration: Rename 'online' bit to 'available' selftests/timers/nanosleep: Add tests for return of remaining time selftests/timers: Clean up kernel version check in posix_timers time: Fix a few typos in time[r] related code comments time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc hrtimer: Store time as ktime_t in restart block timers/migration: Remove dead code handling idle CPU checking for remote timers timers/migration: Remove unused "cpu" parameter from tmigr_get_group() timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy timers/migration: Fix imbalanced NUMA trees timers/migration: Remove locking on group connection timers/migration: Convert "while" loops to use "for" tick/sched: Limit non-timekeeper CPUs calling jiffies update	2025-12-02 09:58:33 -08:00
Linus Torvalds	5028f42416	Updates for clocksource and clockevent drivers: - A new driver for the Realtel system timer - Prevent the unbinding of timers when the drivers do not support that. - Expand the timer counter readout for the SPRD driver to 64 bit to allow IOT devices suspend times of more than 36 hours, which is the current limit of the 32-bi readout - The usual small cleanups, fixes and enhancements all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksxAATHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYobIuD/9HNzi+SDKiWiwuhZEfwrk4IJY1k4uM yRrxQHt8yODKPq13M1eKNiXro3Tbhq6cLhECdQ6Rsf/g4Q0x+TeAl1M2CfHLMOJ5 +VYNqAx7b63bkZIp1pJk8HJfn4e9itDKnqEgi0M20tIoG3K8fLtZfyIdiuqOsTia USWOdOqnPtwIOtVvMLPCmjYTh2FFHFxxcrQgoAW+1ACwOq/AkSSSAqKNcjEB7edH 7C9IZpm6rCl+13ywMiHS5UsOYFWz1fOgSmQ1c7KPqx9PquMaJ7oZFAQgb2FF0xXJ S8DwTMKlwCO2Tq15XjmmCPLlvsGzZgVJkXhDsqyrDAZzOowqjHuT/HTrENLcE3K3 /gS721vahsLWfJp229whKkT11RDgQOO2c/3cplsL2joUyrkDzW4sloYuu00gqWrJ mR9srdA7F3HeSACPb6rX64Rzg3m63P/zJ20h2uJt/JblIkZd+3kBTELM30GZRQbn z176KwiRPy0TDbN8pW1I4I1sLtG7zYhaEsASGZM9yH9uKYU1cLej1SmmbLqDs3oO e0+QyK+A4OzR43LiRltN4X3dJJ59uf+zru12WGjV85WxJsA4rN4/5q/S0xcpWR7b eQNXn/YZwppdlwxTg+n2RWSTzOFtvNm8nfnepxB2UqffOAa1Ah87AT3rPaUrCULj NwI9Fy4AY4IvVQ== =0426 -----END PGP SIGNATURE----- Merge tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Thomas Gleixner: "Updates for clocksource and clockevent drivers: - A new driver for the Realtel system timer - Prevent the unbinding of timers when the drivers do not support that - Expand the timer counter readout for the SPRD driver to 64 bit to allow IOT devices suspend times of more than 36 hours, which is the current limit of the 32-bi readout - The usual small cleanups, fixes and enhancements all over the place" * tag 'timers-clocksource-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers: Add Realtek system timer driver dt-bindings: timer: Add Realtek SYSTIMER clocksource/drivers/stm32-lp: Drop unused module alias clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC clocksource/drivers/nxp-stm: Prevent driver unbind clocksource/drivers/nxp-pit: Prevent driver unbind clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind clocksource/drivers/nxp-stm: Fix section mismatches clocksource/drivers/sh_cmt: Always leave device running after probe clocksource/drivers/stm: Fix double deregistration on probe failure clocksource/drivers/ralink: Fix resource leaks in init error path clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit	2025-12-02 09:54:27 -08:00
Linus Torvalds	9ce62ebbb7	Updates for [PCI] MSI related code: - Remove one variant of PCI/MSI management as all users have been converted to use per device domains. That reduces the variants to two: The modern and the real archaic legacy variant, which keeps the usual suspects in the museum category alive. - Rework the platform MSI device ID detection mechanism in the ARM GIC world to address resource leaks, duplicated code and other details. This requires a corresponding preparatory step in the PCI/iproc driver. - Trivial core code cleanups -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkswn0THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYpuD/wKT7d6I6AqnJVF/RhiJ+/d6vuX/aFW g6E7XAkMLKhmxunSNFfPzXsHy2a0oJroYKmDJH4C8GWGo/gXa+QvmDt2491k9rdV zM+CBodBu3/bXWvTW+o1fbyAvG+p2C3+iSRW/gGqzPdcY8gQiRnNOZS1j7zusMjO A6pz5SvLSPWQUnVl9PygJBuNX5TFHPnY3AySRpW11CvqB5/8gqGz+O6lT/Q+5hov GUC57hskbQd1PsYhTNRaUR4z7VMolPHqscp8DYVCWjOMP/r5quC6dlsn91yxuATU 8D7oRiW8xkCaTJplY/rA6r/VxUthZ3EgIxzev3rGaWBdPxHcFfftf2oxyFFAf3lf 3rEdfGBcNgApx+MCcoT5/3mf3KJfn2/bE6bZhwv94+dtbTlHguztyMD3vnGTS73i zPWQ5ae4M5sqc8kCNMRaBfU8yQEHEKs3gia67vStZyn5R/uUNVKRo67LBPZKVDcJ 2511Ylnm62yG6PtdPGIFHY1i75uPpxXuS7F0BJignzM3iPvVvwLPZLDORr3/pR4q CmswZTA2obue6+nwz/LUacxzONsZ2Z8pzGY6rrT9sfj0Z4mk6xrfEPfjfmVoMpyk Dk4B8lIVYwcR7d/Sw+FIwYst8iw+L77Yn7kN8yCbh4lAOxBUUvtS5KAP6uPGe3D1 30Q/DbBVlEvg/g== =VAtQ -----END PGP SIGNATURE----- Merge tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI updates from Thomas Gleixner: "Updates for [PCI] MSI related code: - Remove one variant of PCI/MSI management as all users have been converted to use per device domains. That reduces the variants to two: The modern and the real archaic legacy variant, which keeps the usual suspects in the museum category alive. - Rework the platform MSI device ID detection mechanism in the ARM GIC world to address resource leaks, duplicated code and other details. This requires a corresponding preparatory step in the PCI/iproc driver. - Trivial core code cleanups" * tag 'irq-msi-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-its: Rework platform MSI deviceID detection PCI: iproc: Implement MSI controller node detection with of_msi_xlate() genirq/msi: Slightly simplify msi_domain_alloc() PCI/MSI: Delete pci_msi_create_irq_domain()	2025-12-02 09:35:59 -08:00
Linus Torvalds	15b87bec89	Boring updates for interrupt drivers: - Support for a couple of new ARM64 and RISCV SoC variants and their magic interrupt controllers which either can reuse existing code or require quirks due to a botched hardware implementation. - More section mismatch fixes. - The usual cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmkswMYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZvSEACZdCx7vO2XX7oef7DxQ6EKFA/NQvd0 xJGFTBlrxIucp26yUxWKkuDVdhu8WYe13zJG6+LVl9IxH3IIBa2duQ4HhIyqxuz6 z74IDjBlOcKAHu2xLFJmBIS4vGTd6UPOg1KvSrIFd9oiuMXikphbnFgyrFGAFiSQ J1gP7mKZATUH08mTXK5k1pmBIbMjEHpyyTdBEJKoVgiN/MB/qsq95dy0Oxal+C13 1cOKBaFreTMdX+77U5RucBcGaLHW4SdoaAVaqc/UXw2c2TAezbt/gPYexRpkdVaG 2tuYTWIfCUuHbjUoOOYwI+ILnuiBMzjxlIUx3uSvcvtUVO4YuMDR4JOWVsevtfgI uUV+4OPq9kBI6PNqAyo16NhDdZ9rmjg3q14F9oyidQfR5gRbsZPPDmtCB/M2jbE1 n3LlsHUJt0UYo8ZqCPrGhiw9hkGXv4wsEl10FKkyoNrQ0Y4SCUrdzGdr6vwhAAub yxMe1+BrFQT23R9l+qVrUZmDmpV9tlFNr6rPwtucrQX3PMWEfAeCc6a/vjY3eqJl sZ4pGyFEx0cwfKzHu1/SmNpnjSNdyc7niiN8HAQ7AnxzRW13fDdGQuuVGsKxHyJc Tke9wJsyUO4MxpSQDI+cmpsF8OeJDHuRDKMBdLFxlLPhABECdLUO0qKq9l0Ry/Ji uDkc3WvM14zKpw== =kdyt -----END PGP SIGNATURE----- Merge tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq driver updates from Thomas Gleixner: "Boring updates for interrupt drivers: - Support for a couple of new ARM64 and RISCV SoC variants and their magic interrupt controllers which either can reuse existing code or require quirks due to a botched hardware implementation - More section mismatch fixes - The usual cleanups and fixes all over the place" * tag 'irq-drivers-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count irqchip/aclint-sswi: Add Nuclei UX900 support dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC irqchip/irq-bcm7038-l1: Remove unused reg_mask_status() irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC irqchip/sifive-plic: Cache the interrupt enable state dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC dt-bindings: vendor-prefixes: Add UltraRISC irqchip/qcom-irq-combiner: Rename driver structure irqchip/riscv-imsic: Inline imsic_vector_from_local_id() irqchip/riscv-imsic: Embed the vector array in lpriv irqchip/riscv-imsic: Remove redundant irq_data lookups irqchip/ts4800: Drop unused module alias irqchip/mvebu-pic: Drop unused module alias irqchip/meson-gpio: Drop unused module alias ...	2025-12-02 09:32:53 -08:00
Linus Torvalds	6863c8385c	Updates for the interrupt core and treewide cleanups: - Rework of the Per Processor Interrupt (PPI) management on ARM[64]. PPI support was built under the assumption that the systems are homogenous so that the same CPU local device types are connected to them. That's unfortunately wishful thinking and created horrible workarounds. This rework provides affinity management for PPIs so that they can be individually configured in the firmware tables and mops up the related drivers all over the place. - Prevent CPUSET/isolation changes to arbitrarily affine interrupt threads to random CPUs, which ignores user or driver settings. - Plug a harmless race in the interrupt affinity proc interface, which allows to see a half updated mask - Adjust the priority of secondary interrupt threads on RT, so that the combination of primary and secondary thread emulates the hardware interrupt plus thread scenario. Having them at the same priority can cause starvation issues in some drivers. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksv3oTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoe5+D/wNnBaX9LRajuLOF+zaYw5WZxkzp6U7 X4AP3cLny8xynI1kM5V8M1ym3Fspk0hiqxNX2LLXrSZzBR+3O4uGCyCceBXeHKo2 vW4auUXG4MB+2sZyudQXaBpNK4A2YBubycTUcRECjkjDkBPAWgN7J+Oz2lXUSUcH zlitlHNo48hnZQPAJr4PDpi5q9+rChn+8/s+K1d8NlEf9HOXC98qzyMuMq+jHdJE AQ6tKoHkA5lHjHAUY3AbWptoHo1Wp+p5PSqsrFr6nbKuPlhUqRNEPXX0Z8q7aUTj NgdkvIHJVJ0C+T40FIWCNzUYOUk4gTQXBSPvptwJSHAmf9ovp+Kg2ltVZBzyL2iI R0EZSQAQU8iJcRrqjcAYqI36LkmwwVT6RD1zFa98xJT/AjsMpAt/U1pEMDtkoTKe Lv7ZQ/hloc+4wV4xS4zEtoV/ukdUfA9aEdXsh5hNH/07tvatpKO2LgortsiI+lCK 76vAULcGvbMr5Jr63snjICgstahunpNMRn2HmnGAjmdZf4+g+TDvZR4DI6bswtuO jp5G6OM30Z9zKheAr1VioV1XAKr6Y4jDKVjfFy/n1k5pDwYaSJopmZxSD35aas4e VqWizAzc5dAVCYRlzr6S1lrMQ2JJRg0RpIn+sMS8dhf9SK7hs5ilGSOsgX1fgVat 1N3WXvYM8vSW+g== =zrA1 -----END PGP SIGNATURE----- Merge tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Updates for the interrupt core and treewide cleanups: - Rework of the Per Processor Interrupt (PPI) management on ARM[64] PPI support was built under the assumption that the systems are homogenous so that the same CPU local device types are connected to them. That's unfortunately wishful thinking and created horrible workarounds. This rework provides affinity management for PPIs so that they can be individually configured in the firmware tables and mops up the related drivers all over the place. - Prevent CPUSET/isolation changes to arbitrarily affine interrupt threads to random CPUs, which ignores user or driver settings. - Plug a harmless race in the interrupt affinity proc interface, which allows to see a half updated mask - Adjust the priority of secondary interrupt threads on RT, so that the combination of primary and secondary thread emulates the hardware interrupt plus thread scenario. Having them at the same priority can cause starvation issues in some drivers" * tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq: Remove cpumask availability check on kthread affinity setting genirq: Fix interrupt threads affinity vs. cpuset isolated partitions genirq: Prevent early spurious wake-ups of interrupt threads genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier() genirq/manage: Reduce priority of forced secondary interrupt handler genirq/proc: Fix race in show_irq_affinity() genirq: Fix percpu_devid irq affinity documentation perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer irqdomain: Kill of_node_to_fwnode() helper genirq: Kill irq_{g,s}et_percpu_devid_partition() irqchip: Kill irq-partition-percpu irqchip/apple-aic: Drop support for custom PMU irq partitions irqchip/gic-v3: Drop support for custom PPI partitions coresight: trbe: Request specific affinities for per CPU interrupts perf: arm_spe_pmu: Request specific affinities for per CPU interrupts perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts genirq: Add request_percpu_irq_affinity() helper genirq: Allow per-cpu interrupt sharing for non-overlapping affinities genirq: Update request_percpu_nmi() to take an affinity genirq: Add affinity to percpu_devid interrupt requests ...	2025-12-02 09:14:26 -08:00
Linus Torvalds	312f5b1866	Two small updates for debugobjects: - Allow pool refill on RT enabled kernels before the scheduler is up and running to prevent pool exhaustion - Correct the lockdep override to prevent false positives. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksu/UTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoScuD/9g3OtG29VZ2uNkOhgvGKuAThxZ+Y4d 8YPNT/X9SuSevuwBCc9zXgpc5S4Af40ndRbcsiZ38t/xrInE6+J6qPJ7BbKCTiac cYvNz4ibMx1qz35BPtym4RnJyZA2EHX2hVIFGCfdh4MkILI7r3OPjemX542epZAW 8MdKu5WZJNa8KYIvUE1UZdDtH1imxU7jdYBkr1ockN66+HMjRKHxcPwrhTCFJeCT N4DHOQ+hf9NzipHpRppDmqwkzQCOyKrOojXht00rG92QXIzmZRepH93cCFi/nW1d 8aUjHU6myNQa65VkFDM2I2bpzCzlK7HpBU3iNXEkXPLZ8bVrYMP9koK+SXIa+Gj0 icXdJwBe9uOKQOaG6MRSO2hn3fHO0m+PjZGtQFg7EqFCaY0J+8tv9k3WttDDpfMg hjXjyJ0U9T+/YUuSDBLdPczIJZr8eGh960SF0OTshHGGVOCGJt4dlvoC0NtUxN8x WQ/he9K/Cyz7U6yr1aNO6hAfqX/+6c0ZhD3OONuC9xgxHUkjPdlEe1ntLbdfn92z VygbJaguvRdzkAeaAlXNNU5WTNvm3ZeLPqDnnUHUlDW1f7hF0KwCrfZUW0PqdB76 +94ptMeIlCv53zIEKamHuALGp7WtGddGGzaZLH8rUnPxfiff+JiMhXtV0ioMuUNG jpdlyBMXK+s0PA== =dUo2 -----END PGP SIGNATURE----- Merge tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull debugobjects update from Thomas Gleixner: "Two small updates for debugobjects: - Allow pool refill on RT enabled kernels before the scheduler is up and running to prevent pool exhaustion - Correct the lockdep override to prevent false positives" * tag 'core-debugobjects-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING	2025-12-02 09:07:48 -08:00
Linus Torvalds	2b09f480f0	A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksaRYTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoencEADA5he8PAFPmSRRPo6+2G5mHzWe8kIU 5ZViQStWFNAA0qqy8VXryWiJ6qqrO6la9o7K4YOXASUtlkVjquRp1DF7PabqGwuy zshbRCXNlT51J8uqanN8VrGVjlf+bMdHDbGoI1SLkUTxG8b+kDD5PXUQE1ARelPP Slbg9u+EMrxj6D5MDTPbuW6TqryJEkPtiNScyOz43emp9ww9+WVxenOcRqU4D+Th mjWmrGIzkroSf4XReMoD/wg9TPTpUjXnNCwl2viY9JvBpkMfYtU4tJAGK3aNFOWy zsAN0O9CaFGrUEFne7qUmtwhNLdtnjx5HN5pe7yZd1EhdTuQKq4jPiiQnwwm8w72 c0o6m45FNPmPoSyfaZWCkLjbTEUXonT9JF61iN35JVxim8gBDDJjHFKnLxDmLrH3 X0eESE48ReY2EneDV6Y8RJRo6oG14Fccvc39aTf/2Rw3trpmtt2agvConQzupQIg DzANw4jhUUzFRrHrMHACNsqKFXh9ratue/S9DM3xxTpGO/bKdeK7jGIgzNf8O34M J0O6Hvk5jMdcWlIJTx21GoGzoSkkXnR49g/71aCcp+MwdY4x9zFz5SWi8LWQRmkx xRo6tY27Bma8/SEwMJjPpAUXDTpq6v+j3cPisybL1yGsyt9lh+p8LX7VUtwcoEqe 6ZelC5Kgw/+/kg== =n5KT -----END PGP SIGNATURE----- Merge tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...	2025-12-02 08:48:53 -08:00
Linus Torvalds	1dce50698a	Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user..access(). This also corrects a couple of imbalanced masked__begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter() -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmksRfITHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVhBEACEySjWcyCrD1e0ZFMFAOJZFI2BShav reotzCzmHYQdpVukDRxc64BgM2vN4yB04xnyMhi2o4hSTiIJhz1NzbKggsQJhVoA psYz+xEI161HuLZnUBUBuF9RRko/HVsbGqO2JFCuOKor4GCycvjVgupR3EIN9h5T HZEWGIgaTmN7MBj0QRrJgJkaaSTnPKOwWaNMV/F9pfk27zuB7vuV8WM9P3FaJYG+ JGa9td7VGaBpWavxgMJqfdvXWBCVDDfZ1dunWx8tPTnLxKZZZD6HlfQXhZTr2n1e rtJpGgfVBx5Uqxn4RrhS0I7QeK1b9rrt3IU7EkFoaa3Z8LU5B7cHlm7KyicyoHhy SzFFUszssznT/0OhA5fmgPRlqI295HynW2p1L4Xy9hC0EZ2vXJPG5rO6X3x6QwSR asjRB7x/6JzWQUzE7/nhXd9KcB66wvQxhnjp7GqulF74aPBCtIdXXDD68YEDYkbi dPC3NRBr0ePbsGVGWbYvYIPWcvo1u814C2io1zKwmVbiN6lCYURgQK861vfAZUP8 oP5D2a6ENgezDKoJo6eJ82inuDu64qZy7OOkU/aO3cbOuWGVyY9CjYD11x85Nr0k UNabSOfvcmhmobtYUiAgLLrjX1grQUG3F74ZQTw513mwgMObuDAAoS11GPjY6HL6 b99WUJRv8jP66A== =6no0 -----END PGP SIGNATURE----- Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user..access(). This also corrects a couple of imbalanced masked__begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" * tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe__user() ARM: uaccess: Implement missing __get_user_asm_dword()	2025-12-02 08:01:39 -08:00
Borislav Petkov (AMD)	e2349c5811	Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-12-01 12:06:08 +01:00
Harry Fellowes	d911fe6e94	x86/boot: Clean up whitespace in a20.c Remove trailing whitespace on empty lines. No functional changes. [ bp: Massage commit message. ] Signed-off-by: Harry Fellowes <harryfellowes1@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20250825192832.6444-3-harryfellowes1@gmail.com	2025-11-28 20:29:52 +01:00
Rafael J. Wysocki	7cede21e9f	Merge branches 'pm-qos' and 'pm-tools' Merge PM QoS updates and a cpupower utility update for 6.19-rc1: - Introduce and document a QoS limit on CPU exit latency during wakeup from suspend-to-idle (Ulf Hansson) - Add support for building libcpupower statically (Zuo An) * pm-qos: Documentation: power/cpuidle: Document the CPU system wakeup latency QoS cpuidle: Respect the CPU system wakeup QoS limit for cpuidle sched: idle: Respect the CPU system wakeup QoS limit for s2idle pmdomain: Respect the CPU system wakeup QoS limit for cpuidle pmdomain: Respect the CPU system wakeup QoS limit for s2idle PM: QoS: Introduce a CPU system wakeup QoS limit * pm-tools: tools/power/cpupower: Support building libcpupower statically	2025-11-28 16:50:45 +01:00
Catalin Marinas	edde060637	Merge branch 'for-next/set_memory' into for-next/core * for-next/set_memory: : Fix + documentation for the arm64 change_memory_common() arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common	2025-11-28 15:48:03 +00:00
Catalin Marinas	52c4d1d624	Merge branch 'for-next/sysreg' into for-next/core * for-next/sysreg: : arm64 sysreg updates/cleanups arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64/sysreg: Add ICH_VMCR_EL2 arm64/sysreg: Move generation of RES0/RES1/UNKN to function arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor arm64/sysreg: Fix checks for incomplete sysreg definitions arm64/sysreg: Replace TCR_EL1 field macros	2025-11-28 15:47:53 +00:00
Catalin Marinas	17c05cb0ef	Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default\|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names	2025-11-28 15:47:12 +00:00
Rafael J. Wysocki	638757c9c9	Merge branches 'pm-em' and 'pm-opp' Merge energy model management updates and operating performance points (OPP) library changes for 6.19-rc1: - Add support for sending netlink notifications to user space on energy model updates (Changwoo Mini, Peng Fan) - Minor improvements to the Rust OPP interface (Tamir Duberstein) - Fixes to scope-based pointers in the OPP library (Viresh Kumar) * pm-em: PM: EM: Add to em_pd_list only when no failure PM: EM: Notify an event when the performance domain changes PM: EM: Implement em_notify_pd_created/updated() PM: EM: Implement em_notify_pd_deleted() PM: EM: Implement em_nl_get_pd_table_doit() PM: EM: Implement em_nl_get_pds_doit() PM: EM: Add an iterator and accessor for the performance domain PM: EM: Add a skeleton code for netlink notification PM: EM: Add em.yaml and autogen files PM: EM: Expose the ID of a performance domain via debugfs PM: EM: Assign a unique ID when creating a performance domain * pm-opp: rust: opp: simplify callers of `to_c_str_array` OPP: Initialize scope-based pointers inline rust: opp: fix broken rustdoc link	2025-11-28 16:44:00 +01:00
Dev Jain	0c2988aaa4	arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic Consider the following code path: (1) vmalloc -> (2) set_vm_flush_reset_perms -> (3) set_memory_ro/set_memory_rox -> .... (4) use the mapping .... -> (5) vfree -> (6) vm_reset_perms -> (7) set_area_direct_map. Or, it may happen that we encounter failure at (3) and directly jump to (5). In both cases, (7) may fail due to linear map split failure. But, we care about its success only for the region which got successfully changed by (3). Such a region is guaranteed to be pte-mapped. The TLDR is that (7) will surely succeed for the regions we care about. Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-28 15:36:40 +00:00
Dev Jain	e5efd56fa1	arm64/pageattr: Propagate return value from __change_memory_common The rodata=on security measure requires that any code path which does vmalloc -> set_memory_ro/set_memory_rox must protect the linear map alias too. Therefore, if such a call fails, we must abort set_memory_* and caller must take appropriate action; currently we are suppressing the error, and there is a real chance of such an error arising post commit `a166563e7e` ("arm64: mm: support large block mapping when rodata=full"). Therefore, propagate any error to the caller. Fixes: `a166563e7e` ("arm64: mm: support large block mapping when rodata=full") Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-28 15:36:40 +00:00
Rafael J. Wysocki	bf7ae1773e	Merge branches 'pm-cpuidle' and 'pm-powercap' Merge cpuidle and power capping updates for 6.19-rc1: - Use residency threshold in polling state override decisions in the menu cpuidle governor (Aboorva Devarajan) - Add sanity check for exit latency and target residency in the cpufreq core (Rafael Wysocki) - Use this_cpu_ptr() where possible in the teo governor (Christian Loehle) - Rework the handling of tick wakeups in the teo cpuidle governor to increase the likelihood of stopping the scheduler tick in the cases when tick wakeups can be counted as non-timer ones (Rafael Wysocki) - Fix a reverse condition in the teo cpuidle governor and drop a misguided target residency check from it (Rafael Wysocki) - Clean up muliple minor defects in the teo cpuidle governor (Rafael Wysocki) - Update header inclusion to make it follow the Include What You Use principle (Andy Shevchenko) - Enable MSR-based RAPL PMU support in the intel_rapl power capping driver and arrange for using it on the Panther Lake and Wildcat Lake processors (Kuppuswamy Sathyanarayanan) - Add support for Nova Lake and Wildcat Lake processors to the intel_rapl power capping driver (Kaushlendra Kumar, Srinivas Pandruvada) * pm-cpuidle: cpuidle: Warn instead of bailing out if target residency check fails cpuidle: Update header inclusion cpuidle: governors: teo: Add missing space to the description cpuidle: governors: teo: Simplify intercepts-based state lookup cpuidle: governors: teo: Fix tick_intercepts handling in teo_update() cpuidle: governors: teo: Rework the handling of tick wakeups cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold cpuidle: governors: teo: Use s64 consistently in teo_update() cpuidle: governors: teo: Drop redundant function parameter cpuidle: governors: teo: Drop misguided target residency check cpuidle: teo: Use this_cpu_ptr() where possible cpuidle: Add sanity check for exit latency and target residency cpuidle: menu: Use residency threshold in polling state override decisions * pm-powercap: powercap: intel_rapl: Enable MSR-based RAPL PMU support powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers powercap: intel_rapl: Add support for Nova Lake processors powercap: intel_rapl: Add support for Wildcat Lake platform	2025-11-28 16:29:41 +01:00
Rafael J. Wysocki	1fe2523713	Merge branch 'pm-cpufreq' Merge cpufreq updates for 6.19-rc1: - Add OPP and bandwidth support for Tegra186 (Aaron Kling) - Optimizations for parameter array handling in the amd-pstate cpufreq driver (Mario Limonciello) - Fix for mode changes with offline CPUs in the amd-pstate cpufreq driver (Gautham Shenoy) - Preserve freq_table_sorted across suspend/hibernate in the cpufreq core (Zihuan Zhang) - Adjust energy model rules for Intel hybrid platforms in the intel_pstate cpufreq driver and improve printing of debug messages in it (Rafael Wysocki) - Replace deprecated strcpy() in cpufreq_unregister_governor() (Thorsten Blum) - Fix duplicate hyperlink target errors in the intel_pstate cpufreq driver documentation and use :ref: directive for internal linking in it (Swaraj Gaikwad, Bagas Sanjaya) - Add Diamond Rapids OOB mode support to the intel_pstate cpufreq driver (Kuppuswamy Sathyanarayanan) - Use mutex guard for driver locking in the intel_pstate driver and eliminate some code duplication from it (Rafael Wysocki) - Replace udelay() with usleep_range() in ACPI cpufreq (Kaushlendra Kumar) - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu) * pm-cpufreq: (27 commits) cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list cpufreq: ACPI: Replace udelay() with usleep_range() cpufreq: intel_pstate: Eliminate some code duplication cpufreq: intel_pstate: Use mutex guard for driver locking cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs cpufreq/amd-pstate: Add static asserts for EPP indices cpufreq/amd-pstate: Fix some whitespace issues cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status() cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string cpufreq/amd-pstate: Use sysfs_match_string() for epp cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM Documentation: intel-pstate: Use :ref: directive for internal linking cpufreq: intel_pstate: Add Diamond Rapids OOB mode support Documentation: intel_pstate: fix duplicate hyperlink target errors cpufreq: CPPC: Don't warn if FIE init fails to read counters cpufreq: nforce2: fix reference count leak in nforce2 cpufreq: tegra186: add OPP support and set bandwidth cpufreq: dt-platdev: Add JH7110S SOC to the allowlist ...	2025-11-28 16:15:38 +01:00
Rafael J. Wysocki	f086594adb	Merge branch 'pm-sleep' Merge updates related to system suspend and hibernation for 6.19-rc1: - Replace snprintf() with scnprintf() in show_trace_dev_match() (Kaushlendra Kumar) - Fix memory allocation error handling in pm_vt_switch_required() (Malaya Kumar Rout) - Introduce CALL_PM_OP() macro and use it to simplify code in generic PM operations (Kaushlendra Kumar) - Add module param to backtrace all CPUs in the device power management watchdog (Sergey Senozhatsky) - Rework message printing in swsusp_save() (Rafael Wysocki) - Make it possible to change the number of hibernation compression threads (Xueqin Luo) - Clarify that only cgroup1 freezer uses PM freezer (Tejun Heo) - Add document on debugging shutdown hangs to PM documentation and correct a mistaken configuration option in it (Mario Limonciello) - Shut down wakeup source timer before removing the wakeup source from the list (Kaushlendra Kumar, Rafael Wysocki) - Introduce new PMSG_POWEROFF event for system shutdown handling with the help of PM device callbacks (Mario Limonciello) - Make pm_test delay interruptible by wakeup events (Riwen Lu) - Clean up kernel-doc comment style usage in the core hibernation code and remove unuseful comments from it (Sunday Adelodun, Rafael Wysocki) - Add support for handling wakeup events and aborting the suspend process while it is syncing file systems (Samuel Wu, Rafael Wysocki) * pm-sleep: (21 commits) PM: hibernate: Extra cleanup of comments in swap handling code PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper() PM: sleep: Add support for wakeup during filesystem sync PM: hibernate: Clean up kernel-doc comment style usage PM: suspend: Make pm_test delay interruptible by wakeup events usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks scsi: Add PM_EVENT_POWEROFF into suspend callbacks PM: Introduce new PMSG_POWEROFF event PM: wakeup: Update after recent wakeup source removal ordering change PM: wakeup: Delete timer before removing wakeup source from list Documentation: power: Correct a mistaken configuration option Documentation: power: Add document on debugging shutdown hangs freezer: Clarify that only cgroup1 freezer uses PM freezer PM: hibernate: add sysfs interface for hibernate_compression_threads PM: hibernate: make compression threads configurable PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads PM: hibernate: Rework message printing in swsusp_save() PM: dpm_watchdog: add module param to backtrace all CPUs PM: sleep: Introduce CALL_PM_OP() macro to simplify code PM: console: Fix memory allocation error handling in pm_vt_switch_required() ...	2025-11-28 16:01:13 +01:00
Rafael J. Wysocki	60d69a7ed1	Merge branches 'pm-core' and 'pm-runtime' Merge a core power management update and runtime PM framework updates for 6.19-rc1: - Add WQ_UNBOUND to pm_wq workqueue (Marco Crivellari) - Add runtime PM wrapper macros for ACQUIRE()/ACQUIRE_ERR() and use them in the PCI core and the ACPI TAD driver (Rafael Wysocki) - Improve runtime PM in the ACPI TAD driver (Rafael Wysocki) - Update pm_runtime_allow/forbid() documentation (Rafael Wysocki) - Fix typos in runtime.c comments (Malaya Kumar Rout) * pm-core: PM: WQ_UNBOUND added to pm_wq workqueue * pm-runtime: PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR() ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR() PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR() PM: runtime: fix typos in runtime.c comments ACPI: TAD: Improve runtime PM using guard macros ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove() PM: runtime: docs: Update pm_runtime_allow/forbid() documentation	2025-11-28 15:56:09 +01:00
Rafael J. Wysocki	af47d98064	Merge branches 'acpi-misc' and 'pnp' Merge miscellaneous ACPI support updates and a PNP update for 6.19-rc1: - Replace `core::mem::zeroed` with `pin_init::zeroed` in the ACPI Rust code (Siyuan Huang) - Update the ACPI code to use the new style of allocating workqueues and new global workqueues (Marco Crivellari) - Fix two spelling mistakes in the ACPI code (Chu Guangqing) - Fix ISAPNP to generate uevents to auto-load modules (René Rebe) * acpi-misc: ACPI: PM: Fix a spelling mistake ACPI: LPSS: Fix a spelling mistake ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users ACPI: OSL: replace use of system_wq with system_percpu_wq ACPI: scan: replace use of system_unbound_wq with system_dfl_wq rust: acpi: replace `core::mem::zeroed` with `pin_init::zeroed` * pnp: PNP: Fix ISAPNP to generate uevents to auto-load modules	2025-11-28 15:08:38 +01:00
Rafael J. Wysocki	ba9aeba053	Merge branches 'acpi-tad', 'acpi-fan', 'acpi-dptf' and 'acpi-tools' Merge updates of the ACPI time and alarm device (TAD) driver, ACPI fan driver, ACPI DPTF code and an ACPI utility update for 6.19-rc1: - Improve runtime PM in the ACPI time and alarm device (TAD) driver using guard macros and rearrange code related to runtime PM in acpi_tad_remove() (Rafael Wysocki) - Add support for Microsoft fan extensions to the ACPI fan driver along with notification support and work around a 64-bit firmware bug in that driver (Armin Wolf) - Use ACPI_FREE() to free ACPI buffer in the ACPI DPTF code (Kaushlendra Kumar) - Fix a memory leak and a resource leak in the ACPI pfrut utility (Malaya Kumar Rout) * acpi-tad: ACPI: TAD: Improve runtime PM using guard macros ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove() * acpi-fan: ACPI: fan: Add support for Microsoft fan extensions ACPI: fan: Add hwmon notification support ACPI: fan: Add basic notification support ACPI: fan: Workaround for 64-bit firmware bug * acpi-dptf: ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation * acpi-tools: ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c	2025-11-28 15:01:07 +01:00
Rafael J. Wysocki	24d268add6	Merge branches 'acpica', 'acpi-property', 'acpi-pm' and 'acpi-battery' Merge an ACPICA change, device ACPI properties handling update, ACPI power management updates, and an ACPI battery driver update for 6.19-rc1: - Avoid walking the ACPI namespace in the AML interpreter if the starting node cannot be determined (Cryolitia PukNgae) - Use min() instead of min_t() in the ACPI device properties handling code to avoid discarding significant bits (David Laight) - Fix potential fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() that may prevent the parent fwnode from being released (Haotian Zhang) - Rework acpi_graph_get_next_endpoint() to use ACPI functions only, remove unnecessary contitionals from it to make it easier to follow, and make acpi_get_next_subnode() static (Sakari Ailus) - Drop unused function acpi_get_lps0_constraint(), make some Low-Power S0 callback functions for suspend-to-idle static, and rearrange the code retrieving Low-Power S0 constraits so it only runs when the constraits are actually used (Rafael Wysocki) - Drop redundant locking from the ACPI battery driver (Rafael Wysocki) * acpica: ACPICA: Avoid walking the Namespace if start_node is NULL * acpi-property: ACPI: property: use min() instead of min_t() ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() ACPI: property: Rework acpi_graph_get_next_endpoint() ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only ACPI: property: Make acpi_get_next_subnode() static * acpi-pm: ACPI: PM: s2idle: Only retrieve constraints when needed ACPI: PM: s2idle: Staticise LPS0 callback functions ACPI: PM: s2idle: Drop acpi_get_lps0_constraint() * acpi-battery: ACPI: battery: Drop redundant locking	2025-11-28 14:44:36 +01:00
Rafael J. Wysocki	63d26c3811	- Document the RZ/V2H TSU DT bindings (Ovidiu Panait) - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu Pallikunhi) - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut) - Convert to DEFINE_SIMPLE_DEV_PM_OPS in the R-Car [Gen3] (Geert Uytterhoeven) - Fix format string bug in thermal-engine (Malaya Kumar Rout) - Make ipq5018 tsens standalone compatible (George Moussalem) - Add the QCS8300 compatible for the QCom Tsens (Gaurav Kohli) - Add the support for the NXP i.MX91 thermal module, including the DT bindings (Pengfei Li) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmkocvMACgkQqDIjiipP 6E/vOQgAncWodGutUfBwvXePKEiR6vgGJK1MExTpgwhPmbJRttEtQhXcG6/Rk76e hA4q+/Cags5rU/ydK5Qor8PcAmC/Y98Xs9fepWoygKB1fClWejVftTBSANgHDLnD Txz12/F9FXxy957mm0NEKgVUkMYt/cipzj0c0y5c4xPVyVFYenVG0YTitCNlmTsI ouBZgKXCZ1lYnGsRwebNcODnkvMKzjOeN5PetcI5ThSGF7SWcgJb9bKRTKS+kJ+r 1j68xJrjh/viYv7qrpaOCMaYbQcb38yPncPOWHLMAx2AIWVceOcmKWEypA2dT63z agHB9jkhf/8kUuymB6tWSA4zjACLsw== =fLGN -----END PGP SIGNATURE----- Merge tag 'thermal-v6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal control changes for 6.19-rc1 from Daniel Lezcano: "- Document the RZ/V2H TSU DT bindings (Ovidiu Panait) - Document the Kaanapali Temperature Sensor (Manaf Meethalavalappu Pallikunhi) - Document R-Car Gen4 and RZ/G2 support in driver comment (Marek Vasut) - Convert to DEFINE_SIMPLE_DEV_PM_OPS in the R-Car [Gen3] (Geert Uytterhoeven) - Fix format string bug in thermal-engine (Malaya Kumar Rout) - Make ipq5018 tsens standalone compatible (George Moussalem) - Add the QCS8300 compatible for the QCom Tsens (Gaurav Kohli) - Add the support for the NXP i.MX91 thermal module, including the DT bindings (Pengfei Li) * tag 'thermal-v6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module dt-bindings: thermal: tsens: Add QCS8300 compatible dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible tools/thermal/thermal-engine: Fix format string bug in thermal-engine thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS() thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS() thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU	2025-11-28 13:02:50 +01:00
Rafael J. Wysocki	9fac2a114b	Merge back ACPI processor driver changes for 6.19	2025-11-28 12:59:01 +01:00
Ben Horgan	27abb1ee5a	arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS The define ARM64_FEATURE_FIELD_BITS is now unused and feature id fields don't always have 4 bits. Remove it. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:17:59 +00:00
Ben Horgan	4138cc63d3	KVM: arm64: selftests: Consider all 7 possible levels of cache In test_clidr() if an empty cache level is not found then the TEST_ASSERT will not fire. Fix this by considering all 7 possible levels when iterating through the hierarchy. Found by inspection. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:16:46 +00:00
Ben Horgan	bf09ee9180	KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user ARM64_FEATURE_FIELD_BITS is set to 4 but not all ID register fields are 4 bits. See for instance ID_AA64SMFR0_EL1. The last user of this define, ARM64_FEATURE_FIELD_BITS, is the set_id_regs selftest. Its logic assumes the fields aren't a single bits; assert that's the case and stop using the define. As there are no more users, ARM64_FEATURE_FIELD_BITS is removed from the arm64 tools sysreg.h header. A separate commit removes this from the kernel version of the header. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:16:46 +00:00
Seongsu Park	c86d9f8764	arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros The ATOMIC_FETCH_OP_AND and ATOMIC64_FETCH_OP_AND macros accept 'mb' and 'cl' parameters but never use them in their implementation. These macros simply delegate to the corresponding andnot functions, which handle the actual atomic operations and memory barriers. Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-27 18:15:24 +00:00
Sebastian Andrzej Siewior	37de2dbc31	debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP fill_pool_map is used to suppress nesting violations caused by acquiring a spinlock_t (from within the memory allocator) while holding a raw_spinlock_t. The used annotation is wrong. LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for lock type which are sleeping while spinning on PREEMPT_RT such as spinlock_t. Use LD_WAIT_CONFIG as override. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251127153652.291697-3-bigeasy@linutronix.de	2025-11-27 16:55:34 +01:00
Sebastian Andrzej Siewior	06e0ae988f	debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING The pool of free objects is refilled on several occasions such as object initialisation. On PREEMPT_RT refilling is limited to preemptible sections due to sleeping locks used by the memory allocator. The system boots with disabled interrupts so the pool can not be refilled. If too many objects are initialized and the pool gets empty then debugobjects disables itself. Refiling can also happen early in the boot with disabled interrupts as long as the scheduler is not operational. If the scheduler can not preempt a task then a sleeping lock can not be contended. Allow to additionally refill the pool if the scheduler is not operational. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251127153652.291697-2-bigeasy@linutronix.de	2025-11-27 16:55:34 +01:00
Rafael J. Wysocki	5e8b7b58b2	Update devfreq next for v6.19 Detailed description for this pull request: - Move governor.h under include/linux/ and rename to devfreq-governor.h in order to allow devfreq governor definitions in out of drivers/devfreq/. - Fix potential use-after-free issue of OPP handling on hisi_uncore_freq.c - Use min() to improve the readability on tegra30-devfreq.c - Fix typo in DFSO_DOWNDIFFERENTIAL macro name on governor_simpleondemand.c -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAmkoZ+wACgkQnM3fLN7r z1NDrhAAtHlmhFgMQdGO8gfvKHRWQ4qzrJkiNGlU2iwXznCB38bNIGHdXAV5hPaL p0N+HeW2JgQRD2fjkB3gez5IJXDPo7xu3YJQ6uTuAY/k/AFzoz6HJR97luBZhPH3 38a10cAJM9wbUd/ssrSHQMjBGr4S+teAuT99fBn+VpNzUzNs7XGfcqvm3cwP48QX 4sv77IGZn1noXkVppq2UjBxjL4f4njcTEuOnWyVzEsdgYqDVc34MR1wImUx1ypFy uzxyOvOWfE0J955wisBltBPdb3WCRH9ZoRdmcCq+KUlgjtRvfK9B0KUccOgyACIq WTIar61Ct2hY1bLD/P57g7qH/jO/3/7usNjeaHr+OacKie4tAFqj8IdYB7RcwDfu iqYzTD+C+10kAUaLLlqwYap7pmW+0I7gLK/oUVRy1qNvfkA13v/i2uMdUgXm6flJ /4cjPDTq1P8Y5ZOpu4FKSho7Lj2e2e1Oz6bspJ3Ny0PZZupN7LlW19h0TwMIX4g+ 1PaKOsxeM9Tj6SG8IdxYcvKnfmxeY4KFBCenmURVqhy+xizkBl/ekD1gPyuKF467 XWgppWWU3RQ0P5VXcFxvi8q6a9yOhFeqF5RJGmpJA0LXYeI6riBMYeI4VTgfiOeO ABii33JZt7dktydlD7RDS6hECfzoGYqR5wFk4K+lmzLP4BPLcZU= =dEyH -----END PGP SIGNATURE----- Merge tag 'devfreq-next-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq changes for v6.19 from Chanwoo Choi: "- Move governor.h under include/linux/ and rename to devfreq-governor.h in order to allow devfreq governor definitions in out of drivers/devfreq/. - Fix potential use-after-free issue of OPP handling on hisi_uncore_freq.c - Use min() to improve the readability on tegra30-devfreq.c - Fix typo in DFSO_DOWNDIFFERENTIAL macro name on governor_simpleondemand.c" * tag 'devfreq-next-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate PM / devfreq: hisi: Fix potential UAF in OPP handling PM / devfreq: Move governor.h to a public header location	2025-11-27 16:28:16 +01:00
Brendan Jackman	3d1f108845	x86/mm: Delete disabled debug code This code doesn't run. Since 2008: `4f9c11dd49` ("x86, 64-bit: adjust mapping of physical pagetables to work with Xen") the kernel has gained more flexible logging and tracing capabilities; presumably if anyone wanted to take advantage of this log message they would have got rid of the "if (0)" so they could use these capabilities. Since they haven't, just delete it. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003-x86-init-cleanup-v1-1-f2b7994c2ad6@google.com	2025-11-27 14:32:16 +01:00
Chu Guangqing	a508939e15	ACPI: PM: Fix a spelling mistake Fix spelling by replacing "interrups" with "interrupts". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251125022403.2614-1-chuguangqing@inspur.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-27 14:11:19 +01:00
Chu Guangqing	037dada8bb	ACPI: LPSS: Fix a spelling mistake Fix spelling by replacing "successfull" with "successful". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251125021431.2243-1-chuguangqing@inspur.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-27 14:08:26 +01:00
René Rebe	17e7972979	ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4 On all AMD AM4 systems I have seen, e.g ASUS X470-i, Pro WS X570 Ace and equivalent Gigabyte, amd-pstate does not initialize when the x2apic is enabled in the BIOS. Kernel debug messages include: [ 0.315438] acpi LNXCPU:00: Failed to get CPU physical ID. [ 0.354756] ACPI CPPC: No CPC descriptor for CPU:0 [ 0.714951] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled I tracked this down to map_x2apic_id() checking device_declaration passed in via the type argument of acpi_get_phys_id() via map_madt_entry() while map_lapic_id() does not. It appears these BIOSes use Processor statements for declaring the CPUs in the ACPI namespace instead of processor device objects (which should have been used). CPU declarations via Processor statements were deprecated in ACPI 6.0 that was released 10 years ago. They should not be used any more in any contemporary platform firmware. I tried to contact Asus support multiple times, but never received a reply nor did any BIOS update ever change this. Fix amd-pstate w/ x2apic on am4 by allowing map_x2apic_id() to work with CPUs declared via Processor statements for IDs less than 255, which is consistent with ACPI 5.0 that still allowed Processor statements to be used for declaring CPUs. Fixes: `7237d3de78` ("x86, ACPI: add support for x2apic ACPI extensions") Signed-off-by: René Rebe <rene@exactco.de> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251126.165513.1373131139292726554.rene@exactco.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-26 18:06:25 +01:00
Heiko Carstens	283f90b50d	watchdog: diag288_wdt: Remove KMSG_COMPONENT macro The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" from 2008 [1] which never made it upstream. The macro was added to s390 code to allow for an out-of-tree patch which used this to generate unique message ids. Also this out-of-tree doesn't exist anymore. Remove the macro in order to get rid of a pointless indirection. [1] https://lwn.net/Articles/292650/ Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-26 17:34:52 +01:00
Pengfei Li	c411d8bf06	thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unit Introduce support for the i.MX91 thermal monitoring unit, which features a single sensor for the CPU. The register layout differs from other chips, necessitating the creation of a dedicated file for this. This sensor provides a resolution of 1/64°C (6-bit fraction). For actual accuracy, refer to the datasheet, as it varies depending on the chip grade. Provide an interrupt for end of measurement and threshold violation and Contain temperature threshold comparators, in normal and secure address space, with direction and threshold programmability. Datasheet Link: https://www.nxp.com/docs/en/data-sheet/IMX91CEC.pdf Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251020-imx91tmu-v7-2-48d7d9f25055@nxp.com	2025-11-26 15:51:28 +01:00
Pengfei Li	f32aedc575	dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal module Add bindings documentation for i.MX91 thermal modules. Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251020-imx91tmu-v7-1-48d7d9f25055@nxp.com	2025-11-26 15:51:16 +01:00
Gaurav Kohli	1ee90870ce	dt-bindings: thermal: tsens: Add QCS8300 compatible Add compatibility string for the thermal sensors on QCS8300 platform. Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Link: https://patch.msgid.link/20250822042316.1762153-2-quic_gkohli@quicinc.com	2025-11-26 15:50:59 +01:00
Thomas Gleixner	2437f79880	- Use 64-bits for timer compensation for IoT usage where the suspend time is much longer than what 32-bits can provide (Enlin Mu) - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou) - Fix missing resource release on error in the probe path of in the ralink driver (Haotian Zhang) - Fix double deregistration on probe failure in the NXP STM driver (Johan Hovold) - Disable runtime PM for the Renesas SH CMT timer because it is incompatible with PREEMPT_RT=y (Niklas Söderlund) - Fix section mismatches in the NXP STM driver (Johan Hovold) - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as the code does not suppport bind/unbind (Johan Hovold) - Use the clocksource instead of ticks on the RDA8810PL platform (Enlin Mu) - Drop the unused module alias for the STM32-LP (Johan Hovold) - Add Realtek system timer driver (Hao-Wen Ting) -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmkm6t4ACgkQqDIjiipP 6E8xqQgAlXnV3vRJmEbjd3ILECvbKMLI2haHV2eA+75P+DvbfriL+ePMHkfkOPI6 CC5UhCSy410cQLO88tzy5+9K8Po2KnHxb+lVS2P6zzcdefL5ZWMZ9Q+CAOwSo1s9 An1A4nUgcTB52mAR+jlz++SF1VV/fMvskMrtiTg8bSIScSc+xi4sEC3GaZR09qSG RODtzmVsyeoHQ1u6ziRJen8GzpX1q6vUP0eAAr+vXqTUXdCuUL8P20h2mwzxPJWH mFo53OuKVbTMOoY1Av7euvO1ZZ1tsHsS4NxJfD1qatq+eh1As1dxYodB4dp44qZt jjnVuj0QrE40VB6EnHAA4kKb6WWpow== =WXsR -----END PGP SIGNATURE----- Merge tag 'timers-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource Pull clocksource/event changes from Daniel Lezcano: - Use 64-bits for timer compensation for IoT usage where the suspend time is much longer than what 32-bits can provide (Enlin Mu) - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou) - Fix missing resource release on error in the probe path of in the ralink driver (Haotian Zhang) - Fix double deregistration on probe failure in the NXP STM driver (Johan Hovold) - Disable runtime PM for the Renesas SH CMT timer because it is incompatible with PREEMPT_RT=y (Niklas Söderlund) - Fix section mismatches in the NXP STM driver (Johan Hovold) - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as the code does not suppport bind/unbind (Johan Hovold) - Use the clocksource instead of ticks on the RDA8810PL platform (Enlin Mu) - Drop the unused module alias for the STM32-LP (Johan Hovold) - Add Realtek system timer driver (Hao-Wen Ting) Link: https://lore.kernel.org/all/9303b790-28d4-4bd9-b01d-28fb05493596@linaro.org	2025-11-26 15:36:52 +01:00
Rafael J. Wysocki	6e757fd548	Merge back ACPI processor driver changes for 6.19	2025-11-26 13:56:30 +01:00
Heiko Carstens	1c93edfd50	s390/entry: Use lay instead of aghik Use the lay instruction instead of aghik. aghik is only available since z196, therefore compiling the kernel for z10 results in this error: arch/s390/kernel/entry.S: Assembler messages: arch/s390/kernel/entry.S:165: Error: Unrecognized opcode: `aghik' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511261518.nBbQN5h7-lkp@intel.com/ Fixes: `f5730d44e0` ("s390: Add stackprotector support") Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-26 12:28:23 +01:00
Hao-Wen Ting	d1780dce95	clocksource/drivers: Add Realtek system timer driver Add a system timer driver for Realtek SoCs. This driver registers the 1 MHz global hardware counter on Realtek platforms as a clock event device. Since this hardware counter starts counting automatically after SoC power-on, no clock initialization is required. Because the counter does not stop or get affected by CPU power down, and it supports oneshot mode, it is typically used as a tick broadcast timer. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251126060110.198330-3-haowen.ting@realtek.com	2025-11-26 11:25:15 +01:00
Hao-Wen Ting	40caba2bd0	dt-bindings: timer: Add Realtek SYSTIMER The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating at a fixed 1MHz frequency. Thanks to its compare match interrupt capability, the timer natively supports oneshot mode for tick broadcast functionality. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://patch.msgid.link/20251126060110.198330-2-haowen.ting@realtek.com	2025-11-26 11:25:15 +01:00
Johan Hovold	ed92a968a9	clocksource/drivers/stm32-lp: Drop unused module alias The driver cannot be built as a module so drop the unused platform module alias. Note that platform aliases are not needed for OF probing should it ever become possible to build the driver as a module. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111154516.1698-1-johan@kernel.org	2025-11-26 11:25:15 +01:00
Enlin Mu	627f3f3716	clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoC The current system log timestamp accuracy is tick based, which can not meet the usage requirements and needs to reach nanoseconds. Therefore, the sched_clock_register function needs to be added. [ dlezcano: Fixed typos ] Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251107063347.3692-1-enlin.mu@linux.dev	2025-11-26 11:25:11 +01:00
Johan Hovold	6a2416892e	clocksource/drivers/nxp-stm: Prevent driver unbind Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-4-johan@kernel.org	2025-11-26 11:25:03 +01:00
Johan Hovold	e25f964cf4	clocksource/drivers/nxp-pit: Prevent driver unbind The driver does not support unbinding (e.g. as clockevents cannot be deregistered) so suppress the bind attributes to prevent the driver from being unbound and rebound after registration (and disabling the timer when reprobing fails). Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: `bee33f22d7` ("clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-3-johan@kernel.org	2025-11-26 11:24:57 +01:00
Johan Hovold	6aa10f0e2e	clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbind Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Fixes: `4891f01527` ("clocksource/drivers/arm_arch_timer: Add standalone MMIO driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20251111153226.579-2-johan@kernel.org	2025-11-26 11:24:47 +01:00
Johan Hovold	b452d2c97e	clocksource/drivers/nxp-stm: Fix section mismatches Platform drivers can be probed after their init sections have been discarded (e.g. on probe deferral or manual rebind through sysfs) so the probe function must not live in init. Device managed resource actions similarly cannot be discarded. The "_probe" suffix of the driver structure name prevents modpost from warning about this so replace it to catch any similar future issues. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: stable@vger.kernel.org # 6.16 Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017054943.7195-1-johan@kernel.org	2025-11-26 11:24:44 +01:00
Niklas Söderlund	62524f285c	clocksource/drivers/sh_cmt: Always leave device running after probe The CMT device can be used as both a clocksource and a clockevent provider. The driver tries to be smart and power itself on and off, as well as enabling and disabling its clock when it's not in operation. This behavior is slightly altered if the CMT is used as an early platform device in which case the device is left powered on after probe, but the clock is still enabled and disabled at runtime. This has worked for a long time, but recent improvements in PREEMPT_RT and PROVE_LOCKING have highlighted an issue. As the CMT registers itself as a clockevent provider, clockevents_register_device(), it needs to use raw spinlocks internally as this is the context of which the clockevent framework interacts with the CMT driver. However in the context of holding a raw spinlock the CMT driver can't really manage its power state or clock with calls to pm_runtime_() and clk_() as these calls end up in other platform drivers using regular spinlocks to control power and clocks. This mix of spinlock contexts trips a lockdep warning. ============================= [ BUG: Invalid wait context ] 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 Not tainted ----------------------------- swapper/1/0 is trying to lock: ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88 ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0 other info that might help us debug this: ccree e6601000.crypto: ARM ccree device initialized context-{5:5} 2 locks held by swapper/1/0: #0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8 #1: ffff0000089a5858 (&ch->lock){....}-{2:2} usbcore: registered new interface driver usbhid , at: sh_cmt_start+0x30/0x364 stack backtrace: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 PREEMPT Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) Call trace: show_stack+0x14/0x1c (C) dump_stack_lvl+0x6c/0x90 dump_stack+0x14/0x1c __lock_acquire+0x904/0x1584 lock_acquire+0x220/0x34c _raw_spin_lock_irqsave+0x58/0x80 __pm_runtime_resume+0x38/0x88 sh_cmt_start+0x54/0x364 sh_cmt_clock_event_set_oneshot+0x64/0xb8 clockevents_switch_state+0xfc/0x13c tick_broadcast_set_event+0x30/0xa4 __tick_broadcast_oneshot_control+0x1e0/0x3a8 tick_broadcast_oneshot_control+0x30/0x40 cpuidle_enter_state+0x40c/0x680 cpuidle_enter+0x30/0x40 do_idle+0x1f4/0x26c cpu_startup_entry+0x34/0x40 secondary_start_kernel+0x11c/0x13c __secondary_switched+0x74/0x78 For non-PREEMPT_RT builds this is not really an issue, but for PREEMPT_RT builds where normal spinlocks can sleep this might be an issue. Be cautious and always leave the power and clock running after probe. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se	2025-11-26 11:24:40 +01:00
Johan Hovold	6b38a8b31e	clocksource/drivers/stm: Fix double deregistration on probe failure The purpose of the devm_add_action_or_reset() helper is to call the action function in case adding an action ever fails so drop the clock source deregistration from the error path to avoid deregistering twice. Fixes: `cec32ac758` ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017055039.7307-1-johan@kernel.org	2025-11-26 11:24:37 +01:00
Haotian Zhang	2ba8e2aae1	clocksource/drivers/ralink: Fix resource leaks in init error path The ralink_systick_init() function does not release all acquired resources on its error paths. If irq_of_parse_and_map() or a subsequent call fails, the previously created I/O memory mapping and IRQ mapping are leaked. Add goto-based error handling labels to ensure that all allocated resources are correctly freed. Fixes: `1f2acc5a8a` ("MIPS: ralink: Add support for systick timer found on newer ralink SoC") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251030090710.1603-1-vulab@iscas.ac.cn	2025-11-26 11:24:34 +01:00
Stephen Eta Zhou	640594a04f	clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock source is not registered Register a valid read_current_timer() function for the SP804 timer on ARM32. On ARM32 platforms, when the SP804 timer is selected as the clocksource, the driver does not register a valid read_current_timer() function. As a result, features that rely on this API—such as rdseed—consistently return incorrect values. To fix this, a delay_timer structure is registered during the SP804 driver's initialization. The read_current_timer() function is implemented using the existing sp804_read() logic, and the timer frequency is reused from the already-initialized clocksource. Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250525-sp804-fix-read_current_timer-v4-1-87a9201fa4ec@gmail.com	2025-11-26 11:24:32 +01:00
Enlin Mu	576c564ec3	clocksource/drivers/sprd: Enable register for timer counter from 32 bit to 64 bit Using 32 bit for suspend compensation, the max compensation time is 36 hours(working clock is 32k).In some IOT devices, the suspend time may be long, even exceeding 36 hours. Therefore, a 64 bit timer counter is needed for counting. Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://patch.msgid.link/20251106021830.34846-1-enlin.mu@linux.dev	2025-11-26 11:24:26 +01:00
Riwen Lu	d9600d5766	PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro name Correct the spelling error in the DFSO_DOWNDIFFERENTIAL macro definition and update the corresponding variable assignment. The macro was previously misspelled as DFSO_DOWNDIFFERENCTIAL. This change ensures consistent and correct spelling throughout the simpleondemand governor implementation. Signed-off-by: Riwen Lu <luriwen@kylinos.cn> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20251118032339.2799230-1-luriwen@kylinos.cn/	2025-11-26 13:58:59 +09:00
Cryolitia PukNgae	9d6c58dae8	ACPICA: Avoid walking the Namespace if start_node is NULL Although commit `0c9992315e` ("ACPICA: Avoid walking the ACPI Namespace if it is not there") fixed the situation when both start_node and acpi_gbl_root_node are NULL, the Linux kernel mainline now still crashed on Honor Magicbook 14 Pro [1]. That happens due to the access to the member of parent_node in acpi_ns_get_next_node(). The NULL pointer dereference will always happen, no matter whether or not the start_node is equal to ACPI_ROOT_OBJECT, so move the check of start_node being NULL out of the if block. Unfortunately, all the attempts to contact Honor have failed, they refused to provide any technical support for Linux. The bad DSDT table's dump could be found on GitHub [2]. DMI: HONOR FMB-P/FMB-P-PCB, BIOS 1.13 05/08/2025 Link: `1c1b57b9eb` Link: https://gist.github.com/Cryolitia/a860ffc97437dcd2cd988371d5b73ed7 [1] Link: https://github.com/denis-bb/honor-fmb-p-dsdt [2] Signed-off-by: Cryolitia PukNgae <cryolitia.pukngae@linux.dev> Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20251125-acpica-v1-1-99e63b1b25f8@linux.dev Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 22:14:11 +01:00
Thomas Gleixner	653fda7ae7	sched/mmcid: Switch over to the new mechanism Now that all pieces are in place, change the implementations of sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict ownership scheme and switch context_switch() over to use the new mm_cid_schedin() functionality. The common case is that there is no mode change required, which makes fork() and exit() just update the user count and the constraints. In case that a new user would exceed the CID space limit the fork() context handles the transition to per CPU mode with mm::mm_cid::mutex held. exit() handles the transition back to per task mode when the user count drops below the switch back threshold. fork() might also be forced to handle a deferred switch back to per task mode, when a affinity change increased the number of allowed CPUs enough. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.280380631@linutronix.de	2025-11-25 19:45:42 +01:00
Thomas Gleixner	9da6ccbcea	sched/mmcid: Implement deferred mode change When affinity changes cause an increase of the number of CPUs allowed for tasks which are related to a MM, that might results in a situation where the ownership mode can go back from per CPU mode to per task mode. As affinity changes happen with runqueue lock held there is no way to do the actual mode change and required fixup right there. Add the infrastructure to defer it to a workqueue. The scheduled work can race with a fork() or exit(). Whatever happens first takes care of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.216484739@linutronix.de	2025-11-25 19:45:42 +01:00
Thomas Gleixner	c809f081fe	irqwork: Move data struct to a types header ... to avoid header recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.152813625@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	fbd0e71dc3	sched/mmcid: Provide CID ownership mode fixup functions CIDs are either owned by tasks or by CPUs. The ownership mode depends on the number of tasks related to a MM and the number of CPUs on which these tasks are theoretically allowed to run on. Theoretically because that number is the superset of CPU affinities of all tasks which only grows and never shrinks. Switching to per CPU mode happens when the user count becomes greater than the maximum number of CIDs, which is calculated by: opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users); max_cids = min(1.25 * opt_cids, nr_cpu_ids); The +25% allowance is useful for tight CPU masks in scenarios where only a few threads are created and destroyed to avoid frequent mode switches. Though this allowance shrinks, the closer opt_cids becomes to nr_cpu_ids, which is the (unfortunate) hard ABI limit. At the point of switching to per CPU mode the new user is not yet visible in the system, so the task which initiated the fork() runs the fixup function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either transfers each tasks owned CID to the CPU the task runs on or drops it into the CID pool if a task is not on a CPU at that point in time. Tasks which schedule in before the task walk reaches them do the handover in mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's guaranteed that no task related to that MM owns a CID anymore. Switching back to task mode happens when the user count goes below the threshold which was recorded on the per CPU mode switch: pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2); This threshold is updated when a affinity change increases the number of allowed CPUs for the MM, which might cause a switch back to per task mode. If the switch back was initiated by a exiting task, then that task runs the fixup function. If it was initiated by a affinity change, then it's run either in the deferred update function in context of a workqueue or by a task which forks a new one or by a task which exits. Whatever happens first. mm_cid_fixup_cpus_to_task() walks through the possible CPUs and either transfers the CPU owned CIDs to a related task which runs on the CPU or drops it into the pool. Tasks which schedule in on a CPU which the walk did not cover yet do the handover themselves. This transition from CPU to per task ownership happens in two phases: 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task CID and denotes that the CID is only temporarily owned by the task. When it schedules out the task drops the CID back into the pool if this bit is set. 2) The initiating context walks the per CPU space and after completion clears mm:mm_cid.transit. After that point the CIDs are strictly task owned again. This two phase transition is required to prevent CID space exhaustion during the transition as a direct transfer of ownership would fail if two tasks are scheduled in on the same CPU before the fixup freed per CPU CIDs. When mm_cid_fixup_cpus_to_tasks() completes it's guaranteed that no CID related to that MM is owned by a CPU anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.088189028@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	9a723ed7fa	sched/mmcid: Provide new scheduler CID mechanism The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least some LTTng library depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes it's affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. Implement the context switch parts of a strict ownership mechanism to address this. This removes most of the work from the task which schedules out. Only during transitioning from per CPU to per task ownership it is required to drop the CID when leaving the CPU to prevent CID space exhaustion. Other than that scheduling out is just a single check and branch. The task which schedules in has to check whether: 1) The ownership mode changed 2) The CID is within the optimal CID space In stable situations this results in zero work. The only short disruption is when ownership mode changes or when the associated CID is not in the optimal CID space. The latter only happens when tasks exit and therefore the optimal CID space shrinks. That mechanism is strictly optimized for the common case where no change happens. The only case where it actually causes a temporary one time spike is on mode changes when and only when a lot of tasks related to a MM schedule exactly at the same time and have eventually to compete on allocating a CID from the bitmap. In the sysbench test case which triggered the spinlock contention in the initial CID code, __schedule() drops significantly in perf top on a 128 Core (256 threads) machine when running sysbench with 255 threads, which fits into the task mode limit of 256 together with the parent thread: Upstream rseq/perf branch +CID rework 0.42% 0.37% 0.32% [k] __schedule Increasing the number of threads to 256, which puts the test process into per CPU mode looks about the same. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.023984859@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	23343b6b09	sched/mmcid: Introduce per task/CPU ownership infrastructure The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least librseq depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes its affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. This can be done differently by implementing a strict CID ownership mechanism. Either the CIDs are owned by the tasks or by the CPUs. The latter provides less locality when tasks are heavily migrating, but there is no justification to optimize for overcommit scenarios and thereby penalizing everyone else. Provide the basic infrastructure to implement this: - Change the UNSET marker to BIT(31) from ~0U - Add the ONCPU marker as BIT(30) - Add the TRANSIT marker as BIT(29) That allows to check for ownership trivially and provides a simple check for UNSET as well. The TRANSIT marker is required to prevent CID space exhaustion when switching from per CPU to per task mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251119172549.960252358@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	51dd92c71a	sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex Prepare for the new CID management scheme which puts the CID ownership transition into the fork() and exit() slow path by serializing sched_mm_cid_fork()/exit() with it, so task list and cpu mask walks can be done in interruptible and preemptible code. The contention on it is not worse than on other concurrency controls in the fork()/exit() machinery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.895826703@linutronix.de	2025-11-25 19:45:41 +01:00
Thomas Gleixner	b0c3d51b54	sched/mmcid: Provide precomputed maximal value Reading mm::mm_users and mm:::mm_cid::nr_cpus_allowed every time to compute the maximal CID value is just wasteful as that value is only changing on fork(), exit() and eventually when the affinity changes. So it can be easily precomputed at those points and provided in mm::mm_cid for consumption in the hot path. But there is an issue with using mm::mm_users for accounting because that does not necessarily reflect the number of user space tasks as other kernel code can take temporary references on the MM which skew the picture. Solve that by adding a users counter to struct mm_mm_cid, which is modified by fork() and exit() and used for precomputing under mm_mm_cid::lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.832764634@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	bf070520e3	sched/mmcid: Move initialization out of line It's getting bigger soon, so just move it out of line to the rest of the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.769636491@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	2b1642b881	signal: Move MMCID exit out of sighand lock There is no need anymore to keep this under sighand lock as the current code and the upcoming replacement are not depending on the exit state of a task anymore. That allows to use a mutex in the exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	539115f08c	sched/mmcid: Convert mm CID mask to a bitmap This is truly a bitmap and just conveniently uses a cpumask because the maximum size of the bitmap is nr_cpu_ids. But that prevents to do searches for a zero bit in a limited range, which is helpful to provide an efficient mechanism to consolidate the CID space when the number of users decreases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.642866767@linutronix.de	2025-11-25 19:45:40 +01:00
Thomas Gleixner	35a5c37cb9	cpumask: Cache num_possible_cpus() Reevaluating num_possible_cpus() over and over does not make sense. That becomes a constant after init as cpu_possible_mask is marked ro_after_init. Cache the value during initialization and provide that for consumption. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20251119172549.578653738@linutronix.de	2025-11-25 19:45:40 +01:00
Rafael J. Wysocki	4bf944f3fc	cpuidle: Warn instead of bailing out if target residency check fails It turns out that the change in commit `76934e495c` ("cpuidle: Add sanity check for exit latency and target residency") goes too far because there are systems in the field on which the check introduced by that commit does not pass. For this reason, change __cpuidle_driver_init() return type back to void and make it print a warning when the check mentioned above does not pass. Fixes: `76934e495c` ("cpuidle: Add sanity check for exit latency and target residency") Reported-by: Val Packett <val@packett.cool> Closes: https://lore.kernel.org/linux-pm/20251121010756.6687-1-val@packett.cool/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2808566.mvXUDI8C0e@rafael.j.wysocki	2025-11-25 19:06:38 +01:00
Andy Shevchenko	6d96ceff9a	cpuidle: Update header inclusion While cleaning up some headers, I got a build error on this file: drivers/cpuidle/poll_state.c:52:2: error: call to undeclared library function 'snprintf' with type 'int (char restrict, unsigned long, const char restrict, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] Update header inclusions to follow IWYU (Include What You Use) principle. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124205752.1328701-1-andriy.shevchenko@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:04:29 +01:00
Ulf Hansson	c19dfb267c	Documentation: power/cpuidle: Document the CPU system wakeup latency QoS Let's document how the new CPU system wakeup latency QoS limit can be used from user space, along with how the constraint is taken into account for s2idle and cpuidle. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-7-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:30 +01:00
Ulf Hansson	2b8d594742	cpuidle: Respect the CPU system wakeup QoS limit for cpuidle The CPU system wakeup QoS limit must be respected for the regular cpuidle state selection. Therefore, let's extend the common governor helper cpuidle_governor_latency_req(), to take the constraint into account. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-6-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:29 +01:00
Ulf Hansson	99b42445f4	sched: idle: Respect the CPU system wakeup QoS limit for s2idle A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle, let's start to take into account the QoS limit. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:29 +01:00
Ulf Hansson	e2e4695f01	pmdomain: Respect the CPU system wakeup QoS limit for cpuidle The CPU system wakeup QoS limit must be respected for the regular cpuidle state selection. Therefore, let's extend the genpd governor for CPUs to take the constraint into account when it selects a domain idle state for the corresponding PM domain. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-4-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:29 +01:00
Ulf Hansson	8e7de6dc42	pmdomain: Respect the CPU system wakeup QoS limit for s2idle A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle through genpd, let's extend the corresponding genpd governor for CPUs. More precisely, during s2idle let the genpd governor select a suitable domain idle state, by taking into account the QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-3-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:29 +01:00
Ulf Hansson	a4e6512a79	PM: QoS: Introduce a CPU system wakeup QoS limit Some platforms supports multiple low power states for CPUs that can be used when entering system-wide suspend. Currently we are always selecting the deepest possible state for the CPUs, which can break the system wakeup latency constraint that may be required for a use case. Let's take the first step towards addressing this problem, by introducing an interface for user space, that allows us to specify the CPU system wakeup QoS limit. Subsequent changes will start taking into account the new QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-2-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-25 19:01:29 +01:00
Rafael J. Wysocki	30a8e0a32e	linux-cpupower-6.19-rc1 Adds support for building libcpupower statically when STATIC=true is specified during build. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmkk6sIACgkQCwJExA0N QxyUOxAAvI0losir97aizng6pyo5RM1OEsI10mEhBo80Vbko9d0NOk8kdTiatYgO X4dBIZ/WZLFZ+t63LDWRx0a8QjBi4UjZjv+ZIUcfvwLME7zZXuJyophfJhs1mgVQ c/X0aUCYTalziOn+zDlJWPtQVD86dxGBikE8RGZuj+LgQBNF0frB5aWvV/yUTdTj o6e+vNRQGKKx6TpKdRuS+fk+fBcGIY1y5fVoYGcXA3gqBTJkj2eNQCbpxnUxqlj8 sZgHh24Gb9rPrVQK48C8SIEMQuPcHNJ2C966OI/okOBPMLb54oH0OW9DXv3mZFug +fn13mIfmTixRI6QH3ZHuZLujtEoNmUnmAhbyng2HJM+ei/U3wskRaOrstnWkRFM 42iYF0DzSeqxF8V3YVt81FUMnXttyooCiiq3H1o2JCmua2tkXf7eK1QdVidc7CAj k2DVj0Fu0o0TXjV2v4V+HCDBSqOxC9oNRsxcRn0agckyhRjfXvRIuKRjmK/Jgene 8E4ImRW4H2ZQCCrWROhQuzLTy0CVJ7bGB81YQIO/t8ifdycUS9ok7qJtaXJBEk9e Jn25zgAYXmeu8aYbNFEBVAhcXLimvIbQtHc9B2Qo0RTnenHIxjZbRQhsf6OKp1Fg YyagHyFUntj5+kyAXzJ77zsVfp9Ehi9whGfOarwciN91UZIxZ+8= =RS7N -----END PGP SIGNATURE----- Merge tag 'linux-cpupower-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull a cpupower utility update for 6.19-rc1 from Shuah Khan: "Adds support for building libcpupower statically when STATIC=true is specified during build." * tag 'linux-cpupower-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: tools/power/cpupower: Support building libcpupower statically	2025-11-25 17:11:45 +01:00
Rafael J. Wysocki	8dfa8bb652	OPP updates for 6.19 - Minor improvements to the Rust interface (Tamir Duberstein). - Fixes to scope-based pointers (Viresh Kumar). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmklXvMACgkQ0rkcPK6B EhzYWBAAsX+K/K9hqMS89kgpetKQZiscU10xYjlZcfXufoh/iJtDFpUFizT79Hpf xWRf4azkKIIfJJcdPEYwuzgdz8Smt2cin7o/UG8ycDzjL4+m6dVdD9lNfkjNjqYc Mn3Ypry8qWxaNGLN15GdV6RIGpkYwBAVbIOAdMu565qN1FMVikQLovXp8dKOsDlz CRgTSk8bYFOli5xbgVEYgcy0GgufKlS1IY0gyrWOKh4gzJOpx/kwTCSCVv3U8tUD DLOGFYg66+qlIlPk5VjFCtBuBcVRfqze62ezgL3lstrhfuWEi0zHt0kwyG2Fp31r KAOM812fGVj2Y+CRYdUIRsntSC4VyEywP4CH90if0ONXbRjWr3YQBr1e2mK0obiJ p4Axh8r71/6E1ZKPajxE1RJqpCEWA+cFc4zJ2QABSaXkQ576gV9OLO08nUeN3sb4 V/Z0ZzpCxNlG3q7ZpVGqCyGRp/S5L802vMtKITdqsiOyqH3p+d9oG5tRe131pXPQ KWPS/F/LBeXKfpIlasaFfcqhBUQXphm91kPEDn+X4bD67GCpA1mue6F+NFbhmX6I zVR8ioswk8bSYnX50zT4kAHHO0qEgBGsUHQSCZiHTjbFIWwyCsf6w9JLLeNtkHzT kQI537YjStualv2SnY2ujVUdg0horMMJxwatUTv7syHKFlFeiHI= =aAUV -----END PGP SIGNATURE----- Merge tag 'opp-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull OPP updates for 6.19 from Viresh Kumar: "- Minor improvements to the Rust interface (Tamir Duberstein). - Fixes to scope-based pointers (Viresh Kumar)." * tag 'opp-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: rust: opp: simplify callers of `to_c_str_array` OPP: Initialize scope-based pointers inline rust: opp: fix broken rustdoc link	2025-11-25 17:08:06 +01:00
Rafael J. Wysocki	ded4feb14d	CPUFreq updates for 6.19 - tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling). - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmklXiwACgkQ0rkcPK6B Ehw4zw//Qf6p1HXJME/o6AhPA2B+obbqBOxqPN8xw1yvnr3MIn1q7wCEv7M/B1zP 3wdW/Tchg6kgwsNJ0IKduHz5Dhx06xH6PzF9YXblS46nehMtErjqrF2K/BN/1faU cLrAcs31dlK9Nv7qCxaXvwl6G6bsN52V4qbdbssONbN2IRtILcs2KiU1eb8WUIMv yjll91trQoQ3eSufKV73FbKzmIwLynYvErwko20WyJODlOO5TgltkRSICNbaDbJh G95eJnMIeXjOT4rf4BX7mMx45GzZwKWVfF/AZBQNTFBVRw+g7vDiuC8SbEY1VMws KxkgWlTdJfv5Hh2UEP3XYodhT6WJehiPdO5RAgLjSkfSw5XEbNTnAvAnhhDqWw7J Tk122tgw/2SRh7bJQoetL4dggKKh9aT9vaRFklrUukmDjzs3GdZKw0xXakc8GjMY dHXgiOpqU3eGY2YGiU5DZnTxS/vDga/D2+jGcX/4C+vq5t8LJ6MiyUP7vuizPCNq eZ0DyFzQiuOe3Kgq2cXRE3wMz0/wNU7JmTdMIuTtpjJm5cewE+7QqS0ksA5zQ35r fj0WrIaqVXOpxoXsOAYB4z8JdgB0P/CH1A2tYfQ5go18sSHUOJsoQhTOT8JMxEx5 4ZsA0WgfzSyALExjjNy6Y06XghpGz4ZUOSeQR6Zzyb88FBZcZ8U= =Nkvn -----END PGP SIGNATURE----- Merge tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull CPUFreq updates for 6.19 from Viresh Kumar: "- tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling). - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)." * tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM cpufreq: CPPC: Don't warn if FIE init fails to read counters cpufreq: nforce2: fix reference count leak in nforce2 cpufreq: tegra186: add OPP support and set bandwidth cpufreq: dt-platdev: Add JH7110S SOC to the allowlist cpufreq: s5pv210: fix refcount leak	2025-11-25 17:06:04 +01:00
George Moussalem	8d6f8d5c58	dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatible The tsens IP found in the IPQ5018 SoC should not use qcom,tsens-v1 as fallback since it has no RPM and, as such, must deviate from the standard v1 init routine as this version of tsens needs to be explicitly reset and enabled in the driver. So let's make qcom,ipq5018-tsens a standalone compatible in the bindings. Fixes: `77c6d28192` ("dt-bindings: thermal: qcom-tsens: Add ipq5018 compatible") Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: George Moussalem <george.moussalem@outlook.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250818-ipq5018-tsens-fix-v1-1-0f08cf09182d@outlook.com	2025-11-25 16:07:57 +01:00
Heiko Carstens	509c34924d	s390/vdso: Get rid of -m64 flag handling The compiler/assembler flag -m64 is added and removed at two locations. This pointless exercise is a leftover to keep the 31 and 64 bit vdso Makefiles as symmetrical as possible. Given that the 31 bit vdso code does not exist anymore, remove the -m64 flag handling. Suggested-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:08 +01:00
Heiko Carstens	c0087d807a	s390/vdso: Rename vdso64 to vdso Since compat is gone there is only a 64 bit vdso left. Remove the superfluous "64" suffix everywhere. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Heiko Carstens	b3bdfdf1f9	s390: Rename head64.S to head.S All the code is 64 bit, therefore remove the superfluous suffix. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Jens Remus	5e811b922e	s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros This simplifies the vDSO linker script. The ELF_DETAILS macro was not used in addition, as done on arm64 and powerpc, as that would introduce an empty .modinfo section. Note that this rearranges the .comment section to follow after all of the debug sections. Signed-off-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-25 15:28:07 +01:00
Zenon Xiu	4b7a59fa70	Documentation/arm64: Fix the typo of register names The register name 'HWFGWTR_EL2' and 'HWFGRTR_EL2' is wrong, should be 'HFGWTR_EL2' and 'HFGRTR_EL2'. Find the register description on arm website here, https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGWTR-EL2--Hypervisor-Fine-Grained-Write-Trap-Register https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGRTR-EL2--Hypervisor-Fine-Grained-Read-Trap-Register?lang=en Signed-off-by: Zenon Xiu <zenonxiu@outlook.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-25 12:26:31 +00:00
Marc Zyngier	155f8d4ef0	ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() Since `0f67b56d84` ("clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver"), acpi_arch_timer_mem_init() is unused. Remove it. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-25 11:55:13 +00:00
Malaya Kumar Rout	16e802667e	tools/thermal/thermal-engine: Fix format string bug in thermal-engine The error message in the daemon() failure path uses %p format specifier without providing a corresponding pointer argument, resulting in undefined behavior and printing garbage values. Replace %p with %m to properly print the errno error message, which is the intended behavior when daemon() fails. This fix ensures proper error reporting when daemonization fails. Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251124104401.374856-1-mrout@redhat.com	2025-11-25 11:00:28 +01:00
Randy Dunlap	73029e73cc	x86/cc: Fix enum spelling to fix kernel-doc warnings Make the enum name in kernel-doc match the code to prevent kernel-doc warnings: Warning: include/linux/cc_platform.h:106 Enum value 'CC_ATTR_GUEST_SEV_SNP' not described in enum 'cc_attr' Warning: include/linux/cc_platform.h:106 Excess enum value '%CC_ATTR_SEV_SNP' description in 'cc_attr' Fixes: `f742b90e61` ("x86/mm: Extend cc_attr to include AMD SEV-SNP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251125022730.3163679-1-rdunlap@infradead.org	2025-11-25 09:17:13 +01:00
Malaya Kumar Rout	8974573ba4	ACPI: tools: pfrut: fix memory leak and resource leak in pfrut.c Static analysis found an issue in pfrut.c cppcheck output before this patch: tools/power/acpi/tools/pfrut/pfrut.c:225:3: error: Resource leak: fd_update [resourceLeak] tools/power/acpi/tools/pfrut/pfrut.c:269:3: error: Resource leak: fd_update [resourceLeak] tools/power/acpi/tools/pfrut/pfrut.c:269:3: error: Resource leak: fd_update_log [resourceLeak] tools/power/acpi/tools/pfrut/pfrut.c:365:4: error: Memory leak: addr_map_capsule [memleak] tools/power/acpi/tools/pfrut/pfrut.c:424:4: error: Memory leak: log_buf [memleak] cppcheck output after this patch: No resource leaks found Fix by closing file descriptors and freeing allocated memory. Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Link: https://patch.msgid.link/20251120170001.251968-1-mrout@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-24 20:50:15 +01:00
David Laight	c964081d60	ACPI: property: use min() instead of min_t() min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'. Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long' and so cannot discard significant bits. In this case the 'unsigned long' value is small enough that the result is ok. Detected by an extra check added to min_t(). Signed-off-by: David Laight <david.laight.linux@gmail.com> [ rjw: Subject adjustment ] Link: https://patch.msgid.link/20251119224140.8616-14-david.laight.linux@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-24 20:45:38 +01:00
Rafael J. Wysocki	15bfdadd61	cpuidle: governors: teo: Add missing space to the description There is a missing space in the governor description comment, so add it. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5059034.31r3eYUQgx@rafael.j.wysocki	2025-11-24 20:43:32 +01:00
Rafael J. Wysocki	c03aef8833	PM: hibernate: Extra cleanup of comments in swap handling code Continue recent cleanups of comments in the swap handling code. Unify the use of white space in the comments, drop some unuseful comments outside function bodies, and move some other comments into function bodies. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5943864.DvuYhMxLoT@rafael.j.wysocki	2025-11-24 20:41:06 +01:00
Nikolay Borisov	69acbdbbef	RAS/AMD/ATL: Replace bitwise_xor_bits() with hweight16() Doing hweight16() and checking whether the lsb is set is functionally equivalent to what bitwise_xor_bits() does. In addition, it results in better generated code as before gcc would inline the function 4 times. With hweight16(), the resulting code boils down to 2 instructions - POPCNT and AND, and all relevant CPUs support POPCNT. An alternative would have been to use the __builtin_parity() function provided by both Clang/GCC, however under some circumstances the compiler can choose not to inline it but generate a library call which is unsupported in the kernel. No functional changes. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251124142517.1708451-1-nik.borisov@suse.com	2025-11-24 17:00:37 +01:00
James Clark	e6a27290d8	perf: arm_spe: Add support for filtering on data source SPE_FEAT_FDS adds the ability to filter on the data source of packets. Like the other existing filters, enable filtering with PMSFCR_EL1.FDS when any of the filter bits are set. Each bit position of the 64 bit filter maps to numerical data sources 0-63 described by bits[0:5] in the data source packet (although the full range of data source is 16 bits so higher value data sources can't be filtered on). The filter is an OR of all the filter bits, so for example clearing filter bits 0 and 3 only includes packets from data sources 0 OR 3. Invert the filter given by userspace so that the default value of 0 is equivalent to including all values (no filtering). This allows us to skip adding a new format bit to enable filtering and still support excluding all data sources which would have been a filter value of 0 if not for the inversion. Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:59:18 +00:00
James Clark	cbbfba4847	perf: Add perf_event_attr::config4 Arm FEAT_SPE_FDS adds the ability to filter on the data source of a packet using another 64-bits of event filtering control. As the existing perf_event_attr::configN fields are all used up for SPE PMU, an additional field is needed. Add a new 'config4' field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:59:18 +00:00
Joakim Zhang	11abb4e87b	perf/imx_ddr: Add support for PMU in DB (system interconnects) There is a PMU in DB, which has the same function with PMU in DDR subsystem, the difference is PMU in DB only supports cycles, axid-read, axid-write events. e.g. perf stat -a -e imx8_db0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd perf stat -a -e imx8_db0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_port=0xPP,axi_channel=0xH/ cmd Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	037e8cf671	perf/imx_ddr: Get and enable optional clks Get and enable optional clks because fsl,imx8dxl-db-pmu have two clocks. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	66db99ffdf	perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() Move ida_alloc() from helper ddr_perf_init() into ddr_perf_probe() to clarify why ida_free() must be called at the error path. Add return value check for ida_alloc(). Rename label 'cpuhp_state_err' to 'idr_free' to make the code clearer, since two error paths now jump to this label. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Frank Li	de8209e554	dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL Add compatible string fsl,imx8qm-ddr-pmu, fsl,imx8qxp-ddr-pmu, which fallback to fsl,imx8-ddr-pmu and fsl,imx8dxl-db-pmu (for data bus fabric). Add clocks, clock-names for fsl,imx8dxl-db-pmu and keep the same restriction for existing compatible strings. Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-24 15:39:05 +00:00
Heiko Carstens	f5730d44e0	s390: Add stackprotector support Stackprotector support was previously unavailable on s390 because by default compilers generate code which is not suitable for the kernel: the canary value is accessed via thread local storage, where the address of thread local storage is within access registers 0 and 1. Using those registers also for the kernel would come with a significant performance impact and more complicated kernel entry/exit code, since access registers contents would have to be exchanged on every kernel entry and exit. With the upcoming gcc 16 release new compiler options will become available which allow to generate code suitable for the kernel. [1] Compiler option -mstack-protector-guard=global instructs gcc to generate stackprotector code that refers to a global stackprotector canary value via symbol __stack_chk_guard. Access to this value is guaranteed to occur via larl and lgrl instructions. Furthermore, compiler option -mstack-protector-guard-record generates a section containing all code addresses that reference the canary value. To allow for per task canary values the instructions which load the address of __stack_chk_guard are patched so they access a lowcore field instead: a per task canary value is available within the task_struct of each task, and is written to the per-cpu lowcore location on each context switch. Also add sanity checks and debugging option to be consistent with other kernel code patching mechanisms. Full debugging output can be enabled with the following kernel command line options: debug_stackprotector bootdebug ignore_loglevel earlyprintk dyndbg="file stackprotector.c +p" Example debug output: stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240 where "<insn address>: <old insn> -> <new insn>". [1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	1d7764cfe3	s390/modules: Simplify module_finalize() slightly Preinitialize the return value, and break out the for loop in module_finalize() in case of an error to get rid of an ifdef. This makes it easier to add additional code, which may also depend on config options. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	c3d17464f0	s390: Remove KMSG_COMPONENT macro The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" which never made it upstream. Remove the macro in order to get rid of a pointless indirection. Replace all users with the string it defines. In almost all cases this leads to a simple replacement like this: - #define KMSG_COMPONENT "appldata" - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + #define pr_fmt(fmt) "appldata: " fmt Except for some special cases this is just mechanical/scripted work. Acked-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:21 +01:00
Heiko Carstens	e950d1f84d	s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is no need for the weak per cpu workaround for modules anymore. Remove it. [1] commit `c98d2ecae0` ("s390/mm: Uncouple physical vs virtual address spaces") Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:45:20 +01:00
Heiko Carstens	f555d885bf	Merge branch 'ap-driver-override' into features Harald Freudenberger says: ==================== Support for driver override on AP queues. Add a new sysfs attribute driver_override the AP queue's directory. Writing in a string overrides the default driver determination and the drivers are matched against this string instead. This overrules the driver binding determined by the apmask/aqmask bitmask fields. With the write to the attribute a check is done if the queue is in use by an mdev device. If this is true, the write is aborted and EBUSY is returned. As there exists some tooling for this kind of driver_override (see package driverctl) the AP bus behavior for re-binding should be compatible to this. The steps for a driver_override are: 1) unbind the current driver from the device. For example echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind 2) set the new driver for this device in the sysfs driver_override attribute. For example echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override 3) trigger a bus reprobe of this device. For example echo "17.0005" > /sys/bus/ap/drivers_probe With the driverctl package this is more comfortable and the settings get persisted: driverctl -b ap set-override 17.0005 vfio_ap and unset with driverctl -b ap unset-override 17.0005 ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:19 +01:00
Harald Freudenberger	46030379f1	s390/ap: Restrict driver_override versus apmask and aqmask use Introduce a restriction for the driver_override feature versus apmask and aqmask: - driver_override is only allowed when the apmask and aqmask values both are default (=0xffff..ffff). - apmask and aqmask modifications are only allowed when there is no driver_override on any AP device active. So in the end the user is restricted to choose to either use apmask/apmask to divide the AP devices into host owned and vfio owned or use the driver_override feature but not mix these two approaches. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:06 +01:00
Harald Freudenberger	8babcc2b6a	s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex The mutex ap_perms_mutex was already used not only for protection of the struct ap_perms ap_perms variable but also for an consistent update of the AP bus sysfs attributes apmask and aqmask. So rename this mutex to ap_attr_mutex which better reflects the current use. This is also a preparation for an upcoming patch which will use this mutex to lock updates on a new sysfs attribute. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:06 +01:00
Harald Freudenberger	d38a87d7c0	s390/ap: Support driver_override for AP queue devices Add a new sysfs attribute driver_override the AP queue's directory. Writing in a string overrides the default driver determination and the drivers are matched against this string instead. This overrules the driver binding determined by the apmask/aqmask bitmask fields. According to the common understanding of how the driver_override behavior shall work, there is no further checking done. Neither about the string which is given as override driver nor if this device is currently in use by an mdev device. Another patch may limit this behavior to refuse a mixed usage of the driver_override and apmask/aqmask feature. As there exists some tooling for this kind of driver_override (see package driverctl) the AP bus behavior for re-binding should be compatible to this. The steps for a driver_override are: 1) unbind the current driver from the device. For example echo "17.0005" > /sys/devices/ap/card17/17.0005/driver/unbind 2) set the new driver for this device in the sysfs driver_override attribute. For example echo "vfio_ap" > /sys//devices/ap/card17/17.0005/driver_override 3) trigger a bus reprobe of this device. For example echo "17.0005" > /sys/bus/ap/drivers_probe With the driverctl package this is more comfortable and the settings get persisted: driverctl -b ap set-override 17.0005 vfio_ap and unset with driverctl -b ap unset-override 17.0005 Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:05 +01:00
Harald Freudenberger	6917f434fd	s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks For the in_use() check of an updated apmask the host's aqmask was provided to the vfio function. Similar on an update of the aqmask the host's apmask was provided to the vfio in_use() function. This led to false results on the check for apmask or aqmask updates. For example with only one APQN when exactly this card is tried to be re-assigned back to the host, the in_use() check did not complain. The correct behavior is achieved with providing a full mask for aqmask when an adapter is to be checked and similar a full mask for aqmask when a domain is to be checked for usage. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-24 11:43:05 +01:00
Geert Uytterhoeven	aaf4e92341	m68k: defconfig: Update defconfigs for v6.18-rc1 - Drop CONFIG_SCTP_COOKIE_HMAC_SHA1=y (removed in commit `2f3dd6ec90` ("sctp: Convert cookie authentication to use HMAC-SHA256")), - Drop CONFIG_BATMAN_ADV_NC=y (removed in commit `87b95082db` ("batman-adv: remove network coding support")), - Enable modular build of the SHA-1 secure hash algorithm (no longer auto-enabled since commit `2f3dd6ec90` ("sctp: Convert cookie authentication to use HMAC-SHA256")). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://patch.msgid.link/65e00bcb7b2980278bb087986ee405627aa32d8b.1760360254.git.geert@linux-m68k.org	2025-11-24 11:03:50 +01:00
Thorsten Blum	dc30fe7a0a	PM / devfreq: tegra30: use min to simplify actmon_cpu_to_emc_rate Use min() to improve the readability of actmon_cpu_to_emc_rate() and remove any unnecessary curly braces. Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20251112172121.3741-2-thorsten.blum@linux.dev/	2025-11-24 00:02:07 +09:00
Pengjie Zhang	26dd44a400	PM / devfreq: hisi: Fix potential UAF in OPP handling Ensure all required data is acquired before calling dev_pm_opp_put(opp) to maintain correct resource acquisition and release order. Fixes: `7da2fdaaa1` ("PM / devfreq: Add HiSilicon uncore frequency scaling driver") Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20250915062135.748653-1-zhangpengjie2@huawei.com/	2025-11-24 00:02:07 +09:00
Dmitry Baryshkov	447c4e8338	PM / devfreq: Move governor.h to a public header location Some device drivers (and out-of-tree modules) might want to define device-specific device governors. Rather than restricting all of them to be a part of drivers/devfreq/ (which is not possible for out-of-tree drivers anyway) move governor.h to include/linux/devfreq-governor.h and update all drivers to use it. The devfreq_cpu_data is only used internally, by the passive governor, so it is moved to the driver source rather than being a part of the public interface. Reported-by: Robie Basak <robibasa@qti.qualcomm.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20251030-governor-public-v2-1-432a11a9975a@oss.qualcomm.com/	2025-11-24 00:02:01 +09:00
Yue Haibing	e6a11a526e	x86/{boot,mtrr}: Remove unused function declarations Commits `28be1b454c` ("x86/boot: Remove unused copy_*_gs() functions") `34d2819f20` ("x86, mtrr: Remove unused mtrr/state.c") removed the functions but left the prototypes. Remove them. [ bp: Merge into a single patch. ] Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251120121037.1479334-1-yuehaibing@huawei.com	2025-11-22 21:26:36 +01:00
Lorenzo Pieralisi	9c1fbc56ca	irqchip/gic-its: Rework platform MSI deviceID detection Current code retrieving platform devices MSI devID in the GIC ITS MSI parent helpers suffers from some minor issues: - It leaks a struct device_node reference - It is duplicated between GICv3 and GICv5 for no good reason - It does not use the OF phandle iterator code that simplifies the msi-parent property parsing Consolidate GIC v3 and v5 deviceID retrieval in a function that addresses the full set of issues in one go by merging GIC v3 and v5 code and converting the msi-parent parsing loop to the more modern OF phandle iterator API, fixing the struct device_node reference leak in the process. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20251021124103.198419-6-lpieralisi@kernel.org	2025-11-22 17:09:03 +01:00
Lorenzo Pieralisi	4f32612f6a	PCI: iproc: Implement MSI controller node detection with of_msi_xlate() The functionality implemented in the iproc driver in order to detect an OF MSI controller node is now fully implemented in of_msi_xlate(). Replace the current msi-map/msi-parent parsing code with of_msi_xlate(). Since of_msi_xlate() is also a deviceID mapping API, pass in a fictitious 0 as deviceID - the driver only requires detecting the OF MSI controller node not the deviceID mapping per-se (of_msi_xlate() return value is ignored for the same reason). Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251021124103.198419-5-lpieralisi@kernel.org	2025-11-22 17:09:03 +01:00
Thomas Gleixner	ebb922c920	Linux 6.18-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmj+p+UeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGKsIH/1EFGYZDVJ7pTOcO qJY/xfu5YNd4ezZTGMW5SgJK+lAdJwkmbu8PUlcOhXKRVvACG9Tud/+pZzw966C5 pk9pF9vpCXq2Zz6dk3/XGFARUPUlDA4uJ/jiPTNVA8yy+V18u+Ds55Y+rhv9MkcW n/Fi+fiYfjqAaqP328mWH9z51ibRqH3WQfqVdjzClzoSC31BuJUVEZi9s5FZ7C9Q OCvRLp8WvTpcQ7ab7WH/wCgznXEKyRM/OxaNtXWztod9GLqOmWoFiHUxWfEQ/gg+ KzgbgQOeXI6q7U8xJZ/711ZFzGLR9VBEPN0HnqxRNr8fCpzJ9FKFGTFD2HcBgUjy F9JH3nk= =YBEg -----END PGP SIGNATURE----- Merge tag 'v6.18-rc3' into irq/msi Pick up OF changes to resolve dependencies	2025-11-22 17:07:57 +01:00
Babu Moger	ac7de456a3	fs/resctrl: Update bit_usage to reflect io_alloc The "shareable_bits" and "bit_usage" resctrl files associated with cache resources give insight into how instances of a cache is used. Update the annotated capacity bitmasks displayed by "bit_usage" to include the cache portions allocated for I/O via the "io_alloc" feature. "shareable_bits" is a global bitmask of shareable cache with I/O and can thus not present the per-domain I/O allocations possible with the "io_alloc" feature. Revise the "shareable_bits" documentation to direct users to "bit_usage" for accurate cache usage information. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/e02a0d424129fd7f3e45822a559b1c614ae4652a.1762995456.git.babu.moger@amd.com	2025-11-22 14:30:34 +01:00
Babu Moger	28fa2cce7a	fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks The io_alloc feature in resctrl enables system software to configure the portion of the cache allocated for I/O traffic. When supported, the io_alloc_cbm file in resctrl provides access to capacity bitmasks (CBMs) allocated for I/O devices. Enable users to modify io_alloc CBMs by writing to the io_alloc_cbm resctrl file when the io_alloc feature is enabled. Mirror the CBMs between CDP_CODE and CDP_DATA when CDP is enabled to present consistent I/O allocation information to user space. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/67609641b03ccfba18a8ee0bf9dbd1f3dcbecda3.1762995456.git.babu.moger@amd.com	2025-11-22 14:28:31 +01:00
Babu Moger	af1242eeca	fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID parse_cbm() requires resource group mode and CLOSID to validate the capacity bitmask (CBM). It is passed via struct rdtgroup in struct rdt_parse_data. The io_alloc feature also uses CBMs to indicate which portions of cache are allocated for I/O traffic. The CBMs are provided by user space and need to be validated the same as CBMs provided for general (CPU) cache allocation. parse_cbm() cannot be used as-is since io_alloc does not have rdtgroup context. Pass the resource group mode and CLOSID directly to parse_cbm() via struct rdt_parse_data, instead of through the rdtgroup struct, to facilitate calling parse_cbm() to verify the CBM of the io_alloc feature. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/f8ec6ab5cf594d906a3fe75f56793d5fbd63f38f.1762995456.git.babu.moger@amd.com	2025-11-22 13:10:12 +01:00
Babu Moger	77b6623262	fs/resctrl: Introduce interface to display io_alloc CBMs Introduce the "io_alloc_cbm" resctrl file to display the capacity bitmasks (CBMs) that represent the portions of each cache instance allocated for I/O traffic on a cache resource that supports the "io_alloc" feature. io_alloc_cbm resides in the info directory of a cache resource, for example, /sys/fs/resctrl/info/L3/. Since the resource name is part of the path, it is not necessary to display the resource name as done in the schemata file. When CDP is enabled, io_alloc routes traffic using the highest CLOSID associated with the CDP_CODE resource and that CLOSID becomes unusable for the CDP_DATA resource. The highest CLOSID of CDP_CODE and CDP_DATA resources will be kept in sync to ensure consistent user interface. In preparation for this, access the CBMs for I/O traffic through highest CLOSID of either CDP_CODE or CDP_DATA resource. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/55a3ff66a70e7ce8239f022e62b334e9d64af604.1762995456.git.babu.moger@amd.com	2025-11-22 11:37:21 +01:00
Frederic Weisbecker	3de5e46e50	genirq: Remove cpumask availability check on kthread affinity setting Failing to allocate the affinity mask of an interrupt descriptor fails the whole descriptor initialization. It is then guaranteed that the cpumask is always available whenever the related interrupt objects are alive, such as the kthread handler. Therefore remove the superfluous check since it is merely a historical leftover. Get rid also of the comments above it that are obsolete and useless. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-4-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Frederic Weisbecker	801afdfbfc	genirq: Fix interrupt threads affinity vs. cpuset isolated partitions When a cpuset isolated partition is created / updated or destroyed, the interrupt threads are affined blindly to all the non-isolated CPUs. This happens without taking into account the interrupt threads initial affinity that becomes ignored. For example in a system with 8 CPUs, if an interrupt and its kthread are initially affine to CPU 5, creating an isolated partition with only CPU 2 inside will eventually end up affining the interrupt kthread to all CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for CPU 5. Besides the blind re-affining, this doesn't take care of the actual low level interrupt which isn't migrated. As of today the only way to isolate non managed interrupts, along with their kthreads, is to overwrite their affinity separately, for example through /proc/irq/ To avoid doing that manually, future development should focus on updating the interrupt's affinity whenever cpuset isolated partitions are updated. In the meantime, cpuset shouldn't fiddle with interrupt threads directly. To prevent from that, set the PF_NO_SETAFFINITY flag to them. This is done through kthread_bind_mask() by affining them initially to all possible CPUs as at that point the interrupt is not started up which means the affinity of the hard interrupt is not known. The thread will adjust that once it reaches the handler, which is guaranteed to happen after the initial affinity of the hard interrupt is established. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-3-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Frederic Weisbecker	68775ca79a	genirq: Prevent early spurious wake-ups of interrupt threads During initialization, the interrupt thread is created before the interrupt is enabled. The interrupt enablement happens before the actual kthread wake up point. Once the interrupt is enabled the hardware can raise an interrupt and once setup_irq() drops the descriptor lock a interrupt wake-up can happen. Even when such an interrupt can be considered premature, this is not a problem in general because at the point where the descriptor lock is dropped and the wakeup can happen, the data which is used by the thread is fully initialized. Though from the perspective of least surprise, the initial wakeup really should be performed by the setup code and not randomly by a premature interrupt. Prevent this by performing a wake-up only if the target is in state TASK_INTERRUPTIBLE, which the thread uses in wait_for_interrupt(). If the thread is still in state TASK_UNINTERRUPTIBLE, the wake-up is not lost because after the setup code completed the initial wake-up the thread will observe the IRQTF_RUNTHREAD and proceed with the handling. [ tglx: Simplified the changes and extended the changelog. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251121143500.42111-2-frederic@kernel.org	2025-11-22 09:26:18 +01:00
Babu Moger	9445c7059c	fs/resctrl: Add user interface to enable/disable io_alloc feature AMD's SDCIAE forces all SDCI lines to be placed into the L3 cache portions identified by the highest-supported L3_MASK_n register, where n is the maximum supported CLOSID. To support this, when io_alloc resctrl feature is enabled, reserve the highest CLOSID exclusively for I/O allocation traffic making it no longer available for general CPU cache allocation. Introduce user interface to enable/disable io_alloc feature and encourage users to enable io_alloc only when running workloads that can benefit from this functionality. On enable, initialize the io_alloc CLOSID with all usable CBMs across all the domains. Since CLOSIDs are managed by resctrl fs, it is least invasive to make "io_alloc is supported by maximum supported CLOSID" part of the initial resctrl fs support for io_alloc. Take care to minimally (only in error messages) expose this use of CLOSID for io_alloc to user space so that this is not required from other architectures that may support io_alloc differently in the future. When resctrl is mounted with "-o cdp" to enable code/data prioritization, there are two L3 resources that can support I/O allocation: L3CODE and L3DATA. From resctrl fs perspective the two resources share a CLOSID and the architecture's available CLOSID are halved to support this. The architecture's underlying CLOSID used by SDCIAE when CDP is enabled is the CLOSID associated with the CDP_CODE resource, but from resctrl's perspective there is only one CLOSID for both CDP_CODE and CDP_DATA. CDP_DATA is thus not usable for general (CPU) cache allocation nor I/O allocation. Keep the CDP_CODE and CDP_DATA I/O alloc status in sync to avoid any confusion to user space. That is, enabling io_alloc on CDP_CODE does so on CDP_DATA and vice-versa, and keep the I/O allocation CBMs of CDP_CODE and CDP_DATA in sync. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c7d3037795e653e22b02d8fc73ca80d9b075031c.1762995456.git.babu.moger@amd.com	2025-11-21 23:01:54 +01:00
Babu Moger	48068e5650	fs/resctrl: Introduce interface to display "io_alloc" support Introduce the "io_alloc" resctrl file to the "info" area of a cache resource, for example /sys/fs/resctrl/info/L3/io_alloc. "io_alloc" indicates support for the "io_alloc" feature that allows direct insertion of data from I/O devices into the cache. Restrict exposing support for "io_alloc" to the L3 resource that is the only resource where this feature can be backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). With that, the "io_alloc" file is only visible to user space if the L3 resource supports "io_alloc". Doing so makes the file visible for all cache resources though, for example also L2 cache (if it supports cache allocation). As a consequence, add capability for file to report expected "enabled" and "disabled", as well as "not supported". Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/e8b116a8f424128b227734bb1d433c14af478d90.1762995456.git.babu.moger@amd.com	2025-11-21 22:49:42 +01:00
Babu Moger	556d2892aa	x86,fs/resctrl: Implement "io_alloc" enable/disable handlers "io_alloc" is the generic name of the new resctrl feature that enables system software to configure the portion of cache allocated for I/O traffic. On AMD systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). Introduce the architecture-specific functions that resctrl fs should call to enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state by setting (to enable) or clearing (to disable) bit 1 of MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com	2025-11-21 22:35:22 +01:00
Babu Moger	7923ae7698	x86,fs/resctrl: Detect io_alloc feature AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software to control the portions of L3 cache used for direct insertion of data from I/O devices into the L3 cache. Introduce a generic resctrl cache resource property "io_alloc_capable" as the first part of the new "io_alloc" resctrl feature that will support AMD's SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if a portion of the cache can be allocated for I/O traffic. Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD) systems that support SDCIAE. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com	2025-11-21 22:04:59 +01:00
Babu Moger	4d4840b125	x86/resctrl: Add SDCIAE feature in the command line options Add a kernel command-line parameter to enable or disable the exposure of the L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE) hardware feature to resctrl. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c623edf7cb369ba9da966de47d9f1b666778a40e.1762995456.git.babu.moger@amd.com	2025-11-21 22:03:23 +01:00
Babu Moger	3767def18f	x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement Smart Data Cache Injection (SDCI) is a mechanism that enables direct insertion of data from I/O devices into the L3 cache. By directly caching data from I/O devices rather than first storing the I/O data in DRAM, SDCI reduces demands on DRAM bandwidth and reduces latency to the processor consuming the I/O data. The SDCIAE (SDCI Allocation Enforcement) PQE feature allows system software to control the portion of the L3 cache used for SDCI. When enabled, SDCIAE forces all SDCI lines to be placed into the L3 cache partitions identified by the highest-supported L3_MASK_n register, where n is the maximum supported CLOSID. Add CPUID feature bit that can be used to configure SDCIAE. The SDCIAE feature details are documented in: AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.4.7 L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). available at https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/83ca10d981c48e86df2c3ad9658bb3ba3544c763.1762995456.git.babu.moger@amd.com	2025-11-21 22:03:07 +01:00
Kuppuswamy Sathyanarayanan	748d6ba43a	powercap: intel_rapl: Enable MSR-based RAPL PMU support Currently, RAPL PMU support requires adding CPU model entries to arch/x86/events/rapl.c for each new generation. However, RAPL MSRs are not architectural and require platform-specific customization, making arch/x86 an inappropriate location for this functionality. The powercap subsystem already handles RAPL functionality and is the natural place to consolidate all RAPL features. The powercap RAPL driver already includes PMU support for TPMI-based RAPL interfaces, making it straightforward to extend this support to MSR-based RAPL interfaces as well. This consolidation eliminates the need to maintain RAPL support in multiple subsystems and provides a unified approach for both TPMI and MSR-based RAPL implementations. The MSR-based PMU support includes the following updates: 1. Register MSR-based PMU support for the supported platforms and unregister it when no online CPUs remain in the package. 2. Remove existing checks that restrict RAPL PMU support to TPMI-based interfaces and extend the logic to allow MSR-based RAPL interfaces. 3. Define a CPU model list to determine which processors should register RAPL PMU interface through the powercap driver for MSR-based RAPL, excluding those that support TPMI interface. This list prevents conflicts with existing arch/x86 PMU code that already registers RAPL PMU for some processors. Add Panther Lake & Wildcat Lake to the CPU models list. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251121000539.386069-3-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-21 21:47:08 +01:00
Kuppuswamy Sathyanarayanan	1d6c915819	powercap: intel_rapl: Prepare read_raw() interface for atomic-context callers The current read_raw() implementation of the TPMI, MMIO and MSR interfaces does not distinguish between atomic and non-atomic callers. rapl_msr_read_raw() uses rdmsrq_safe_on_cpu(), which can sleep and issue cross CPU calls. When MSR-based RAPL PMU support is enabled, PMU event handlers can invoke this function from atomic context where sleeping or rescheduling is not allowed. In atomic context, the caller is already executing on the target CPU, so a direct rdmsrq() is sufficient. To support such usage, introduce an atomic flag to the read_raw() interface to allow callers pass the context information. Modify the common RAPL code to propagate this flag, and set the flag to reflect the calling contexts. Utilize the atomic flag in rapl_msr_read_raw() to perform direct MSR read with rdmsrq() when running in atomic context, and a sanity check to ensure target CPU matches the current CPU for such use cases. The TPMI and MMIO implementations do not require special atomic handling, so the flag is ignored in those paths. This is a preparatory patch for adding MSR-based RAPL PMU support. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20251121000539.386069-2-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-21 21:47:08 +01:00
Smita Koralahalli	5c4663ed1e	x86/mce: Handle AMD threshold interrupt storms Extend the logic of handling CMCI storms to AMD threshold interrupts. Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on a storm. Rather, disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors (30, as programmed by Intel). Turning off the threshold interrupts would be a better solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled. [ Tony: Small tweak because mce_handle_storm() isn't a pointer now ] [ Yazen: Rebase and simplify ] [ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix checkpatch warnings. ] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com	2025-11-21 20:41:10 +01:00
Avadhut Naik	d7ac083f09	x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems Currently, when a CMCI storm detected on a Machine Check bank, subsides, the bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared unconditionally by cmci_storm_end(). On AMD SMCA systems, this essentially disables polling on that particular bank on that CPU. Consequently, any subsequent correctable errors or storms will not be logged. Since AMD SMCA systems allow banks to be managed by both polling and interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should not be modified when a storm subsides. Fixes: `7eae17c4ad` ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com	2025-11-21 20:33:12 +01:00
Ma Ke	ef1b6d9049	EDAC/igen6: Fix error handling in igen6_edac driver The igen6_edac driver calls device_initialize() for all memory controllers in igen6_register_mci(), but misses corresponding put_device() calls in error paths and during normal shutdown in igen6_unregister_mcis(). Adding the missing put_device() calls improves code readability and ensures proper reference counting for the device structure. Found by code review. Signed-off-by: Ma Ke <make24@iscas.ac.cn> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251105090244.23327-1-make24@iscas.ac.cn	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	5f40ea7f41	EDAC/imh: Setup 'imh_test' debugfs testing node Setup the following debugfs testing node to enable fake memory error address decoding tests for the imh_edac driver. /sys/kernel/debug/edac/imh_test/addr Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-8-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	f619613f30	EDAC/{skx_comm,imh}: Detect 2-level memory configuration Detect 2-level memory configurations and notify the 'skx_common' library to enable ADXL 2-level memory error decoding. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-7-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	39abdcbdad	EDAC/skx_common: Extend the maximum number of DRAM chip row bits The allowed maximum number of row bits for DRAM chips in the Diamond Rapids server processor is up to 19. Extend the current maximum row bits from 18 to 19. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-6-qiuxu.zhuo@intel.com	2025-11-21 10:20:51 -08:00
Qiuxu Zhuo	9fc67b1170	EDAC/{skx_common,imh}: Add EDAC driver for Intel Diamond Rapids servers Intel Diamond Rapids CPUs include Integrated Memory and I/O Hubs (IMH). The memory controllers within the IMHs provide memory stacks to the processor. Create a new driver for this IMH-based memory controllers rather than applying additional patches to the existing i10nm_edac.c for the following reasons: 1) The memory controllers are not presented as PCI devices; instead, the detection and all their registers have been transitioned to MMIO-based memory spaces. 2) Validation processes are costly. Modifications to i10nm_edac would require extensive validation checks against multiple platforms, including Ice Lake, Sapphire Rapids, Emerald Rapids, Granite Rapids, Sierra Forest, and Grand Ridge. 3) Future Intel CPUs will likely only need patches on top of this new EDAC driver. Validation can be limited to Diamond Rapids servers and future Intel CPU generations. [Tony: Fix kerneldoc for struct local_reg] [randconfig: Added dependencies on NFIT and DMI] Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-5-qiuxu.zhuo@intel.com	2025-11-21 10:19:43 -08:00
Avadhut Naik	821f5fe4db	x86/mce: Add support for physical address valid bit Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System Physical Address (SPA) is present in MCA_ADDR. PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it indicates validity of SPA in MCA_ADDR. PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if it is implementation specific. Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com	2025-11-21 10:32:28 +01:00
Yazen Ghannam	eeb3f76d73	x86/mce: Save and use APEI corrected threshold limit The MCA threshold limit generally is not something that needs to change during runtime. It is common for a system administrator to decide on a policy for their managed systems. If MCA thresholding is OS-managed, then the threshold limit must be set at every boot. However, many systems allow the user to set a value in their BIOS. And this is reported through an APEI HEST entry even if thresholding is not in FW-First mode. Use this value, if available, to set the OS-managed threshold limit. Users can still override it through sysfs if desired for testing or debug. APEI is parsed after MCE is initialized. So reset the thresholding blocks later to pick up the threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-21 10:32:28 +01:00
Christian Marangi	c3852d2ca4	cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list If CONFIG_OF is not enabled, of_match_node() is set as NULL and qcom_cpufreq_ipq806x_match_list won't be used causing a compilation warning. Flag qcom_cpufreq_ipq806x_match_list as __maybe_unused to fix the compilation warning. While at it also flag as __initconst as it's used only in probe contest and can be freed after probe. This follows the pattern of the usual of_device_id variables. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511202119.6zvvFMup-lkp@intel.com/ Fixes: `58f5d39d5e` ("cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> [ Viresh: Drop __initconst ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-11-21 10:21:13 +05:30
Samuel Wu	8e2d57e653	PM: sleep: Call pm_sleep_fs_sync() instead of ksys_sync_helper() Replace the direct calls to ksys_sync_helper() with the new pm_sleep_fs_sync() in suspend and hibernation code paths. This enables the new mechanism allowing the filesystem sync phase to be interrupted. Suggested-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Samuel Wu <wusamuel@google.com> Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ rjw: Subject and changelog edits, tags adjustment ] Link: https://patch.msgid.link/20251119171426.4086783-3-wusamuel@google.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 22:29:40 +01:00
Samuel Wu	bf8867eae1	PM: sleep: Add support for wakeup during filesystem sync Add helper function pm_sleep_fs_sync() and related data structures as a preparation for allowing system suspend and hibernation to be aborted by wakeup events while syncing file systems. The new function, to be called by the suspend process in order to sync file systems, uses a dedicated ordered workqueue to run ksys_sync_helper() in parallel with the calling process. Next, it waits for the completion of the filesystem sync and periodically checks if any system wakeup events are pending, in which case it will return an error. If that happens while the filesystem sync is still in progress, it will continue, possibly after pm_sleep_fs_sync() has returned, and if that function is called again before the sync is complete, a new work item to run ksys_sync_helper() again will be queued (and waited for) to increase the likelihood of writing all of the dirty pages in memory back to persistent storage. Suggested-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Samuel Wu <wusamuel@google.com> Co-developed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ rjw: Subject and changelog rewrite, tags adjustment ] Link: https://patch.msgid.link/20251119171426.4086783-2-wusamuel@google.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 22:29:40 +01:00
Rafael J. Wysocki	a857b530b3	Merge back material related to system sleep for 6.19	2025-11-20 22:28:23 +01:00
Kaushlendra Kumar	1b541e10ee	cpufreq: ACPI: Replace udelay() with usleep_range() Replace udelay() with usleep_range() in check_freqs() to allow CPU scheduling during frequency polling. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251119031109.134583-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 21:50:08 +01:00
Srinivas Pandruvada	8538e7ee09	docs: driver-api/thermal/intel_dptf: Add new workload type hint Add documentation for longer term classification of workload type for power or performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251118223620.554798-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 21:32:52 +01:00
Ard Biesheuvel	a3e6907128	x86/boot: Drop unused sev_enable() fallback The misc.h header is not included by the EFI stub, which is the only C caller of sev_enable(). This means the fallback for cases where CONFIG_AMD_MEM_ENCRYPT is not set is never used, so it can be dropped. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20250909080631.2867579-6-ardb+git@google.com	2025-11-20 21:12:48 +01:00
Gabriele Monaco	7dec062cfc	timers/migration: Exclude isolated cpus from hierarchy The timer migration mechanism allows active CPUs to pull timers from idle ones to improve the overall idle time. This is however undesired when CPU intensive workloads run on isolated cores, as the algorithm would move the timers from housekeeping to isolated cores, negatively affecting the isolation. Exclude isolated cores from the timer migration algorithm, extend the concept of unavailable cores, currently used for offline ones, to isolated ones: * A core is unavailable if isolated or offline; * A core is available if non isolated and online; A core is considered unavailable as isolated if it belongs to: * the isolcpus (domain) list * an isolated cpuset Except if it is: * in the nohz_full list (already idle for the hierarchy) * the nohz timekeeper core (must be available to handle global timers) CPUs are added to the hierarchy during late boot, excluding isolated ones, the hierarchy is also adapted when the cpuset isolation changes. Due to how the timer migration algorithm works, any CPU part of the hierarchy can have their global timers pulled by remote CPUs and have to pull remote timers, only skipping pulling remote timers would break the logic. For this reason, prevent isolated CPUs from pulling remote global timers, but also the other way around: any global timer started on an isolated CPU will run there. This does not break the concept of isolation (global timers don't come from outside the CPU) and, if considered inappropriate, can usually be mitigated with other isolation techniques (e.g. IRQ pinning). This effect was noticed on a 128 cores machine running oslat on the isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs, and the CPU with lowest count in a timer migration hierarchy (here 1 and 65) appears as always active and continuously pulls global timers, from the housekeeping CPUs. This ends up moving driver work (e.g. delayed work) to isolated CPUs and causes latency spikes: before the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 1203 10 3 4 ... 5 (us) after the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 10 4 3 4 3 ... 5 (us) The same behaviour was observed on a machine with as few as 20 cores / 40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: John B. Wyatt IV <jwyatt@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-8-gmonaco@redhat.com	2025-11-20 20:17:32 +01:00
Yury Norov	b56651007f	cpumask: Add initialiser to use cleanup helpers Now we can simplify a code that allocates cpumasks for local needs. Automatic variables have to be initialized at declaration, or at least before any possibility for the logic to return, so that compiler wouldn't try to call an associate destructor function on a random stack number. Because cpumask_var_t, depending on the CPUMASK_OFFSTACK config, is either a pointer or an array, we have to have a macro for initialization. So define a CPUMASK_VAR_NULL macro, which allows to init struct cpumask pointer with NULL when CPUMASK_OFFSTACK is enabled, and effectively a no-op when CPUMASK_OFFSTACK is disabled (initialisation optimised out with -O2). Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-7-gmonaco@redhat.com	2025-11-20 20:17:32 +01:00
Gabriele Monaco	185bccc797	sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any Currently the user can set up isolcpus and nohz_full in such a way that leaves no housekeeping CPU (i.e. no CPU that is neither domain isolated nor nohz full). This can be a problem for other subsystems (e.g. the timer wheel imgration). Prevent this configuration by invalidating the last setting in case the union of isolcpus (domain) and nohz_full covers all CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-6-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	22f8e41680	cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks() update_unbound_workqueue_cpumask() updates unbound workqueues settings when there's a change in isolated CPUs, but it can be used for other subsystems requiring updated when isolated CPUs change. Generalise the name to update_isolation_cpumasks() to prepare for other functions unrelated to workqueues to be called in that spot. [longman: Change the function name to update_isolation_cpumasks()] Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Chen Ridong <chenridong@huaweicloud.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251120145653.296659-5-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	4c2374ed86	timers/migration: Use scoped_guard on available flag set/clear Cleanup tmigr_clear_cpu_available() and tmigr_set_cpu_available() to prepare for easier checks on the available flag. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-4-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	a048ca5f00	timers/migration: Add mask for CPUs available in the hierarchy Keep track of the CPUs available for timer migration in a cpumask. This prepares the ground to generalise the concept of unavailable CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-3-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Gabriele Monaco	8312cab5ff	timers/migration: Rename 'online' bit to 'available' The timer migration hierarchy excludes offline CPUs via the tmigr_is_not_available function, which is essentially checking the online bit for the CPU. Rename the online bit to available and all references in function names and tracepoint to generalise the concept of available CPUs. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251120145653.296659-2-gmonaco@redhat.com	2025-11-20 20:17:31 +01:00
Cai Xinchen	f20810157f	arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT The commit `e7bafbf717` ("arm64: mm: Add top-level dispatcher for internal mem_encrypt API") adds ARCH_HAS_MEM_ENCRYPT. And then the commit `42be24a417` ("arm64: Enable memory encrypt for Realms") adds duplicate config. Just remove it. Fixes: `42be24a417` ("arm64: Enable memory encrypt for Realms") Signed-off-by: Cai Xinchen <caixinchen1@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-20 17:57:23 +00:00
Yang Shi	a06494adb7	arm64: mm: use untagged address to calculate page index Nathan Chancellor reported the below bug: [ 0.149929] BUG: KASAN: invalid-access in change_memory_common+0x258/0x2d0 [ 0.151006] Read of size 8 at addr f96680000268a000 by task swapper/0/1 [ 0.152031] [ 0.152274] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc1-00012-g37cb0aab9068 #1 PREEMPT [ 0.152288] Hardware name: linux,dummy-virt (DT) [ 0.152292] Call trace: [ 0.152295] show_stack+0x18/0x30 (C) [ 0.152309] dump_stack_lvl+0x60/0x80 [ 0.152320] print_report+0x480/0x498 [ 0.152331] kasan_report+0xac/0xf0 [ 0.152343] kasan_check_range+0x90/0xb0 [ 0.152353] __hwasan_load8_noabort+0x20/0x34 [ 0.152364] change_memory_common+0x258/0x2d0 [ 0.152375] set_memory_ro+0x18/0x24 [ 0.152386] bpf_prog_pack_alloc+0x200/0x2e8 [ 0.152397] bpf_jit_binary_pack_alloc+0x78/0x188 [ 0.152409] bpf_int_jit_compile+0xa4c/0xc74 [ 0.152420] bpf_prog_select_runtime+0x1c0/0x2bc [ 0.152430] bpf_prepare_filter+0x5a4/0x7c0 [ 0.152443] bpf_prog_create+0xa4/0x100 [ 0.152454] ptp_classifier_init+0x80/0xd0 [ 0.152465] sock_init+0x12c/0x178 [ 0.152474] do_one_initcall+0xa0/0x260 [ 0.152484] kernel_init_freeable+0x2d8/0x358 [ 0.152495] kernel_init+0x20/0x140 [ 0.152510] ret_from_fork+0x10/0x20 It is because the KASAN tagged address was used when calculating the page index. The untagged address should be used. Fixes: `37cb0aab90` ("arm64: mm: make linear mapping permission update more robust for patial range") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-20 17:57:19 +00:00
Rafael J. Wysocki	d834e68a0e	cpuidle: governors: teo: Simplify intercepts-based state lookup Simplify the loop looking up a candidate idle state in the case when an intercept is likely to occur by adding a search for the state index limit if the tick is stopped before it. First, call tick_nohz_tick_stopped() just once and if it returns true, look for the shallowest state index below the current candidate one with target residency at least equal to the tick period length. Next, simply look for a state that is not shallower than the one found in the previous step and satisfies the intercepts majority condition (if there are no such states, the shallowest state that is not shallower than the one found in the previous step becomes the new candidate). Since teo_state_ok() has no callers any more after the above changes, drop it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Changelog clarification and code comment edit ] Link: https://patch.msgid.link/2418792.ElGaqSPkdT@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 16:32:48 +01:00
Geert Uytterhoeven	a6eb177102	thermal/drivers/rcar_gen3: Convert to DEFINE_SIMPLE_DEV_PM_OPS() Convert the Renesas R-Car Gen3 thermal driver from SIMPLE_DEV_PM_OPS() to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This lets us drop the __maybe_unused annotation from its resume callback, and reduces kernel size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/813ad36fdc8561cf1c396230436e8ff3ff903a1f.1763117455.git.geert+renesas@glider.be	2025-11-20 15:33:45 +01:00
Geert Uytterhoeven	186b5c2726	thermal/drivers/rcar: Convert to DEFINE_SIMPLE_DEV_PM_OPS() Convert the Renesas R-Car thermal driver from SIMPLE_DEV_PM_OPS() to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This lets us drop the check for CONFIG_PM_SLEEP, and reduces kernel size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled, while increasing build coverage. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://patch.msgid.link/ee03ec71d10fd589e7458fa1b0ada3d3c19dbb54.1763117351.git.geert+renesas@glider.be	2025-11-20 15:32:14 +01:00
Rafael J. Wysocki	50db438231	cpuidle: governors: teo: Fix tick_intercepts handling in teo_update() The condition deciding whether or not to increase cpu_data->tick_intercepts in teo_update() is reverse, so fix it. Fixes: `d619b5cc67` ("cpuidle: teo: Simplify counting events used for tick management") Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 0796ddf4a7f0: cpuidle: teo: Use this_cpu_ptr() where possible Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: 8f3f01082d7a: cpuidle: governors: teo: Use s64 consistently in teo_update() Cc: 6.14+ <stable@vger.kernel.org> # 6.14+: b54df61c7428: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold Cc: 6.14+ <stable@vger.kernel.org> 6.14+: 083654ded547: cpuidle: governors: teo: Rework the handling of tick wakeups Cc: 6.14+ <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/5085160.31r3eYUQgx@rafael.j.wysocki	2025-11-20 14:54:08 +01:00
Rafael J. Wysocki	083654ded5	cpuidle: governors: teo: Rework the handling of tick wakeups If the wakeup pattern is clearly dominated by tick wakeups, count those wakeups as hits on the deepest available idle state to increase the likelihood of stopping the tick, especially on systems where there are only 2 usable idle states and the tick can only be stopped when the deeper state is selected. This change is expected to reduce power on some systems where state 0 is selected relatively often even though they are almost idle. Without it, the governor may end up selecting the shallowest idle state all the time even if the system is almost completely idle due all tick wakeups being counted as hits on that state and preventing the tick from being stopped at all. Fixes: `4b20b07ce7` ("cpuidle: teo: Don't count non-existent intercepts") Reported-by: Reka Norman <rekanorman@chromium.org> Closes: https://lore.kernel.org/linux-pm/CAEmPcwsNMNnNXuxgvHTQ93Mx-q3Oz9U57THQsU_qdcCx1m4w5g@mail.gmail.com/ Tested-by: Reka Norman <rekanorman@chromium.org> Tested-by: Christian Loehle <christian.loehle@arm.com> Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 92ce5c07b7a1: cpuidle: teo: Reorder candidate state index checks Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ea185406d1ed: cpuidle: teo: Combine candidate state index checks against 0 Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b9a6af26bd83: cpuidle: teo: Drop local variable prev_intercept_idx Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: e24f8a55de50: cpuidle: teo: Clarify two code comments Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: d619b5cc6780: cpuidle: teo: Simplify counting events used for tick management Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 13ed5c4a6d9c: cpuidle: teo: Skip getting the sleep length if wakeups are very frequent Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: ddcfa7964677: cpuidle: teo: Simplify handling of total events count Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 65e18e654475: cpuidle: teo: Replace time_span_ns with a flag Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 0796ddf4a7f0: cpuidle: teo: Use this_cpu_ptr() where possible Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: 8f3f01082d7a: cpuidle: governors: teo: Use s64 consistently in teo_update() Cc: 6.11+ <stable@vger.kernel.org> # 6.11+: b54df61c7428: cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold Cc: 6.11+ <stable@vger.kernel.org> # 6.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ rjw: Rebase on commit `0796ddf4a7`, changelog update ] Link: https://patch.msgid.link/6228387.lOV4Wx5bFT@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-20 14:49:57 +01:00
Thomas Gleixner	79c11fb3da	sched/mmcid: Use cpumask_weighted_or() Use cpumask_weighted_or() instead of cpumask_or() and cpumask_weight() on the result, which walks the same bitmap twice. Results in 10-20% less cycles, which reduces the runqueue lock hold time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.511736272@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	437cb3ded2	cpumask: Introduce cpumask_weighted_or() CID management OR's two cpumasks and then calculates the weight on the result. That's inefficient as that has to walk the same stuff twice. As this is done with runqueue lock held, there is a real benefit of speeding this up. Depending on the system this results in 10-20% less cycles spent with runqueue lock held for a 4K cpumask. Provide cpumask_weighted_or() and the corresponding bitmap functions which return the weight of the OR result right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.448263340@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	0d032a43eb	sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() mm_update_cpus_allowed() is not required to be invoked for affinity changes due to migrate_disable() and migrate_enable(). migrate_disable() restricts the task temporarily to a CPU on which the task was already allowed to run, so nothing changes. migrate_enable() restores the actual task affinity mask. If that mask changed between migrate_disable() and migrate_enable() then that change was already accounted for. Move the invocation to the proper place to avoid that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.385208276@linutronix.de	2025-11-20 12:14:54 +01:00
Thomas Gleixner	b08ef5fc8f	sched/mmcid: Move scheduler code out of global header This is only used in the scheduler core code, so there is no point to have it in a global header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.321259077@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	925b7847bb	sched: Fixup whitespace damage With whitespace checks enabled in the editor this makes eyes bleed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.258651925@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	be4463fa2c	sched/mmcid: Cacheline align MM CID storage Both the per CPU storage and the data in mm_struct are heavily used in context switch. As they can end up next to other frequently modified data, they are subject to false sharing. Make them cache line aligned. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.194111661@linutronix.de	2025-11-20 12:14:53 +01:00
Thomas Gleixner	8cea569ca7	sched/mmcid: Use proper data structures Having a lot of CID functionality specific members in struct task_struct and struct mm_struct is not really making the code easier to read. Encapsulate the CID specific parts in data structures and keep them separate from the stuff they are embedded in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de	2025-11-20 12:14:52 +01:00
Thomas Gleixner	77d7dc8bef	sched/mmcid: Revert the complex CID management The CID management is a complex beast, which affects both scheduling and task migration. The compaction mechanism forces random tasks of a process into task work on exit to user space causing latency spikes. Revert back to the initial simple bitmap allocating mechanics, which are known to have scalability issues as that allows to gradually build up a replacement functionality in a reviewable way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.068197830@linutronix.de	2025-11-20 12:14:52 +01:00
Qiuxu Zhuo	d4839582bc	EDAC/skx_common: Prepare for skx_set_hi_lo() The upcoming imh_edac driver for Intel Diamond Rapids servers cannot use skx_get_hi_lo() in skx_common to retrieve the TOHM (Top of High Memory) and TOLM (Top of Low Memory) parameters. Instead, it obtains these parameters within its own EDAC driver. To accommodate this, prepare skx_set_hi_lo() to allow the driver to notify skx_common of these parameters. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-4-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Qiuxu Zhuo	9529e69773	EDAC/skx_common: Prepare for skx_get_edac_list() The Intel EDAC library 'skx_common' maintains the Intel server EDAC device list for {skx, i10nm}_edac drivers, which use skx_get_all_bus_mappings() to build and retrieve the EDAC device list. However, the upcoming Intel EDAC driver, imh_edac, for Diamond Rapids servers is designed for memory controllers that are MMIO-based devices rather than PCI devices. Consequently, it can't use skx_get_all_bus_mappings() due to the absence of a PCI bus. To accommodate this, prepare skx_get_edac_list() to enable the upcoming imh_edac driver to obtain the EDAC device list from the skx_common library and build the EDAC device list independently. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-3-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Qiuxu Zhuo	b3d70059cb	EDAC/{skx_common,skx,i10nm}: Make skx_register_mci() independent of pci_dev Memory controllers in the new Intel server CPUs, such as Diamond Rapids, are presented as MMIO-based devices rather than PCI devices. Modify skx_register_mci() to be independent of 'pci_dev' and use a generic 'dev' of 'struct device' to prepare for support of such MMIO-based memory controllers. Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20251119134132.2389472-2-qiuxu.zhuo@intel.com	2025-11-19 12:11:40 -08:00
Ben Horgan	ce1e1421f8	MAINTAINERS: new entry for MPAM Driver Create a maintainer entry for the new MPAM Driver. Add myself and James Morse as maintainers. James created the driver and I have taken up the later versions of his series. Cc: James Morse <james.morse@arm.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	2557e0eafe	arm_mpam: Add kunit tests for props_mismatch() When features are mismatched between MSC the way features are combined to the class determines whether resctrl can support this SoC. Add some tests to illustrate the sort of thing that is expected to work, and those that must be removed. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	e3565d1fd4	arm_mpam: Add kunit test for bitmap reset The bitmap reset code has been a source of bugs. Add a unit test. This currently has to be built in, as the rest of the driver is builtin. Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
James Morse	201d96ca4c	arm_mpam: Add helper to reset saved mbwu state resctrl expects to reset the bandwidth counters when the filesystem is mounted. To allow this, add a helper that clears the saved mbwu state. Instead of cross calling to each CPU that can access the component MSC to write to the counter, set a flag that causes it to be zero'd on the the next read. This is easily done by forcing a configuration update. Signed-off-by: James Morse <james.morse@arm.com> Cc: Peter Newman <peternewman@google.com> Reviewed-by: Fenghua Yu <fenghuay@nvdia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
Rohit Mathew	9e5afb7c32	arm_mpam: Use long MBWU counters if supported Now that the larger counter sizes are probed, make use of them. Callers of mpam_msmon_read() may not know (or care!) about the different counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the driver pick the counter to use. Only 32bit accesses to the MSC are required to be supported by the spec, but these registers are 64bits. The lower half may overflow into the higher half between two 32bit reads. To avoid this, use a helper that reads the top half multiple times to check for overflow. Signed-off-by: Rohit Mathew <rohit.mathew@arm.com> [morse: merged multiple patches from Rohit, added explicit counter selection ] Signed-off-by: James Morse <james.morse@arm.com> Cc: Peter Newman <peternewman@google.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:24 +00:00
Rohit Mathew	fdc29a141d	arm_mpam: Probe for long/lwd mbwu counters mpam v0.1 and versions above v1.0 support optional long counter for memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields indicating support for long counters. Probe these feature bits. The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth monitors are supported, instead of muddling this with which size of bandwidth monitors, add an explicit 31 bit counter feature. Signed-off-by: Rohit Mathew <rohit.mathew@arm.com> [ morse: Added 31bit counter feature to simplify later logic ] Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
Ben Horgan	b353637932	arm_mpam: Consider overflow in bandwidth counter state Use the overflow status bit to track overflow on each bandwidth counter read and add the counter size to the correction when overflow is detected. This assumes that only a single overflow has occurred since the last read of the counter. Overflow interrupts, on hardware that supports them could be used to remove this limitation. Cc: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	41e8a14950	arm_mpam: Track bandwidth counter state for power management Bandwidth counters need to run continuously to correctly reflect the bandwidth. Save the counter state when the hardware is reset due to CPU hotplug. Add struct mbwu_state to track the bandwidth counter. Support for tracking overflow with the same structure will be added in a subsequent commit. Cc: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	823e7c3712	arm_mpam: Add mpam_msmon_read() to read monitor value Reading a monitor involves configuring what you want to monitor, and reading the value. Components made up of multiple MSC may need values from each MSC. MSCs may take time to configure, returning 'not ready'. The maximum 'not ready' time should have been provided by firmware. Add mpam_msmon_read() to hide all this. If (one of) the MSC returns not ready, then wait the full timeout value before trying again. CC: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Shaopeng Tan (Fujitsu) <tan.shaopeng@fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:23 +00:00
James Morse	c891bae664	arm_mpam: Add helpers to allocate monitors MPAM's MSC support a number of monitors, each of which supports bandwidth counters, or cache-storage-utilisation counters. To use a counter, a monitor needs to be configured. Add helpers to allocate and free CSU or MBWU monitors. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	880df85d86	arm_mpam: Probe and reset the rest of the features MPAM supports more features than are going to be exposed to resctrl. For partid other than 0, the reset values of these controls isn't known. Discover the rest of the features so they can be reset to avoid any side effects when resctrl is in use. PARTID narrowing allows MSC/RIS to support less configuration space than is usable. If this feature is found on a class of device we are likely to use, then reduce the partid_max to make it usable. This allows us to map a PARTID to itself. CC: Rohit Mathew <Rohit.Mathew@arm.com> CC: Zeng Heng <zengheng4@huawei.com> CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	09b89d2a72	arm_mpam: Allow configuration to be applied and restored during cpu online When CPUs come online the MSC's original configuration should be restored. Add struct mpam_config to hold the configuration. For each component, this has a bitmap of features that have been changed from the reset values. The mpam_config is also used on RIS reset where all bits are set to ensure all features are reset. Once the maximum partid is known, allocate a configuration array for each component, and reprogram each RIS configuration from this. CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Fujitsu Fujitsu <Shaopeng Tan tan.shaopeng@fujitsu.com> Cc: Peter Newman peternewman@google.com Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	3796f75aa7	arm_mpam: Use a static key to indicate when mpam is enabled Once all the MSC have been probed, the system wide usable number of PARTID is known and the configuration arrays can be allocated. After this point, checking all the MSC have been probed is pointless, and the cpuhp callbacks should restore the configuration, instead of just resetting the MSC. Add a static key to enable this behaviour. This will also allow MPAM to be disabled in response to an error, and the architecture code to enable/disable the context switch of the MPAM system registers. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	49aa621c4d	arm_mpam: Register and enable IRQs Register and enable error IRQs. All the MPAM error interrupts indicate a software bug, e.g. out of range partid. If the error interrupt is ever signalled, attempt to disable MPAM. Only the irq handler accesses the MPAMF_ESR register, so no locking is needed. The work to disable MPAM after an error needs to happen at process context as it takes mutex. It also unregisters the interrupts, meaning it can't be done from the threaded part of a threaded interrupt. Instead, mpam_disable() gets scheduled. Enabling the IRQs in the MSC may involve cross calling to a CPU that can access the MSC. Once the IRQ is requested, the mpam_disable() path can be called asynchronously, which will walk structures sized by max_partid. Ensure this size is fixed before the interrupt is requested. CC: Rohit Mathew <rohit.mathew@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Rohit Mathew <rohit.mathew@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	3bd04fe7d8	arm_mpam: Extend reset logic to allow devices to be reset any time cpuhp callbacks aren't the only time the MSC configuration may need to be reset. Resctrl has an API call to reset a class. If an MPAM error interrupt arrives it indicates the driver has misprogrammed an MSC. The safest thing to do is reset all the MSCs and disable MPAM. Add a helper to reset RIS via their class. Call this from mpam_disable(), which can be scheduled from the error interrupt handler. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:22 +00:00
James Morse	475228d15d	arm_mpam: Add a helper to touch an MSC from any CPU Resetting RIS entries from the cpuhp callback is easy as the callback occurs on the correct CPU. This won't be true for any other caller that wants to reset or configure an MSC. Add a helper that schedules the provided function if necessary. Callers should take the cpuhp lock to prevent the cpuhp callbacks from changing the MSC state. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	f188a36ca2	arm_mpam: Reset MSC controls from cpuhp callbacks When a CPU comes online, it may bring a newly accessible MSC with it. Only the default partid has its value reset by hardware, and even then the MSC might not have been reset since its config was previously dirtied. e.g. Kexec. Any in-use partid must have its configuration restored, or reset. In-use partids may be held in caches and evicted later. MSC are also reset when CPUs are taken offline to cover cases where firmware doesn't reset the MSC over reboot using UEFI, or kexec where there is no firmware involvement. If the configuration for a RIS has not been touched since it was brought online, it does not need resetting again. To reset, write the maximum values for all discovered controls. CC: Rohit Mathew <Rohit.Mathew@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	c10ca83a77	arm_mpam: Merge supported features during mpam_enable() into mpam_class To make a decision about whether to expose an mpam class as a resctrl resource we need to know its overall supported features and properties. Once we've probed all the resources, we can walk the tree and produce overall values by merging the bitmaps. This eliminates features that are only supported by some MSC that make up a component or class. If bitmap properties are mismatched within a component we cannot support the mismatched feature. Care has to be taken as vMSC may hold mismatched RIS. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	8c90dc68a5	arm_mpam: Probe the hardware features resctrl supports Expand the probing support with the control and monitor types we can use with resctrl. CC: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	d02beb06ca	arm_mpam: Add helpers for managing the locking around the mon_sel registers The MSC MON_SEL register needs to be accessed from hardirq for the overflow interrupt, and when taking an IPI to access these registers on platforms where MSC are not accessible from every CPU. This makes an irqsave spinlock the obvious lock to protect these registers. On systems with SCMI or PCC mailboxes it must be able to sleep, meaning a mutex must be used. The SCMI or PCC platforms can't support an overflow interrupt, and can't access the registers from hardirq context. Clearly these two can't exist for one MSC at the same time. Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and only support 'real' MMIO platforms. In the future this lock will be split in two allowing SCMI/PCC platforms to take a mutex. Because there are contexts where the SCMI/PCC platforms can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do this now, so that all the error handling on these paths is present. This allows the relevant paths to fail if they are needed on a platform where this isn't possible, instead of having to make explicit checks of the interface type. Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:21 +00:00
James Morse	bd221f9f82	arm_mpam: Probe hardware to find the supported partid/pmg values CPUs can generate traffic with a range of PARTID and PMG values, but each MSC may also have its own maximum size for these fields. Before MPAM can be used, the driver needs to probe each RIS on each MSC, to find the system-wide smallest value that can be used. The limits from requestors (e.g. CPUs) also need taking into account. While doing this, RIS entries that firmware didn't describe are created under MPAM_CLASS_UNKNOWN. This adds the low level MSC write accessors. While we're here, implement the mpam_register_requestor() call for the arch code to register the CPU limits. Future callers of this will tell us about the SMMU and ITS. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	8f8d0ac1da	arm_mpam: Add cpuhp callbacks to probe MSC hardware Because an MSC can only by accessed from the CPUs in its cpu-affinity set we need to be running on one of those CPUs to probe the MSC hardware. Do this work in the cpuhp callback. Probing the hardware will only happen before MPAM is enabled, walk all the MSCs and probe those we can reach that haven't already been probed as each CPU's online call is made. This adds the low-level MSC register read accessors. Once all MSCs reported by the firmware have been probed from a CPU in their respective cpu-affinity set, the probe-time cpuhp callbacks are replaced. The replacement callbacks will ultimately need to handle save/restore of the runtime MSC state across power transitions, but for now there is nothing to do in them: so do nothing. The architecture's context switch code will be enabled by a static-key, this can be set by mpam_enable(), but must be done from process context, not a cpuhp callback because both take the cpuhp lock. Whenever a new MSC has been probed, the mpam_enable() work is scheduled to test if all the MSCs have been probed. If probing fails, mpam_disable() is scheduled to unregister the cpuhp callbacks and free memory. CC: Lecopzer Chen <lecopzerc@nvidia.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	aa64b9e110	arm_mpam: Add MPAM MSC register layout definitions Memory Partitioning and Monitoring (MPAM) has memory mapped devices (MSCs) with an identity/configuration page. Add the definitions for these registers as offset within the page(s). Link: https://developer.arm.com/documentation/ihi0099/aa/ Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	01fb4b8224	arm_mpam: Add the class and component structures for firmware described ris An MSC is a container of resources, each identified by their RIS index. Some RIS are described by firmware to provide their position in the system. Others are discovered when the driver probes the hardware. To configure a resource it needs to be found by its class, e.g. 'L2'. There are two kinds of grouping, a class is a set of components, which are visible to user-space as there are likely to be multiple instances of the L2 cache. (e.g. one per cluster or package) Add support for creating and destroying structures to allow a hierarchy of resources to be created. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	f04046f257	arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be accessible from those CPUs, and they may not be online. Touching the hardware early is pointless as MPAM can't be used until the system-wide common values for num_partid and num_pmg have been discovered. Start with driver probe/remove and mapping the MSC. Cc: Carl Worth <carl@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
James Morse	115c5325be	ACPI / MPAM: Parse the MPAM table Add code to parse the arm64 specific MPAM table, looking up the cache level from the PPTT and feeding the end result into the MPAM driver. This happens in two stages. Platform devices are created first for the MSC devices. Once the driver probes it calls acpi_mpam_parse_resources() to discover the RIS entries the MSC contains. For now the MPAM hook mpam_ris_create() is stubbed out, but will update the MPAM driver with optional discovered data about the RIS entries. CC: Carl Worth <carl@os.amperecomputing.com> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:20 +00:00
Ben Horgan	96f4a4d53e	ACPI: Define acpi_put_table cleanup handler and acpi_get_table_pointer() helper Define a cleanup helper for use with __free to release the acpi table when the pointer goes out of scope. Also, introduce the helper acpi_get_table_pointer() to simplify a commonly used pattern involving acpi_get_table(). These are first used in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
Ben Horgan	f5915600cc	platform: Define platform_device_put cleanup handler Define a cleanup helper for use with __free to destroy platform devices automatically when the pointer goes out of scope. This is only intended to be used in error cases and so should be used with return_ptr() or no_free_ptr() directly to avoid the automatic destruction on success. A first use of this is introduced in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
James Morse	d8bf01d809	arm64: kconfig: Add Kconfig entry for MPAM The bulk of the MPAM driver lives outside the arch code because it largely manages MMIO devices that generate interrupts. The driver needs a Kconfig symbol to enable it. As MPAM is only found on arm64 platforms, the arm64 tree is the most natural home for the Kconfig option. This Kconfig option will later be used by the arch code to enable or disable the MPAM context-switch code, and to register properties of CPUs with the MPAM driver. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> CC: Dave Martin <dave.martin@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:19 +00:00
James Morse	a39a723a6f	ACPI / PPTT: Add a helper to fill a cpumask from a cache_id MPAM identifies CPUs by the cache_id in the PPTT cache structure. The driver needs to know which CPUs are associated with the cache. The CPUs may not all be online, so cacheinfo does not have the information. Add a helper to pull this information out of the PPTT. CC: Rohit Mathew <Rohit.Mathew@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:14 +00:00
James Morse	41a7bb39fe	ACPI / PPTT: Find cache level by cache-id The MPAM table identifies caches by id. The MPAM driver also wants to know the cache level to determine if the platform is of the shape that can be managed via resctrl. Cacheinfo has this information, but only for CPUs that are online. Waiting for all CPUs to come online is a problem for platforms where CPUs are brought online late by user-space. Add a helper that walks every possible cache, until it finds the one identified by cache-id, then return the level. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:09 +00:00
Ben Horgan	cfc085af83	ACPI / PPTT: Add acpi_pptt_cache_v1_full to use pptt cache as one structure In actbl2.h, acpi_pptt_cache describes the fields in the original Cache Type Structure. In PPTT table version 3 a new field was added at the end, cache_id. This is described in acpi_pptt_cache_v1 but rather than including all v1 fields it just includes this one. In lieu of this being fixed in acpica, introduce acpi_pptt_cache_v1_full to contain all the fields of the Cache Type Structure . Update the existing code to use this new struct. This simplifies the code and removes a non-standard use of ACPI_ADD_PTR. Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:34:01 +00:00
James Morse	eeec7845e9	ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels In acpi_count_levels(), the initial value of *levels passed by the caller is really an implementation detail of acpi_count_levels(), so it is unreasonable to expect the callers of this function to know what to pass in for this parameter. The only sensible initial value is 0, which is what the only upstream caller (acpi_get_cache_info()) passes. Use a local variable for the starting cache level in acpi_count_levels(), and pass the result back to the caller via the function return value. Get rid of the levels parameter, which has no remaining purpose. Fix acpi_get_cache_info() to match. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:33:56 +00:00
James Morse	796e29b857	ACPI / PPTT: Add a helper to fill a cpumask from a processor container The ACPI MPAM table uses the UID of a processor container specified in the PPTT to indicate the subset of CPUs and cache topology that can access each MPAM System Component (MSC). This information is not directly useful to the kernel. The equivalent cpumask is needed instead. Add a helper to find the processor container by its id, then walk the possible CPUs to fill a cpumask with the CPUs that have this processor container as a parent. CC: Dave Martin <dave.martin@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 18:33:36 +00:00
Huang Ying	cb1fa2e999	arm64, tlbflush: don't TLBI broadcast if page reused in write fault A multi-thread customer workload with large memory footprint uses fork()/exec() to run some external programs every tens seconds. When running the workload on an arm64 server machine, it's observed that quite some CPU cycles are spent in the TLB flushing functions. While running the workload on the x86_64 server machine, it's not. This causes the performance on arm64 to be much worse than that on x86_64. During the workload running, after fork()/exec() write-protects all pages in the parent process, memory writing in the parent process will cause a write protection fault. Then the page fault handler will make the PTE/PDE writable if the page can be reused, which is almost always true in the workload. On arm64, to avoid the write protection fault on other CPUs, the page fault handler flushes the TLB globally with TLBI broadcast after changing the PTE/PDE. However, this isn't always necessary. Firstly, it's safe to leave some stale read-only TLB entries as long as they will be flushed finally. Secondly, it's quite possible that the original read-only PTE/PDEs aren't cached in remote TLB at all if the memory footprint is large. In fact, on x86_64, the page fault handler doesn't flush the remote TLB in this situation, which benefits the performance a lot. To improve the performance on arm64, make the write protection fault handler flush the TLB locally instead of globally via TLBI broadcast after making the PTE/PDE writable. If there are stale read-only TLB entries in the remote CPUs, the page fault handler on these CPUs will regard the page fault as spurious and flush the stale TLB entries. To test the patchset, make the usemem.c from vm-scalability (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git). support calling fork()/exec() periodically. To mimic the behavior of the customer workload, run usemem with 4 threads, access 100GB memory, and call fork()/exec() every 40 seconds. Test results show that with the patchset the score of usemem improves ~40.6%. The cycles% of TLB flush functions reduces from ~50.5% to ~0.3% in perf profile. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 16:01:48 +00:00
Huang Ying	79301c7d60	mm: add spurious fault fixing support for huge pmd The page faults may be spurious because of the racy access to the page table. For example, a non-populated virtual page is accessed on 2 CPUs simultaneously, thus the page faults are triggered on both CPUs. However, it's possible that one CPU (say CPU A) cannot find the reason for the page fault if the other CPU (say CPU B) has changed the page table before the PTE is checked on CPU A. Most of the time, the spurious page faults can be ignored safely. However, if the page fault is for the write access, it's possible that a stale read-only TLB entry exists in the local CPU and needs to be flushed on some architectures. This is called the spurious page fault fixing. In the current kernel, there is spurious fault fixing support for pte, but not for huge pmd because no architectures need it. But in the next patch in the series, we will change the write protection fault handling logic on arm64, so that some stale huge pmd entries may remain in the TLB. These entries need to be flushed via the huge pmd spurious fault fixing mechanism. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-19 16:01:48 +00:00
Reinette Chatre	5a88a6e92b	fs/resctrl: Consider sparse masks when initializing new group's allocation A new resource group is intended to be created with sane defaults. For a cache resource this means all cache portions the new group could possibly allocate into. This includes unused cache portions and shareable cache portions used by other groups and hardware. New resource group creation does not take sparse masks into account. After determining the bitmask reflecting the new group's possible allocations the bitmask is forced to be contiguous even if the system supports sparse masks. For example, a new group could by default allocate into a large portion of cache represented by 0xff0f, but it is instead created with a mask of 0xf. Do not force a contiguous allocation range if the system supports sparse masks. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/abbbb008bc09d982d715e79d3b885c10f92c64e0.1763426240.git.reinette.chatre@intel.com	2025-11-18 21:10:56 +01:00
Sohil Mehta	d5cb957439	x86/cpu: Enable LASS during CPU initialization Linear Address Space Separation (LASS) mitigates a class of side-channel attacks that rely on speculative access across the user/kernel boundary. Enable LASS along with similar security features if the platform supports it. While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup instead of updating it, as the whole sequence is quite self-explanatory. Some EFI runtime and boot services may rely on 1:1 mappings in the lower half during early boot and even after SetVirtualAddressMap(). To avoid tripping LASS, the initial CR4 programming would need to be delayed until EFI has completely finished entering virtual mode (including efi_free_boot_services()). Also, LASS would need to be temporarily disabled while switching to efi_mm to avoid potential faults on stray runtime accesses. Similarly, legacy vsyscall page accesses are flagged by LASS resulting in a #GP (instead of a #PF). Without LASS, the #PF handler emulates the accesses and returns the appropriate values. Equivalent emulation support is required in the #GP handler with LASS enabled. In case of vsyscall XONLY (execute only) mode, the faulting address is readily available in the RIP which would make it easier to reuse the #PF emulation logic. For now, keep it simple and disable LASS if either of those are compiled in. Though not ideal, this makes it easier to start testing LASS support in some environments. In future, LASS support can easily be expanded to support EFI and legacy vsyscalls. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-9-sohil.mehta%40intel.com	2025-11-18 10:38:27 -08:00
Sohil Mehta	c9129cf0f0	selftests/x86: Update the negative vsyscall tests to expect a #GP Some of the vsyscall selftests expect a #PF when vsyscalls are disabled. However, with LASS enabled, an invalid access results in a SIGSEGV due to a #GP instead of a #PF. One such negative test fails because it is expecting X86_PF_INSTR to be set. Update the failing test to expect either a #GP or a #PF. Also, update the printed messages to show the trap number (denoting the type of fault) instead of assuming a #PF. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-8-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Alexander Shishkin	42fea0a3a7	x86/traps: Communicate a LASS violation in #GP message A LASS violation typically results in a #GP. With LASS active, any invalid access to user memory (including the first page frame) would be reported as a #GP, instead of a #PF. Unfortunately, the #GP error messages provide limited information about the cause of the fault. This could be confusing for kernel developers and users who are accustomed to the friendly #PF messages. To make the transition easier, enhance the #GP Oops message to include a hint about LASS violations. Also, add a special hint for kernel NULL pointer dereferences to match with the existing #PF message. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-7-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	731d43750c	x86/kexec: Disable LASS during relocate kernel The relocate kernel mechanism uses an identity mapping to copy the new kernel, which leads to a LASS violation when executing from a low address. LASS must be disabled after the original CR4 value is saved because kexec paths that preserve context need to restore CR4.LASS. But, disabling it along with CET during identity_mapped() is too late. So, disable LASS immediately after saving CR4, along with PGE, and before jumping to the identity-mapped page. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-6-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	b3a7e973ab	x86/alternatives: Disable LASS when patching kernel code For patching, the kernel initializes a temporary mm area in the lower half of the address range. LASS blocks these accesses because its enforcement relies on bit 63 of the virtual address as opposed to SMAP which depends on the _PAGE_BIT_USER bit in the page table. Disable LASS enforcement by toggling the RFLAGS.AC bit during patching to avoid triggering a #GP fault. Introduce LASS-specific STAC/CLAC helpers to set the AC bit only on platforms that need it. Name the wrappers as lass_stac()/_clac() instead of lass_disable()/_enable() because they only control the kernel data access enforcement. The entire LASS mechanism (including instruction fetch enforcement) is controlled by the CR4.LASS bit. Describe the usage of the new helpers in comparison to the ones used for SMAP. Also, add comments to explain when the existing stac()/clac() should be used. While at it, move the duplicated "barrier" comment to the same block. The Text poking functions use standard memcpy()/memset() while patching kernel code. However, objtool complains about calling such dynamic functions within an AC=1 region. See warning #9, regarding function calls with UACCESS enabled, in tools/objtool/Documentation/objtool.txt. To pacify objtool, one option is to add memcpy() and memset() to the list of allowed-functions. However, that would provide a blanket exemption for all usages of memcpy() and memset(). Instead, replace the standard calls in the text poking functions with their unoptimized, always-inlined versions. Considering that patching is usually small, there is no performance impact expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-5-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Peter Zijlstra (Intel)	d9a96cc18b	x86/asm: Introduce inline memcpy and memset Provide inline memcpy and memset functions that can be used instead of the GCC builtins when necessary. The immediate use case is for the text poking functions to avoid the standard memcpy()/memset() calls because objtool complains about such dynamic calls within an AC=1 region. See tools/objtool/Documentation/objtool.txt, warning #9, regarding function calls with UACCESS enabled. Some user copy functions such as copy_user_generic() and __clear_user() have similar rep_{movs,stos} usages. But, those are highly specialized and hard to combine or reuse for other things. Define these new helpers for all other usages that need a completely unoptimized, strictly inline version of memcpy() or memset(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-4-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	e39c5387ad	x86/cpu: Add an LASS dependency on SMAP With LASS enabled, any kernel data access to userspace typically results in a #GP, or a #SS in some stack-related cases. When the kernel needs to access user memory, it can suspend LASS enforcement by toggling the RFLAGS.AC bit. Most of these cases are already covered by the stac()/clac() pairs used to avoid SMAP violations. Even though LASS could potentially be enabled independently, it would be very painful without SMAP and the related stac()/clac() calls. There is no reason to support such a configuration because all future hardware with LASS is expected to have SMAP as well. Also, the STAC/CLAC instructions are architected to: #UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. So, make LASS depend on SMAP to conveniently reuse the existing AC bit toggling already in place. Note: Additional STAC/CLAC would still be needed for accesses such as text poking which are not flagged by SMAP. This is because such mappings are in the lower half but do not have the _PAGE_USER bit set which SMAP uses for enforcement. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-3-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
Sohil Mehta	7baadd463e	x86/cpufeatures: Enumerate the LASS feature bits Linear Address Space Separation (LASS) is a security feature that mitigates a class of side-channel attacks relying on speculative access across the user/kernel boundary. Privilege mode based access protection already exists today with paging and features such as SMEP and SMAP. However, to enforce these protections, the processor must traverse the paging structures in memory. An attacker can use timing information resulting from this traversal to determine details about the paging structures, and to determine the layout of the kernel memory. LASS provides the same mode-based protections as paging but without traversing the paging structures. Because the protections are enforced prior to page-walks, an attacker will not be able to derive paging-based timing information from the various caching structures such as the TLBs, mid-level caches, page walker, data caches, etc. LASS enforcement relies on the kernel implementation to divide the 64-bit virtual address space into two halves: Addr[63]=0 -> User address space Addr[63]=1 -> Kernel address space Any data access or code execution across address spaces typically results in a #GP fault, with an #SS generated in some rare cases. The LASS enforcement for kernel data accesses is dependent on CR4.SMAP being set. The enforcement can be disabled by toggling the RFLAGS.AC bit similar to SMAP. Define the CPU feature bits to enumerate LASS. Also, disable the feature at compile time on 32-bit kernels. Use a direct dependency on X86_32 (instead of !X86_64) to make it easier to combine with similar 32-bit specific dependencies in the future. LASS mitigates a class of side-channel speculative attacks, such as Spectre LAM, described in the paper, "Leaky Address Masking: Exploiting Unmasked Spectre Gadgets with Noncanonical Address Translation". Add the "lass" flag to /proc/cpuinfo to indicate that the feature is supported by hardware and enabled by the kernel. This allows userspace to determine if the system is secure against such attacks. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-2-sohil.mehta%40intel.com	2025-11-18 10:38:26 -08:00
René Rebe	e96190da17	PNP: Fix ISAPNP to generate uevents to auto-load modules Currently ISAPNP devices do not generate an uevent for udev to auto-load the driver modules for Creative SoundBlaster or Gravis UltraSound to just work. Signed-off-by: René Rebe <rene@exactco.de> [ rjw: Subject edits ] Link: https://patch.msgid.link/20251118.145942.1445519082574147037.rene@exactco.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-18 17:35:36 +01:00
Thorsten Blum	cdf5ecc3f6	EDAC/ghes: Replace deprecated strcpy() in ghes_edac_report_mem_error() strcpy() has been deprecated¹ because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Use the safer strscpy() instead. ¹ https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://patch.msgid.link/20251118135621.101148-2-thorsten.blum@linux.dev	2025-11-18 16:50:32 +01:00
Chengkaitao	9d3faec60b	genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier() Since irq_set_affinity_notifier() may sleep, interrupts are enabled. So raw_spinlock_irqsave() can be replaced with raw_spinlock_irq(). Signed-off-by: Chengkaitao <chengkaitao@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251118012754.61805-1-pilgrimtao@gmail.com	2025-11-18 16:19:40 +01:00
Dan Carpenter	80adaccf0e	rseq: Delete duplicate if statement in rseq_virt_userspace_exit() This if statement is indented weirdly. It's a duplicate and doesn't affect runtime (beyond wasting a little time). Delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/aRxP3YcwscrP1BU_@stanley.mountain	2025-11-18 15:56:55 +01:00
Rafael J. Wysocki	b20a374902	cpufreq: intel_pstate: Eliminate some code duplication To eliminate some code duplication from the intel_pstate driver, move the core_get_val() function body to a new function called get_perf_ctl_val() and make both core_get_val() and atom_get_val() invoke it to carry out the same computation. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2829273.mvXUDI8C0e@rafael.j.wysocki	2025-11-18 15:51:31 +01:00
Kaushlendra Kumar	58075aec92	powercap: intel_rapl: Add support for Nova Lake processors Add RAPL support for Intel Nova Lake and Nova Lake L processors using the core defaults configuration. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Subject and changelog edits, rebase ] Link: https://patch.msgid.link/20251028101814.3482508-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-18 15:39:29 +01:00
Christophe Leroy	4322c8f81c	lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required Properly use masked_user_read_access_begin() and masked_user_write_access_begin() instead of masked_user_access_begin() in order to match user_read_access_end() and user_write_access_end(). This is important for architectures like PowerPC that enable separately user reads and user writes. That means masked_user_read_access_begin() is used when user memory is exclusively read during the window and masked_user_write_access_begin() is used when user memory is exclusively writen during the window. masked_user_access_begin() remains and is used when both reads and writes are performed during the open window. Each of them is expected to be terminated by the matching user_read_access_end(), user_write_access_end() and user_access_end(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/cb5e4b0fa49ea9c740570949d5e3544423389757.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:35 +01:00
Christophe Leroy	1c204914bc	scm: Convert put_cmsg() to scoped user access Replace the open coded implementation with the scoped user access guard. That also corrects the imbalance between masked_user_access_begin() and user_write_access_end(), which would affect PowerPC when it gains masked user access support. No functional change intended. [ tglx: Amend change log ] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/793219313f641eda09a892d06768d2837246bf9f.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Christophe Leroy	803abedbd5	iov_iter: Add missing speculation barrier to copy_from_user_iter() The results of "access_ok()" can be mis-speculated. The result is that the CPU can end speculatively: if (access_ok(from, size)) // Right here For the same reason as done in copy_from_user() in commit `74e19ef0ff` ("uaccess: Add speculation barrier to copy_from_user()"), add a speculation barrier to copy_from_user_iter(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/6b73e69cc7168c89df4eab0a216e3ed4cca36b0a.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Christophe Leroy	4db1df7a72	iov_iter: Convert copy_from_user_iter() to masked user access copy_from_user_iter() lacks a speculation barrier, which will degrade performance on some architecture like x86, which would be unfortunate as copy_from_user_iter() is a critical hotpath function. Convert copy_from_user_iter() to using masked user access on architecture that support it. This allows to add the speculation barrier without impacting performance. This is similar to what was done for copy_from_user() in commit `0fc810ae3a` ("x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()") [ tglx: Massage change log ] Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/58e4b07d469ca68a2b9477fe2c1ccc8a44cef131.1763396724.git.christophe.leroy@csgroup.eu	2025-11-18 15:27:34 +01:00
Mark Brown	a0245b42f8	kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace On a system which support SME but not SVE we can now disable streaming mode via ptrace by writing FPSIMD formatted data through NT_ARM_SVE with a VL of 0. Extend fp-ptrace to cover rather than skip these cases, relax the check for SVE writes of FPSIMD format data to not skip if SME is supported and accept 0 as the VL when performing the ptrace write. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Mark Brown	eb9df6d69a	kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems In order to allow exiting streaming mode on systems with SME but not SVE we allow writes of FPSIMD format data via NT_ARM_SVE even when SVE is not supported, add a test case that covers this to sve-ptrace. We do not support reads. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Mark Brown	472800cd5e	arm64/sme: Support disabling streaming mode via ptrace on SME only systems Currently it is not possible to disable streaming mode via ptrace on SME only systems, the interface for doing this is to write via NT_ARM_SVE but such writes will be rejected on a system without SVE support. Enable this functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such writes currently error since we require that a vector length is specified which should minimise the risk that existing software is relying on current behaviour. Reads are not supported since I am not aware of any use case for this and there is some risk that an existing userspace application may be confused if it reads NT_ARM_SVE on a system without SVE. Existing kernels will return FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not stored, for example if the task has not used SVE. Returning a vector length of 0 would create a risk that software would try to do things like allocate space for register state with zero sizes, while returning a vector length of 128 bits would look like SVE is supported. It seems safer to just not make the changes to add read support. It remains possible for userspace to detect a SME only system via the ptrace interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in non-streaming mode is available via REGSET_FPR. sve_set_common() already avoids allocating SVE storage when doing a FPSIMD formatted write and allocating SME storage when doing a NT_ARM_SVE write so we change the function to validate the new case and skip setting a vector length for it. The aim is to make a minimally invasive change, no operation that would previously have succeeded will be affected, and we use a previously defined interface in new circumstances rather than define completely new ABI. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: David Spickett <david.spickett@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-17 20:11:54 +00:00
Sunday Adelodun	46fc75a29b	PM: hibernate: Clean up kernel-doc comment style usage Several static functions in kernel/power/swap.c were described using the kernel-doc comment style (/** ... */) even though they are not exported or referenced by generated documentation. This led to kernel-doc warnings and stylistic inconsistencies. Convert these unnecessary kernel-doc blocks to regular C comments, remove comment blocks that are no longer useful, relocate comments to more appropriate positions where needed, and fix a few "Return:" descriptions that were either missing or incorrectly formatted. No functional changes. Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> [ rjw: Subject adjustment, changelog edits, comment edits ] Link: https://patch.msgid.link/20251114220438.52448-1-adelodunolaoluwa@yahoo.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-17 20:16:56 +01:00
Rafael J. Wysocki	37d6d92fe0	Merge back earlier material related to system sleep for 6.19	2025-11-17 16:55:55 +01:00
Peter Oberparleiter	2a2153a2ba	s390/debug: Update description of resize operation With commit `1204777867` ("s390/debug: keep debug data on resize") the behavior of a debug area resize operation was changed. Update the associated documentation to reflect this change. Fixes: `1204777867` ("s390/debug: keep debug data on resize") Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:55:16 +01:00
Heiko Carstens	0d79affa31	Merge branch 'compat-removal' Heiko Carstens says: ==================== Remove s390 compat support to allow for code simplification and especially reduced test effort. To the best of our knowledge there aren't any 31 bit binaries out in the world anymore that would matter for newer kernels or newer distributions. Distributions do not provide compat packages since quite some time or even have CONFIG_COMPAT disabled. Instead of adding deprecation warnings to config option, or adding kernel messages, just remove the code. Deprecation warnings haven't proven to be useful. If it turns out there is still a reason to keep the compat support this series can be reverted at any time in the future. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:11:48 +01:00
Heiko Carstens	4ac286c4a8	s390/syscalls: Switch to generic system call table generation The s390 syscall.tbl format differs slightly from most others, and therefore requires an s390 specific system call table generation script. With compat support gone use the opportunity to switch to generic system call table generation. The abi for all 64 bit system calls is now common, since there is no need to specify if system call entry points are only for 64 bit anymore. Furthermore create the system call table in C instead of assembler code in order to get type checking for all system call functions contained within the table. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:39 +01:00
Heiko Carstens	f4e1f1b137	s390/syscalls: Remove system call table pointer from thread_struct With compat support gone there is only one system call table left. Therefore remove the sys_call_table pointer from thread_struct and use the sys_call_table directly. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:39 +01:00
Heiko Carstens	3db5cf9354	s390/uapi: Remove 31 bit support from uapi header files Since the kernel does not support running 31 bit / compat binaries anymore, remove also the corresponding 31 bit support from uapi header files. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	8e0b986c59	s390: Remove compat support There shouldn't be any 31 bit code around anymore that matters. Remove the compat layer support required to run 31 bit code. Reason for removal is code simplification and reduced test effort. Note that this comes without any deprecation warnings added to config options, or kernel messages, since most likely those would be ignored anyway. If it turns out there is still a reason to keep the compat layer this can be reverted at any time in the future. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	169ebcbb90	tools: Remove s390 compat support Remove s390 compat support from everything within tools, since s390 compat support will be removed from the kernel. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Weißschuh <linux@weissschuh.net> # tools/nolibc selftests/nolibc Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> # selftests/vDSO Acked-by: Alexei Starovoitov <ast@kernel.org> # bpf bits Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	7afb095df3	s390/syscalls: Add pt_regs parameter to SYSCALL_DEFINE0() syscall wrapper All system call wrappers should match the sys_call_ptr_t type. This is not the case for system calls without parameters. Add the missing pt_regs parameter there too. Note: this is currently not a problem, since the parameter is unused. However it prevents to create a correctly typed system call table in C. With the current assembler implementation this works because of missing type checking. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:38 +01:00
Heiko Carstens	b2da5f6400	s390/kvm: Use psw32_t instead of psw_compat_t kvm_s390_handle_lpsw() make use of the psw_compat_t type even though the code has nothing to do with CONFIG_COMPAT, for which the type is supposed to be used. Use psw32_t instead. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:37 +01:00
Heiko Carstens	8c633c78c2	s390/ptrace: Rename psw_t32 to psw32_t Use a standard "_t" suffix for psw_t32 and rename it to psw32_t. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-17 11:10:37 +01:00
Sean Christopherson	f2f22721ac	x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute Use the exact enum name when documenting "enum sgx_attribute" to fix a warning if the file is fed into kernel-doc processing: WARNING: ./arch/x86/include/asm/sgx.h:139 expecting prototype for enum sgx_attributes. Prototype was for enum sgx_attribute instead Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-6-seanjc%40google.com	2025-11-14 15:30:32 -08:00
Sean Christopherson	55bf13b612	x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h Drop an asterisk from a file-level copyright comment so that the comment isn't intrepeted as a kernel-doc comment. E.g. if arch/x86/include/asm/sgx.h is fed into kernel-doc processing: WARNING: ./arch/x86/include/asm/sgx.h:2 This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-5-seanjc%40google.com	2025-11-14 15:30:28 -08:00
Sean Christopherson	905885fdb1	x86/sgx: Document structs and enums with '@', not '%' Use '@' to document structure members and enum values in kernel-doc markup, as per Documentation/doc-guide/kernel-doc.rst and flagged by make htmldocs. WARNING: arch/x86/include/uapi/asm/sgx.h:17 Enum value 'SGX_PAGE_MEASURE' not described in enum 'sgx_page_flags' Opportunistically add a missing ':' for SGX_CHILD_PRESENT. Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-4-seanjc%40google.com	2025-11-14 15:30:26 -08:00
Sean Christopherson	243ea511fe	x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler Add kernel-doc markup for the register parameters passed by the vDSO blob to the user handler to suppress build warnings, e.g. WARNING: arch/x86/include/uapi/asm/sgx.h:157 function parameter 'r8' not described in 'sgx_enclave_user_handler_t' Call out that except for RSP, the registers are undefined on asynchronous exits as far as the vDSO ABI is concerned. E.g. the vDSO's exception handler clobbers RDX, RDI, and RSI, and the kernel doesn't guarantee that R8 or R9 will be zero (the synthetic value loaded by the CPU). Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-3-seanjc%40google.com	2025-11-14 15:30:22 -08:00
Sean Christopherson	75801ca620	x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run" Add a missing ':' for the description of sgx_enclave_run.reserved so that documentation for the member is correctly generated: WARNING: arch/x86/include/uapi/asm/sgx.h:184 struct member 'reserved' not described in 'sgx_enclave_run' Closes: https://lore.kernel.org/all/20251106145506.145fc620@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112160708.1343355-2-seanjc%40google.com	2025-11-14 15:30:13 -08:00
Rafael J. Wysocki	07f42f8290	PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR() Use new PM_RUNTIME_ACQUIRE() and PM_RUNTIME_ACQUIRE_ERR() wrapper macros to make the code look more straightforward. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> [ rjw; Typo fix in the changelog ] Link: https://patch.msgid.link/3932581.kQq0lBPeGt@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 21:25:15 +01:00
Rafael J. Wysocki	70dcad3400	ACPI: TAD: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR() Use new PM_RUNTIME_ACQUIRE() and PM_RUNTIME_ACQUIRE_ERR() wrapper macros to make the code look more straightforward. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> [ rjw: Typo fix in the changelog ] Link: https://patch.msgid.link/2040585.PYKUYFuaPT@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 21:25:15 +01:00
Rafael J. Wysocki	ef8057b07c	PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR() Add wrapper macros for ACQUIRE()/ACQUIRE_ERR() and runtime PM usage counter guards introduced recently: pm_runtime_active_try, pm_runtime_active_auto_try, pm_runtime_active_try_enabled, and pm_runtime_active_auto_try_enabled. The new macros should be more straightforward to use. For example, they can be used for rewriting a piece of code like below: ACQUIRE(pm_runtime_active_try, pm)(dev); if ((ret = ACQUIRE_ERR(pm_runtime_active_try, &pm))) return ret; in the following way: PM_RUNTIME_ACQUIRE(dev, pm); if ((ret = PM_RUNTIME_ACQUIRE_ERR(&pm))) return ret; If the original code does not care about the specific error code returned when attepmting to resume the device: ACQUIRE(pm_runtime_active_try, pm)(dev); if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) return -ENXIO; it may be changed like this: PM_RUNTIME_ACQUIRE(dev, pm); if (PM_RUNTIME_ACQUIRE_ERR(&pm)) return -ENXIO; Link: https://lore.kernel.org/linux-pm/5068916.31r3eYUQgx@rafael.j.wysocki/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/3400866.aeNJFYEL58@rafael.j.wysocki	2025-11-14 21:24:54 +01:00
Thomas Weißschuh	308bc2e338	selftests/timers/nanosleep: Add tests for return of remaining time If interrupted by a signal clock_nanosleep() returns the remaining time into the structure pointed to by the rmtp parameter. So far this functionality was not tested by the timer selftests. Extend the nanosleep selftest to cover this feature. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251106-nanosleep-rtmp-selftest-v1-1-f9212fb295fe@linutronix.de	2025-11-14 20:34:50 +01:00
Wake Liu	05d89fe7e4	selftests/timers: Clean up kernel version check in posix_timers Several tests in the posix_timers selftest which test timer behavior related to SIG_IGN fail on kernels older than 6.13. This is due to a refactoring of signal handling in commit `caf77435dd` ("signal: Handle ignored signals in do_sigaction(action != SIG_IGN)"). A previous attempt to fix this by adding a kernel version check to each of the nine affected tests was suboptimal, as it resulted in emitting the same skip message nine times. Following the suggestion from Thomas Gleixner, this is refactored to perform a single version check in main(). To satisfy the kselftest framework's requirement for the test count to match the declared plan, the plan is now conditionally set to 10 (for older kernels) or 19. While setting the plan conditionally may seem complex, it is the better approach to avoid the alternatives: either running tests on unsupported kernels that are known to fail, or emitting a noisy series of nine identical skip messages. A single informational message is now printed instead when the tests are skipped. Signed-off-by: Wake Liu <wakel@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250807085042.1690931-1-wakel@google.com/ Link: https://patch.msgid.link/20251103114502.584940-1-wakel@google.com	2025-11-14 20:34:50 +01:00
Jianyun Gao	4518767be9	time: Fix a few typos in time[r] related code comments Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250927093411.1509275-1-jianyungao89@gmail.com	2025-11-14 20:34:50 +01:00
Borislav Petkov (AMD)	e67997021f	x86/bugs: Get rid of the forward declarations Get rid of the forward declarations of the mitigation functions by moving their single caller below them. No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20251105200447.GBaQut3w4dLilZrX-z@fat_crate.local	2025-11-14 20:32:21 +01:00
Sunday Adelodun	e54dd0474c	time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc Several functions in kernel/time/tick-oneshot.c are missing parameter and return value descriptions in their kernel-doc comments. This causes warnings during doc generation. Update the kernel-doc blocks to include detailed @param and Return: descriptions for better clarity and to fix kernel-doc warnings. No functional code changes are made. Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251106113938.34693-3-adelodunolaoluwa@yahoo.com	2025-11-14 20:17:44 +01:00
Srinivas Pandruvada	3402bc010d	Documentation: thermal: Document thermal throttling on Intel platforms Add documentation for Intel thermal throttling reporting events. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Subject adjustment, file name change, minor edits ] Link: https://patch.msgid.link/20251113212104.221632-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 17:22:45 +01:00
Riwen Lu	a10ad1b104	PM: suspend: Make pm_test delay interruptible by wakeup events Modify the suspend_test() function to allow the test delay to be interrupted by wakeup events. This improves the responsiveness of the system during suspend testing when wakeup events occur, allowing the suspend process to proceed without waiting for the full test delay to complete when wakeup events are detected. Additionally, using msleep() instead of mdelay() avoids potential soft lockup "CPU stuck" issues when long test delays are configured. Co-developed-by: xiongxin <xiongxin@kylinos.cn> Signed-off-by: xiongxin <xiongxin@kylinos.cn> Signed-off-by: Riwen Lu <luriwen@kylinos.cn> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251113012638.1362013-1-luriwen@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 17:09:16 +01:00
Mario Limonciello (AMD)	7b9725b3d1	usb: sl811-hcd: Add PM_EVENT_POWEROFF into suspend callbacks When the PM core uses hibernation callbacks for shutdown drivers will receive PM_EVENT_POWEROFF and should handle it the same as PM_EVENT_HIBERNATE would have been used. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> [ rjw: Changelog adjustment ] Link: https://patch.msgid.link/20251112224025.2051702-4-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 17:05:53 +01:00
Mario Limonciello (AMD)	988dd0bd91	scsi: Add PM_EVENT_POWEROFF into suspend callbacks If the PM core uses hibernation callbacks for powering off the system, drivers will receive PM_EVENT_POWEROFF and should handle it the same as they previously handled PM_EVENT_HIBERNATE. Support this case in the scsi driver. No functional changes. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20251112224025.2051702-3-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 17:05:53 +01:00
Mario Limonciello (AMD)	0ca04993da	PM: Introduce new PMSG_POWEROFF event PMSG_POWEROFF will be used for the PM core to allow differentiating between a hibernation or shutdown sequence when re-using callbacks for common code. Hibernation is started by writing a hibernation method (such as 'platform' 'shutdown', or 'reboot') to use into /sys/power/disk and writing 'disk' to /sys/power/state. Shutdown is initiated with the reboot() syscall with arguments on whether to halt the system or power it off. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20251112224025.2051702-2-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-14 17:05:53 +01:00
Rafael J. Wysocki	bdfacf441b	Merge back earlier runtime PM changes for 6.19	2025-11-14 16:56:40 +01:00
Thomas Weißschuh	4702f4eceb	hrtimer: Store time as ktime_t in restart block The hrtimer core uses ktime_t to represent times, use that also for the restart block. CPU timers internally use nanoseconds instead of ktime_t but use the same restart block, so use the correct accessors for those. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251110-restart-block-expiration-v1-3-5d39cc93df4f@linutronix.de	2025-11-14 16:31:19 +01:00
Rafael J. Wysocki	b54df61c74	cpuidle: governors: teo: Decay metrics below DECAY_SHIFT threshold If a given governor metric falls below a certain value (8 for DECAY_SHIFT equal to 3), it will not decay any more due to the simplistic decay implementation. This may in some cases lead to subtle inconsistencies in the governor behavior, so change the decay implementation to take it into account and set the metric at hand to 0 in that case. Suggested-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2819353.mvXUDI8C0e@rafael.j.wysocki	2025-11-14 15:20:01 +01:00
Rafael J. Wysocki	8f3f01082d	cpuidle: governors: teo: Use s64 consistently in teo_update() Two local variables in teo_update() are defined as u64, but their values are then compared with s64 values, so it is more consistent to use s64 as their data type. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3026616.e9J7NaK4W3@rafael.j.wysocki	2025-11-14 15:20:01 +01:00
Rafael J. Wysocki	17673f64a0	cpuidle: governors: teo: Drop redundant function parameter The last no_poll parameter of teo_find_shallower_state() is always false, so drop it. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2253109.irdbgypaU6@rafael.j.wysocki	2025-11-14 15:20:01 +01:00
Rafael J. Wysocki	a03b201180	cpuidle: governors: teo: Drop misguided target residency check When the target residency of the current candidate idle state is greater than the expected time till the closest timer (the sleep length), it does not matter whether or not the tick has already been stopped or if it is going to be stopped. The closest timer will trigger anyway at its due time, so if an idle state with target residency above the sleep length is selected, energy will be wasted and there may be excess latency. Of course, if the closest timer were canceled before it could trigger, a deeper idle state would be more suitable, but this is not expected to happen (generally speaking, hrtimers are not expected to be canceled as a rule). Accordingly, the teo_state_ok() check done in that case causes energy to be wasted more often than it allows any energy to be saved (if it allows any energy to be saved at all), so drop it and let the governor use the teo_find_shallower_state() return value as the new candidate idle state index. Fixes: `21d28cd2fa` ("cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/5955081.DvuYhMxLoT@rafael.j.wysocki	2025-11-14 15:19:39 +01:00
Heiko Carstens	52a1f73d17	s390/fault: Print unmodified PSW address on protection exception In case of a kernel crash caused by a protection exception, print the unmodified PSW address as reported by the CPU. The protection exception handler modifies the PSW address in order to keep fault handling easy, however that leads to misleading call traces. Therefore restore the original PSW address before printing it. Before this change the output in case of a protection exception looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40d78 (sysrq_handle_crash+0x28/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40d66: e3e0f0980024 stg %r14,152(%r15) 000003ffe0b40d6c: c010fffffff2 larl %r1,000003ffe0b40d50 #000003ffe0b40d72: c0200046b6bc larl %r2,000003ffe1417aea >000003ffe0b40d78: 92021000 mvi 0(%r1),2 000003ffe0b40d7c: c0e5ffae03d6 brasl %r14,000003ffe0101528 With this change it looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40dfc (sysrq_handle_crash+0x2c/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40dec: c010fffffff2 larl %r1,000003ffe0b40dd0 000003ffe0b40df2: c0200046b67c larl %r2,000003ffe1417aea 000003ffe0b40df8: 92021000 mvi 0(%r1),2 >000003ffe0b40dfc: c0e5ffae03b6 brasl %r14,000003ffe0101568 000003ffe0b40e02: 0707 bcr 0,%r7 Note that with this change the PSW address points to the instruction behind the instruction which caused the exception like it is expected for protection exceptions. This also replaces the '#' marker in the disassembly with '', which allows to distinguish between new and old behavior. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Heiko Carstens	a603a00399	s390/uprobes: Use __forward_psw() instead of private implementation With adjust_psw_addr() the uprobes code contains more or less a private __forward_psw() implementation. Switch it to use __forward_psw(), and remove adjust_psw_addr(). Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Heiko Carstens	37450e0994	s390/processor: Add __forward_psw() helper Similar to __rewind_psw() add the counter part __forward_psw(). This helps to make code more readable if a PSW address has to be forwarded, since it is more natural to write addr = __forward_psw(psw, ilen); instead of addr = __rewind_psw(psw, -ilen); This renames also the ilc parameter of __rewind_psw() to ilen, since the parameter reflects an instruction length, and not an instruction length code. Also change the type of ilen from unsigned long to long so it reflects that lengths can be negative or positive. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:28 +01:00
Aleksei Nikiforov	14e4e4175b	s390/fpu: Fix false-positive kmsan report in fpu_vstl() A false-positive kmsan report is detected when running ping command. An inline assembly instruction 'vstl' can write varied amount of bytes depending on value of 'index' argument. If 'index' > 0, 'vstl' writes at least 2 bytes. clang generates kmsan write helper call depending on inline assembly constraints. Constraints are evaluated compile-time, but value of 'index' argument is known only at runtime. clang currently generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan function to indicate correct amount of bytes written and fix false-positive report. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty #16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty #16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: `dcd3e1de9d` ("s390/checksum: provide csum_partial_copy_nocheck()") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Thomas Richter	d17901e8e8	s390/pai: Calculate size of reserved PAI extension control block area The PAI extension 1 control block area is 512 bytes in total. It currently contains three address pointer which refer to counter memory blocks followed by a reserved area. Calculate the reserved area instead of hardcoding its size. This makes the code more readable and maintainable. No functional chance. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Suggested-by: Jan Polensky <japo@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	b60d126c8e	s390/mm: Let dump_fault_info() print additional information Let dump_fault_info() print additional information to make debugging easier: Print "FSI" if the access-exception-fetch/store-indication facility is installed. If it is installed the TEID may also indicate if an exception happened because of a fetch or a store operation. Print "SOP", "ESOP-1", or "ESOP-2" depending on the type of the installed Suppression-on-Protection facility. This also gives additional information about the validity and meaning of the TEID bits. The output is changed from something like: Failing address: 0000000000000000 TEID: 0000000000000803 to Failing address: 0000000000000000 TEID: 0000000000000803 ESOP-2 FSI Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	76502abca2	s390/mm: Change comment and die() message if teid.b61 is zero The comments in do_protection() give the impression that a TEID, where bit 61 is zero, indicates a low address protection exception. This is not necessarily true, and it depends on the type of Suppression-on-Protection facility of the machine (see Princples of Operation) what this means. Rework the comments and the die() message to reflect this. This may also help to avoid confusion. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	02310adcc6	s390/mm: Remove unused flush_tlb() flush_tlb() exists for historic reasons and was never used. Remove it. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:34:27 +01:00
Heiko Carstens	f518d469fe	Merge branch 'pai-pmu-merge' Thomas Richter says: ==================== The PAI PMUs pai_crypto and pai_ext both operate on memory mapped counters supported by z16 and follow on machines. These memory mapped counters have a lot in common, like: - validation, installing and removing events - starting and stopping events - retrieving counter values - collecting sample data. However both PMU drivers have slightly different parameters, for example: - different mapped memory size - different number of supported counters - different counter numbers and names - different bits in the CR0 register - different anchor address in lowcore Due to these different parameters, two independent PMUs have been developed. However both PMU drivers have very much in common and most of the PMU call back functions look very similar and are sometimes identical. This patch set combines both independent PMU device drivers perf_pai_crypto.c and per_pai_ext.c into one device driver. The new device driver operations on a table which contains the different parameters and uses common functions for event operations. Result is one PAI PMU driver which supports both PMUs. It is also extendable to support new PAI PMUs. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:50 +01:00
Thomas Richter	492578d3a2	s390/pai: Rename perf_pai_crypto.c to perf_pai.c Rename perf_pai_crypto.c to perf_pai.c. The new perf_pai.c contains both PAI device drivers: - pai_crypto for PAI crypto counter set - pai_ext for PAI NNPA counter set The rename reflects this common driver supporting both PMUs. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:08 +01:00
Thomas Richter	8b65b0ba35	s390/pai_crypto: Merge pai_ext PMU into pai_crypto Combine PAI cryptography and PAI extension (NNPA) PMUs in one driver. Remove file perf_pai_ext.c and registration of PMU "pai_ext" from perf_pai_crypto.c. Includes: - Shared alloc/free and sched_task handling - NNPA events with exclude_kernel enforced, exclude_user rejected - Setup CR0 bits for both PMUs Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	3abb6b1675	s390/pai_crypto: Introduce PAI crypto specific event delete function Introduce PAI crypto specific event delete function to handle additional actions to be done at event removal. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	35a27bad07	s390/pai_crypto: Make pai_root per-PMU and unify naming Prepare the common PAI PMU driver to handle multiple PMUs. Convert pai_root into an array indexed by PAI_PMU_IDX(event) so that per-CPU state becomes per-PMU. Adjust all call sites accordingly. Rename KMSG_COMPONENT and the s390dbf buffer from "pai_crypto" to "pai" for consistent naming. No functional change intended beyond log identifiers. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	f124735413	s390/pai_crypto: Rename paicrypt_copy() to pai_copy() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_copy() to pai_copy() to indicate its common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	42e6a0f6d2	s390/pai_crypto: Add common pai_del() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_stop() for the event on a CPU. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	ac03223f07	s390/pai_crypto: Add common pai_stop() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_stop() for the event on a CPU. Call this common pai_stop() from paicrypt_del(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:07 +01:00
Thomas Richter	a65a4d7e80	s390/pai_crypto: Add common pai_add() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_add() for the event on a CPU. Call this common pai_add() from paicrypt_add(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	6fe66b2157	s390/pai_crypto: Add common pai_start() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_start() to the event on a CPU. The function expects a PAI PMU specific read function as second parameter to read out the start value for an event. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	8f6116fd49	s390/pai_crypto: Add common pai_read() function To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Add a common usable function pai_read() to read counter values. The function expects a PAI PMU specific read function as second parameter. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	74466e87e7	s390/pai_crypto: Unify sample push logic and update context handling Unify naming and logic for PAI PMU drivers to support both PMUs pai_crypto and pai_ext. Rename paicrypt_push_sample() to pai_push_sample() to reflect its common usage. Add detailed comments about invocation context and scheduler callbacks. Use struct pai_pmu to determine area_size instead of PAGE_SIZE for counter backup. Remove obsolete variable paicrypt_cnt. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	0f1c0d754a	s390/pai_crypto: Rename paicrypt_have_samples() to pai_have_samples() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_have_samples() to pai_have_samples() to reflect its common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	360e180d8b	s390/pai_crypto: Rename paicrypt_getctr() to pai_getctr() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_getctr() to pai_getctr() to reflect is common purpose. pai_getctr() now uses pai_pmu table to extract PAI PMU characteristics such as kernel_offset inside the counter area page. Also rename paicrypt_have_sample() to pai_have_sample(). Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:06 +01:00
Thomas Richter	42cd0c8242	s390/pai_crypto: Rename paicrypt_getdata() to pai_getdata() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename paicrypt_getdata() to pai_getdata(). Use the PAI PMU characteristics in the pai_pmu table to determine the number of counters to be extracted. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	65b9831bd3	s390/pai_crypto: Rename some function for common usage. To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename functions - paicrypt_free() -> pai_free() - paicrypt_destroy_event() -> pai_destroy_event() - paicrypt_destroy_event_cpu() -> pai_destroy_event_cpu() to reflect their future common usage. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	413957980a	s390/pai_crypto: Introduce generic event init using pai_pmu[] To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rework PAI crypto event initialization. Add a common function for event initialization. It uses the PAI characteristics stored in the pai_pmu table instead of hardcoded values. Enlarge pai_event_valid() to check all event validation aspects. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	a3f8423622	s390/pai_crypto: Add PAI crypto characteristics table for parameters Create and add a PMU characteristics table to store the parameters of the PAI crypto PMU. This table contains PMU details such as - number of available counters - name of these counters to export to /sysfs - Size of the memory mapped counter area - base number of first counter - etc Also define a PMU specific initialization function to be called when a PAI PMU feature is supported. At device driver initialization test these features and if available use instruction qpaci to retrieve the number of available counters. Also export these counter names to /sysfs and register this PMU. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	387c7b5f04	s390/pai_crypto: Rename paicrypt_root_alloc() and paicrypt_root_free() To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename functions paicrypt_root_alloc() and paicrypt_root_free() to pai_root_alloc() and pai_root_free(). No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:05 +01:00
Thomas Richter	3f082c2e47	s390/pai_crypto: Rename structure paicrypt_root To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_root to pai_root. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	a626e0d46a	s390/pai_crypto: Rename structure paicrypt_map to pai_map To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_map to pai_map. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	2706ea193a	s390/pai_crypto: Rename structure paicrypt_mapptr to pai_mapptr To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rename structure paicrypt_mapptr to pai_mapptr. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	c124208b74	s390/pai_crypto: Rename member paicrypt_map::page Rename member page in struct paicrypt_map to area. This rename creates consistent naming for both PMU drivers paicrypto and PMU paiext. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Thomas Richter	abc524caa1	s390/pai_crypto: Rename variable cfm_dbg The global variable cfm_dbg points to the s390dbf debug buffer. Rename it to paidbg to better reflect its purpose. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-14 11:30:04 +01:00
Sascha Bischoff	a04fbfb8a1	arm64/sysreg: Add ICH_VMCR_EL2 Add the ICH_VMCR_EL2 register, which is required for the upcoming GICv5 KVM support. This register has two different field encodings, based on if it is used for GICv3 or GICv5-based VMs. The GICv5-specific field encodings are generated with a FEAT_GCIE prefix. This register is already described in the GICv3 KVM code directly. This will be ported across to use the generated encodings as part of an upcoming change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	a0b130eedd	arm64/sysreg: Move generation of RES0/RES1/UNKN to function The RESx and UNKN define generation happens in two places (EndSysreg and EndSysregFields), and was using nearly identical code. Split this out into a function, and call that instead, rather then keeping the dupliated code. There are no changes to the generated sysregs as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	fe2ef46995	arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor Some system register field encodings change based on, for example the in-use architecture features, or the context in which they are accessed. In order to support these different field encodings, introduce the Prefix descriptor (Prefix, EndPrefix) for describing such sysregs. The Prefix descriptor can be used in the following way: Sysreg EXAMPLE 0 1 2 3 4 Prefix FEAT_A Field 63:0 Foo EndPrefix Prefix FEAT_B Field 63:1 Bar Res0 0 EndPrefix Field 63:0 Baz EndSysreg This will generate a single set of system register encodings (REG_, SYS_, ...), and then generate three sets of field definitions for the system register called EXAMPLE. The first set is prefixed by FEAT_A, e.g. FEAT_A_EXAMPLE_Foo. The second set is prefixed by FEAT_B, e.g., FEAT_B_EXAMPLE_Bar. The third set is not given a prefix at all, e.g. EXAMPLE_BAZ. For each set, a corresponding set of defines for Res0, Res1, and Unkn is generated. The intent for the final prefix-less fields is to describe default or legacy field encodings. This ensure that prefixed encodings can be added to already-present sysregs without affecting existing legacy code. Prefixed fields must be defined before those without a prefix, and this is checked by the generator. This ensures consisnt ordering within the sysregs definitions. The Prefix descriptor can be used within Sysreg or SysregFields blocks. Field, Res0, Res1, Unkn, Rax, SignedEnum, Enum can all be used within a Prefix block. Fields and Mapping can not. Fields that vary with features must be described as part of a SysregFields block, instead. Mappings, which are just a code comment, make little sense in this context, and have hence not been included. There are no changes to the generated system register definitions as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Sascha Bischoff	0aab5772a5	arm64/sysreg: Fix checks for incomplete sysreg definitions The checks for incomplete sysreg definitions were checking if the next_bit was greater than 0, which is incorrect and missed occasions where bit 0 hasn't been defined for a sysreg. The reason is that next_bit is -1 when all bits have been processed (LSB - 1). Change the checks to use >= 0, instead. Also, set next_bit in Mapping to -1 instead of 0 to match these new checks. There are no changes to the generated sysreg definitons as part of this change, and conveniently no definitions lack definitions for bit 0. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:09:46 +00:00
Yang Shi	37cb0aab90	arm64: mm: make linear mapping permission update more robust for patial range The commit `fcf8dda8cc` ("arm64: pageattr: Explicitly bail out when changing permissions for vmalloc_huge mappings") made permission update for partial range more robust. But the linear mapping permission update still assumes update the whole range by iterating from the first page all the way to the last page of the area. Make it more robust by updating the linear mapping permission from the page mapped by start address, and update the number of numpages. Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 18:07:21 +00:00
Dev Jain	c320dbb7c8	arm64/mm: Elide TLB flush in certain pte protection transitions Currently arm64 does an unconditional TLB flush in mprotect(). This is not required for some cases, for example, when changing from PROT_NONE to PROT_READ \| PROT_WRITE (a real usecase - glibc malloc does this to emulate growing into the non-main heaps), and unsetting uffd-wp in a range. Therefore, implement pte_needs_flush() for arm64, which is already implemented by some other arches as well. Running a userspace program changing permissions back and forth between PROT_NONE and PROT_READ \| PROT_WRITE, and measuring the average time taken for the none->rw transition, I get a reduction from 3.2 microseconds to 2.85 microseconds, giving a 12.3% improvement. Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 17:50:25 +00:00
Linu Cherian	1b214452b6	arm64/mm: Rename try_pgd_pgtable_alloc_init_mm With BUG_ON in pgd_pgtable_alloc_init_mm moved up to higher layer, gfp flags is the only difference between try_pgd_pgtable_alloc_init_mm and pgd_pgtable_alloc_init_mm. Hence rename the "try" version to pgd_pgtable_alloc_init_mm_gfp. Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Linu Cherian <linu.cherian@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 16:00:19 +00:00
Chaitanya S Prakash	bfc184cb1b	arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arch_add_memory() is used to hotplug memory into a system but as a part of its implementation it calls __create_pgd_mapping(), which uses pgtable_alloc() in order to build intermediate page tables. As this path was initally only used during early boot pgtable_alloc() is designed to BUG_ON() on failure. However, in the event that memory hotplug is attempted when the system's memory is extremely tight and the allocation were to fail, it would lead to panicking the system, which is not desirable. Hence update __create_pgd_mapping and all it's callers to be non void and propagate -ENOMEM on allocation failure to allow system to fail gracefully. But during early boot if there is an allocation failure, we want the system to panic, hence create a wrapper around __create_pgd_mapping() called early_create_pgd_mapping() which is designed to panic, if ret is non zero value. All the init calls are updated to use this wrapper rather than the modified __create_pgd_mapping() to restore functionality. Fixes: `4ab2150615` ("arm64: Add memory hotplug support") Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Signed-off-by: Linu Cherian <linu.cherian@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 16:00:19 +00:00
Anshuman Khandual	b0a3f0e894	arm64/sysreg: Replace TCR_EL1 field macros This just replaces all used TCR_EL1 field macros with tools sysreg variant based fields and subsequently drops them from the header (pgtable-hwdef.h), although while retaining the ones used for KVM (represented via the sysreg tools format). Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-13 15:58:30 +00:00
Xianwei Zhao	fc584d871c	irqchip/meson-gpio: Add support for Amlogic S6 S7 and S7D SoCs The Amlogic S6/S7/S7D SoCs support GPIO interrupt lines: S6 IRQ Number: - 99:98 2 pins on bank CC - 97 1 pin on bank TESTN - 96:81 16 pins on bank A - 80:65 16 pins on bank Z - 64:45 20 pins on bank X - 44:37 8 pins on bank H offs H1 - 36:32 5 pins on bank F - 31:25 7 pins on bank D - 24:22 3 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B S7 IRQ Number: - 83:82 2 pins on bank CC - 81 1 pin on bank TESTN - 80:68 13 pins on bank Z - 67:48 20 pins on bank X - 47:36 12 pins on bank H - 35:24 12 pins on bank D - 23:22 2 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B S7D IRQ Number: - 83:82 2 pins on bank CC - 81:75 7 pins on bank DV - 74 1 pin on bank TESTN - 73:61 13 pins on bank Z - 60:41 20 pins on bank X - 40:29 12 pins on bank H - 28:24 5 pins on bank D - 23:22 2 pins on bank E - 21:14 8 pins on bank C - 13:0 14 pins on bank B Add the required compatibles and interrupt count initializers. Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-2-b4d1fe4781c1@amlogic.com	2025-11-13 14:04:16 +01:00
Xianwei Zhao	e4ca152008	dt-bindings: interrupt-controller: Add support for Amlogic S6 S7 and S7D SoCs Update the device tree binding document for GPIO interrupt controller of Amlogic S6 S7 and S7D SoCs. Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251105-irqchip-gpio-s6-s7-s7d-v1-1-b4d1fe4781c1@amlogic.com	2025-11-13 14:04:16 +01:00
Sean Christopherson	6276c67f2b	x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com	2025-11-12 15:29:38 -08:00
Sean Christopherson	e6f2d5866c	x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" Don't export "ptdump_walk_pgd_level_debugfs" as its sole user is arch/x86/mm/debug_pagetables.c, which can't be built as a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-4-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Sean Christopherson	9c26c91e10	x86/mtrr: Drop unnecessary export of "mtrr_state" Don't export "mtrr_state" as usage is limited to arch/x86/kernel/cpu/mtrr (and nothing outside of that directory even includes the local mtrr.h). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-3-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Sean Christopherson	ed02882460	x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" Don't export x86_spec_ctrl_base as it's used only in bugs.c and process.c, neither of which can be built into a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-2-seanjc%40google.com	2025-11-12 15:24:42 -08:00
Haotian Zhang	593ee49222	ACPI: property: Fix fwnode refcount leak in acpi_fwnode_graph_parse_endpoint() acpi_fwnode_graph_parse_endpoint() calls fwnode_get_parent() to obtain the parent fwnode but returns without calling fwnode_handle_put() on it. This potentially leads to a fwnode refcount leak and prevents the parent node from being released properly. Call fwnode_handle_put() on the parent fwnode before returning to prevent the leak from occurring. Fixes: `3b27d00e7b` ("device property: Move fwnode graph ops to firmware specific locations") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251111075000.1828-1-vulab@iscas.ac.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:23:56 +01:00
Srinivas Pandruvada	172880f7c9	ACPI: DPTF: Support Nova Lake Add Nova Lake ACPI IDs for DPTF. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20251111004552.137984-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:14:21 +01:00
Srinivas Pandruvada	dbd911a07f	thermal: intel: int340x: Add DLVR support for Nova Lake Add support for DLVR (Digital Linear Voltage Regulator) for Nova Lake. There are no new sysfs attributes or difference in operations compared to prior generations. MMIO offset and bit positions are changed. Also no mapping is required as units are already in MHz. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20251111004552.137984-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:14:21 +01:00
Srinivas Pandruvada	af1b80b941	thermal: int340x: processor_thermal: Add Nova Lake processor thermal device Add PCI IDs for Nova Lake processor thermal device. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20251111004552.137984-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:14:21 +01:00
Kaushlendra Kumar	347d92a795	thermal: intel: int340x: Replace sprintf() with sysfs_emit() Replace sprintf() calls with sysfs_emit() in sysfs "show" functions to follow current kernel coding standards. sysfs_emit() is the preferred method for formatting sysfs output as it provides better bounds checking and is more secure. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Subject adjustments, changelog edits ] Link: https://patch.msgid.link/20251030053410.311656-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:13:13 +01:00
Kaushlendra Kumar	9019907816	thermal: intel: int340x: Use symbolic constant for UUID comparison Replace sizeof() with a symbolic constant for UUID matching to maintain existing ABI behavior while improving code clarity. The current behavior of comparing only the first 7 characters is sufficient to distinguish all UUIDs and changing to full string comparison would alter the kernel ABI, potentially breaking existing userspace applications. Use a defined constant to make the truncated comparison explicit and maintainable. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Subject adjustments ] Link: https://patch.msgid.link/20251030035955.62171-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:13:13 +01:00
Christian Loehle	0796ddf4a7	cpuidle: teo: Use this_cpu_ptr() where possible The cpuidle governor callbacks for update, select and reflect are always running on the actual idle entering/exiting CPU, so use the more optimized this_cpu_ptr() to access the internal teo data. This brings down the latency-critical teo_reflect() from static void teo_reflect(struct cpuidle_device dev, int state) { ffffffc080ffcff0: hint #0x19 ffffffc080ffcff4: stp x29, x30, [sp, #-48]! struct teo_cpu cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28> { ffffffc080ffcffc: add x29, sp, #0x0 ffffffc080ffd000: stp x19, x20, [sp, #16] ffffffc080ffd004: orr x20, xzr, x0 struct teo_cpu cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); ffffffc080ffd008: add x0, x2, #0xc20 { ffffffc080ffd00c: stp x21, x22, [sp, #32] struct teo_cpu cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78> ffffffc080ffd014: add x19, x19, #0xbb0 ffffffc080ffd018: ldr w3, [x20, #4] dev->last_state_idx = state; to static void teo_reflect(struct cpuidle_device dev, int state) { ffffffc080ffd034: hint #0x19 ffffffc080ffd038: stp x29, x30, [sp, #-48]! ffffffc080ffd03c: add x29, sp, #0x0 ffffffc080ffd040: stp x19, x20, [sp, #16] ffffffc080ffd044: orr x20, xzr, x0 struct teo_cpu cpu_data = this_cpu_ptr(&teo_cpus); ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78> { ffffffc080ffd04c: stp x21, x22, [sp, #32] struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus); ffffffc080ffd050: add x19, x19, #0xbb0 dev->last_state_idx = state; This saves us: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28> add x0, x2, #0xc20 ldr w3, [x20, #4] Signed-off-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20251110120819.714560-1-christian.loehle@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:02:07 +01:00
Rafael J. Wysocki	76934e495c	cpuidle: Add sanity check for exit latency and target residency Make __cpuidle_driver_init() fail if the exit latency of one of the driver's idle states is less than its target residency which would break cpuidle assumptions. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Changelog fix ] Link: https://patch.msgid.link/12779486.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-12 21:00:00 +01:00
Rafael J. Wysocki	9cf02802d6	PM: wakeup: Update after recent wakeup source removal ordering change After a recent change, wakeup_source_activate() will warn that the given wakeup source is "unregistered" after its timer has been shut down in wakeup_source_remove() which may be somewhat confusing, so change the warning message to say that the wakeup source is "unusable". Accordingly, rename wakeup_source_not_registered() to wakeup_source_not_usable() and update the comment in it to also mention the removal of the wakeup source. Also restore the comment in wakeup_source_remove() regarding the warning in wakeup_source_activate() that may trigger after shutting down the wakeup source timer. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12788103.O9o76ZdvQC@rafael.j.wysocki	2025-11-12 20:56:25 +01:00
Rafael J. Wysocki	62c95ea763	cpufreq: intel_pstate: Use mutex guard for driver locking Use guard(mutex)(&intel_pstate_driver_lock), or the scoped variant of it, wherever intel_pstate_driver_lock needs to be held. This allows some local variables and goto statements to be dropped as they are not necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://patch.msgid.link/2807232.mvXUDI8C0e@rafael.j.wysocki	2025-11-12 20:54:57 +01:00
Rafael J. Wysocki	25ca66300a	amd-pstate content for 6.19 (11/10/25) * optimizations for parameter array handling * fix for mode changes with offline CPUs -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEECwtuSU6dXvs5GA2aLRkspiR3AnYFAmkSy/cTHHN1cGVybTFA a2VybmVsLm9yZwAKCRAtGSymJHcCdqSlD/4nv4PXetfH2Sul/ivGX9tWbGEMBxtD iVOGPmeVVDKEkUyptBbIMbN7elWz5BN6VF0EM3FGD2IZcwYDhPRAENoZeotl9isS QRgl8427M3UgZ3FbX7QU+SCZRbU6wmuTSxR9mlXtmuVpJ+QQT4BzYXFh0Jcxg9Mp 6XrL5WFZtzSPrW5RnN/NUm26A+qRwIRZuRTAhxVV9K9Vd6ZlEq7/B+hTrlIz1GW4 XqSlGnoOQTTTpIlUXlQTtYsn9bquqb+YoPpvEbme7LCWWOxIr9GrETrQ4Yj2ljW6 cnHvFM1ky2Ld6+yfOMwAdS9aiSgcBtWD41UTxHecVVJyOVNXrw+yOxIpT9K8poWy 2/iy8qlBz2z5q9VC2ZOM2GNDGlfoiSdKlZpFV8A3O9nb/ZTTKgwN782aycSHMNGC 5Mj95tgaP00/TU6SIGHyaTg934vn9wnumzVb+cXjLNR35vQU8zfX24ER/Fo4lLSq ntTDo1byjCAKi78Ghp0Zlh3wqcWN1cKguqTZ04IRWB5I/D+NoYQvDldm8LGrUkpW YS11TxOMupuHFV5Jp7hraJeZDQB7/3wha15SdNUaOaMm+Bb++9zZj7tNAmW0sEz9 291REgM5Iba1CPK7YGE/4/xtq0nVRzlK4PB2PDJxcTZfbqteiJfeR+9788vWVRIk QJIwYJ4sprrlVA== =zQeO -----END PGP SIGNATURE----- Merge tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Pull amd-pstate content for 6.19 (11/10/25) from Mario Liminciello: "* optimizations for parameter array handling * fix for mode changes with offline CPUs" * tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs cpufreq/amd-pstate: Add static asserts for EPP indices cpufreq/amd-pstate: Fix some whitespace issues cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status() cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string cpufreq/amd-pstate: Use sysfs_match_string() for epp	2025-11-12 20:51:53 +01:00
Rafael J. Wysocki	377e38859c	Merge back cpufreq material for 6.19	2025-11-12 20:50:58 +01:00
Bo Liu	337f7e3a4b	arm64: Fix double word in comments Remove the repeated word "the" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:07:59 +00:00
mrigendrachaubey	96ac403ea2	arm64: Fix typos and spelling errors in comments This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-12 17:06:21 +00:00
Ryan Chen	7083e14225	dt-bindings: interrupt-controller: aspeed,ast2700: Correct #interrupt-cells and interrupts count Update the AST2700 interrupt controller binding to match the actual hardware and the irq-aspeed-intc driver behavior. - Interrupts: First-level INTC banks request multiple interrupt lines to the root GIC, with a maximum of 10 per bank. Second-level INTC banks request only one interrupt line to their parent INTC-IC. Therefore, set the interrupts property to allow a minimum of 1 and a maximum of 10 entries. - #interrupt-cells: Set '#interrupt-cells' to <1> since the aspeed intc driver does not support specifying a trigger type; only the interrupt index is used. Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251030060155.2342604-2-ryan_chen@aspeedtech.com	2025-11-11 22:20:45 +01:00
Junhui Liu	47a4ebbf91	irqchip/aclint-sswi: Add Nuclei UX900 support Reuse the generic ACLINT SSWI probe for Nuclei UX900 since it is compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-9-5478db4f664a@pigmoral.tech	2025-11-11 22:17:22 +01:00
Junhui Liu	a1c3a7d7ee	dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT SSWI Add SSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a TIMER unit compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-6-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Junhui Liu	579951da64	dt-bindings: interrupt-controller: Add Anlogic DR1V90 ACLINT MSWI Add MSWI support for Anlogic DR1V90 SoC, which uses Nuclei UX900 with a TIMER unit compliant with the ACLINT specification. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-5-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Junhui Liu	b90ac5fe32	dt-bindings: interrupt-controller: Add Anlogic DR1V90 PLIC Add PLIC support for Anlogic DR1V90. Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20251021-dr1v90-basic-dt-v3-4-5478db4f664a@pigmoral.tech	2025-11-11 22:17:21 +01:00
Krzysztof Kozlowski	45cc441de7	irqchip/irq-bcm7038-l1: Remove unused reg_mask_status() reg_mask_status() is not referenced anywhere leading to W=1 warning: irq-bcm7038-l1.c:85:28: error: unused function 'reg_mask_status' [-Werror,-Wunused-function] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20251106155200.337399-2-krzysztof.kozlowski@linaro.org	2025-11-11 22:11:17 +01:00
Charles Mirabile	a045359e72	irqchip/sifive-plic: Fix call to __plic_toggle() in M-Mode code path The code path for M-Mode linux that disables interrupts for other contexts was missed when refactoring __plic_toggle(). Since the new version caches updates to the state for the primary context, its use in this codepath is no longer desireable even if it could be made correct. Replace the calls to __plic_toggle() with a loop that simply disables all of the interrupts in groups of 32 with a direct mmio write. Fixes: `14ff9e54dd` ("irqchip/sifive-plic: Cache the interrupt enable state") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251103161813.2437427-1-cmirabil@redhat.com Closes: https://lore.kernel.org/oe-kbuild-all/202510271316.AQM7gCCy-lkp@intel.com/	2025-11-11 22:11:16 +01:00
Li Qiang	df717b9564	arm64: add unlikely hint to MTE async fault check in el0_svc_common Add unlikely() hint to the _TIF_MTE_ASYNC_FAULT flag check in el0_svc_common() since asynchronous MTE faults are expected to be rare occurrences during normal system call execution. This optimization helps the compiler to improve instruction caching and branch prediction for the common case where no asynchronous MTE faults are pending, while maintaining correct behavior for the exceptional case where such faults need to be handled prior to system call execution. Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:49:19 +00:00
Thomas Huth	287d163322	arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__ macro that is provided by the compilers. This is a mostly mechanical patch (done with a simple "sed -i" statement), except for the following files where comments with mis-spelled macros were tweaked manually: arch/arm64/include/asm/stacktrace/frame.h arch/arm64/include/asm/kvm_ptrauth.h arch/arm64/include/asm/debug-monitors.h arch/arm64/include/asm/esr.h arch/arm64/include/asm/scs.h arch/arm64/include/asm/memory.h Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:35:59 +00:00
Thomas Huth	639f08fc20	arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers __ASSEMBLY__ is only defined by the Makefile of the kernel, so this is not really useful for uapi headers (unless the userspace Makefile defines it, too). Let's switch to __ASSEMBLER__ which gets set automatically by the compiler when compiling assembly code. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:35:58 +00:00
Osama Abdelkader	420cab0155	arm64: acpi: add newline to deferred APEI warning missing newline in pr_warn_ratelimited in apei_claim_sea Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:28:37 +00:00
Linus Walleij	555827a064	arm64: entry: Clean out some indirection The conversion to generic IRQ entry left some functions in the EL1 (kernel) IRQ entry path very shallow, so drop the __inner_functions() where appropriate, saving some time and stack. This is not a fix but an optimization. Drop stale comments about irqentry_enter/exit() while we are at it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:28:05 +00:00
Anshuman Khandual	e2e21a9757	arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 Although the comment clearly states about PGD table's alignment requirement (when PA_BITS = 52) but the subsequent BUILD_BUG_ON() tests size comparison to 64 bytes instead. So change it as an actual alignment test. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 19:13:03 +00:00
Ard Biesheuvel	a5baf582f4	arm64/efi: Call EFI runtime services without disabling preemption The only remaining reason why EFI runtime services are invoked with preemption disabled is the fact that the mm is swapped out behind the back of the context switching code. The kernel no longer disables preemption in kernel_neon_begin(). Furthermore, the EFI spec is being clarified to explicitly state that only baseline FP/SIMD is permitted in EFI runtime service implementations, and so the existing kernel mode NEON context switching code is sufficient to preserve and restore the execution context of an in-progress EFI runtime service call. Most EFI calls are made from the efi_rts_wq, which is serviced by a kthread. As kthreads never return to user space, they usually don't have an mm, and so we can use the existing infrastructure to swap in the efi_mm while the EFI call is in progress. This is visible to the scheduler, which will therefore reactivate the selected mm when switching out the kthread and back in again. Given that the EFI spec explicitly permits runtime services to be called with interrupts enabled, firmware code is already required to tolerate interruptions. So rather than disable preemption, disable only migration so that EFI runtime services are less likely to cause scheduling delays. To avoid potential issues where runtime services are interrupted while polling the secure firmware for async completions, keep migration disabled so that a runtime service invocation does not resume on a different CPU from the one it was started on. Note, though, that the firmware executes at the same privilege level as the kernel, and is therefore able to disable interrupts altogether. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	6b9c98e657	arm64/efi: Move uaccess en/disable out of efi_set_pgd() efi_set_pgd() will no longer be called when invoking EFI runtime services via the efi_rts_wq work queue, but the uaccess en/disable are still needed when using PAN emulation using TTBR0 switching. So move these into the callers. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	1068cb52e8	arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper Since commit `5894cf571e` ("acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers") all EFI runtime calls on arm64 are routed via the EFI runtime wrappers, which are serialized using the efi_runtime_lock semaphore. This means the efi_rt_lock spinlock in the arm64 arch wrapper code has become redundant, and can be dropped. For robustness, replace it with an assert that the EFI runtime lock is in fact held by 'current'. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	7137a203b2	arm64/fpsimd: Permit kernel mode NEON with IRQs off Currently, may_use_simd() will return false when called from a context where IRQs are disabled. One notable case where this happens is when calling the ResetSystem() EFI runtime service from the reboot/poweroff code path. For this case alone, there is a substantial amount of FP/SIMD support code to handle the corner case where a EFI runtime service is invoked with IRQs disabled. The only reason kernel mode SIMD is not allowed when IRQs are disabled is that re-enabling softirqs in this case produces a noisy diagnostic when lockdep is enabled. The warning is valid, in the sense that delivering pending softirqs over the back of the call to local_bh_enable() is problematic when IRQs are disabled. While the API lacks a facility to simply mask and unmask softirqs without triggering their delivery, disabling softirqs is not needed to begin with when IRQs are disabled, given that softirqs are only every taken asynchronously over the back of a hard IRQ. So dis/enable softirq processing conditionally, based on whether IRQs are enabled, and relax the check in may_use_simd(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	1d038e8018	arm64/fpsimd: Don't warn when EFI execution context is preemptible Kernel mode FP/SIMD no longer requires preemption to be disabled, so only warn on uses of FP/SIMD from preemptible context if the fallback path is taken for cases where kernel mode NEON would not be allowed otherwise. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	a286050120	efi/runtime-wrappers: Keep track of the efi_runtime_lock owner The EFI runtime wrappers use a file local semaphore to serialize access to the EFI runtime services. This means that any calls to the arch wrappers around the runtime services will also be serialized, removing the need for redundant locking. For robustness, add a facility that allows those arch wrappers to assert that the semaphore was taken by the current task. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Ard Biesheuvel	40374d308e	efi: Add missing static initializer for efi_mm::cpus_allowed_lock Initialize the cpus_allowed_lock struct member of efi_mm. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-11 18:59:22 +00:00
Borislav Petkov (AMD)	b2c1dd6c6f	x86/coco/sev: Convert has_cpuflag() to use cpu_feature_enabled() Drop one redundant definition, while at it. There should be no functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251031122122.GKaQSpwhLvkinKKbjG@fat_crate.local	2025-11-11 16:42:31 +01:00
Gautham R. Shenoy	bb31fef0d0	cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs amd_pstate_change_mode_without_dvr_change() calls cppc_set_auto_sel() for all the present CPUs. However, this callpath eventually calls cppc_set_reg_val() which accesses the per-cpu cpc_desc_ptr object. This object is initialized only for online CPUs via acpi_soft_cpu_online() --> __acpi_processor_start() --> acpi_cppc_processor_probe(). Hence, restrict calling cppc_set_auto_sel() to only the online CPUs. Fixes: `3ca7bc818d` ("cpufreq: amd-pstate: Add guided mode control support via sysfs") Suggested-by: Mario Limonciello (AMD) (kernel.org) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	077f23573d	cpufreq/amd-pstate: Add static asserts for EPP indices In case a new index is introduced add a static assert to make sure that strings and values are updated. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	e9d62ca86a	cpufreq/amd-pstate: Fix some whitespace issues Add whitespace around the equals and remove leading space. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	92d6146a40	cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status() get_mode_idx_from_str() already checks the upper boundary for a string sent. Drop the extra check in amd_pstate_update_status() and pass the return code if there is a failure. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	baf106f3a7	cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL amd_pstate_get_mode_string() is only used by amd-pstate-ut. Set the failure path to use AMD_PSTATE_UNDEFINED ("undefined") to avoid showing "(null)" as a string when running test suite. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	06791bc017	cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string None of the users actually look for the NULL value. To avoid risk of regression introducing a new value but forgetting to add a string add a static assert to test AMD_PSTATE_MAX matches the array size. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Mario Limonciello (AMD)	7e17f48667	cpufreq/amd-pstate: Use sysfs_match_string() for epp Rather than scanning the buffer and manually matching the string use the sysfs macros. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-11-10 23:35:20 -06:00
Ma Ke	f18e71cd6c	EDAC/ie31200: Fix error handling in ie31200_register_mci ie31200_register_mci() calls device_initialize() for priv->dev unconditionally. However, in the error path, put_device() is not called, leading to an imbalance. Similarly, in the unload path, put_device() is missing. Although edac_mc_free() eventually frees the memory, it does not release the device initialized by device_initialize(). For code readability and proper pairing of device_initialize()/put_device(), add put_device() calls in both error and unload paths. Found by code review. Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://patch.msgid.link/20251106084735.35017-1-make24@iscas.ac.cn	2025-11-10 17:06:10 -08:00
Marek Vasut	b1c4c05bb0	thermal/drivers/rcar_gen3: Document R-Car Gen4 and RZ/G2 support in driver comment The R-Car Gen3 thermal driver supports both R-Car Gen3 and Gen4 SoCs as well as RZ/G2. Update the driver comment. No functional change. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Link: https://patch.msgid.link/20251110143029.10940-1-marek.vasut+renesas@mailbox.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>	2025-11-10 15:56:49 +01:00
Manaf Meethalavalappu Pallikunhi	e1304efc19	dt-bindings: thermal: qcom-tsens: document the Kaanapali Temperature Sensor Document the Temperature Sensor (TSENS) on the Kaanapali Platform. Signed-off-by: Manaf Meethalavalappu Pallikunhi <manaf.pallikunhi@oss.qualcomm.com> Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251021-b4-knp-tsens-v2-1-7b662e2e71b4@oss.qualcomm.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>	2025-11-10 13:01:40 +01:00
Ovidiu Panait	30183a67a8	dt-bindings: thermal: r9a09g047-tsu: Document RZ/V2H TSU The Renesas RZ/V2H SoC includes a Thermal Sensor Unit (TSU) block designed to measure the junction temperature. The device provides real-time temperature measurements for thermal management, utilizing two dedicated channels for temperature sensing. The Renesas RZ/V2H SoC is using the same TSU IP found on the RZ/G3E SoC, the only difference being that it has two channels instead of one. Add new compatible string "renesas,r9a09g057-tsu" for RZ/V2H and use "renesas,r9a09g047-tsu" as a fallback compatible to indicate hardware compatibility with the RZ/G3E implementation. Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251020143107.13974-3-ovidiu.panait.rb@renesas.com	2025-11-10 12:56:17 +01:00
Uros Bizjak	fd4e025526	x86/percpu: Use BIT_WORD() and BIT_MASK() macros Use BIT_WORD() and BIT_MASK() macros from <linux/bits.h> in <arch/x86/include/asm/percpu.h> instead of open-coding them. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20250907184915.78041-1-ubizjak@gmail.com	2025-11-10 11:55:54 +01:00
Marco Crivellari	47c303ba6e	cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> [ Viresh: Fixed Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-11-10 16:18:48 +05:30
Christian Marangi	58f5d39d5e	cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM On some IPQ806x SoC SMEM might be not initialized by SBL. This is the case for some Google devices (the OnHub family) that can't make use of SMEM to detect the SoC ID (and socinfo can't be used either as it does depends on SMEM presence). To handle these specific case, check if the SMEM is not initialized (by checking if the qcom_smem_get_soc_id returns -ENODEV) and fallback to OF machine compatible checking to identify the SoC variant. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-11-10 16:16:52 +05:30
Kaushlendra Kumar	352899fd91	PM: wakeup: Delete timer before removing wakeup source from list Replace timer_delete_sync() with timer_shutdown_sync() and move it before list_del_rcu() in wakeup_source_remove() to improve the cleanup ordering and code clarity. This ensures that the timer is stopped before removing the wakeup source from the events list, providing a more logical cleanup sequence. While the current ordering is functionally correct, stopping the timer first makes the cleanup flow more intuitive and follows the general pattern of disabling active components before removing data structures. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251027044127.2456365-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-08 12:17:28 +01:00
Kaushlendra Kumar	2f58be82fc	ACPI: DPTF: Use ACPI_FREE() for ACPI buffer deallocation Replace kfree() with ACPI_FREE() in pch_fivr_read() to follow ACPICA memory management conventions. While functionally equivalent in Linux (ACPI_FREE() is implemented as kfree()), using ACPI_FREE() maintains consistency with ACPICA coding standards for deallocating ACPI buffer objects. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251028051554.2862049-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-07 21:24:54 +01:00
Anshuman Khandual	fc1abd4093	arm64/mm: Drop cpu_set_[default\|idmap]_tcr_t0sz() These TCR_El1 helpers don't have any other callers. Drop these redundant indirections completely thus making this code more compact and readable. No functional change. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 20:01:34 +00:00
Mark Rutland	a7717cad61	kselftest/arm64: Align zt-test register dumps The zt-test output is awkward to read, as the 'Expected' value isn't dumped on its own line and isn't aligned with the 'Got' value beneath. For example: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Add a newline, matching the other FPSIMD/SVE/SME tests, so that we get output that can be read more easily: Mismatch: PID=5281, iteration=3270249 Expected [00a1146901a1146902a1146903a1146904a1146905a1146906a1146907a1146908a1146909a114690aa114690ba114690ca114690da114690ea114690fa11469] Got [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] SVCR: 2 Admittedly this isn't all that important when the 'Got' value is all zeroes, but otherwise this would be a major help for identifying which portion of the 'Got' value is not as expected. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kselftest@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 20:00:49 +00:00
Omar Sandoval	bf6b3fed18	arm64: remove unused ARCH_PFN_OFFSET This is only relevant to the FLATMEM memory model, which isn't an option since commit `782276b4d0` ("arm64: Force SPARSEMEM_VMEMMAP as the only memory management model"). Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:59:37 +00:00
Ryo Takakura	d3b570eba7	arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack For those architectures with HAVE_SOFTIRQ_ON_OWN_STACK use their dedicated softirq stack when !PREEMPT_RT. This condition is ensured by SOFTIRQ_ON_OWN_STACK. Let arm64 use SOFTIRQ_ON_OWN_STACK as well to select its usage of the stack. Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:55:52 +00:00
Dawei Li	4002068508	arm64: Remove assertion on CONFIG_VMAP_STACK CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit `ef6861b8e6` ("arm64: Mandate VMAP_STACK"). Remove the redundant assertion and headers. Signed-off-by: Dawei Li <dawei.li@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2025-11-07 19:46:39 +00:00
Slawomir Rosek	966c9e65ba	ACPI: DPTF: Remove int340x thermal scan handler Using the IS_ENABLED() macro in the int340x_thermal_handler_attach() forces the kernel to be recompiled when thermal drivers are enabled or disabled, which is a significant limitation of its modularity. The IS_ENABLED() macro is particularly problematic for the Android Generic Kernel Image (GKI) project which uses unified core kernel while SoC/board support is moved to loadable vendor modules. The Intel Dynamic Platform and Thermal Framework (DPTF) requires thermal drivers to be loaded at runtime, thus ACPI bus scan handler is not needed and acpi_default_enumeration() may create all platform devices, regardless of the actual setting of CONFIG_INT340X_THERMAL. Signed-off-by: Slawomir Rosek <srosek@google.com> Link: https://patch.msgid.link/20251103162516.2606158-3-srosek@google.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-07 20:45:37 +01:00
Slawomir Rosek	13a96342d5	thermal: intel: Select INT340X_THERMAL from INTEL_SOC_DTS_THERMAL The IRQ used by the Intel SoC DTS thermal device for critical overheating notification is listed in _CRS of device INT3401 which therefore needs to be enumerated for Intel SoC DTS thermal to work. The enumeration happens by binding the int3401_thermal driver to the INT3401 platform device. Thus CONFIG_INT340X_THERMAL is in fact necessary for enumerating it, so checking CONFIG_INTEL_SOC_DTS_THERMAL in int340x_thermal_handler_attach() is pointless and INT340X_THERMAL may as well be selected by INTEL_SOC_DTS_THERMAL. Signed-off-by: Slawomir Rosek <srosek@google.com> [ rjw: New subject ] Link: https://patch.msgid.link/20251103162516.2606158-2-srosek@google.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-07 20:45:37 +01:00
Huisong Li	77ca1612b8	ACPI: processor: idle: Drop redundant C-state count checks acpi_processor_setup_cstates() and acpi_processor_setup_cpuidle_cx() are called after successfully obtaining power information. Among other things, these setup functions check the C-state count against zero. However, that check is done by acpi_processor_get_power_info_cst() which will cause acpi_processor_get_power_info() to fail if it does no pass, so the checks in the two functions mentioned above are redundant. Drop those redundant checks. No intentional functional impact. Signed-off-by: Huisong Li <lihuisong@huawei.com> [ rjw: Subject and changelog rewrite ] Link: https://patch.msgid.link/20251105093647.3557248-1-lihuisong@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-07 18:47:58 +01:00
Mario Limonciello (AMD)	39ce15a48f	Documentation: power: Correct a mistaken configuration option Somehow CONFIG_PSTORE_FIRMWARE ended up in this document when I intended it to be CONFIG_CHROMEOS_PSTORE. Correct the configuration option and make it clear that not all options are required. Fixes: `b1f02f005a` ("Documentation: power: Add document on debugging shutdown hangs") Reported-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> [ rjw: Fixes: tag ] Link: https://patch.msgid.link/20251106142524.3841343-1-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-07 16:19:14 +01:00
Yicong Yang	7ab06ea41a	arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY The arm_pmu driver is using topology_core_has_smt() for retrieving the SMT implementation which depends on CONFIG_GENERIC_ARCH_TOPOLOGY. The config is optional on arm platforms so provide a !CONFIG_GENERIC_ARCH_TOPOLOGY stub for topology_core_has_smt(). Fixes: `c3d78c34ad` ("perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511041757.vuCGOmFc-lkp@intel.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Yicong Yang <yangyccccc@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-07 13:45:02 +00:00
Robin Murphy	2d7a824807	perf/arm-ni: Fix and optimise register offset calculation LKP points out an operator precedence oversight in the new NoC S3 support that, annoyingly, my local W=1 build didn't flag. In fixing that, we can also take the similarly-missed opportunity to cache the version check itself at event_init time. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511041749.ok8zDP6u-lkp@intel.com/ Fixes: `8fa08f8835` ("perf/arm-ni: Add NoC S3 support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-07 13:42:46 +00:00
Marco Crivellari	24e3848a2e	RAS/CEC: Replace use of system_wq with system_percpu_wq Switch to using system_percpu_wq because system_wq is going away as part of a workqueue restructuring. Currently if a user enqueues a work item using schedule_delayed_work() the used workqueue is "system_wq" (per-cpu workqueue) while queue_delayed_work() uses WORK_CPU_UNBOUND (used when a CPU is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use of WORK_CPU_UNBOUND again. This lack of consistency cannot be addressed without refactoring the API. For more details see those commits and the Link tag below. `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") [ bp: Massage commit message. ] Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de	2025-11-07 13:48:28 +01:00
Sumanth Korikkar	c1287d67c3	s390/sclp_mem: Consider global memory_hotplug.memmap_on_memory setting When the global kernel command line parameter memory_hotplug.memmap_on_memory is set to false, per-memory-block memmap_on_memory setting can still be set to true. However, when configuring memory block, add_memory_resource() would configure it without memmap_on_memory. i.e. Even if the MHP_MEMMAP_ON_MEMORY flag is set, mhp_supports_memmap_on_memory() returns false unless the kernel command line parameter "memory_hotplug.memmap_on_memory" is enabled. When both the flag and the cmdline parameter are set, the memory block can be configured with or without memmap_on_memory support. To ensure consistent behavior, permit configuring per-memory-block memmap_on_memory only when the memory_hotplug.memmap_on_memory kernel command line parameter is enabled. This is similar to commit `73954d379e` ("dax: add a sysfs knob to control memmap_on_memory behavior") Fixes: `ff18dcb19a` ("s390/sclp: Add support for dynamic (de)configuration of memory") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:18:23 +01:00
Mete Durlu	8840cc4520	s390/hiperdispatch: Decrease steal time threshold Higher steal time thresholds favor low utilization scenarios, which is not the common case for s390. Set steal time threshold to a lower value to prioritize vertical high and medium CPUs sooner and allow high utilization scenarios to benefit from it. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Thorsten Blum	eb3a9b405b	s390/smp: Mark pcpu_delegate() and smp_call_ipl_cpu() as __noreturn pcpu_delegate() never returns to its caller. If the target CPU is the current CPU, it calls __pcpu_delegate(), whose delegate function is not supposed to return. In any case, even if __pcpu_delegate() unexpectedly returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits in an infinite loop. Annotate pcpu_delegate() with the __noreturn attribute to improve compiler optimizations. Also annotate smp_call_ipl_cpu() accordingly since it always calls pcpu_delegate(). [hca: Merge two patches from Thorsten Blum] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Thorsten Blum	f07ebfa5e4	s390/nmi: Annotate s390_handle_damage() with __noreturn s390_handle_damage() ends by calling the non-returning function disabled_wait() and therefore also never returns. Annotate it with the __noreturn compiler attribute to improve compiler optimizations. Remove the unreachable infinite while loop. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:28 +01:00
Bo Liu	858063c1ae	s390: Fix double word in comments Remove the repeated word "the" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:17:27 +01:00
Heiko Carstens	547e9feb0e	Merge branch 'dat-enhancement-1' Heiko Carstens says: ==================== Add the Dat-Enhancement facility 1 to the list of facilities which are required to start the kernel. The facility provides the CSPG and IDTE instructions. In particular the CSPG instruction can be used to replace a valid page table entry with a different page table entry, which also differs in the page frame real address. Without the CSPG instruction it is possible to use the CSP instruction to change valid page table entries, however it only allows to change the lower or higher 32 bits of such entries, which means it cannot be used to change the page frame real address of valid page table entries. Given that there is code around (e.g. HugeTLB vmemmap optimization) which requires to change valid page table entries of the kernel mapping, without the detour over an invalid page table entry, make the CSPG instruction unconditionally available. The Dat-Enhancement facility 1 is available since z990, which is older than the currently supported minimum architecture (z10). Therefore adding this the architecture level set shouldn't cause any problems. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:14:00 +01:00
Heiko Carstens	68807a894f	s390/mm: Replace the CSP instruction with CSPG The CSPG instruction is part of the Dat-Enhancement facility 1, which is always available. Given that it can be used everywhere where also the CSP instruction can be used, replace CSP with CSPG everywhere. This allows to remove the csp() inline assembly. Also remove the unused gmap_pmdp_csp() function. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:31 +01:00
Heiko Carstens	220d8e10d6	s390/mm: Remove cpu_has_idte() Remove cpu_has_idte(). The IDTE instruction is part of the Dat-Enhancement facility 1, which is always available. Therefore remove the helper and now superfluous code. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:31 +01:00
Heiko Carstens	73c4b5d728	s390: Add Dat-Enhancement facility 1 to architecture level set Add the Dat-Enhancement facility 1 to the list of facilities which are required to start the kernel. The facility provides the CSPG and IDTE instructions. In particular the CSPG instruction can be used to replace a valid page table entry with a different page table entry, which also differs in the page frame real address. Without the CSPG instruction it is possible to use the CSP instruction to change valid page table entries, however it only allows to change the lower or higher 32 bits of such entries, which means it cannot be used to change the page frame real address of valid page table entries. Given that there is code around (e.g. HugeTLB vmemmap optimization) which requires to change valid page table entries of the kernel mapping, without the detour over an invalid page table entry, make the CSPG instruction unconditionally available. The Dat-Enhancement facility 1 is available since z990, which is older than the currently supported minimum architecture (z10). Therefore adding this to the architecture level set shouldn't cause any problems. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-11-06 14:12:30 +01:00
Avadhut Naik	8616025ae6	EDAC: Remove the legacy EDAC sysfs interface Commit `1997471069` ("edac: add a new per-dimm API and make the old per-virtual-rank API obsolete") introduced a new per-DIMM sysfs interface for EDAC making the old per-virtual-rank sysfs interface obsolete. Since this new sysfs interface was introduced more than a decade ago, remove the obsolete legacy interface. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 13:21:29 +01:00
Avadhut Naik	6a85796915	EDAC/amd64: Remove NUM_CONTROLLERS macro Currently, the NUM_CONTROLLERS macro is used to limit the amount of memory controllers (UMCs) available per node. The number of UMCs available per node, however, is already cached by the max_mcs variable of struct amd64_pvt. Allocate the relevant data structures dynamically using the variable instead of static allocation through the macro. The max_mcs variable is used for legacy systems too. These systems have a max of 2 controllers. Since the default value of max_mcs, set in per_family_init(), is 2, these legacy systems are also covered. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 12:51:33 +01:00
Avadhut Naik	e9abd990ae	EDAC/amd64: Generate ctl_name string at runtime Currently, the ctl_name string is statically assigned based on the family and model of the SOC when the amd64_edac module is loaded. The same, however, is not exactly needed as the string can be generated and assigned at runtime through scnprintf(). Remove all static assignments and generate the string at runtime. Also, cleanup the switch cases which became defunct and consolidate identical cases. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251106015727.1987246-1-avadhut.naik@amd.com	2025-11-06 12:35:59 +01:00
Yazen Ghannam	56f17be67a	x86/mce/amd: Define threshold restart function for banks Prepare for CMCI storm support by moving the common bank/block iterator code to a helper function. Include a parameter to switch the interrupt enable. This will be used by the CMCI storm handling function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:38:31 +01:00
Yazen Ghannam	3206b41604	x86/mce/amd: Remove redundant reset_block() Many of the checks in reset_block() are done again in the block reset function. So drop the redundant checks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:34:53 +01:00
Yazen Ghannam	4efaec6e16	x86/mce/amd: Support SMCA Corrected Error Interrupt AMD systems optionally support MCA thresholding which provides the ability for hardware to send an interrupt when a set error threshold is reached. This feature counts errors of all severities, but it is commonly used to report correctable errors with an interrupt rather than polling. Scalable MCA systems allow the platform to take control of this feature. In this case, the OS will not see the feature configuration and control bits in the MCA_MISC* registers. The OS will not receive the MCA thresholding interrupt, and it will need to poll for correctable errors. A "corrected error interrupt" will be available on Scalable MCA systems. This will be used in the same configuration where the platform controls MCA thresholding. However, the platform will now be able to send the MCA thresholding interrupt to the OS. Check for, and enable, this feature during per-CPU SMCA init. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 22:10:23 +01:00
Zuo An	059835bbfa	tools/power/cpupower: Support building libcpupower statically The cpupower Makefile built and installed libcpupower as a shared library (libcpupower.so) without passing `STATIC=true`, but did not build a static version of the library even with `STATIC=true`. (Only the programs were static). Thus, out-of-tree programs using libcpupower were unable to link statically against the library without having access to intermediate object files produced during the build. This fixes that situation by ensuring that libcpupower.a is built and installed when `STATIC=true` is specified. Link: https://lore.kernel.org/r/x7geegquiks3zndiavw2arihdc2rk7e2dx3lk7yxkewqii6zpg@tzjijqxyzwmu Signed-off-by: Zuo An <zuoan.penguin@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>	2025-11-05 09:56:01 -07:00
Usama Arif	8436112341	efi/libstub: Fix page table access in 5-level to 4-level paging transition When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags Bits above the physical address width of the system (i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The pgd value is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: `cb1c9e02b0` ("x86/efistub: Perform 4/5 level paging switch from the stub") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://patch.msgid.link/20251103141002.2280812-3-usamaarif642@gmail.com	2025-11-05 17:31:32 +01:00
Usama Arif	eb22663125	x86/boot: Fix page table access in 5-level to 4-level paging transition When transitioning from 5-level to 4-level paging, the existing code incorrectly accesses page table entries by directly dereferencing CR3 and applying PAGE_MASK. This approach has several issues: - __native_read_cr3() returns the raw CR3 register value, which on x86_64 includes not just the physical address but also flags. Bits above the physical address width of the system i.e. above __PHYSICAL_MASK_SHIFT) are also not masked. - The PGD entry is masked by PAGE_SIZE which doesn't take into account the higher bits such as _PAGE_BIT_NOPTISHADOW. Replace this with proper accessor functions: - native_read_cr3_pa(): Uses CR3_ADDR_MASK to additionally mask metadata out of CR3 (like SME or LAM bits). All remaining bits are real address bits or reserved and must be 0. - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for flags above bit 51 (_PAGE_BIT_NOPTISHADOW in particular). Bits below 51, but above the max physical address are reserved and must be 0. Fixes: `e9d0e6330e` ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Reported-by: Michael van der Westhuizen <rmikey@meta.com> Reported-by: Tobias Fleig <tfleig@meta.com> Co-developed-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Kiryl Shutsemau <kas@kernel.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/a482fd68-ce54-472d-8df1-33d6ac9f6bb5@intel.com	2025-11-05 17:19:11 +01:00
Yazen Ghannam	134b1eabe6	x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems Scalable MCA systems have a per-CPU register that gives the APIC LVT offset for the thresholding and deferred error interrupts. Currently, this register is read once to set up the deferred error interrupt and then read again for each thresholding block. Furthermore, the APIC LVT registers are configured each time, but they only need to be configured once per-CPU. Move the APIC LVT setup to the early part of CPU init, so that the registers are set up once. Also, this ensures that the kernel is ready to service the interrupts before the individual error sources (each MCA bank) are enabled. Apply this change only to SMCA systems to avoid breaking any legacy behavior. The deferred error interrupt is technically advertised by the SUCCOR feature. However, this was first made available on SMCA systems. Therefore, only set up the deferred error interrupt on SMCA systems and simplify the code. Guidance from hardware designers is that the LVT offsets provided from the platform should be used. The kernel should not try to enforce specific values. However, the kernel should check that an LVT offset is not reused for multiple sources. Therefore, remove the extra checking and value enforcement from the MCE code. The "reuse/conflict" case is already handled in setup_APIC_eilvt(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 16:51:27 +01:00
Yazen Ghannam	7cb735d7c0	x86/mce: Unify AMD DFR handler with MCA Polling AMD systems optionally support a deferred error interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how other MCA interrupts are handled. Deferred errors do not require any special handling related to the interrupt, e.g. resetting or rearming the interrupt, etc. However, Scalable MCA systems include a pair of registers, MCA_DESTAT and MCA_DEADDR, that should be checked for valid errors. This check should be done whenever MCA registers are polled. Currently, the deferred error interrupt does this check, but the MCA polling function does not. Call the MCA polling function when handling the deferred error interrupt. This keeps all "polling" cases in a common function. Add an SMCA status check helper. This will do the same status check and register clearing that the interrupt handler has done. And it extends the common polling flow to find AMD deferred errors. Clear the MCA_DESTAT register at the end of the handler rather than the beginning. This maintains the procedure that the 'status' register must be cleared as the final step. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 16:41:32 +01:00
Yazen Ghannam	34da4a5d68	x86/mce: Unify AMD THR handler with MCA Polling AMD systems optionally support an MCA thresholding interrupt. The interrupt should be used as another signal to trigger MCA polling. This is similar to how the Intel Corrected Machine Check interrupt (CMCI) is handled. AMD MCA thresholding is managed using the MCA_MISC registers within an MCA bank. The OS will need to modify the hardware error count field in order to reset the threshold limit and rearm the interrupt. Management of the MCA_MISC register should be done as a follow up to the basic MCA polling flow. It should not be the main focus of the interrupt handler. Furthermore, future systems will have the ability to send an MCA thresholding interrupt to the OS even when the OS does not manage the feature, i.e. MCA_MISC registers are Read-as-Zero/Locked. Call the common MCA polling function when handling the MCA thresholding interrupt. This will allow the OS to find any valid errors whether or not the MCA thresholding feature is OS-managed. Also, this allows the common MCA polling options and kernel parameters to apply to AMD systems. Add a callback to the MCA polling function to check and reset any threshold blocks that have reached their threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com	2025-11-05 13:41:18 +01:00
Marc Herbert	41f4767000	x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg) While restricting access, `a7e1f67ed2` ("x86/msr: Filter MSR writes") also added warning and started tainting the kernel. But the warning message never mentioned tainting. Moreover, this uses the "CPU_OUT_OF_SPEC" flag which is not clearly related to MSRs: that flag is overloaded by several, fairly different situations, including some much scarier ones. So, without an expert around (thank you Dave Hansen), it would have been practically impossible to root cause the tainting from just the log file at hand. So it would be prudent to explicitly mention in the logs when the tainting happens so that debugging crashes can be made easier. Fix this by simply appending the CPU_OUT_OF_SPEC flag to the warning message. This readability issue happened when staring at logs involving the Intel Memory Latency Checker (among many other things going on in that log). The MLC disables hardware prefetch. [ bp: Massage and extend commit message. ] Signed-off-by: Marc Herbert <marc.herbert@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251101-tainted-msr-v1-1-e00658ba04d4@linux.intel.com	2025-11-05 13:14:42 +01:00
Borislav Petkov (AMD)	47955b58cf	x86/cpufeatures: Correct LKGS feature flag description Quotation marks in cpufeatures.h comments are special and when the comment begins with a quoted string, that string lands in /proc/cpuinfo, turning it into a user-visible one. The LKGS comment doesn't begin with a quoted string but just in case drop the quoted "kernel" in there to avoid confusion. And while at it, simply change the description into what the LKGS instruction does for more clarity. No functional changes. Reviewed-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20251015103548.10194-1-bp@kernel.org	2025-11-04 23:09:34 +01:00
Peter Zijlstra	1fe4002cf7	x86/ptrace: Always inline trivial accessors A KASAN build bloats these single load/store helpers such that it fails to inline them: vmlinux.o: error: objtool: irqentry_exit+0x5e8: call to instruction_pointer_set() with UACCESS enabled Make sure the compiler isn't allowed to do stupid. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net	2025-11-04 08:36:20 +01:00
Peter Zijlstra	323d93f043	cleanup: Always inline everything KASAN bloat caused cleanup helper functions to not get inlined: vmlinux.o: error: objtool: irqentry_exit+0x323: call to class_user_rw_access_destructor() with UACCESS enabled Force inline all the cleanup helpers like they already are on normal builds. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251031105435.GU4068168@noisy.programming.kicks-ass.net	2025-11-04 08:35:58 +01:00
Thomas Gleixner	32034df66b	rseq: Switch to TIF_RSEQ if supported TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is suboptimal especially with the RSEQ fast path depending on it, but not really handling it. Define a separate TIF_RSEQ in the generic TIF space and enable the full separation of fast and slow path for architectures which utilize that. That avoids the hassle with invocations of resume_user_mode_work() from hypervisors, which clear TIF_NOTIFY_RESUME. It makes the therefore required re-evaluation at the end of vcpu_run() a NOOP on architectures which utilize the generic TIF space and have a separate TIF_RSEQ. The hypervisor TIF handling does not include the separate TIF_RSEQ as there is no point in doing so. The guest does neither know nor care about the VMM host applications RSEQ state. That state is only relevant when the ioctl() returns to user space. The fastpath implementation still utilizes TIF_NOTIFY_RESUME for failure handling, but this only happens within exit_to_user_mode_loop(), so arguably the hypervisor ioctl() code is long done when this happens. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.903622031@linutronix.de	2025-11-04 08:35:37 +01:00
Thomas Gleixner	7a5201ea19	rseq: Split up rseq_exit_to_user_mode() Separate the interrupt and syscall exit handling. Syscall exit does not require to clear the user_irq bit as it can't be set. On interrupt exit it can be set when the interrupt did not result in a scheduling event and therefore the return path did not invoke the TIF work handling, which would have cleared it. The debug check for the event state is also not really required even when debug mode is enabled via the static key. Debug mode is largely aiding user space by enabling a larger amount of validation checks, which cause a segfault when a malformed critical section is detected. In production mode the critical section handling takes the content mostly as is and lets user space keep the pieces when it screwed up. On kernel changes in that area the state check is useful, but that can be done when lockdep is enabled, which is anyway a required test scenario for fundamental changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.842785700@linutronix.de	2025-11-04 08:35:30 +01:00
Thomas Gleixner	70fe25a3bc	entry: Split up exit_to_user_mode_prepare() exit_to_user_mode_prepare() is used for both interrupts and syscalls, but there is extra rseq work, which is only required for in the interrupt exit case. Split up the function and provide wrappers for syscalls and interrupts, which allows to separate the rseq exit work in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.782234789@linutronix.de	2025-11-04 08:35:17 +01:00
Thomas Gleixner	3db6b38dfe	rseq: Switch to fast path processing on exit to user Now that all bits and pieces are in place, hook the RSEQ handling fast path function into exit_to_user_mode_prepare() after the TIF work bits have been handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised and the caller needs to take another turn through the TIF handling slow path. This only works for architectures which use the generic entry code. Architectures who still have their own incomplete hacks are not supported and won't be. This results in the following improvements: Kernel build Before After Reduction exit to user 80692981 80514451 signal checks: 32581 121 99% slowpath runs: 1201408 1.49% 198 0.00% 100% fastpath runs: 675941 0.84% N/A id updates: 1233989 1.53% 50541 0.06% 96% cs checks: 1125366 1.39% 0 0.00% 100% cs cleared: 1125366 100% 0 100% cs fixup: 0 0% 0 RSEQ selftests Before After Reduction exit to user: 386281778 387373750 signal checks: 35661203 0 100% slowpath runs: 140542396 36.38% 100 0.00% 100% fastpath runs: `9509789` 2.51% N/A id updates: 176203599 45.62% 9087994 2.35% 95% cs checks: 175587856 45.46% 4728394 1.22% 98% cs cleared: 172359544 98.16% 1319307 27.90% 99% cs fixup: 3228312 1.84% 3409087 72.10% The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to user invocations, they are relative to the actual 'cs check' invocations. While some of this could have been avoided in the original code, like the obvious clearing of CS when it's already clear, the main problem of going through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ notify handler is invoked more than once before going out to user space. Doing this once when everything has stabilized is the only solution to avoid this. The initial attempt to completely decouple it from the TIF work turned out to be suboptimal for workloads, which do a lot of quick and short system calls. Even if the fast path decision is only 4 instructions (including a conditional branch), this adds up quickly and becomes measurable when the rate for actually having to handle rseq is in the low single digit percentage range of user/kernel transitions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de	2025-11-04 08:34:39 +01:00
Thomas Gleixner	05b44aef70	rseq: Implement fast path for exit to user Implement the actual logic for handling RSEQ updates in a fast path after handling the TIF work and at the point where the task is actually returning to user space. This is the right point to do that because at this point the CPU and the MM CID are stable and cannot longer change due to yet another reschedule. That happens when the task is handling it via TIF_NOTIFY_RESUME in resume_user_mode_work(), which is invoked from the exit to user mode work loop. The function is invoked after the TIF work is handled and runs with interrupts disabled, which means it cannot resolve page faults. It therefore disables page faults and in case the access to the user space memory faults, it: - notes the fail in the event struct - raises TIF_NOTIFY_RESUME - returns false to the caller The caller has to go back to the TIF work, which runs with interrupts enabled and therefore can resolve the page faults. This happens mostly on fork() when the memory is marked COW. If the user memory inspection finds invalid data, the function returns false as well and sets the fatal flag in the event struct along with TIF_NOTIFY_RESUME. The slow path notify handler has to evaluate that flag and terminate the task with SIGSEGV as documented. The initial decision to invoke any of this is based on one flags in the event struct: @sched_switch. The decision is in pseudo ASM: load tsk::event::sched_switch jnz inspect_user_space mov $0, tsk::event::events ... leave So for the common case where the task was not scheduled out, this really boils down to three instructions before going out if the compiler is not completely stupid (and yes, some of them are). If the condition is true, then it checks, whether CPU ID or MM CID have changed. If so, then the CPU/MM IDs have to be updated and are thereby cached for the next round. The update unconditionally retrieves the user space critical section address to spare another user*begin/end() pair. If that's not zero and tsk::event::user_irq is set, then the critical section is analyzed and acted upon. If either zero or the entry came via syscall the critical section analysis is skipped. If the comparison is false then the critical section has to be analyzed because the event flag is then only true when entry from user was by interrupt. This is provided without the actual hookup to let reviewers focus on the implementation details. The hookup happens in the next step. Note: As with quite some other optimizations this depends on the generic entry infrastructure and is not enabled to be sucked into random architecture implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.638929615@linutronix.de	2025-11-04 08:34:18 +01:00
Thomas Gleixner	39a167560a	rseq: Optimize event setting After removing the various condition bits earlier it turns out that one extra information is needed to avoid setting event::sched_switch and TIF_NOTIFY_RESUME unconditionally on every context switch. The update of the RSEQ user space memory is only required, when either the task was interrupted in user space and schedules or the CPU or MM CID changes in schedule() independent of the entry mode Right now only the interrupt from user information is available. Add an event flag, which is set when the CPU or MM CID or both change. Evaluate this event in the scheduler to decide whether the sched_switch event and the TIF bit need to be set. It's an extra conditional in context_switch(), but the downside of unconditionally handling RSEQ after a context switch to user is way more significant. The utilized boolean logic minimizes this to a single conditional branch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.578058898@linutronix.de	2025-11-04 08:34:03 +01:00
Thomas Gleixner	e2d4f42271	rseq: Rework the TIF_NOTIFY handler Replace the whole logic with a new implementation, which is shared with signal delivery and the upcoming exit fast path. Contrary to the original implementation, this ignores invocations from KVM/IO-uring, which invoke resume_user_mode_work() with the @regs argument set to NULL. The original implementation updated the CPU/Node/MM CID fields, but that was just a side effect, which was addressing the problem that this invocation cleared TIF_NOTIFY_RESUME, which in turn could cause an update on return to user space to be lost. This problem has been addressed differently, so that it's not longer required to do that update before entering the guest. That might be considered a user visible change, when the hosts thread TLS memory is mapped into the guest, but as this was never intentionally supported, this abuse of kernel internal implementation details is not considered an ABI break. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.517640811@linutronix.de	2025-11-04 08:33:54 +01:00
Thomas Gleixner	9f6ffd4ceb	rseq: Separate the signal delivery path Completely separate the signal delivery path from the notify handler as they have different semantics versus the event handling. The signal delivery only needs to ensure that the interrupted user context was not in a critical section or the section is aborted before it switches to the signal frame context. The signal frame context does not have the original instruction pointer anymore, so that can't be handled on exit to user space. No point in updating the CPU/CID ids as they might change again before the task returns to user space for real. The fast path optimization, which checks for the 'entry from user via interrupt' condition is only available for architectures which use the generic entry code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.455429038@linutronix.de	2025-11-04 08:33:47 +01:00
Thomas Gleixner	0f085b4188	rseq: Provide and use rseq_set_ids() Provide a new and straight forward implementation to set the IDs (CPU ID, Node ID and MM CID), which can be later inlined into the fast path. It does all operations in one scoped_user_rw_access() section and retrieves also the critical section member (rseq::cs_rseq) from user space to avoid another user..begin/end() pair. This is in preparation for optimizing the fast path to avoid extra work when not required. On rseq registration set the CPU ID fields to RSEQ_CPU_ID_UNINITIALIZED and node and MM CID to zero. That's the same as the kernel internal reset values. That makes the debug validation in the exit code work correctly on the first exit to user space. Use it to replace the whole related zoo in rseq.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.393972266@linutronix.de	2025-11-04 08:33:33 +01:00
Thomas Gleixner	eaa9088d56	rseq: Use static branch for syscall exit debug when GENERIC_IRQ_ENTRY=y Make the syscall exit debug mechanism available via the static branch on architectures which utilize the generic entry code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.333440475@linutronix.de	2025-11-04 08:33:27 +01:00
Thomas Gleixner	c1cbad8f99	rseq: Make exit debugging static branch based Disconnect it from the config switch and use the static debug branch. This is a temporary measure for validating the rework. At the end this check needs to be hidden behind lockdep as it has nothing to do with the other debug infrastructure, which mainly aids user space debugging by enabling a zoo of checks which terminate misbehaving tasks instead of letting them keep the hard to diagnose pieces. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.272660745@linutronix.de	2025-11-04 08:33:20 +01:00
Thomas Gleixner	f7ee1964ac	rseq: Replace the original debug implementation Just utilize the new infrastructure and put the original one to rest. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.212510692@linutronix.de	2025-11-04 08:33:12 +01:00
Thomas Gleixner	abc850e761	rseq: Provide and use rseq_update_user_cs() Provide a straight forward implementation to check for and eventually clear/fixup critical sections in user space. The non-debug version does only the minimal sanity checks and aims for efficiency. There are two attack vectors, which are checked for: 1) An abort IP which is in the kernel address space. That would cause at least x86 to return to kernel space via IRET. 2) A rogue critical section descriptor with an abort IP pointing to some arbitrary address, which is not preceded by the RSEQ signature. If the section descriptors are invalid then the resulting misbehaviour of the user space application is not the kernels problem. The kernel provides a run-time switchable debug slow path, which implements the full zoo of checks including termination of the task when one of the gazillion conditions is not met. Replace the zoo in rseq.c with it and invoke it from the TIF_NOTIFY_RESUME handler. Move the remainders into the CONFIG_DEBUG_RSEQ section, which will be replaced and removed in a subsequent step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.151465632@linutronix.de	2025-11-04 08:32:57 +01:00
Thomas Gleixner	9c37cb6e80	rseq: Provide static branch for runtime debugging Config based debug is rarely turned on and is not available easily when things go wrong. Provide a static branch to allow permanent integration of debug mechanisms along with the usual toggles in Kconfig, command line and debugfs. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de	2025-11-04 08:32:49 +01:00
Thomas Gleixner	5412910487	rseq: Expose lightweight statistics in debugfs Analyzing the call frequency without actually using tracing is helpful for analysis of this infrastructure. The overhead is minimal as it just increments a per CPU counter associated to each operation. The debugfs readout provides a racy sum of all counters. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de	2025-11-04 08:32:41 +01:00
Thomas Gleixner	dab344753e	rseq: Provide tracepoint wrappers for inline code Provide tracepoint wrappers for the upcoming RSEQ exit to user space inline fast path, so that the header can be safely included by code which defines actual trace points. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.967114316@linutronix.de	2025-11-04 08:32:35 +01:00
Thomas Gleixner	2fc0e4b412	rseq: Record interrupt from user space For RSEQ the only relevant reason to inspect and eventually fixup (abort) user space critical sections is when user space was interrupted and the task was scheduled out. If the user to kernel entry was from a syscall no fixup is required. If user space invokes a syscall from a critical section it can keep the pieces as documented. This is only supported on architectures which utilize the generic entry code. If your architecture does not use it, bad luck. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.905067101@linutronix.de	2025-11-04 08:32:23 +01:00
Thomas Gleixner	4b7de6df20	rseq: Cache CPU ID and MM CID values In preparation for rewriting RSEQ exit to user space handling provide storage to cache the CPU ID and MM CID values which were written to user space. That prepares for a quick check, which avoids the update when nothing changed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.841964081@linutronix.de	2025-11-04 08:32:14 +01:00
Thomas Gleixner	4fc9225d19	sched: Move MM CID related functions to sched.h There is nothing mm specific in that and including mm.h can cause header recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.778457951@linutronix.de	2025-11-04 08:32:04 +01:00
Thomas Gleixner	7702a9c285	entry: Inline irqentry_enter/exit_from/to_user_mode() There is no point to have this as a function which just inlines enter_from_user_mode(). The function call overhead is larger than the function itself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.715309918@linutronix.de	2025-11-04 08:31:47 +01:00
Thomas Gleixner	54a5ab5624	entry: Remove syscall_enter_from_user_mode_prepare() Open code the only user in the x86 syscall code and reduce the zoo of functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.652839989@linutronix.de	2025-11-04 08:31:37 +01:00
Thomas Gleixner	5204be1679	entry: Clean up header Clean up the include ordering, kernel-doc and other trivialities before making further changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.590338411@linutronix.de	2025-11-04 08:31:14 +01:00
Thomas Gleixner	faba9d250e	rseq: Introduce struct rseq_data In preparation for a major rewrite of this code, provide a data structure for rseq management. Put all the rseq related data into it (except for the debug part), which allows to simplify fork/execve by using memset() and memcpy() instead of adding new fields to initialize over and over. Create a storage struct for event management as well and put the sched_switch event and a indicator for RSEQ on a task into it as a start. That uses a union, which allows to mask and clear the whole lot efficiently. The indicators are explicitly not a bit field. Bit fields generate abysmal code. The boolean members are defined as u8 as that actually guarantees that it fits. There seem to be strange architecture ABIs which need more than 8 bits for a boolean. The has_rseq member is redundant vs. task::rseq, but it turns out that boolean operations and quick checks on the union generate better code than fiddling with separate entities and data types. This struct will be extended over time to carry more information. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.527086690@linutronix.de	2025-11-04 08:30:50 +01:00
Thomas Gleixner	566d8015f7	rseq: Avoid CPU/MM CID updates when no event pending There is no need to update these values unconditionally if there is no event pending. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.462964916@linutronix.de	2025-11-04 08:30:43 +01:00
Thomas Gleixner	83409986f4	rseq, virt: Retrigger RSEQ after vcpu_run() Hypervisors invoke resume_user_mode_work() before entering the guest, which clears TIF_NOTIFY_RESUME. The @regs argument is NULL as there is no user space context available to them, so the rseq notify handler skips inspecting the critical section, but updates the CPU/MM CID values unconditionally so that the eventual pending rseq event is not lost on the way to user space. This is a pointless exercise as the task might be rescheduled before actually returning to user space and it creates unnecessary work in the vcpu_run() loops. It's way more efficient to ignore that invocation based on @regs == NULL and let the hypervisors re-raise TIF_NOTIFY_RESUME after returning from the vcpu_run() loop before returning from the ioctl(). This ensures that a pending RSEQ update is not lost and the IDs are updated before returning to user space. Once the RSEQ handling is decoupled from TIF_NOTIFY_RESUME, this turns into a NOOP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20251027084306.399495855@linutronix.de	2025-11-04 08:30:23 +01:00
Thomas Gleixner	d923739e2e	rseq: Simplify the event notification Since commit `0190e4198e` ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") the bits in task::rseq_event_mask are meaningless and just extra work in terms of setting them individually. Aside of that the only relevant point where an event has to be raised is context switch. Neither the CPU nor MM CID can change without going through a context switch. Collapse them all into a single boolean which simplifies the code a lot and remove the pointless invocations which have been sprinkled all over the place for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.336978188@linutronix.de	2025-11-04 08:30:09 +01:00
Thomas Gleixner	067b3b41b4	rseq: Simplify registration There is no point to read the critical section element in the newly registered user space RSEQ struct first in order to clear it. Just clear it and be done with it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.274661227@linutronix.de	2025-11-04 08:30:05 +01:00
Thomas Gleixner	41b43a6ba3	rseq: Remove the ksig argument from rseq_handle_notify_resume() There is no point for this being visible in the resume_to_user_mode() handling. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.211520245@linutronix.de	2025-11-04 08:30:01 +01:00
Thomas Gleixner	77f19e4d4f	rseq: Move algorithm comment to top Move the comment which documents the RSEQ algorithm to the top of the file, so it does not create horrible diffs later when the actual implementation is fed into the mincer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.149519580@linutronix.de	2025-11-04 08:29:52 +01:00
Thomas Gleixner	fdc0f39d28	rseq: Condense the inline stubs Scrolling over tons of pointless { } lines to find the actual code is annoying at best. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.085971048@linutronix.de	2025-11-04 08:29:08 +01:00
Thomas Gleixner	3ca59da7aa	rseq: Avoid pointless evaluation in __rseq_notify_resume() The RSEQ critical section mechanism only clears the event mask when a critical section is registered, otherwise it is stale and collects bits. That means once a critical section is installed the first invocation of that code when TIF_NOTIFY_RESUME is set will abort the critical section, even when the TIF bit was not raised by the rseq preempt/migrate/signal helpers. This also has a performance implication because TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by quite some infrastructure. That means every invocation of __rseq_notify_resume() goes unconditionally through the heavy lifting of user space access and consistency checks even if there is no reason to do so. Keeping the stale event mask around when exiting to user space also prevents it from being utilized by the upcoming time slice extension mechanism. Avoid this by reading and clearing the event mask before doing the user space critical section access with interrupts or preemption disabled, which ensures that the read and clear operation is CPU local atomic versus scheduling and the membarrier IPI. This is correct as after re-enabling interrupts/preemption any relevant event will set the bit again and raise TIF_NOTIFY_RESUME, which makes the user space exit code take another round of TIF bit clearing. If the event mask was non-zero, invoke the slow path. On debug kernels the slow path is invoked unconditionally and the result of the event mask evaluation is handed in. Add a exit path check after the TIF bit loop, which validates on debug kernels that the event mask is zero before exiting to user space. While at it reword the convoluted comment why the pt_regs pointer can be NULL under certain circumstances. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084306.022571576@linutronix.de	2025-11-04 08:28:38 +01:00
Thomas Gleixner	3ce17e6909	select: Convert to scoped user access Replace the open coded implementation with the scoped user access guard. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.862419776@linutronix.de	2025-11-04 08:28:34 +01:00
Thomas Gleixner	e02718c986	x86/futex: Convert to scoped user access Replace the open coded implementation with the scoped user access guards No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251027083745.799714344@linutronix.de	2025-11-04 08:28:29 +01:00
Thomas Gleixner	e4e28fd698	futex: Convert to get/put_user_inline() Replace the open coded implementation with the new get/put_user_inline() helpers. This might be replaced by a regular get/put_user(), but that needs a proper performance evaluation. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251027083745.736737934@linutronix.de	2025-11-04 08:28:23 +01:00
Thomas Gleixner	b2cfc0cd68	uaccess: Provide put/get_user_inline() Provide convenience wrappers around scoped user access similar to put/get_user(), which reduce the usage sites to: if (!get_user_inline(val, ptr)) return -EFAULT; Should only be used if there is a demonstrable performance benefit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.609031602@linutronix.de	2025-11-04 08:28:15 +01:00
Thomas Gleixner	e497310b4f	uaccess: Provide scoped user access regions User space access regions are tedious and require similar code patterns all over the place: if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; This got worse with the recent addition of masked user access, which optimizes the speculation prevention: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; There have been issues with using the wrong user__access_end() variant in the error path and other typical Copy&Pasta problems, e.g. using the wrong fault label in the user accessor which ends up using the wrong accesss end variant. These patterns beg for scopes with automatic cleanup. The resulting outcome is: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; The scope guarantees the proper cleanup for the access mode is invoked both in the success and the failure (fault) path. The scoped_user_$MODE_access() macros are implemented as self terminating nested for() loops. Thanks to Andrew Cooper for pointing me at them. The scope can therefore be left with 'break', 'goto' and 'return'. Even 'continue' "works" due to the self termination mechanism. Both GCC and clang optimize all the convoluted macro maze out and the above results with clang in: b80: f3 0f 1e fa endbr64 b84: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax b8e: 48 39 c7 cmp %rax,%rdi b91: 48 0f 47 f8 cmova %rax,%rdi b95: 90 nop b96: 90 nop b97: 90 nop b98: 31 c9 xor %ecx,%ecx b9a: 8b 07 mov (%rdi),%eax b9c: 89 06 mov %eax,(%rsi) b9e: 85 c9 test %ecx,%ecx ba0: 0f 94 c0 sete %al ba3: 90 nop ba4: 90 nop ba5: 90 nop ba6: c3 ret Which looks as compact as it gets. The NOPs are placeholder for STAC/CLAC. GCC emits the fault path seperately: bf0: f3 0f 1e fa endbr64 bf4: 48 b8 ef cd ab 89 67 45 23 01 movabs $0x123456789abcdef,%rax bfe: 48 39 c7 cmp %rax,%rdi c01: 48 0f 47 f8 cmova %rax,%rdi c05: 90 nop c06: 90 nop c07: 90 nop c08: 31 d2 xor %edx,%edx c0a: 8b 07 mov (%rdi),%eax c0c: 89 06 mov %eax,(%rsi) c0e: 85 d2 test %edx,%edx c10: 75 09 jne c1b <afoo+0x2b> c12: 90 nop c13: 90 nop c14: 90 nop c15: b8 01 00 00 00 mov $0x1,%eax c1a: c3 ret c1b: 90 nop c1c: 90 nop c1d: 90 nop c1e: 31 c0 xor %eax,%eax c20: c3 ret The fault labels for the scoped() macros and the fault labels for the actual user space accessors can be shared and must be placed outside of the scope. If masked user access is enabled on an architecture, then the pointer handed in to scoped_user_$MODE_access() can be modified to point to a guaranteed faulting user address. This modification is only scope local as the pointer is aliased inside the scope. When the scope is left the alias is not longer in effect. IOW the original pointer value is preserved so it can be used e.g. for fixup or diagnostic purposes in the fault path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.546420421@linutronix.de	2025-11-04 08:27:52 +01:00
Thomas Gleixner	2db48d8bf8	arm64: uaccess: Use unsafe wrappers for ASM GOTO Clang propagates a provided label, which is outside of a cleanup scope to ASM GOTO despite the fact that __raw_get_mem() has a local label for that purpose: "error: cannot jump from this asm goto statement to one of its possible targets" Using the unsafe wrapper with the extra local label indirection cures that. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-11-04 08:27:20 +01:00
Mario Limonciello (AMD)	b1f02f005a	Documentation: power: Add document on debugging shutdown hangs If the kernel hangs while shutting down, ideally a UART log should be captured to debug the problem. However if one isn't available, users can use the pstore functionality to retrieve logs. Add a document explaining how this works to make it more accessible to users. Tested-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20251025004341.2386868-1-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 20:55:07 +01:00
Bagas Sanjaya	4ab25c9214	Documentation: intel-pstate: Use :ref: directive for internal linking intel_pstate docs uses standard reST construct (`Section title`_) for cross-referencing sections (internal linking), rather than for external links. Incorrect cross-references are not caught when these are written in that syntax, however (fortunately docutils 0.22 raise duplicate target warnings that get fixed in `cb908f8b0a` ("Documentation: intel_pstate: fix duplicate hyperlink target errors")). Convert the cross-references to use :ref: directive, which doesn't exhibit this problem. Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Changelog tweak ] Link: https://patch.msgid.link/20251101055614.32270-1-bagasdotme@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 19:20:53 +01:00
Marco Crivellari	2817e6fa84	ACPI: thermal: Add WQ_PERCPU to alloc_workqueue() users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> [ rjw: Subject adjustment ] Link: https://patch.msgid.link/20251030154739.262582-6-marco.crivellari@suse.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 18:45:42 +01:00
Marco Crivellari	ec4291f524	ACPI: OSL: Add WQ_PERCPU to alloc_workqueue() users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> [ rjw: Subject adjustment ] Link: https://patch.msgid.link/20251030154739.262582-5-marco.crivellari@suse.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 18:45:42 +01:00
Marco Crivellari	87c21e2406	ACPI: EC: Add WQ_PERCPU to alloc_workqueue() users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> [ rjw: Subject adjustment ] Link: https://patch.msgid.link/20251030154739.262582-4-marco.crivellari@suse.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 18:45:42 +01:00
Marco Crivellari	6447ece47c	ACPI: OSL: replace use of system_wq with system_percpu_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. system_wq should be the per-cpu workqueue, yet in this name nothing makes that clear, so replace system_wq with system_percpu_wq. The old wq (system_wq) will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251030154739.262582-3-marco.crivellari@suse.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 18:45:42 +01:00
Marco Crivellari	0327c504e2	ACPI: scan: replace use of system_unbound_wq with system_dfl_wq Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251030154739.262582-2-marco.crivellari@suse.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-11-03 18:45:42 +01:00
Thomas Gleixner	43cc54d8db	s390/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. S390 is not affected for unsafe__user() as it uses its own local label already, but __get/put_kernel_nofault() lack that. Rename them to arch_*_kernel_nofault() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://patch.msgid.link/20251027083745.483079889@linutronix.de	2025-11-03 15:26:10 +01:00
Thomas Gleixner	0988ea18c6	riscv/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.419351819@linutronix.de	2025-11-03 15:26:10 +01:00
Thomas Gleixner	5002dd5314	powerpc/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.356628509@linutronix.de	2025-11-03 15:26:09 +01:00
Thomas Gleixner	14219398e3	x86/uaccess: Use unsafe wrappers for ASM GOTO ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } It ends up leaking the pagefault disable counter in the fault path. clang at least fails the build. Rename unsafe__user() to arch_unsafe_*_user() which makes the generic uaccess header wrap it with a local label that makes both compilers emit correct code. Same for the kernel_nofault() variants. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027083745.294359925@linutronix.de	2025-11-03 15:26:09 +01:00
Thomas Gleixner	3eb6660f26	uaccess: Provide ASM GOTO safe wrappers for unsafe__user() ASM GOTO is miscompiled by GCC when it is used inside a auto cleanup scope: bool foo(u32 __user p, u32 val) { scoped_guard(pagefault) unsafe_put_user(val, p, efault); return true; efault: return false; } e80: e8 00 00 00 00 call e85 <foo+0x5> e85: 65 48 8b 05 00 00 00 00 mov %gs:0x0(%rip),%rax e8d: 83 80 04 14 00 00 01 addl $0x1,0x1404(%rax) // pf_disable++ e94: 89 37 mov %esi,(%rdi) e96: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- e9d: b8 01 00 00 00 mov $0x1,%eax // success ea2: e9 00 00 00 00 jmp ea7 <foo+0x27> // ret ea7: 31 c0 xor %eax,%eax // fail ea9: e9 00 00 00 00 jmp eae <foo+0x2e> // ret which is broken as it leaks the pagefault disable counter on failure. Clang at least fails the build. Linus suggested to add a local label into the macro scope and let that jump to the actual caller supplied error label. __label__ local_label; \ arch_unsafe_get_user(x, ptr, local_label); \ if (0) { \ local_label: \ goto label; \ That works for both GCC and clang. clang: c80: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) c85: 65 48 8b 0c 25 00 00 00 00 mov %gs:0x0,%rcx c8e: ff 81 04 14 00 00 incl 0x1404(%rcx) // pf_disable++ c94: 31 c0 xor %eax,%eax // set retval to false c96: 89 37 mov %esi,(%rdi) // write c98: b0 01 mov $0x1,%al // set retval to true c9a: ff 89 04 14 00 00 decl 0x1404(%rcx) // pf_disable-- ca0: 2e e9 00 00 00 00 cs jmp ca6 <foo+0x26> // ret The exception table entry points correctly to c9a GCC: f70: e8 00 00 00 00 call f75 <baz+0x5> f75: 65 48 8b 05 00 00 00 00 mov %gs:0x0(%rip),%rax f7d: 83 80 04 14 00 00 01 addl $0x1,0x1404(%rax) // pf_disable++ f84: 8b 17 mov (%rdi),%edx f86: 89 16 mov %edx,(%rsi) f88: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- f8f: b8 01 00 00 00 mov $0x1,%eax // success f94: e9 00 00 00 00 jmp f99 <baz+0x29> // ret f99: 83 a8 04 14 00 00 01 subl $0x1,0x1404(%rax) // pf_disable-- fa0: 31 c0 xor %eax,%eax // fail fa2: e9 00 00 00 00 jmp fa7 <baz+0x37> // ret The exception table entry points correctly to f99 So both compilers optimize out the extra goto and emit correct and efficient code. Provide a generic wrapper to do that to avoid modifying all the affected architecture specific implementation with that workaround. The only change required for architectures is to rename unsafe__user() to arch_unsafe__user(). That's done in subsequent changes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/877bweujtn.ffs@tglx	2025-11-03 15:26:09 +01:00
Thomas Gleixner	44c5b6768e	ARM: uaccess: Implement missing __get_user_asm_dword() When CONFIG_CPU_SPECTRE=n then get_user() is missing the 8 byte ASM variant for no real good reason. This prevents using get_user(u64) in generic code. Implement it as a sequence of two 4-byte reads with LE/BE awareness and make the unsigned long (or long long) type for the intermediate variable to read into dependend on the the target type. The __long_type() macro and idea was lifted from PowerPC. Thanks to Christophe for pointing it out. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509120155.pFgwfeUD-lkp@intel.com/ Link: https://patch.msgid.link/20251027083745.168468637@linutronix.de	2025-11-03 15:26:09 +01:00
Rob Herring (Arm)	989b40b757	perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs Add CPU PMU compatible strings for Cortex-A320, Cortex-A520AE, Cortex-A720AE, and C1 Nano/Premium/Pro/Ultra. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:21:55 +00:00
Ma Ke	970e1e4180	perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() driver_find_device() calls get_device() to increment the reference count once a matching device is found. device_release_driver() releases the driver, but it does not decrease the reference count that was incremented by driver_find_device(). At the end of the loop, there is no put_device() to balance the reference count. To avoid reference count leakage, add put_device() to decrease the reference count. Found by code review. Cc: stable@vger.kernel.org Fixes: `bfc653aa89` ("perf: arm_cspmu: Separate Arm and vendor module") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:20:03 +00:00
Robin Murphy	8fa08f8835	perf/arm-ni: Add NoC S3 support NoC S3 and its SI L1 sibling look largely similar to their predecessors, but add the notion of subfeatures to the discovery process, which we now use to find the event muxes for each device node. Plus, as ever, more mildly annoying shuffling around of some of the PMU registers (this time it's the counters...) Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 14:19:17 +00:00
Besar Wicaksono	decc3684c2	perf/arm_cspmu: nvidia: Add pmevfiltr2 support Support NVIDIA PMU that utilizes the optional event filter2 register. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	82dfd72bfb	perf/arm_cspmu: nvidia: Add revision id matching Distinguish NVIDIA devices by revision and variant bits in PMIIDR register in addition to product id. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	04330be8dc	perf/arm_cspmu: Add pmpidr support The PMIIDR value is composed by the values in PMPIDR registers. We can use PMPIDR registers as alternative for device identification for systems that do not implement PMIIDR. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Besar Wicaksono	a2573bc790	perf/arm_cspmu: Add callback to reset filter config Implementer may need to reset a filter config when stopping a counter, thus adding a callback for this. Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:35:07 +00:00
Yicong Yang	c3d78c34ad	perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if one of the SMT siblings is not idle on a multi-threaded implementation. So don't use it on SMT cores. Introduce topology_core_has_smt() for knowing the SMT implementation and cached it in arm_pmu::has_smt during allocation. When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this patch we'll get: [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2880457316 cycles CPU3 2880459810 cycles 1.254688470 seconds time elapsed With this patch the idle state of CPU3 is observed as expected: [root@client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1 --taskset 2 --timeout 1 [...] Performance counter stats for 'CPU(s) 2-3': CPU2 2558580492 cycles CPU3 305749 cycles 1.113626410 seconds time elapsed Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>	2025-11-03 13:28:48 +00:00
Lukas Wunner	51d0656959	genirq/manage: Reduce priority of forced secondary interrupt handler Crystal reports that the PCIe Advanced Error Reporting driver gets stuck in an infinite loop on PREEMPT_RT: Both the primary interrupt handler aer_irq() as well as the secondary handler aer_isr() are forced into threads with identical priority. Crystal writes that on the ARM system in question, the primary handler has to clear an error in the Root Error Status register... "before the next error happens, or else the hardware will set the Multiple ERR_COR Received bit. If that bit is set, then aer_isr() can't rely on the Error Source Identification register, so it scans through all devices looking for errors -- and for some reason, on this system, accessing the AER registers (or any Config Space above 0x400, even though there are capabilities located there) generates an Unsupported Request Error (but returns valid data). Since this happens more than once, without aer_irq() preempting, it causes another multi error and we get stuck in a loop." The issue does not show on non-PREEMPT_RT because the primary handler runs in hardirq context and thus can preempt the threaded secondary handler, clear the Root Error Status register and prevent the secondary handler from getting stuck. Emulate the same behavior on PREEMPT_RT by assigning a lower default priority to the secondary handler if the primary handler is forced into a thread. Reported-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Crystal Wood <crwood@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/f6dcdb41be2694886b8dbf4fe7b3ab89e9d5114c.1761569303.git.lukas@wunner.de Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/	2025-11-01 21:30:02 +01:00
Frederic Weisbecker	ba14500e4b	timers/migration: Remove dead code handling idle CPU checking for remote timers Idle migrators don't walk the whole tree in order to find out if there are timers to migrate because they recorded the next deadline to be verified within a single check in tmigr_requires_handle_remote(). Remove the related dead code and data. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-7-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	93643b90d6	timers/migration: Remove unused "cpu" parameter from tmigr_get_group() Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-6-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	3c8eb36e2a	timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy The CPU doing the prepare work for a remote target must be online from the tree point of view and its hierarchy must be active, otherwise propagating its active state up to the new root branch would be either incorrect or racy. Assert those conditions with more sanity checks. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-5-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	5eb579dfd4	timers/migration: Fix imbalanced NUMA trees When a CPU from a new node boots, the old root may happen to be connected to the new root even if their node mismatch, as depicted in the following scenario: 1) CPU 0 boots and creates the first group for node 0. [GRP0:0] node 0 \| CPU 0 2) CPU 1 from node 1 boots and creates a new top that corresponds to node 1, but it also connects the old root from node 0 to the new root from node 1 by mistake. [GRP1:0] node 1 / \ / \ [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0 CPU 1 3) This eventually leads to an imbalanced tree where some node 0 CPUs migrate node 1 timers (and vice versa) way before reaching the crossnode groups, resulting in more frequent remote memory accesses than expected. [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:1] node 1 node 0 / \ \| / \ [...] [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0... CPU 1... A balanced tree should only contain groups having children that belong to the same node: [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:0] node 0 node 1 / \ / \ / \ / \ [GRP0:0] [...] [...] [GRP0:1] node 0 node 1 \| \| CPU 0... CPU 1... In order to fix this, the hierarchy must be unfolded up to the crossnode level as soon as a node mismatch is detected. For example the stage 2 above should lead to this layout: [GRP2:0] NUMA_NO_NODE / \ [GRP1:0] [GRP1:1] node 0 node 1 / \ / \ [GRP0:0] [GRP0:1] node 0 node 1 \| \| CPU 0 CPU 1 This means that not only GRP1:0 must be created but also GRP1:1 and GRP2:0 in order to prepare a balanced tree for next CPUs to boot. Fixes: `7ee9887703` ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-4-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	fa9620355d	timers/migration: Remove locking on group connection Initializing the tmc's group, the group's number of children and the group's parent can all be done without locking because: 1) Reading the group's parent and its group mask is done locklessly. 2) The connections prepared for a given CPU hierarchy are visible to the target CPU once online, thanks to the CPU hotplug enforced memory ordering. 3) In case of a newly created upper level, the new root and its connections and initialization are made visible by the CPU which made the connections. When that CPUs goes idle in the future, the new link is published by tmigr_inactive_up() through the atomic RmW on ->migr_state. 4) If CPUs were still walking up the active hierarchy, they could observe the new root earlier. In this case the ordering is enforced by an early initialization of the group mask and by barriers that maintain address dependency as explained in: `b729cc1ec2` ("timers/migration: Fix another race between hotplug and idle entry/exit") `de3ced72a7` ("timers/migration: Enforce group initialization visibility to tree walkers") 5) Timers are propagated by a chain of group locking from the bottom to the top. And while doing so, the tree also propagates groups links and initialization. Therefore remote expiration, which also relies on group locking, will observe those links and initialization while holding the root lock before walking the tree remotely and update remote timers. This is especially important for migrators in the active hierarchy that may observe the new root early. Therefore the locking is unnecessary at initialization. If anything, it just brings confusion. Remove it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-3-frederic@kernel.org	2025-11-01 20:38:25 +01:00
Frederic Weisbecker	6c181b5667	timers/migration: Convert "while" loops to use "for" Both the "do while" and "while" loops in tmigr_setup_groups() eventually mimic the behaviour of "for" loops. Simplify accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024132536.39841-2-frederic@kernel.org	2025-11-01 20:38:24 +01:00
Steve Wahl	4138787408	tick/sched: Limit non-timekeeper CPUs calling jiffies update On large NUMA systems, while running a test program that saturates the inter-processor and inter-NUMA links, acquiring the jiffies_lock can be very expensive. If the cpu designated to do jiffies updates (tick_do_timer_cpu) gets delayed and other cpus decide to do the jiffies update themselves, a large number of them decide to do so at the same time. The inexpensive check against tick_next_period is far quicker than actually acquiring the lock, so most of these get in line to obtain the lock. If obtaining the lock is slow enough, this spirals into the vast majority of CPUs continuously being stuck waiting for this lock, just to obtain it and find out that time has already been updated by another cpu. For example, on one random entry to kdb by manually-injected NMI, 2912 of 3840 CPUs were observed to be stuck there. To avoid this, allow only one non-timekeeper CPU to call tick_do_update_jiffies64() at any given time, resetting ts->stalled jiffies only if the jiffies update function is actually called. With this change, manually interrupting the test at most two CPUs are observed to invoke tick_do_update_jiffies64() - the timekeeper and one other. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20251027183456.343407-1-steve.wahl@hpe.com	2025-11-01 20:25:53 +01:00
Muchun Song	9ea2b810d5	genirq/proc: Fix race in show_irq_affinity() Reading /proc/irq/N/smp_affinity* races with irq_set_affinity() and irq_move_masked_irq(), leading to old or torn output for users. After a user writes a new CPU mask to /proc/irq/N/affinity*, the syscall returns success, yet a subsequent read of the same file immediately returns a value different from what was just written. That's due to a race between show_irq_affinity() and irq_move_masked_irq() which lets the read observe a transient, inconsistent affinity mask. Cure it by guarding the read with irq_desc::lock. [ tglx: Massaged change log ] Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251028090408.76331-1-songmuchun@bytedance.com	2025-10-31 22:30:05 +01:00
Marc Zyngier	68c4c159a0	genirq: Fix percpu_devid irq affinity documentation Stephen points out that some of the percpu_devid irq affinity documentation is either missing or not matching the data structures. Address all the issues in one go. Fixes: `87b0031f7f` ("irqdomain: Add firmware info reporting interface") Fixes: `258e7d28a3` ("genirq: Add affinity to percpu_devid interrupt requests") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251030143032.2035987-1-maz@kernel.org	2025-10-31 22:25:34 +01:00
Rafael J. Wysocki	1cf9c4f115	Merge back system sleep material for 6.19	2025-10-31 11:33:01 +01:00
Srinivas Pandruvada	39f421f2e3	powercap: intel_rapl: Add support for Wildcat Lake platform Add Wildcat Lake to the list of supported processors for RAPL. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20251023174532.1882008-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:15:02 +01:00
Kuppuswamy Sathyanarayanan	790e826be8	cpufreq: intel_pstate: Add Diamond Rapids OOB mode support Prevent intel_pstate from loading when Out-of-Band (OOB) P-states mode is enabled. The OOB identification mechanism for Diamond Rapids servers is the same as for prior generation CPUs such as Granite Rapids. Add the Diamond Rapids CPU model to intel_pstate_cpu_oob_ids[] to ensure correct OOB handling. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20251022215425.3566218-1-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:13:14 +01:00
Tejun Heo	8e4ec90701	freezer: Clarify that only cgroup1 freezer uses PM freezer cgroup1 freezer piggybacks on the PM freezer, which inadvertently allowed userspace to produce uninterruptible tasks at will. To avoid the issue, cgroup2 freezer switched to a separate job control based mechanism. While this happened a long time ago, the code and comment haven't been updated making it confusing to people who aren't familiar with the history. Rename cgroup_freezing() to cgroup1_freezing() and update comments on top of freezing() and frozen() to clarify that cgroup2 freezer isn't covered by the PM freezer mechanism. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Qu Wenruo <wqu@suse.com> Link: https://patch.msgid.link/aPZ3q6Hm865NicBC@slm.duckdns.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:10:27 +01:00
Xueqin Luo	ea358066de	PM: hibernate: add sysfs interface for hibernate_compression_threads Add a sysfs attribute `/sys/power/hibernate_compression_threads` to allow runtime configuration of the number of threads used for compressing and decompressing hibernation images. The new sysfs interface enables dynamic adjustment at runtime: # cat /sys/power/hibernate_compression_threads 3 # echo 4 > /sys/power/hibernate_compression_threads This change provides greater flexibility for debugging and performance tuning of hibernation without requiring a reboot. Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn> Link: https://patch.msgid.link/c68c62f97fabf32507b8794ad8c16cd22ee656ac.1761046167.git.luoxueqin@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:07:00 +01:00
Xueqin Luo	090bf5a0f4	PM: hibernate: make compression threads configurable The number of compression/decompression threads has a direct impact on hibernate image generation and resume latency. Using more threads can reduce overall resume time, but on systems with fewer CPU cores it may also introduce contention and reduce efficiency. Performance was evaluated on an 8-core ARM system, averaged over 10 runs: Threads Hibernate(s) Resume(s) -------------------------------- 3 12.14 18.86 4 12.28 17.48 5 11.09 16.77 6 11.08 16.44 With 5–6 threads, resume latency improves by approximately 12% compared to the default 3-thread configuration, with negligible impact on hibernate time. Introduce a new kernel parameter `hibernate_compression_threads=` that allows users and integrators to tune the number of compression/decompression threads at boot. This provides a way to balance performance and CPU utilization across a wide range of hardware without recompiling the kernel. Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn> Link: https://patch.msgid.link/f24b3ca6416e230a515a154ed4c121d72a7e05a6.1761046167.git.luoxueqin@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:07:00 +01:00
Xueqin Luo	e114e2eb7e	PM: hibernate: dynamically allocate crc->unc_len/unc for configurable threads Convert crc->unc_len and crc->unc from fixed-size arrays to dynamically allocated arrays, sized according to the actual number of threads selected at runtime. This removes the fixed limit imposed by CMP_THREADS. Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn> Link: https://patch.msgid.link/b5db63bb95729482d2649b12d3a11cb7547b7fcc.1761046167.git.luoxueqin@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-30 20:07:00 +01:00
John Allen	92ad6505a4	x86/sev: Include XSS value in GHCB CPUID request When a guest issues a CPUID instruction for Fn0000000D_x01, the hypervisor may be intercepting the CPUID instruction and need to access the guest XSS value. For SEV-ES, the XSS value is encrypted and needs to be included in the GHCB to be visible to the hypervisor. Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/all/20250924200852.4452-3-john.allen@amd.com/	2025-10-30 17:47:49 +01:00
John Allen	9249bcdea0	x86/boot: Move boot_msr helpers to asm/shared/msr.h The boot_{rdmsr,wrmsr}() helpers are just* the barebones MSR access functionality, without any tracing or exception handling glue as it is done in kernel proper. Move these helpers to asm/shared/msr.h and rename to raw_{rdmsr,wrmsr}() to indicate what they are. [ bp: Correct the reason why those helpers exist. I should've caught that in the original patch that added them: `176db62257` ("x86/boot: Introduce helpers for MSR reads/writes" but oh well... - fixup include path delimiters to <> ] Signed-off-by: John Allen <john.allen@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/all/20250924200852.4452-2-john.allen@amd.com	2025-10-30 16:29:53 +01:00
Yu Peng	ca8313fd83	x86/microcode: Mark early_parse_cmdline() as __init Fix section mismatch warning reported by modpost: .text:early_parse_cmdline() -> .init.data:boot_command_line The function early_parse_cmdline() is only called during init and accesses init data, so mark it __init to match its usage. [ bp: This happens only when the toolchain fails to inline the function and I haven't been able to reproduce it with any toolchain I'm using. Patch is obviously correct regardless. ] Signed-off-by: Yu Peng <pengyu@kylinos.cn> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/all/20251030123757.1410904-1-pengyu@kylinos.cn	2025-10-30 14:33:31 +01:00
Borislav Petkov (AMD)	8d17104506	x86/microcode/AMD: Select which microcode patch to load All microcode patches up to the proper BIOS Entrysign fix are loaded only after the sha256 signature carried in the driver has been verified. Microcode patches after the Entrysign fix has been applied, do not need that signature verification anymore. In order to not abandon machines which haven't received the BIOS update yet, add the capability to select which microcode patch to load. The corresponding microcode container supplied through firmware-linux has been modified to carry two patches per CPU type (family/model/stepping) so that the proper one gets selected. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Waiman Long <longman@redhat.com> Link: https://patch.msgid.link/20251027133818.4363-1-bp@kernel.org	2025-10-30 14:29:54 +01:00
Swaraj Gaikwad	cb908f8b0a	Documentation: intel_pstate: fix duplicate hyperlink target errors Fix reST warnings in Documentation/admin-guide/pm/intel_pstate.rst caused by missing explicit hyperlink labels for section titles. Before this change, the following errors were printed during `make htmldocs`: Documentation/admin-guide/pm/intel_pstate.rst:401: ERROR: Indirect hyperlink target (id="id6") refers to target "passive mode", which is a duplicate, and cannot be used as a unique reference. Documentation/admin-guide/pm/intel_pstate.rst:517: ERROR: Indirect hyperlink target (id="id9") refers to target "active mode", which is a duplicate, and cannot be used as a unique reference. Documentation/admin-guide/pm/intel_pstate.rst:611: ERROR: Indirect hyperlink target (id="id15") refers to target "global attributes", which is a duplicate, and cannot be used as a unique reference. ERROR: Duplicate target name, cannot be used as a unique reference: "passive mode", "active mode", "global attributes". These errors occurred because the sections "Active Mode", "Active Mode With HWP", "Passive Mode", and "Global Attributes" did not define explicit hyperlink labels. As a result, Sphinx auto-generated duplicate anchors when the same titles appeared multiple times within the document. Because of this, the generated HTML documentation contained broken references such as: `active mode <Active Mode_>`_ `passive mode <Passive Mode_>`_ `global attributes <Global Attributes_>`_ This patch adds explicit hyperlink labels for the affected sections, ensuring all references are unique and correctly resolved. After applying this patch, `make htmldocs` completes without any warnings, and all hyperlinks in intel_pstate.html render properly. Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> [ rjw: Subject adjustment ] Link: https://patch.msgid.link/20251029134737.42229-1-swarajgaikwad1925@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-29 20:04:50 +01:00
Malaya Kumar Rout	4e48e7baa3	PM: runtime: fix typos in runtime.c comments Fix several typos in comments: - "timesptamp" -> "timestamp" - "involed" -> "involved" - "nonero" -> "nonzero" Fix typos in comments to improve code documentation clarity. Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Link: https://patch.msgid.link/20251026170527.262003-1-mrout@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-29 19:58:58 +01:00
Peng Fan	65df3a9629	PM: EM: Add to em_pd_list only when no failure When em_create_perf_table() returns failure, pd is freed, there dev->em_pd is not valid. Then accessing dev->em_pd->node will trigger kernel panic in em_dev_register_pd_no_update(). So return early if 'ret' is non-zero. Kernel dump: cpu cpu0: EM: invalid power: 0 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Mem abort info: pc : em_dev_register_pd_no_update+0xb4/0x79c lr : em_dev_register_pd_no_update+0x9c/0x79c Call trace: em_dev_register_pd_no_update+0xb4/0x79c (P) em_dev_register_perf_domain+0x18/0x58 scmi_cpufreq_register_em+0x84/0xb8 cpufreq_online+0x48c/0xb74 cpufreq_add_dev+0x80/0x98 subsys_interface_register+0x100/0x11c cpufreq_register_driver+0x158/0x278 scmi_cpufreq_probe+0x1f8/0x2e0 scmi_dev_probe+0x28/0x3c really_probe+0xbc/0x29c __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x15c __device_attach_driver+0xb8/0x134 bus_for_each_drv+0x84/0xe4 Fixes: `cbe5aeedec` ("PM: EM: Assign a unique ID when creating a performance domain") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251028-fix-energy-v1-1-ab854fd6a97c@nxp.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-29 13:37:00 +01:00
Jie Zhan	1971b18785	cpufreq: CPPC: Don't warn if FIE init fails to read counters During the CPPC FIE initialization, reading perf counters on offline cpus should be expected to fail. Don't warn on this case. Also, change the error log level to debug since FIE is optional. Co-developed-by: Bowen Yu <yubowen8@huawei.com> Signed-off-by: Bowen Yu <yubowen8@huawei.com> # Changing loglevel to debug Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> [ Viresh: Added back the dropped comment. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-28 10:40:47 +05:30
Miaoqian Lin	9600156bb9	cpufreq: nforce2: fix reference count leak in nforce2 There are two reference count leaks in this driver: 1. In nforce2_fsb_read(): pci_get_subsys() increases the reference count of the PCI device, but pci_dev_put() is never called to release it, thus leaking the reference. 2. In nforce2_detect_chipset(): pci_get_subsys() gets a reference to the nforce2_dev which is stored in a global variable, but the reference is never released when the module is unloaded. Fix both by: - Adding pci_dev_put(nforce2_sub5) in nforce2_fsb_read() after reading the configuration. - Adding pci_dev_put(nforce2_dev) in nforce2_exit() to release the global device reference. Found via static analysis. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-28 10:28:13 +05:30
Armin Wolf	a5c2fcd82e	ACPI: fan: Add support for Microsoft fan extensions Microsoft has designed a set of extensions for the ACPI fan device allowing the OS to specify a set of fan speed trip points. The platform firmware will then notify the ACPI fan device when one of the trip points is triggered. Unfortunatly, some device manufacturers (like HP) blindly assume that the OS will use said extensions and thus only update the values returned by the _FST control method when receiving such a notification. As a result, the ACPI fan driver is currently unusable on such machines, always reporting a constant value. Fix this by adding support for the Microsoft extensions. During probe and when resuming from suspend, the driver will attempt to trigger an initial notification that will update the values returned by _FST. Said trip points will be updated each time a notification is received from the platform firmware to ensure that the values returned by the _FST control method are updated. Link: https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/design-guide Closes: https://github.com/lm-sensors/lm-sensors/issues/506 Signed-off-by: Armin Wolf <W_Armin@gmx.de> [ rjw: Edits of the new code comments ] Link: https://patch.msgid.link/20251024183824.5656-4-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-27 20:56:01 +01:00
Armin Wolf	3d4ca76369	ACPI: fan: Add hwmon notification support The platform firmware can notify the ACPI fan device that the fan speed has changed. Relay this notification to the hwmon device if present so that userspace applications can react to it. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251024183824.5656-3-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-27 20:56:01 +01:00
Armin Wolf	0670b9ad4d	ACPI: fan: Add basic notification support The ACPI specification states that the platform firmware can notify the ACPI fan device that the fan speed has changed an that the _FST control method should be reevaluated. Add support for this mechanism to prepare for future changes. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251024183824.5656-2-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-27 20:56:01 +01:00
Rafael J. Wysocki	58ca21d591	ACPI: TAD: Improve runtime PM using guard macros Use guard pm_runtime_active_try to simplify runtime PM cleanup and implement runtime resume error handling in multiple places. Also use guard pm_runtime_noresume to simplify acpi_tad_remove(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/13881356.uLZWGnKmhe@rafael.j.wysocki	2025-10-27 20:32:13 +01:00
Rafael J. Wysocki	f9f5e22b75	ACPI: TAD: Rearrange runtime PM operations in acpi_tad_remove() It is not necessary to resume the device upfront in acpi_tad_remove() because both acpi_tad_disable_timer() and acpi_tad_clear_status() attempt to resume it, but it is better to prevent it from suspending between these calls by incrementing its runtime PM usage counter. Accordingly, replace the pm_runtime_get_sync() call in acpi_tad_remove() with a pm_runtime_get_noresume() one and put the latter right before the first invocation of acpi_tad_disable_timer(). In addition, use pm_runtime_put_noidle() to drop the device's runtime PM usage counter after using pm_runtime_get_noresume() to bump it up to follow a common pattern and use pm_runtime_suspend() for suspending the device afterward. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5031965.GXAFRqVoOG@rafael.j.wysocki	2025-10-27 20:32:13 +01:00
Siyuan Huang	040beccb03	rust: acpi: replace `core::mem::zeroed` with `pin_init::zeroed` All types in `bindings` implement `Zeroable` if they can, so use `pin_init::zeroed` instead of relying on `unsafe` code. If this ends up not compiling in the future, something in bindgen or on the C side changed and is most likely incorrect. Link: https://github.com/Rust-for-Linux/linux/issues/1189 Suggested-by: Benno Lossin <lossin@kernel.org> Signed-off-by: Siyuan Huang <huangsiyuan@kylinos.cn> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <lossin@kernel.org> Acked-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Kunwu Chan <chentao@kylinos.cn> Link: https://patch.msgid.link/20251020031204.78917-1-huangsiyuan@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-27 20:27:05 +01:00
Rafael J. Wysocki	86bfd21a0b	ACPI: battery: Drop redundant locking All of the evaluations of objects in the ACPI namespace are carried out under the namespace lock and interpreter lock in ACPICA, so it is not necessary to put any additional locks around them for synchronization. However, the ACPI battery driver does just that, so remove the redundant locking around ACPI object evaluation from it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2344462.iZASKD2KPV@rafael.j.wysocki	2025-10-27 20:19:52 +01:00
Yazen Ghannam	187d1b27a1	RAS/AMD/ATL: Require PRM support for future systems Currently, the AMD Address Translation Library will fail to load for new, unrecognized systems (based on Data Fabric revision). The intention is to prevent the code from executing on new systems and returning incorrect results. Recent AMD systems, however, may provide UEFI PRM handlers for address translation. This is code provided by the platform through BIOS tables. These are the preferred method for translation, and the Linux native code can be used as a fallback. Future AMD systems are expected to provide PRM handlers by default. And Linux native code will not be used. Adjust the ATL init code so that new, unrecognized systems will default to using PRM handlers only. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org> Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-2-7ab1df4a5fbc@amd.com	2025-10-27 19:56:41 +01:00
Marc Zyngier	fa9d277738	perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer Having removed the use of the cpu_armpmu per-CPU variable from the interrupt handling, the only user left is the BRBE scheduler hook. It is easy to drop the use of this variable by following the pointer to the generic PMU structure, and get the arm_pmu structure from there. Perform the conversion and kill cpu_armpmu altogether. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-27-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	ebac4649fc	irqdomain: Kill of_node_to_fwnode() helper There is no in-tree users of this helper since `b13b41cc3d` ("misc: ti_fpc202: Switch to of_fwnode_handle()"), and is replaced with of_fwnode_handle(). Get rid of it. Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-26-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	ee2d50a9f5	genirq: Kill irq_{g,s}et_percpu_devid_partition() These two helpers do not have any user anymore, and can be removed, together with the affinity field kept in the irqdesc structure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-25-maz@kernel.org	2025-10-27 17:16:37 +01:00
Marc Zyngier	c620438ef2	irqchip: Kill irq-partition-percpu This code is now completely unused, and nobody will ever miss it. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-24-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	7443813f10	irqchip/apple-aic: Drop support for custom PMU irq partitions Similarly to what has been done for GICv3, drop the irq partitioning support from the AIC driver, effectively merging the two per-cpu interrupts for the PMU. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Sven Peter <sven@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-23-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	64b9738eaa	irqchip/gic-v3: Drop support for custom PPI partitions The only thing getting in the way of correctly handling PPIs the way they were intended is the GICv3 hack that deals with PPI partitions. Remove that code, allowing the common code to kick in. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-22-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	4cdf4813f5	coresight: trbe: Request specific affinities for per CPU interrupts Let the TRBE driver request interrupts with an affinity mask matching the TRBE implementation affinity. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://patch.msgid.link/20251020122944.3074811-21-maz@kernel.org	2025-10-27 17:16:36 +01:00
Marc Zyngier	f8112d29ba	perf: arm_spe_pmu: Request specific affinities for per CPU interrupts Let the SPE driver request interrupts with an affinity mask matching the SPE implementation affinity. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-20-maz@kernel.org	2025-10-27 17:16:36 +01:00
Will Deacon	54b350fa8e	perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts Let the PMU driver request both NMIs and normal interrupts with an affinity mask matching the PMU affinity. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-19-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	c734af3b2b	genirq: Add request_percpu_irq_affinity() helper While it would be nice to simply make request_percpu_irq() take an affinity mask, the churn is likely to be on the irritating side given that most drivers do not give a damn about affinities. So take the more innocuous path to provide a helper that parallels request_percpu_irq(), with an affinity as a bonus argument. Yes, request_percpu_irq_affinity() is a bit of a mouthful. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-18-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	bdf4e2ac29	genirq: Allow per-cpu interrupt sharing for non-overlapping affinities Interrupt sharing for percpu-devid interrupts is forbidden, and for good reasons. These are interrupts generated from a CPU and handled by itself (timer, for example). Nobody in their right mind would put two devices on the same pin (and if they have, they get to keep the pieces...). But this also prevents more benign cases, where devices are connected to groups of CPUs, and for which the affinities are not overlapping. Effectively, the only thing they share is the interrupt number, and nothing else. Tweak the definition of IRQF_SHARED applied to percpu_devid interrupts to allow this particular use case. This results in extra validation at the point of the interrupt being setup and freed, as well as a tiny bit of extra complexity for interrupts at handling time (to pick the correct irqaction). Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-17-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	b9c6aa9efc	genirq: Update request_percpu_nmi() to take an affinity Continue spreading the notion of affinity to the per CPU interrupt request code by updating the call sites that use request_percpu_nmi() (all two of them) to take an affinity pointer. This pointer is firmly NULL for now. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-16-maz@kernel.org	2025-10-27 17:16:35 +01:00
Marc Zyngier	258e7d28a3	genirq: Add affinity to percpu_devid interrupt requests Add an affinity field to both the irqaction structure and the interrupt request primitives. Nothing is making use of it yet, and the only value used it NULL, which is used as a shorthand for cpu_possible_mask. This will shortly get used with actual affinities. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-15-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	9047a39daa	genirq: Factor-in percpu irqaction creation Move the code creating a per-cpu irqaction into its own helper, so that future changes to this code can be kept localised. At the same time, fix the documentation which appears to say the wrong thing when it comes to interrupts being automatically enabled (percpu_devid interrupts never are). Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-14-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	5c2b2cc472	genirq: Merge irqaction::{dev_id,percpu_dev_id} When irqaction::percpu_dev_id was introduced, it was hoped that it could be part of an anonymous union with dev_id, as the two fields are mutually exclusive. However, toolchains used at the time were often showing terrible support for anonymous unions, breaking the build on a number of architectures. It was therefore decided to keep the two fields separate and address this down the line. 14 years later, the compiler dark age is over, and there is universal support for anonymous unions. Get a whole pointer back that can immediately be spent on something else. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-13-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	5ff78c8de9	genirq: Kill handle_percpu_devid_fasteoi_nmi() There is no in-tree user of this flow handler anymore, so simply remove it. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-12-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	21bbbc50f3	irqchip/gic-v3: Switch high priority PPIs over to handle_percpu_devid_irq() It so appears that handle_percpu_devid_irq() is extremely similar to handle_percpu_devid_fasteoi_nmi(), and that the differences do no justify the horrid machinery in the GICv3 driver to handle the flow handler switch. Stick with the standard flow handler, even for NMIs. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-11-maz@kernel.org	2025-10-27 17:16:34 +01:00
Marc Zyngier	f6c8aced7c	perf: arm_spe_pmu: Convert to new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the ARM SPE driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-10-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	663783e001	perf: arm_pmu: Convert to the new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the OF side of the ARM PMU driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-9-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	541454dd20	coresight: trbe: Convert to the new interrupt affinity retrieval API Now that the relevant interrupt controllers are equipped with a callback returning the affinity of per-CPU interrupts, switch the TRBE driver over to this new method. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://patch.msgid.link/20251020122944.3074811-8-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	de575de83c	irqchip/apple-aic: Add FW info retrieval support Plug the new .get_fwspec_info() callback into the Apple AIC driver, using some of the existing FIQ affinity handling infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Acked-by: Sven Peter <sven@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-7-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	68905ea65c	irqchip/gic-v3: Add FW info retrieval support Plug the new .get_fwspec_info() callback into the GICv3 core driver, using some of the existing PPI affinity handling infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-6-maz@kernel.org	2025-10-27 17:16:33 +01:00
Marc Zyngier	0d5daa938c	platform: Add firmware-agnostic irq and affinity retrieval interface Expand platform_get_irq_optional() to also return an affinity if available, renaming it to platform_get_irq_affinity() in the process. platform_get_irq_optional() is preserved with its current semantics by calling into the new helper with a NULL affinity pointer. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-5-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	5404f5c06d	of/irq: Add interrupt affinity reporting interface Plug the irq_populate_fwspec_info() helper into the OF layer to offer an interrupt affinity reporting function. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-4-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	5324fe21ba	ACPI: irq: Add interrupt affinity reporting interface Plug the irq_populate_fwspec_info() helper into the ACPI layer to offer an interrupt affinity reporting function. This is currently only supported for the CONFIG_ACPI_GENERIC_GSI configurations, but could later be extended to legacy architectures if necessary. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-3-maz@kernel.org	2025-10-27 17:16:32 +01:00
Marc Zyngier	87b0031f7f	irqdomain: Add firmware info reporting interface Add an irqdomain callback to report firmware-provided information that is otherwise not available in a generic way. This is reported using a new data structure (struct irq_fwspec_info). This callback is optional and the only information that can be reported currently is the affinity of an interrupt. However, the containing structure is designed to be extensible, allowing other potentially relevant information to be reported in the future. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251020122944.3074811-2-maz@kernel.org	2025-10-27 17:16:32 +01:00
Yazen Ghannam	83be4bee57	ACPI: PRM: Add acpi_prm_handler_available() Add a helper function to check if a PRM handler/module is present. This can be used during init time by code that depends on a particular handler. If the handler is not present, then the code does not need to be loaded. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: "Mario Limonciello (AMD)" <superm1@kernel.org> Acked-by: "Rafael J. Wysocki (Intel)" <rafael@kernel.org> Link: https://patch.msgid.link/all/20251017-wip-atl-prm-v2-1-7ab1df4a5fbc@amd.com	2025-10-27 15:45:22 +01:00
Aboorva Devarajan	07d8157012	cpuidle: menu: Use residency threshold in polling state override decisions On virtualized PowerPC (pseries) systems, where only one polling state (Snooze) and one deep state (CEDE) are available, selecting CEDE when the predicted idle duration is less than the target residency of CEDE state can hurt performance. In such cases, the entry/exit overhead of CEDE outweighs the power savings, leading to unnecessary state transitions and higher latency. Menu governor currently contains a special-case rule that prioritizes the first non-polling state over polling, even when its target residency is much longer than the predicted idle duration. On PowerPC/pseries, where the gap between the polling state (Snooze) and the first non-polling state (CEDE) is large, this behavior causes performance regressions. Refine that special case by adding an extra requirement: the first non-polling state can only be chosen if its target residency is below the defined RESIDENCY_THRESHOLD_NS. If this condition is not satisfied, polling is allowed instead, avoiding suboptimal non-polling state entries. This change is limited to the single special-case rule for the first non-polling state. The general non-polling state selection logic in the menu governor remains unchanged. Performance improvement observed with pgbench on PowerPC (pseries) system: +---------------------------+------------+------------+------------+ \| Metric \| Baseline \| Patched \| Change (%) \| +---------------------------+------------+------------+------------+ \| Transactions/sec (TPS) \| 495,210 \| 536,982 \| +8.45% \| \| Avg latency (ms) \| 0.163 \| 0.150 \| -7.98% \| +---------------------------+------------+------------+------------+ CPUIdle state usage: +--------------+--------------+-------------+ \| Metric \| Baseline \| Patched \| +--------------+--------------+-------------+ \| Total usage \| 12,735,820 \| 13,918,442 \| \| Above usage \| 11,401,520 \| 1,598,210 \| \| Below usage \| 20,145 \| 702,395 \| +--------------+--------------+-------------+ Above/Total and Below/Total usage percentages: +------------------------+-----------+---------+ \| Metric \| Baseline \| Patched \| +------------------------+-----------+---------+ \| Above % (Above/Total) \| 89.56% \| 11.49% \| \| Below % (Below/Total) \| 0.16% \| 5.05% \| \| Total cpuidle miss (%) \| 89.72% \| 16.54% \| +------------------------+-----------+---------+ The results indicate that restricting CEDE selection to cases where its residency matches the predicted idle time reduces mispredictions, lowers unnecessary state transitions, and improves overall throughput. Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> [ rjw: Changelog edits, rebase ] Link: https://patch.msgid.link/20251006013954.17972-1-aboorvad@linux.ibm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-27 14:53:46 +01:00
Borislav Petkov (AMD)	4058386498	- Remove dead code leftovers after a recent mitigations cleanup which fail a Clang build - Make sure a Retbleed mitigation message is printed only when necessary - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed - Fix a NULL ptr deref when mounting the resctrl fs on a system which supports assignable counters but where L3 total and local bandwidth monitoring has been disabled at boot -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmj+FSYACgkQEsHwGGHe VUqh5RAAwTAfMsEs57v6gQqnm/rbNjGXoZuNcT9xhk4jbRC7xCcyJrZVyYA+mWIe 5rgGOuSThOsOgqJHwVqn4kdym9yUwLradZS8gn5vHFIlDVXDoMRYJuvm8U7PdTug UWJv0uw0B393RNb+7yCeEN7Zpe2bvbh25PF66uh/7dQYKmWIaiTVlDhrZ+Ba51IK mmJzbVb6zqWrSP3heISZRjfV3rv+/SifUb+wIgWcCzcAb36fFIlUKaEYd/g5249R BBcEY5n/eUUKjMJVOki4vDqJyQdPdJCz9yH3qdZaz661Wh9/FVy/rLCQC/O1ruwt Ovoi6UJAjleb0OXfi00Hl1LT3v92xH/OwyVCamBAYyaIhTdPaoQS6YADGstt3PTx RUc/BG5wHyaOWsG94zVEvqK9MElyjW3DPiBH4E+O7OB348WAfhsbrUDnnaveDSym n2LivNnkiaXi8DpPhWL7XsJJjYAy1fi2piDrh952I5oVfhf5iYeNwFjNdtgAft7G wNr01qraqdPKfMYHZHdkaqrPH/Qy9DlLuDuTjQqtjGm8lsZK/g+txzQLfeXoDJSe RtKtRYlq0bVCOnAuA8MN4xi9H2WaKAZNgavJxywZslmaQuQzh21g7ISwxcAFe07n nevcypF1s/dnCUPK8yuKTmFzkwbg7I2OgrmX0RKZdFxY8uzg4Co= =EZGc -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.18_rc3' into x86/microcode Pick up the below urgent upstream change in order to base more work ontop: - Correct the last Zen1 microcode revision for which Entrysign sha256 check is needed Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-27 14:06:38 +01:00
Charles Mirabile	539d147ef6	irqchip/sifive-plic: Add support for UltraRISC DP1000 PLIC Add a new compatible for the plic found in UltraRISC DP1000 with a quirk to work around a known hardware bug with IRQ claiming in the UR-CP100 cores. When claiming an interrupt on UR-CP100 cores, all other interrupts must be disabled before the claim register is accessed to prevent incorrect handling of the interrupt. This is a hardware bug in the CP100 core implementation, not specific to the DP1000 SoC. When the PLIC_QUIRK_CP100_CLAIM_REGISTER_ERRATUM flag is present, a specialized handler (plic_handle_irq_cp100) disables all interrupts except for the first pending one before reading the claim register, and then restores the interrupts before further processing of the claimed interrupt continues. This implementation leverages the enable_save optimization, which maintains the current interrupt enable state in memory, avoiding additional register reads during the workaround. The driver matches on "ultrarisc,cp100-plic" to apply the quirk to all SoCs using UR-CP100 cores, regardless of the specific SoC implementation. This has no impact on other platforms. [ tglx: Condensed the code a bit, massaged change log and comments ] Co-developed-by: Zhang Xincheng <zhangxincheng@ultrarisc.com> Signed-off-by: Zhang Xincheng <zhangxincheng@ultrarisc.com> Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Samuel Holland <samuel.holland@sifive.com> Link: https://patch.msgid.link/20251024083647.475239-5-lzampier@redhat.com	2025-10-27 12:11:56 +01:00
Brendan Jackman	5385dec724	x86/mm: Unify __phys_addr_symbol() There are two implementations on 64-bit, depending on CONFIG_DEBUG_VIRTUAL, but they differ only regarding the presence of VIRTUAL_BUG_ON, which is already ifdef'd on CONFIG_DEBUG_VIRTUAL. To avoid adding a function call on non-LTO non-DEBUG_VIRTUAL builds, move the function into the header. (Note the function is already only used on 64-bit). Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/all/20250813-phys-addr-cleanup-v1-1-19e334b1c466@google.com/	2025-10-24 22:13:00 +02:00
Matthew Wilcox (Oracle)	70e0a80a1f	treewide: Remove in_irq() This old alias for in_hardirq() has been marked as deprecated since 2020; remove the stragglers. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024180654.1691095-1-willy@infradead.org	2025-10-24 21:39:27 +02:00
Charles Mirabile	14ff9e54dd	irqchip/sifive-plic: Cache the interrupt enable state Optimize the PLIC driver by maintaining the interrupt enable state in the handler's enable_save array during normal operation rather than only during suspend/resume. This eliminates the need to read enable registers during suspend and makes the enable state immediately available for other purposes. Let __plic_toggle() update both the hardware registers and the cached enable_save state atomically within the existing enable_lock protection. That allows to remove the suspend-time enable register reading since handler::enable_save now always reflects the current state. [ tglx: Massaged change log ] Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251024083647.475239-4-lzampier@redhat.com	2025-10-24 21:34:32 +02:00
Charles Mirabile	9dfb295a93	dt-bindings: interrupt-controller: Add UltraRISC DP1000 PLIC Add compatible strings for the PLIC found in UltraRISC DP1000 SoC. The PLIC is part of the UR-CP100 core and has a hardware bug requiring a workaround. Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20251024083647.475239-3-lzampier@redhat.com	2025-10-24 21:34:32 +02:00
Lucas Zampieri	e95f66dd0e	dt-bindings: vendor-prefixes: Add UltraRISC Add vendor prefix for UltraRISC Technology Co., Ltd. Signed-off-by: Lucas Zampieri <lzampier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20251024083647.475239-2-lzampier@redhat.com	2025-10-24 21:34:31 +02:00
Petr Tesarik	e9cc99142a	x86/tsx: Get the tsx= command line parameter with early_param() Use early_param() to get the value of the tsx= command line parameter. It is an early parameter, because it must be parsed before tsx_init(), which is called long before kernel_init(), where normal parameters are parsed. Although cmdline_find_option() from tsx_init() works fine, the option is later reported as unknown and passed to user space. The latter is not a real issue, but the former is confusing and makes people wonder if the tsx= parameter had any effect and double-check for typos unnecessarily. The behavior changes slightly if "tsx" is given without any argument (which is invalid syntax). Until now, the kernel logged an error message and disabled TSX. Now, the kernel still issues a warning (Malformed early option 'tsx'), but TSX state is unchanged. The new behavior is consistent with other parameters, e.g. "tsx_async_abort". [ bp: Fixup minor formatting request during review. ] Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com	2025-10-24 18:35:17 +02:00
Petr Tesarik	f018fca8f9	x86/tsx: Make tsx_ctrl_state static Move all definitions related to tsx_ctrl_state to tsx.c. They are never referenced outside this file. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/cover.1758906115.git.ptesarik@suse.com	2025-10-24 18:24:42 +02:00
Heiko Carstens	020d5dc578	s390/ap: Don't leak debug feature files if AP instructions are not available If no AP instructions are available the AP bus module leaks registered debug feature files. Change function call order to fix this. Fixes: `cccd85bfb7` ("s390/zcrypt: Rework debug feature invocations.") Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-24 15:25:56 +02:00
Jens Remus	f25d952ab6	s390/ptrace: Explicitly include <linux/typecheck.h> The psw_bits() macro makes use of typecheck() without that typecheck.h is included. Add the missing include to avoid potential future compile problems. [hca@linux.ibm.com: change commit message] Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-24 15:25:56 +02:00
Armin Wolf	2e00f7a4bb	ACPI: fan: Workaround for 64-bit firmware bug Some firmware implementations use the "Ones" ASL opcode to produce an integer with all bits set in order to indicate missing speed or power readings. This however only works when using 32-bit integers, as the ACPI spec requires a 32-bit integer (0xFFFFFFFF) to be returned for missing speed/power readings. With 64-bit integers the "Ones" opcode produces a 64-bit integer with all bits set, violating the ACPI spec regarding the placeholder value for missing readings. Work around such buggy firmware implementation by also checking for 64-bit integers with all bits set when reading _FST. Signed-off-by: Armin Wolf <W_Armin@gmx.de> [ rjw: Typo fix in the changelog ] Link: https://patch.msgid.link/20251007234149.2769-3-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-24 10:29:52 +02:00
Tamir Duberstein	33ffb0aa8c	rust: opp: simplify callers of `to_c_str_array` Use `Option` combinators to make this a bit less noisy. Wrap the `dev_pm_opp_set_config` operation in a closure and use type ascription to leverage the compiler to check for use after free. Signed-off-by: Tamir Duberstein <tamird@kernel.org> Tested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-23 20:51:17 +05:30
Rafael J. Wysocki	cea54f8e34	PM: runtime: docs: Update pm_runtime_allow/forbid() documentation Drop confusing descriptions of pm_runtime_allow() and pm_runtime_forbid() from Documentation/power/runtime_pm.rst and update the kerneldoc comments of these functions to better explain their purpose. Link: https://lore.kernel.org/linux-pm/08976178-298f-79d9-1d63-cff5a4e56cc3@linux.intel.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/12780841.O9o76ZdvQC@rafael.j.wysocki	2025-10-23 16:13:33 +02:00
Harald Freudenberger	51d921a613	s390/ap: Expose ap_bindings_complete_count counter via sysfs The AP bus udev event BINDINGS=complete is sent out when the first time all devices detected by the AP bus scan have been bound to device drivers. This is the ideal time to for example change the AP bus masks apmask and aqmask to re-establish a persistent change on the decision about which cards/domains should be available for the host and which should go into the pool for kvm guests. However, if exactly this initial udev event is sent out early in the boot process a udev rule may not have been established yet and thus this event will never be recognized. To have some indication about if the AP bus binding complete has already happened, the internal ap_bindings_complete_count counter is exposed via sysfs with this patch. Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:38 +02:00
Heiko Carstens	07a75d08cf	s390/smp: Fix fallback CPU detection In case SCLP CPU detection does not work a fallback mechanism using SIGP is in place. Since a cleanup this does not work correctly anymore: new CPUs are only considered if their type matches the boot CPU. Before the cleanup the information if a CPU type should be considered was also part of a structure generated by the fallback mechanism and indicated that a CPU type should not be considered when adding CPUs. Since the rework a global SCLP state is used instead. If the global SCLP state indicates that the CPU type should be considered and the fallback mechanism is used, there may be a mismatch with CPU types if CPUs are added. This can lead to a system with only a single CPU even tough there are many more CPUs. Address this by simply copying the boot cpu type into the generated data structure from the fallback mechanism. Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Fixes: `d08d94306e` ("s390/smp: cleanup core vs. cpu in the SCLP interface") Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:38 +02:00
Gerd Bayer	564ebcae6a	s390/pci: Highlight failure to enable PCI function Emit an error log when a PCI function cannot be enabled for use, despite being reported as configured to the system. This brings to attention situations where functions might go missing without notice. Going unnoticed is less likely when functions are added to the system through hotplug, but will produce the same error log. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-23 16:11:37 +02:00
Aaron Kling	85976d3774	cpufreq: tegra186: add OPP support and set bandwidth Add support to use OPP table from DT in Tegra186 cpufreq driver. Tegra SoC's receive the frequency lookup table (LUT) from BPMP-FW. Cross check the OPP's present in DT against the LUT from BPMP-FW and enable only those DT OPP's which are present in LUT also. The OPP table in DT has CPU Frequency to bandwidth mapping where the bandwidth value is per MC channel. DRAM bandwidth depends on the number of MC channels which can vary as per the boot configuration. This per channel bandwidth from OPP table will be later converted by MC driver to final bandwidth value by multiplying with number of channels before being handled in the EMC driver. If OPP table is not present in DT, then use the LUT from BPMP-FW directly as the CPU frequency table and not do the DRAM frequency scaling which is same as the current behavior. Signed-off-by: Aaron Kling <webgeek1234@gmail.com> [ Viresh: Fix _free() definitions ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-23 12:10:11 +05:30
Hal Feng	6e7970cab5	cpufreq: dt-platdev: Add JH7110S SOC to the allowlist Add the compatible strings for supporting the generic cpufreq driver on the StarFive JH7110S SoC. Signed-off-by: Hal Feng <hal.feng@starfivetech.com> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-23 12:10:11 +05:30
Shuhao Fu	2de5cb9606	cpufreq: s5pv210: fix refcount leak In function `s5pv210_cpu_init`, a possible refcount inconsistency has been identified, causing a resource leak. Why it is a bug: 1. For every clk_get, there should be a matching clk_put on every successive error handling path. 2. After calling `clk_get(dmc1_clk)`, variable `dmc1_clk` will not be freed even if any error happens. How it is fixed: For every failed path, an extra goto label is added to ensure `dmc1_clk` will be freed regardlessly. Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-23 12:10:11 +05:30
Viresh Kumar	173e02d674	OPP: Initialize scope-based pointers inline Uninitialized pointers with `__free` attribute can cause undefined behaviour as the memory allocated to the pointer is freed automatically when the pointer goes out of scope. The OPP core doesn't have any bugs related to this as of now, but it is better to initialize pointers marked with `__free` attribute at declaration to simplify the code and ensure proper scope-based cleanup. Reported-by: Joe Perches <joe@perches.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-23 11:58:05 +05:30
Changwoo Min	a1b17c9ac8	PM: EM: Notify an event when the performance domain changes Send an event to userspace when a performance domain is created or deleted, or its energy model is updated. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-11-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:38 +02:00
Changwoo Min	b95a0c02ad	PM: EM: Implement em_notify_pd_created/updated() Implement two event notifications when a performance domain is created (EM_CMD_PD_CREATED) and updated (EM_CMD_PD_UPDATED). The message format of these two event notifications is the same as EM_CMD_GET_PD_TABLE -- containing the performance domain's ID and its energy model table. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-10-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:38 +02:00
Changwoo Min	b2b1bbcac7	PM: EM: Implement em_notify_pd_deleted() Add the event notification infrastructure and implement the event notification for when a performance domain is deleted (EM_CMD_PD_DELETED). The event contains the ID of the performance domain (EM_A_PD_TABLE_PD_ID) so the userspace can identify the changed performance domain for further processing. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-9-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	f2d2946eaa	PM: EM: Implement em_nl_get_pd_table_doit() When a userspace requests EM_CMD_GET_PD_TABLE with an ID of a performance domain, the kernel reports back the energy model table of the specified performance domain. The message format of the response is as follows: EM_A_PD_TABLE_PD_ID (NLA_U32) EM_A_PD_TABLE_PS (NLA_NESTED)* EM_A_PS_PERFORMANCE (NLA_U64) EM_A_PS_FREQUENCY (NLA_U64) EM_A_PS_POWER (NLA_U64) EM_A_PS_COST (NLA_U64) EM_A_PS_FLAGS (NLA_U64) where EM_A_PD_TABLE_PS can be repeated as many times as there are performance states (struct em_perf_state). Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-8-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	d8eef04531	PM: EM: Implement em_nl_get_pds_doit() When a userspace requests EM_CMD_GET_PDS, the kernel responds with information on all performance domains. The message format of the response is as follows: EM_A_PDS_PD (NLA_NESTED)* EM_A_PD_PD_ID (NLA_U32) EM_A_PD_FLAGS (NLA_U64) EM_A_PD_CPUS (NLA_STRING) where EM_A_PDS_PD can be repeated as many times as there are performance domains, and EM_A_PD_CPUS is a hexadecimal string representing a CPU bitmask. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-7-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	7928339cfe	PM: EM: Add an iterator and accessor for the performance domain Add an iterator function (for_each_em_perf_domain) that iterates all the performance domains in the global list. A passed callback function (cb) is called for each performance domain. Additionally, add a lookup function (em_perf_domain_get_by_id) that searches for a performance domain by matching the ID in the global list. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-6-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	e4ed8d26c5	PM: EM: Add a skeleton code for netlink notification Add a boilerplate code for netlink notification to register the new protocol family. Also, initialize and register the netlink during booting. The initialization is called at the postcore level, which is late enough after the generic netlink is initialized. Finally, update MAINTAINERS to include new files. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-5-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	bd26631ccd	PM: EM: Add em.yaml and autogen files Add a generic netlink spec in YAML format and autogenerate boilerplate code using ynl-regen.sh to introduce a generic netlink for the energy model. It allows a userspace program to read the performance domain and its energy model. It notifies the userspace program when a performance domain is created or deleted or its energy model is updated through a multicast interface. Specifically, it supports two commands: - EM_CMD_GET_PDS: Get the list of information for all performance domains. - EM_CMD_GET_PD_TABLE: Get the energy model table of a performance domain. Also, it supports three notification events: - EM_CMD_PD_CREATED: When a performance domain is created. - EM_CMD_PD_DELETED: When a performance domain is deleted. - EM_CMD_PD_UPDATED: When the energy model table of a performance domain is updated. Finally, update MAINTAINERS to include new files. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-4-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	ee50b8bb6b	PM: EM: Expose the ID of a performance domain via debugfs For ease of debugging, let's expose the assigned ID of a performance domain through debugfs (e.g., /sys/kernel/debug/energy_model/cpu0/id). Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-3-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Changwoo Min	cbe5aeedec	PM: EM: Assign a unique ID when creating a performance domain It is necessary to refer to a specific performance domain from a userspace. For example, the energy model of a particular performance domain is updated. To this end, assign a unique ID to each performance domain to address it, and manage them in a global linked list to look up a specific one by matching ID. IDA is used for ID assignment, and the mutex is used to protect the global list from concurrent access. Note that the mutex (em_pd_list_mutex) is not supposed to hold while holding em_pd_mutex to avoid ABBA deadlock. Signed-off-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20251020220914.320832-2-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 21:44:37 +02:00
Sakari Ailus	b889ed5abf	ACPI: property: Rework acpi_graph_get_next_endpoint() Rework the code obtaining the next endpoint in acpi_graph_get_next_endpoint(). The resulting code removes unnecessary contitionals and should be easier to follow. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Link: https://patch.msgid.link/20251001104320.1272752-4-sakari.ailus@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 16:57:00 +02:00
Sakari Ailus	5d010473cd	ACPI: property: Use ACPI functions in acpi_graph_get_next_endpoint() only Calling fwnode_get_next_child_node() in ACPI implementation of the fwnode property API is somewhat problematic as the latter is used in the impelementation of the former. Instead of using fwnode_get_next_child_node() in acpi_graph_get_next_endpoint(), call acpi_get_next_subnode() directly instead. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251001104320.1272752-3-sakari.ailus@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 16:57:00 +02:00
Sakari Ailus	159e851108	ACPI: property: Make acpi_get_next_subnode() static acpi_get_next_subnode() is only used in drivers/acpi/property.c. Remove its prototype from include/linux/acpi.h and make it static. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251001104320.1272752-2-sakari.ailus@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 16:57:00 +02:00
Huisong Li	945661d581	ACPI: processor: idle: Relocate state flags initialization Since acpi_processor_setup_cstates() is a more logical place for setting idle state flags than acpi_processor_setup_cpuidle_cx(), move that code from the latter to the former. It also allows direct references to acpi_idle_driver in acpi_processor_setup_cpuidle_cx() to be avoided. No intentional functional impact. Signed-off-by: Huisong Li <lihuisong@huawei.com> [ rjw: Subject and changelog rewrite ] Link: https://patch.msgid.link/20250929093754.3998136-5-lihuisong@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-22 16:47:40 +02:00
Tamir Duberstein	e6fdbe8fea	rust: opp: fix broken rustdoc link Correct the spelling of "CString" to make the link work. Fixes: `ce32e2d47c` ("rust: opp: Add abstractions for the configuration options") Signed-off-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2025-10-22 09:26:20 +05:30
Heiko Carstens	5c02c74dd4	Merge branch 'ap-bus-trace-events' Harald Freudenberger says: ==================== Investigations related to runtime of crypto requests has revealed a lack of performance or runtime information with crypto requests. There are the two zcrypt ioctl trace events covering the entry and exit of an ioctl with crypto requests giving the overall runtime within the kernel. However, there is no way to figure out the time where a request is enqueued into the AP bus queue but not pushed into the firmware queue. Then there is no information about the runtime of an request during processing in the firmware. And finally some info about pulling the reply from the firmware and delivering it into user space is missing. This series is aiming to provide a way to collect measurements which can be used to cover these runtime information for each crypto request/reply. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:10:55 +02:00
Harald Freudenberger	9c11918040	s390/ap: Introduce new AP nqap and dqap trace events Introduce two new AP bus related tracepoint events: - There is a tracepoint s390_ap_nqap event immediately after a request has been pushed into the AP firmware queue with the NQAP AP command. - The other tracepoint s390_ap_dqap event fires immediately after a reply has been pulled out of the AP firmware queue via DQAP AP command. Both events are triggered unconditional and may need filtering. Filtering can be done based on the status value which is part of the nqap and dqap trace. So for example a echo "!(status & 0x00ff0000)" >.../s390_ap_dqap/filter filters out all trace events which have a response_code != 0 leaving just the successful nqap and dqap invocations. The idea of these two trace events focuses on performance to measure the runtime of a crypto request/reply as close as possible at the firmware level. In combination with the two zcrypt tracepoints (see the zcrypt.h trace event definition file) this gives measurement data about the runtime of a request/reply within the zcrpyt and AP bus layer. However, with having the status of these AP commands in hand also other usage may be possible. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Harald Freudenberger	7f124d78d4	s390/ap: Extend struct ap_queue_status with some convenience fields Sometimes there is a different view of the AP status word needed. So here is slight rework of the struct ap_queue_status to open up the possibility to have different ways of accessing the AP status bits and fields. The new struct ap_queue_status struct ap_queue_status { union { unsigned int value : 32; struct { unsigned int status_bits : 8; unsigned int rc : 8; unsigned int : 16; }; struct { unsigned int queue_empty : 1; unsigned int replies_waiting : 1; unsigned int queue_full : 1; unsigned int : 3; unsigned int async : 1; unsigned int irq_enabled : 1; unsigned int response_code : 8; unsigned int : 16; }; }; }; comprises the old struct ap_queue_status but extends it to have this also accessible as an unsigned int required for example for a simple print or trace of the whole value. Note that this rework is fully backward compatible to the existing code exploiting the struct ap_queue_status. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Harald Freudenberger	507cff242a	s390/zcrypt: Rework zcrypt request and reply trace event definition This is a slight rework of the s390_zcrypt_req and s390_zcrypt_rep trace event: - the psmid has been added to the s390_zcrypt_rep - "dev" renamed to "card" - "domain" renamed to "dom" The motivation of these changes is to make these traces more aligned to new upcoming traces for AP bus related trace events. Additionally the psmid is needed to match the reply (and thus indirect the request) to AP bus related trace events where only the psmid is unique identifying AP messages. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 11:09:21 +02:00
Josephine Pfeiffer	215231deea	s390/ptdump: Use seq_puts() in pt_dump_seq_puts() macro The pt_dump_seq_puts() macro incorrectly uses seq_printf() instead of seq_puts(). This is both a performance issue and conceptually wrong, as the macro name suggests plain string output (puts) but the implementation uses formatted output (printf). The macro is used in dump_pagetables.c:67-68 and 131 to output constant strings. Using seq_printf() adds unnecessary overhead for format string parsing. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:29:50 +02:00
Heiko Carstens	5e09c0a03e	Merge branch 'tape-block-sizes' Jan Höppner says: ==================== The tape device driver is limited to a block size of 65535 bytes since a single CCW can only transfer up to 64K-1 bytes (The count field is a 16bit value). This series introduces data chaining for all read/write functions to support block sizes larger than 65535. The tape device type 3490 (emulated) and 3590/3592 can handle up to 256K. [1] [1] https://www.ibm.com/docs/en/zos/3.1.0?topic=blksize-system-determined-block-size ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:26:26 +02:00
Jan Höppner	319d3d6653	s390/tape: Add support for bigger block sizes The tape device type 3590/3592 and emulated 3490 VTS can handle a block size of up to 256K bytes. Currently the tape device driver is limited to a block size of 65535 bytes (64K-1). This limitation stems from the maximum of 65535 bytes of data that can be transferred with one Channel-Command Word (CCW). To work around this limitation data chaining is used which uses several CCW to transfer an entire 256K block of data. A single CCW holds a maximum of 65535 bytes of data. Set MAX_BLOCKSIZE to 262144 (= 256K) to allow for data transfers with larger block sizes. The read_block() and write_block() discipline functions calculate the number of CCWs required based on the IDAL buffer array size that was created for a given block size. If there is more than one CCW required for the data transfer, the new helper function tape_ccw_dc_idal() is used to build the data chain accordingly. The Interruption-Repsonse Block (irb) is added to the tape_request struct so that the tapechar_read/write() functions can analyze what data was read or written accordingly. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	574817d6c0	s390/tape: Introduce idal buffer array The tape device driver uses a single idal_buffer for I/O. While the buffer itself can be arbitrary big, the limit for data transfer for a single Channel-Command Word is at 65535 bytes (64K-1) since the count field specifying the amount of data designated by the CCW is a 16-bit unsigned value. Provide functionality that allocates an array of multiple IDAL buffer with the limitation mentioned above in mind. A call to idal_buffer_array_alloc() allocates an array with a certain amount of IDAL buffers which is determined based on the total size of @size. Each individual buffer is limited to a size of CCW_MAX_BYTE_COUNT (65535 bytes). Add helper functions that determine the size (# of elements) and the total data size covered by the array as well. Current users of the single IDAL buffer are adapted to use the new functions with one buffer to allocate. The single IDAL buffer is removed from the tape_char_data struct. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	a5e2ca22c1	s390/tape: Move idal allocation to core functions Currently tapechar_check_idalbuffer() is part of tape_char.c and is used to ensure the idal buffer is big enough for the requested I/O and reallocates a new one if required. The same is done in tape_std.c when a fixed block size is set using the mtsetblk command. This is essentially duplicate code. The allocation of the buffer that is required for I/O can be considered core functionality. Move the idal buffer allocation to tape_core.c, make it generally available, and reduce code duplication. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	e039400f75	s390/tape: Fix return value of ccw helper functions In contrast to all other helper functions used to build CCW chains, tape_ccw_cc_idal() and tape_ccw_end_idal() return values using post-increments, which results in returning the same CCW pointer. Though, the intent of the CCW helper functions is to return the _next_ CCW in the chain, which can then be processed. There is currently no actual issue, as tape_ccw_cc_idal() is not used yet and tape_ccw_end_idal() is only used at the end of a chain. Change both functions return statement to ccw + 1 and bring them in line with the other helper functions. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:55 +02:00
Jan Höppner	83cff1b124	s390/tape: Remove extra CCW allocation for error recovery The Read Opposite error recovery code required 2 extra CCWs to be allocated in order to transform the request. As this error recovery code for both 34xx and 3590 was removed the additional allocation isn't required anymore. Reduce it to two. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	a984d71277	s390/tape: Remove 3590 Read Opposite error recovery On old native type 3590 tape devices a Read Opposite error recovery procedure on Error Recovery Action Code (ERA) 26 was issued if a Read Forward command failed. This recovery procedure was implemented with the Read Backward command. This is no longer supported. Remove 3590 ERA 26 and Read Backward related recovery code. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	1b9df1a28f	s390/tape: Remove 34xx Read Opposite error recovery On old native type 3490 tape devices a Read Opposite error recovery procedure on Error Recovery Action Code (ERA) 26 was issued if a Read Forward command failed. This recovery procedure was implemented with the Read Backward command. As a preparation for a subsequent commit, that adds support for bigger block sizes, remove the 34xx ERA 26 related recovery code. The recovery code would need to be adapted to the bigger block sizes, without any possibility to be tested, as modern Virtual Tape Servers (VTS) do neither report ERA 26 on a Read Forward command failure nor support the error recovery procedure anymore. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Jan Höppner	39376c77a5	s390/tape: Remove count parameter from read/write_block functions The count parameter of the read/write_block discipline functions was never used. Remove it. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:25:54 +02:00
Heiko Carstens	ec9b3b85ea	Merge branch 'memory-hotplug' Sumanth Korikkar says: ==================== Provide a new interface for dynamic configuration and deconfiguration of hotplug memory on s390, allowing with/without memmap_on_memory support. It is a follow up on the discussion with David when introducing memmap_on_memory support for s390 and support dynamic (de)configuration of memory: https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/ https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/ The original motivation for introducing memmap_on_memory on s390 was to avoid using online memory to store struct pages metadata, particularly for standby memory blocks. This became critical in cases where there was an imbalance between standby and online memory, potentially leading to boot failures due to insufficient memory for metadata allocation. To address this, memmap_on_memory was utilized on s390. However, in its current form, it adds struct pages metadata at the start of each memory block at the time of addition (only standby memory), and this configuration is static. It cannot be changed at runtime (When the user needs continuous physical memory). Inorder to provide more flexibility to the user and overcome the above limitation, add an option to dynamically configure and deconfigure hotpluggable memory block with/without memmap_on_memory. With the new interface, s390 will not add all possible hotplug memory in advance, like before, to make it visible in sysfs for online/offline actions. Instead, before memory block can be set online, it has to be configured via a new interface in /sys/firmware/memory/memoryX/config, which makes s390 similar to others. i.e. Adding of hotpluggable memory is controlled by the user instead of adding it at boottime. s390 kernel sysfs interface to configure/deconfigure memory with memmap_on_memory (with upcoming lsmem changes): * Initial memory layout: lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Configure memory echo 1 > /sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0x87ffffff 128M offline 16 yes yes 0x88000000-0xffffffff 1.9G offline 17-31 no yes * Deconfigure memory echo 0 > /sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Enable memmap_on_memory and online it. (Deconfigure first) echo 0 > /sys/devices/system/memory/memory5/online echo 0 > /sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no no 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes (Enable memmap_on_memory and online it) echo 1 > /sys/firmware/memory/memory5/memmap_on_memory echo 1 > /sys/firmware/memory/memory5/config echo 1 > /sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M online 5 yes yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Disable memmap_on_memory and online it. (Deconfigure first) echo 0 > /sys/devices/system/memory/memory5/online echo 0 > /sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes (Disable memmap_on_memory and online it) echo 0 > /sys/firmware/memory/memory5/memmap_on_memory echo 1 > /sys/firmware/memory/memory5/config echo 1 > /sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Userspace changes: lsmem/chmem tool is also changed to use the new interface. I will send it to util-linux soon. Patch 1 adds support for removal of boot-allocated memory blocks. Patch 2 provides option to dynamically configure and deconfigure memory with/without memmap_on_memory. Patch 3 removes MHP_OFFLINE_INACCESSIBLE from s390. The mhp flag was used to mark memory as not accessible until memory hotplug online phase begins. However, with patch 2, it is no longer essential. Memory can be brought to accessible state before adding memory, as the memory is added during runttime now instead of boottime. Patch 4 removes the MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. It is no longer needed. Memory can be brought to accessible state before adding memory now, with runtime (de)configuration of memory. Note: The patches apply to the linux-next branch. v3: Thanks David * Avoid goto label in create_standby_sclp_mems(). * Use unsigned long instead of u64. * Add Acked-by. v2: Thanks David * Rename struct mblock/mblock_arg with struct sclp_mem/sclp_mem_arg. * Rename all mblocks/mblock references with sclp_mems/sclp_mem - structures, functions. * Rename create_online_mblock() with create_configured_sclp_mem(). * Rename config_mblock_show()/config_mblock_store() with config_sclp_mem_show()/config_sclp_mem_store(). * Remove contains_standby_increment() and sclp_mem_notifier. sclp mem state change is performed when adding/removing memory. sclp memory notifier - no longer needed with this patchset. * Recover sclp mem state when add_memory() fails. * Refactor and add function init_sclp_mem(). * Use unsigned long instead of unsigned long long. * Simplify and correct kobj handling. Thanks Heiko. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:19:30 +02:00
Heiko Carstens	c97689345c	s390/con3270: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	c769941de8	s390/tape: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	ffb5d3af5e	s390/dcss: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	ba06238bbe	s390/cio: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Heiko Carstens	6850221116	s390/early: Use scnprintf() instead of sprintf() Use scnprintf() instead of sprintf() for those cases where the destination is an array and the size of the array is known at compile time. This prevents theoretical buffer overflows, but also avoids that people again and again spend time to figure out if the code is actually safe. Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:30 +02:00
Thomas Richter	4d065f3c80	s390/pai_crypto: Adjust paicrypt_copy() return statement Adjust the return statement in paicrypt_copy() to the same statement as in paiext_copy(). Use one common style. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	4738e11662	s390/sysinfo: Replace sprintf() with snprintf() for buffer safety Replace sprintf() with snprintf() when formatting symlink target name to prevent potential buffer overflow. The link_to buffer is only 10 bytes, and using snprintf() ensures proper bounds checking if the topology nesting limit value is unexpectedly large. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	5379879a76	s390/extmem: Replace sprintf() with snprintf() for buffer safety Replace unsafe sprintf() calls with snprintf() in segment_save() to prevent potential buffer overflows. The function builds command strings by repeatedly appending to a fixed-size buffer, which could overflow if segment ranges are numerous or values are large. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:29 +02:00
Josephine Pfeiffer	dd7d1d34ae	s390/cmm: Replace sprintf() with scnprintf() for buffer safety Replace sprintf() with scnprintf() in cmm_timeout_handler() to prevent potential buffer overflow. The scnprintf() function ensures we don't write beyond the buffer size and provides safer string formatting. Signed-off-by: Josephine Pfeiffer <hi@josie.lol> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-21 10:17:20 +02:00
Thorsten Blum	ace0471774	cpufreq: Replace deprecated strcpy() in cpufreq_unregister_governor() strcpy() is deprecated; assign the NUL terminator directly instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> [ rjw: Subject tweaks ] Link: https://patch.msgid.link/20251017153354.82009-2-thorsten.blum@linux.dev Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-20 21:25:36 +02:00
Rafael J. Wysocki	5313ec4a21	cpufreq: intel_pstate: Improve printing of debug messages Some debug messages generated by intel_pstate on a given hybrid system are only printed for some CPUs which is confusing, so modify the driver to print them for all CPUs. Also change those messages to avoid printing local variable names in them. Moreover, some debug messages printed by intel_pstate are quite hard to understand without looking at the code printing them, so make them somewhat clearer while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8609836.T7Z3S40VBb@rafael.j.wysocki	2025-10-20 21:23:36 +02:00
Rafael J. Wysocki	d852b6f67b	cpufreq: intel_pstate: hybrid: Adjust energy model rules Instead of using HWP-to-frequency scaling factors for computing cost coefficients in the energy model used on hybrid systems, which is fragile, rely on CPU type information that is easily accessible now and the information on whether or not L3 cache is present for this purpose. This also allows the cost coefficients for P-cores to be adjusted so that they start to be populated somewhat earlier (that is, before E-cores are loaded up to their full capacity). In addition to the above, replace an inaccurate comment regarding the reason why the freq value is added to the cost in hybrid_get_cost(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Link: https://patch.msgid.link/5932894.DvuYhMxLoT@rafael.j.wysocki	2025-10-20 21:22:21 +02:00
Rafael J. Wysocki	c17add7349	cpufreq: intel_pstate: Add and use hybrid_has_l3() Introduce a function for checking whether or not a given CPU has L3 cache, called hybrid_has_l3(), and use it in hybrid_get_cost() for computing cost coefficients associated with a given perf domain. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/13884343.uLZWGnKmhe@rafael.j.wysocki	2025-10-20 21:20:49 +02:00
Rafael J. Wysocki	528dde6619	cpufreq: intel_pstate: Add and use hybrid_get_cpu_type() Introduce a function for identifying the type of a given CPU in a hybrid system, called hybrid_get_cpu_type(), and use if for hybrid scaling factor determination in hwp_get_cpu_scaling(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1954386.tdWV9SEqCh@rafael.j.wysocki	2025-10-20 21:20:49 +02:00
Zihuan Zhang	6db0f533d3	cpufreq: preserve freq_table_sorted across suspend/hibernate During S3/S4 suspend and resume, cpufreq policies are not freed or recreated; the freq_table and policy structure remain intact. However, set_freq_table_sorted() currently resets policy->freq_table_sorted to UNSORTED unconditionally, which is unnecessary since the table order does not change across suspend/resume. This patch adds a check to skip validation if policy->freq_table_sorted is already ASCENDING or DESCENDING. This avoids unnecessary traversal of the frequency table on S3/S4 resume or repeated online events, reducing overhead while preserving correctness. Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/20251011072420.11495-1-zhangzihuan@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-20 21:01:35 +02:00
Rafael J. Wysocki	d3db87f89c	PM: hibernate: Rework message printing in swsusp_save() The messages printed by swsusp_save() are basically only useful for debug, so printing them every time a hibernation image is created at the "info" log level is not particularly useful. Also printing a message on a failing memory allocation is redundant. Use pm_deferred_pr_dbg() for printing those messages so they will only be printed when requested and the "deferred" variant is used because this code runs in a deeply atomic context (one CPU with interrupts off, no functional devices). Also drop the useless message printed when memory allocations fails. While at it, extend one of the messages in question so it is less cryptic. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ rjw: Dropped a useless colon at the end of one of the messages ] Link: https://patch.msgid.link/10750389.nUPlyArG6x@rafael.j.wysocki	2025-10-20 20:43:09 +02:00
Rafael J. Wysocki	32ece31db4	ACPI: PM: s2idle: Only retrieve constraints when needed The evaluation of LPS0 _DSM Function 1 in lps0_device_attach() may be useless if pm_debug_messages_on is never set. For this reason, instead of evaluating it in lps0_device_attach(), do that in a new .begin() callback for s2idle, acpi_s2idle_begin_lps0(), only when pm_debug_messages_on is set at that point. However, never attempt to evaluate LPS0 _DSM Function 1 more than once to avoid recurring failures. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/3027060.e9J7NaK4W3@rafael.j.wysocki	2025-10-20 20:39:33 +02:00
Rafael J. Wysocki	bfc09902de	ACPI: PM: s2idle: Staticise LPS0 callback functions The LPS0 callback functions in x86/s2idle.c can be made static, so do that and remove their declarations from sleep.h. While at it, add the _lps0 suffix to their names to indicate that they are LPS0-specific. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/2254836.irdbgypaU6@rafael.j.wysocki	2025-10-20 20:39:33 +02:00
Rafael J. Wysocki	a00f3dea03	ACPI: PM: s2idle: Drop acpi_get_lps0_constraint() Drop unused function acpi_get_lps0_constraint(). No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/5032801.GXAFRqVoOG@rafael.j.wysocki	2025-10-20 20:39:33 +02:00
Christophe JAILLET	ac646f4495	genirq/msi: Slightly simplify msi_domain_alloc() The return value of irq_find_mapping() is only tested, not used for anything else. Replaced it by irq_resolve_mapping() which is internally used by irq_find_mapping() and allows a simple boolean decision. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/1ce680114cdb8d40b072c54d7f015696a540e5a6.1760863194.git.christophe.jaillet@wanadoo.fr	2025-10-20 20:18:48 +02:00
Sergey Senozhatsky	a67818f745	PM: dpm_watchdog: add module param to backtrace all CPUs Add dpm_watchdog_all_cpu_backtrace module parameter which controls all CPU backtrace dump before the DPM watchdog panics the system. This is expected to help understand what might have caused device timeout. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Link: https://patch.msgid.link/20251007063551.3147937-1-senozhatsky@chromium.org [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-20 20:07:02 +02:00
Kaushlendra Kumar	5a151c2328	PM: sleep: Introduce CALL_PM_OP() macro to simplify code Add CALL_PM_OP() macro to eliminate a repetitive code pattern in power management generic operations. Replace analogous driver PM callback invocation logic across all pm_generic_*() functions with a single macro that handles the NULL pointer checks and function calls. This reduces code size while maintaining the same functionality and improving code maintainability. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Link: https://patch.msgid.link/20250919124437.3075016-1-kaushlendra.kumar@intel.com [ rjw: Subject and changelog edits, adjust white space ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-20 19:54:25 +02:00
Malaya Kumar Rout	b57100a3d9	PM: console: Fix memory allocation error handling in pm_vt_switch_required() The pm_vt_switch_required() function fails silently when memory allocation fails, offering no indication to callers that the operation was unsuccessful. This behavior prevents drivers from handling allocation errors correctly or implementing retry mechanisms. By ensuring that failures are reported back to the caller, drivers can make informed decisions, improve robustness, and avoid unexpected behavior during critical power management operations. Change the function signature to return an integer error code and modify the implementation to return -ENOMEM when kmalloc() fails. Update both the function declaration and the inline stub in include/linux/pm.h to maintain consistency across CONFIG_VT_CONSOLE_SLEEP configurations. The function now returns: - 0 on success (including when updating existing entries) - -ENOMEM when memory allocation fails This change improves error reporting without breaking existing callers, as the current callers in drivers/video/fbdev/core/fbmem.c already ignore the return value, making this a backward-compatible improvement. Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251013193028.89570-1-mrout@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-18 14:38:23 +02:00
Johan Hovold	a7f25e00c4	irqchip/qcom-irq-combiner: Rename driver structure The "_probe" suffix of the driver structure name prevents modpost from warning about section mismatches so replace it to catch any future issues like the recently fixed probe function being incorrectly marked as __init. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-17 15:18:18 +02:00
Yazen Ghannam	6553c68bc7	RAS/AMD/ATL: Return error codes from helper functions Pass up error codes from helper functions rather than discarding them. Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-17 14:38:42 +02:00
Elena Reshetova	0f2753efc5	x86/sgx: Enable automatic SVN updates for SGX enclaves == Background == ENCLS[EUPDATESVN] is a new SGX instruction [1] which allows enclave attestation to include information about updated microcode SVN without a reboot. Before an EUPDATESVN operation can be successful, all SGX memory (aka. EPC) must be marked as “unused” in the SGX hardware metadata (aka.EPCM). This requirement ensures that no compromised enclave can survive the EUPDATESVN procedure and provides an opportunity to generate new cryptographic assets. == Solution == Attempt to execute ENCLS[EUPDATESVN] every time the first file descriptor is obtained via sgx_(vepc_)open(). In the most common case the microcode SVN is already up-to-date, and the operation succeeds without updating SVN. Note: while in such cases the underlying crypto assets are regenerated, it does not affect enclaves' visible keys obtained via EGETKEY instruction. If it fails with any other error code than SGX_INSUFFICIENT_ENTROPY, this is considered unexpected and the open() returns an error. This should not happen in practice. On contrary, SGX_INSUFFICIENT_ENTROPY might happen due to a pressure on the system's DRNG (RDSEED) and therefore the open() can be safely retried to allow normal enclave operation. [1] Runtime Microcode Updates with Intel Software Guard Extensions, https://cdrdv2.intel.com/v1/dl/getContent/648682 Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	4e75697faa	x86/sgx: Implement ENCLS[EUPDATESVN] All running enclaves and cryptographic assets (such as internal SGX encryption keys) are assumed to be compromised whenever an SGX-related microcode update occurs. To mitigate this assumed compromise the new supervisor SGX instruction ENCLS[EUPDATESVN] can generate fresh cryptographic assets. Before executing EUPDATESVN, all SGX memory must be marked as unused. This requirement ensures that no potentially compromised enclave survives the update and allows the system to safely regenerate cryptographic assets. Add the method to perform ENCLS[EUPDATESVN]. However, until the follow up patch that wires calling sgx_update_svn() from sgx_inc_usage_count(), this code is not reachable. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	7b502832ee	x86/sgx: Define error codes for use by ENCLS[EUPDATESVN] Add error codes for ENCLS[EUPDATESVN], then SGX CPUSVN update process can know the execution state of EUPDATESVN and notify userspace. EUPDATESVN will be called when no active SGX users is guaranteed. Only add the error codes that can legally happen. E.g., it could also fail due to "SGX not ready" when there's SGX users but it wouldn't happen in this implementation. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:09 -07:00
Elena Reshetova	6ffdb49101	x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag Add a flag indicating whenever ENCLS[EUPDATESVN] SGX instruction is supported. This will be used by SGX driver to perform CPU SVN updates. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:08 -07:00
Elena Reshetova	483fc19e9c	x86/sgx: Introduce functions to count the sgx_(vepc_)open() Currently, when SGX is compromised and the microcode update fix is applied, the machine needs to be rebooted to invalidate old SGX crypto-assets and make SGX be in an updated safe state. It's not friendly for the cloud. To avoid having to reboot, a new ENCLS[EUPDATESVN] is introduced to update SGX environment at runtime. This process needs to be done when there's no SGX users to make sure no compromised enclaves can survive from the update and allow the system to regenerate crypto-assets. For now there's no counter to track the active SGX users of host enclave and virtual EPC. Introduce such counter mechanism so that the EUPDATESVN can be done only when there's no SGX users. Define placeholder functions sgx_inc/dec_usage_count() that are used to increment and decrement such a counter. Also, wire the call sites for these functions. Encapsulate the current sgx_(vepc_)open() to __sgx_(vepc_)open() to make the new sgx_(vepc_)open() easy to read. The definition of the counter itself and the actual implementation of sgx_inc/dec_usage_count() functions come next. Note: The EUPDATESVN, which may fail, will be done in sgx_inc_usage_count(). Make it return 'int' to make subsequent patches which implement EUPDATESVN easier to review. For now it always returns success. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Nataliia Bondarevska <bondarn@google.com>	2025-10-16 14:42:08 -07:00
Nam Cao	dce7450093	PCI/MSI: Delete pci_msi_create_irq_domain() pci_msi_create_irq_domain() is now unused. Delete it. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com>	2025-10-16 21:09:52 +02:00
Samuel Holland	3a16b05384	irqchip/riscv-imsic: Inline imsic_vector_from_local_id() This function is only called from one place, which is in the interrupt handling hot path. Inline it to improve code generation and to take advantage of this_cpu operations. lpriv and imsic->base_domain can never be NULL because irq_set_chained_handler() is called after they are allocated. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Samuel Holland	79eaabc61d	irqchip/riscv-imsic: Embed the vector array in lpriv Reduce pointer chasing and the number of allocations by using a flexible array member for the vector array instead of a separate allocation. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Samuel Holland	c475c0b713	irqchip/riscv-imsic: Remove redundant irq_data lookups imsic_irq_set_affinity() already takes the irq_data pointer as a parameter, so it is pointless to look it up again from the IRQ number. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Johan Hovold	dcc31768ff	irqchip/ts4800: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:28 +02:00
Johan Hovold	b03127a4e7	irqchip/mvebu-pic: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch>	2025-10-16 18:17:28 +02:00
Johan Hovold	867c6aa283	irqchip/meson-gpio: Drop unused module alias The driver has never supported anything but OF probing so drop the unused platform alias that was erroneously added by commit `a947aa00ed` ("irqchip/meson-gpio: Make it possible to build as a module"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 18:17:27 +02:00
Johan Hovold	1230fbb225	irqchip: Enable compile testing of Broadcom drivers There seems to be nothing preventing the Broadcom drivers from being compile tested so enable that for wider build coverage. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 18:17:27 +02:00
Johan Hovold	1e3e330c07	irqchip: Pass platform device to platform drivers The IRQCHIP_PLATFORM_DRIVER macros can be used to convert OF irqchip drivers to platform drivers but currently reuse the OF init callback prototype that only takes OF nodes as arguments. This forces drivers to do reverse lookups of their struct devices during probe if they need them for things like dev_printk() and device managed resources. Half of the drivers doing reverse lookups also currently fail to release the additional reference taken during the lookup, while other drivers have had the reference leak plugged in various ways (e.g. using non-intuitive cleanup constructs which still confuse static checkers). Switch to using a probe callback that takes a platform device as its first argument to simplify drivers and plug the remaining (mostly benign) reference leaks. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Fixes: `70afdab904` ("irqchip: Add IMX MU MSI controller driver") Fixes: `a6199bb514` ("irqchip: Add Qualcomm MPM controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>	2025-10-16 18:17:27 +02:00
Randy Dunlap	762a3d1ca2	x86/idtentry: Add missing '' to kernel-doc lines Fix kernel-doc warnings by adding the missing '' to each line. Warning: include/asm/idtentry.h:395 bad line: when raised from kernel mode Warning: include/asm/idtentry.h:405 bad line: when raised from user mode Since this is in a kernel-doc block, these lines need a leading " *" on each line to prevent the warnings. Fixes: `a13644f3a5` ("x86/entry/64: Add entry code for #VC handler") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-16 17:45:42 +02:00
Johan Hovold	3540d99c03	irqchip: Drop leftover brackets Drop some unnecessary brackets in platform_irqchip_probe() mistakenly left by commit `9322d1915f` ("irqchip: Plug a OF node reference leak in platform_irqchip_probe()"). Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>	2025-10-16 11:30:38 +02:00
Johan Hovold	9b685058ca	irqchip/qcom-irq-combiner: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the probe callback must not live in init. Fixes: `f20cc9b00c` ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 11:30:38 +02:00
Johan Hovold	f798bdb9aa	irqchip/starfive-jh8100: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `e4e5350361` ("irqchip: Add StarFive external interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>	2025-10-16 11:30:38 +02:00
Johan Hovold	5b338fbb2b	irqchip/renesas-rzg2l: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `d011c022ef` ("irqchip/renesas-rzg2l: Add support for RZ/Five SoC") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>	2025-10-16 11:30:38 +02:00
Johan Hovold	64acfd8e68	irqchip/imx-mu-msi: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `70afdab904` ("irqchip: Add IMX MU MSI controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2025-10-16 11:30:38 +02:00
Johan Hovold	bbe1775924	irqchip/irq-brcmstb-l2: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `51d9db5c8f` ("irqchip/irq-brcmstb-l2: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	bfc0c5beab	irqchip/irq-bcm7120-l2: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callbacks must not live in init. Fixes: `3ac268d5ed` ("irqchip/irq-bcm7120-l2: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	e9db5332ca	irqchip/irq-bcm7038-l1: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `c057c799e3` ("irqchip/irq-bcm7038-l1: Switch to IRQCHIP_PLATFORM_DRIVER") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	a8452d1d59	irqchip/bcm2712-mip: Fix section mismatch Platform drivers can be probed after their init sections have been discarded so the irqchip init callback must not live in init. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Johan Hovold	0435bcc4e5	irqchip/bcm2712-mip: Fix OF node reference imbalance The init callback must not decrement the reference count of the provided irqchip OF node. This should not cause any trouble currently, but if the driver ever starts probe deferring it could lead to warnings about reference underflow and saturation. Fixes: `32c6c05466` ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>	2025-10-16 11:30:37 +02:00
Chang S. Bae	bffeb2fd0b	x86/microcode/intel: Enable staging when available With staging support implemented, enable it when the CPU reports the feature. [ bp: Sort in the MSR properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:50 +02:00
Chang S. Bae	4ab410287b	x86/microcode/intel: Support mailbox transfer The functions for sending microcode data and retrieving the next offset were previously placeholders, as they need to handle a specific mailbox format. While the kernel supports similar mailboxes, none of them are compatible with this one. Attempts to share code led to unnecessary complexity, so add a dedicated implementation instead. [ bp: Sort the include properly. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:43 +02:00
Chang S. Bae	afc3b50954	x86/microcode/intel: Implement staging handler Previously, per-package staging invocations and their associated state data were established. The next step is to implement the actual staging handler according to the specified protocol. Below are key aspects to note: (a) Each staging process must begin by resetting the staging hardware. (b) The staging hardware processes up to a page-sized chunk of the microcode image per iteration, requiring software to submit data incrementally. (c) Once a data chunk is processed, the hardware responds with an offset in the image for the next chunk. (d) The offset may indicate completion or request retransmission of an already transferred chunk. As long as the total transferred data remains within the predefined limit (twice the image size), retransmissions should be acceptable. Incorporate them in the handler, while data transmission and mailbox format handling are implemented separately. [ bp: Sort the headers in a reversed name-length order. ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:37 +02:00
Chang S. Bae	079b90d4ba	x86/microcode/intel: Define staging state struct Define a staging_state struct to simplify function prototypes by consolidating relevant data, instead of passing multiple local variables. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:47:31 +02:00
Chang S. Bae	740144bc6b	x86/microcode/intel: Establish staging control logic When microcode staging is initiated, operations are carried out through an MMIO interface. Each package has a unique interface specified by the IA32_MCU_STAGING_MBOX_ADDR MSR, which maps to a set of 32-bit registers. Prepare staging with the following steps: 1. Ensure the microcode image is 32-bit aligned to match the MMIO register size. 2. Identify each MMIO interface based on its per-package scope. 3. Invoke the staging function for each identified interface, which will be implemented separately. [ bp: Improve error logging. ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/all/871pznq229.ffs@tglx	2025-10-15 16:47:20 +02:00
Chang S. Bae	7cdda85ed9	x86/microcode: Introduce staging step to reduce late-loading time As microcode patch sizes continue to grow, late-loading latency spikes can lead to timeouts and disruptions in running workloads. This trend of increasing patch sizes is expected to continue, so a foundational solution is needed to address the issue. To mitigate the problem, introduce a microcode staging feature. This option processes most of the microcode update (excluding activation) on a non-critical path, allowing CPUs to remain operational during the majority of the update. By offloading work from the critical path, staging can significantly reduce latency spikes. Integrate staging as a preparatory step in late-loading. Introduce a new callback for staging, which is invoked at the beginning of load_late_stop_cpus(), before CPUs enter the rendezvous phase. Staging follows an opportunistic model: * If successful, it reduces CPU rendezvous time * Even though it fails, the process falls back to the legacy path to finish the loading process but with potentially higher latency. Extend struct microcode_ops to incorporate staging properties, which will be implemented in the vendor code separately. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Anselm Busse <abusse@amazon.de> Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:46:58 +02:00
Chang S. Bae	ed44a5625f	x86/cpu/topology: Make primary thread mask available with SMP=n cpu_primary_thread_mask is only defined when CONFIG_SMP=y. However, even in UP kernels there is always exactly one CPU, which can reasonably be treated as the primary thread. Historically, topology_is_primary_thread() always returned true with CONFIG_SMP=n. A recent commit: `4b455f5994` ("cpu/SMT: Provide a default topology_is_primary_thread()") replaced it with a generic implementation with the note: "When disabling SMT, the primary thread of the SMT will remain enabled/active. Architectures that have a special primary thread (e.g. x86) need to override this function. ..." For consistency and clarity, make the primary thread mask available regardless of SMP, similar to cpu_possible_mask and cpu_present_mask. Move __cpu_primary_thread_mask into common code to prevent build issues. Let cpu_mark_primary_thread() configure the mask even for UP kernels, alongside other masks. Then, topology_is_primary_thread() can consistently reference it. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20250320234104.8288-1-chang.seok.bae@intel.com	2025-10-15 16:46:11 +02:00
Sumanth Korikkar	300709fbef	mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced to prepare the transition of memory to and from a physically accessible state. This enhancement was crucial for implementing the "memmap on memory" feature for s390. With introduction of dynamic (de)configuration of hotpluggable memory, memory can be brought to accessible state before add_memory(). Memory can be brought to inaccessible state before remove_memory(). Hence, there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers anymore. This basically reverts commit `c5f1e2d189` ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers") Additionally, apply minor adjustments to the function parameters of move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure compatibility with the latest branch. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	ce2071e02d	s390/sclp: Remove MHP_OFFLINE_INACCESSIBLE mhp_flag MHP_OFFLINE_INACCESSIBLE was used to mark memory as not accessible until memory hotplug online phase begins. Earlier, standby memory blocks were added upfront during boottime and MHP_OFFLINE_INACCESSIBLE flag avoided page_init_poison() on memmap during mhp addition phase. However with dynamic runtime configuration of memory, standby memory can be brought to accessible state before performing add_memory(). Hence, remove MHP_OFFLINE_INACCESSIBLE. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	ff18dcb19a	s390/sclp: Add support for dynamic (de)configuration of memory Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with/without memmap_on_memory support. It is a follow up on the discussion with David when introducing memmap_on_memory support for s390 and support dynamic (de)configuration of memory: https://lore.kernel.org/all/ee492da8-74b4-4a97-8b24-73e07257f01d@redhat.com/ https://lore.kernel.org/all/20241202082732.3959803-1-sumanthk@linux.ibm.com/ The original motivation for introducing memmap_on_memory on s390 was to avoid using online memory to store struct pages metadata, particularly for standby memory blocks. This became critical in cases where there was an imbalance between standby and online memory, potentially leading to boot failures due to insufficient memory for metadata allocation. To address this, memmap_on_memory was utilized on s390. However, in its current form, it adds struct pages metadata at the start of each memory block at the time of addition and this configuration is static. It cannot be changed at runtime. (When the user needs continuous physical memory). Inorder to provide more flexibility to the user and overcome the above limitation, add option to dynamically configure and deconfigure hotpluggable memory block with/without memmap_on_memory. With the new interface, s390 will not add all possible hotplug memory in advance, like before, to make it visible in sysfs for online/offline actions. Instead, before memory block can be set online, it has to be configured via a new interface in /sys/firmware/memory/memoryX/config, which makes s390 similar to others. i.e. Adding of hotpluggable memory is controlled by the user instead of adding it at boottime. The s390 kernel sysfs interface to configure and deconfigure memory is as follows (considering the upcoming lsmem changes): * Initial memory layout: lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes * Configure memory sys="/sys" echo 1 > $sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0x87ffffff 128M offline 16 yes yes 0x88000000-0xffffffff 1.9G offline 17-31 no yes * Deconfigure memory echo 0 > $sys/firmware/memory/memory16/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes 3. Enable memmap_on_memory and online it. echo 0 > $sys/devices/system/memory/memory5/online echo 0 > $sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no no 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes echo 1 > $sys/firmware/memory/memory5/memmap_on_memory echo 1 > $sys/firmware/memory/memory5/config echo 1 > $sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M online 5 yes yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes 4. Disable memmap_on_memory and online it. echo 0 > $sys/devices/system/memory/memory5/online echo 0 > $sys/firmware/memory/memory5/config lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x27ffffff 640M online 0-4 yes no 0x28000000-0x2fffffff 128M offline 5 no yes 0x30000000-0x7fffffff 1.3G online 6-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes echo 0 > $sys/firmware/memory/memory5/memmap_on_memory echo 1 > $sys/firmware/memory/memory5/config echo 1 > $sys/devices/system/memory/memory5/online lsmem -o RANGE,SIZE,STATE,BLOCK,CONFIGURED,MEMMAP_ON_MEMORY RANGE SIZE STATE BLOCK CONFIGURED MEMMAP_ON_MEMORY 0x00000000-0x7fffffff 2G online 0-15 yes no 0x80000000-0xffffffff 2G offline 16-31 no yes Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Sumanth Korikkar	d5e88d32de	s390/mm: Support removal of boot-allocated virtual memory map On s390, memory blocks are not currently removed via arch_remove_memory(). With upcoming dynamic memory (de)configuration support, runtime removal of memory blocks is possible. This internally involves tearing down identity mapping, virtual memory mappings and freeing the physical memory backing the struct pages metadata. During early boot, physical memory used to back the struct pages metadata in vmemmap is allocated through: setup_arch() -> sparse_init() -> sparse_init_nid() -> __populate_section_memmap() -> vmemmap_alloc_block_buf() -> sparse_buffer_alloc() -> memblock_alloc() Here, sparse_init_nid() sets up virtual-to-physical mapping for struct pages backed by memblock_alloc(). This differs from runtime addition of hotplug memory which uses the buddy allocator later. To correctly free identity mappings, vmemmap mappings during hot-remove, boot-time and runtime allocations must be distinguished using the PageReserved bit: * Boot-time memory, such as identity-mapped page tables allocated via boot_crst_alloc() and reserved via reserve_pgtables() is marked PageReserved in memmap_init_reserved_pages(). * Physical memory backing vmemmap (struct pages from memblock_alloc()) is also marked PageReserved similarly. During teardown, PageReserved bit is checked to distinguish between boot-time allocation or buddy allocation. This is similar to commit `645d5ce2f7` ("powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2025-10-14 14:24:53 +02:00
Andrew Cooper	4ab13be5ed	x86/fred: Fix 64bit identifier in fred_ss FRED can only be enabled in Long Mode. This is the 64bit mode (as opposed to compatibility mode) identifier, rather than being something hard-wired at 1. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Xin Li (Intel) <xin@zytor.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>	2025-10-13 14:05:42 -07:00
Kaushlendra Kumar	67434ce57c	PM: sleep: Replace snprintf() with scnprintf() in show_trace_dev_match() Replace snprintf() with scnprintf() in show_trace_dev_match() to simplify buffer length handling. The scnprintf() function returns the number of characters actually written (excluding the null terminator), which eliminates the need for manual length checking and clamping. This change removes the redundant size check since scnprintf() guarantees that the return value will never exceed the buffer size, making the code cleaner and less error-prone. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Link: https://patch.msgid.link/20250922055231.3523680-1-kaushlendra.kumar@intel.com [ rjw: Subject adjustment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-13 21:19:12 +02:00
Marco Crivellari	c9ff363738	PM: WQ_UNBOUND added to pm_wq workqueue Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change add the WQ_UNBOUND flag to pm_wq, to make explicit this workqueue can be unbound and that it does not benefit from per-cpu work. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-10-13 20:50:09 +02:00
Chen Yu	a0a0999507	x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest Clearwater Forest supports SNC mode. Add it to the snc_cpu_ids[] table. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com>	2025-10-13 16:59:55 +02:00
Borislav Petkov (AMD)	ddde4abaa0	x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific That cpuinfo_x86.x86_capability[] element was supposed to mirror CPUID flags from CPUID_0x80000007_EBX but that leaf has still to this day only three bits defined in it. So move those bits to scattered.c and free the capability element for synthetic flags. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-10-13 16:21:25 +02:00

822 changed files with 19311 additions and 12519 deletions

16

Documentation/ABI/testing/sysfs-power

View File

 @ -454,3 +454,19 @@ Description:
 		disables it.  Reads from the file return the current value.
 		The default is "1" if the build-time "SUSPEND_SKIP_SYNC" config
 		flag is unset, or "0" otherwise.
 What:           /sys/power/hibernate_compression_threads
 Date:           October 2025
 Contact:        <luoxueqin@kylinos.cn>
 Description:
                 Controls the number of threads used for compression
                 and decompression of hibernation images.
                 The value can be adjusted at runtime to balance
                 performance and CPU utilization.
                 The change takes effect on the next hibernation or
                 resume operation.
                 Minimum value: 1
                 Default value: 3

									
										142

Documentation/admin-guide/RAS/main.rst

										View File
									
				@ -406,24 +406,8 @@ index of the MC::

						   |->mc2

						   ....

				Under each ``mcX`` directory each ``csrowX`` is again represented by a

				``csrowX``, where ``X`` is the csrow index::

					.../mc/mc0/

						|

						|->csrow0

						|->csrow2

						|->csrow3

						....

				Notice that there is no csrow1, which indicates that csrow0 is composed

				of a single ranked DIMMs. This should also apply in both Channels, in

				order to have dual-channel mode be operational. Since both csrow2 and

				csrow3 are populated, this indicates a dual ranked set of DIMMs for

				channels 0 and 1.

				Within each of the ``mcX`` and ``csrowX`` directories are several EDAC

				control and attribute files.

				Within each of the ``mcX`` directory are several EDAC control and

				attribute files.

				``mcX`` directories

				-------------------

				@ -569,7 +553,7 @@ this ``X`` memory module:

						- Unbuffered-DDR

				.. [#f5] On some systems, the memory controller doesn't have any logic

				  to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.

				  to identify the memory module. On such systems, the directory is called ``rankX``.

				  On modern Intel memory controllers, the memory controller identifies the

				  memory modules directly. On such systems, the directory is called ``dimmX``.

				@ -577,126 +561,6 @@ this ``X`` memory module:

				  symlinks inside the sysfs mapping that are automatically created by

				  the sysfs subsystem. Currently, they serve no purpose.

				``csrowX`` directories

				----------------------

				When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``

				directories. As this API doesn't work properly for Rambus, FB-DIMMs and

				modern Intel Memory Controllers, this is being deprecated in favor of

				``dimmX`` directories.

				In the ``csrowX`` directories are EDAC control and attribute files for

				this ``X`` instance of csrow:

				- ``ue_count`` - Total Uncorrectable Errors count attribute file

					This attribute file displays the total count of uncorrectable

					errors that have occurred on this csrow. If panic_on_ue is set

					this counter will not have a chance to increment, since EDAC

					will panic the system.

				- ``ce_count`` - Total Correctable Errors count attribute file

					This attribute file displays the total count of correctable

					errors that have occurred on this csrow. This count is very

					important to examine. CEs provide early indications that a

					DIMM is beginning to fail. This count field should be

					monitored for non-zero values and report such information

					to the system administrator.

				- ``size_mb`` - Total memory managed by this csrow attribute file

					This attribute file displays, in count of megabytes, the memory

					that this csrow contains.

				- ``mem_type`` - Memory Type attribute file

					This attribute file will display what type of memory is currently

					on this csrow. Normally, either buffered or unbuffered memory.

					Examples:

						- Registered-DDR

						- Unbuffered-DDR

				- ``edac_mode`` - EDAC Mode of operation attribute file

					This attribute file will display what type of Error detection

					and correction is being utilized.

				- ``dev_type`` - Device type attribute file

					This attribute file will display what type of DRAM device is

					being utilized on this DIMM.

					Examples:

						- x1

						- x2

						- x4

						- x8

				- ``ch0_ce_count`` - Channel 0 CE Count attribute file

					This attribute file will display the count of CEs on this

					DIMM located in channel 0.

				- ``ch0_ue_count`` - Channel 0 UE Count attribute file

					This attribute file will display the count of UEs on this

					DIMM located in channel 0.

				- ``ch0_dimm_label`` - Channel 0 DIMM Label control file

					This control file allows this DIMM to have a label assigned

					to it. With this label in the module, when errors occur

					the output can provide the DIMM label in the system log.

					This becomes vital for panic events to isolate the

					cause of the UE event.

					DIMM Labels must be assigned after booting, with information

					that correctly identifies the physical slot with its

					silk screen label. This information is currently very

					motherboard specific and determination of this information

					must occur in userland at this time.

				- ``ch1_ce_count`` - Channel 1 CE Count attribute file

					This attribute file will display the count of CEs on this

					DIMM located in channel 1.

				- ``ch1_ue_count`` - Channel 1 UE Count attribute file

					This attribute file will display the count of UEs on this

					DIMM located in channel 0.

				- ``ch1_dimm_label`` - Channel 1 DIMM Label control file

					This control file allows this DIMM to have a label assigned

					to it. With this label in the module, when errors occur

					the output can provide the DIMM label in the system log.

					This becomes vital for panic events to isolate the

					cause of the UE event.

					DIMM Labels must be assigned after booting, with information

					that correctly identifies the physical slot with its

					silk screen label. This information is currently very

					motherboard specific and determination of this information

					must occur in userland at this time.

				System Logging

				--------------

16

Documentation/admin-guide/kernel-parameters.txt

View File

 @ -1907,6 +1907,16 @@
 			/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
 			is set. Default value is 5.
 	hibernate_compression_threads=
 			[HIBERNATION]
 			Set the number of threads used for compressing or decompressing
 			hibernation images.
 			Format: <integer>
 			Default: 3
 			Minimum: 1
 			Example: hibernate_compression_threads=4
 	highmem=nn[KMG]	[KNL,BOOT,EARLY] forces the highmem zone to have an exact
 			size of <nn>. This works even on boxes that have no
 			highmem otherwise. This also works to reduce highmem
 @ -6207,7 +6217,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
 			mba, smba, bmec, abmc.
 			mba, smba, bmec, abmc, sdciae.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 @ -6500,6 +6510,10 @@
 			Memory area to be used by remote processor image,
 			managed by CMA.
 	rseq_debug=	[KNL] Enable or disable restartable sequence
 			debug mode. Defaults to CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE.
 			Format: <bool>
 	rt_group_sched=	[KNL] Enable or disable SCHED_RR/FIFO group scheduling
 			when CONFIG_RT_GROUP_SCHED=y. Defaults to
 			!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.

									
										9

Documentation/admin-guide/pm/cpuidle.rst

										View File
									
				@ -580,6 +580,15 @@ the given CPU as the upper limit for the exit latency of the idle states that

				they are allowed to select for that CPU.  They should never select any idle

				states with exit latency beyond that limit.

				While the above CPU QoS constraints apply to CPU idle time management, user

				space may also request a CPU system wakeup latency QoS limit, via the

				`cpu_wakeup_latency` file.  This QoS constraint is respected when selecting a

				suitable idle state for the CPUs, while entering the system-wide suspend-to-idle

				sleep state, but also to the regular CPU idle time management.

				Note that, the management of the `cpu_wakeup_latency` file works according to

				the 'cpu_dma_latency' file from user space point of view.  Moreover, the unit

				is also microseconds.

				Idle States Control Via Kernel Command Line

				===========================================

									
										133

Documentation/admin-guide/pm/intel_pstate.rst

										View File
									
				@ -48,8 +48,9 @@ only way to pass early-configuration-time parameters to it is via the kernel

				command line.  However, its configuration can be adjusted via ``sysfs`` to a

				great extent.  In some configurations it even is possible to unregister it via

				``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and

				registered (see `below <status_attr_>`_).

				registered (see :ref:`below <status_attr>`).

				.. _operation_modes:

				Operation Modes

				===============

				@ -62,6 +63,8 @@ a certain performance scaling algorithm.  Which of them will be in effect

				depends on what kernel command line options are used and on the capabilities of

				the processor.

				.. _active_mode:

				Active Mode

				-----------

				@ -94,6 +97,8 @@ Which of the P-state selection algorithms is used by default depends on the

				Namely, if that option is set, the ``performance`` algorithm will be used by

				default, and the other one will be used by default if it is not set.

				.. _active_mode_hwp:

				Active Mode With HWP

				~~~~~~~~~~~~~~~~~~~~

				@ -123,7 +128,7 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's

				internal P-state selection logic is expected to focus entirely on performance.

				This will override the EPP/EPB setting coming from the ``sysfs`` interface

				(see `Energy vs Performance Hints`_ below).  Moreover, any attempts to change

				(see :ref:`energy_performance_hints` below).  Moreover, any attempts to change

				the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this

				configuration will be rejected.

				@ -192,6 +197,8 @@ This is the default P-state selection algorithm if the

				:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option

				is not set.

				.. _passive_mode:

				Passive Mode

				------------

				@ -289,12 +296,12 @@ Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes

				the entire range of available P-states, including the whole turbo range, to the

				``CPUFreq`` core and (in the passive mode) to generic scaling governors.  This

				generally causes turbo P-states to be set more often when ``intel_pstate`` is

				used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_

				for more information).

				used relative to ACPI-based CPU performance scaling (see

				:ref:`below <acpi-cpufreq>` for more information).

				Moreover, since ``intel_pstate`` always knows what the real turbo threshold is

				(even if the Configurable TDP feature is enabled in the processor), its

				``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should

				``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should

				work as expected in all cases (that is, if set to disable turbo P-states, it

				always should prevent ``intel_pstate`` from using them).

				@ -307,12 +314,12 @@ pieces of information on it to be known, including:

				 * The minimum supported P-state.

				 * The maximum supported `non-turbo P-state <turbo_>`_.

				 * The maximum supported :ref:`non-turbo P-state <turbo>`.

				 * Whether or not turbo P-states are supported at all.

				 * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states

				   are supported).

				 * The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo

				   P-states are supported).

				 * The scaling formula to translate the driver's internal representation

				   of P-states into frequencies and the other way around.

				@ -400,10 +407,10 @@ Energy-Aware Scheduling Support

				If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and

				``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling

				`CAS <CAS_>`_ it registers an Energy Model for the processor.  This allows the

				:ref:`CAS` it registers an Energy Model for the processor.  This allows the

				Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if

				``schedutil`` is used as the  ``CPUFreq`` governor which requires ``intel_pstate``

				to operate in the `passive mode <Passive Mode_>`_.

				to operate in the :ref:`passive mode <passive_mode>`.

				The Energy Model registered by ``intel_pstate`` is artificial (that is, it is

				based on abstract cost values and it does not include any real power numbers)

				@ -432,6 +439,8 @@ the ``energy_model`` directory in ``debugfs`` (typlically mounted on

				User Space Interface in ``sysfs``

				=================================

				.. _global_attributes:

				Global Attributes

				-----------------

				@ -444,8 +453,8 @@ argument is passed to the kernel in the command line.

				``max_perf_pct``

					Maximum P-state the driver is allowed to set in percent of the

					maximum supported performance level (the highest supported `turbo

					P-state <turbo_>`_).

					maximum supported performance level (the highest supported :ref:`turbo

					P-state <turbo>`).

					This attribute will not be exposed if the

					``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel

				@ -453,8 +462,8 @@ argument is passed to the kernel in the command line.

				``min_perf_pct``

					Minimum P-state the driver is allowed to set in percent of the

					maximum supported performance level (the highest supported `turbo

					P-state <turbo_>`_).

					maximum supported performance level (the highest supported :ref:`turbo

					P-state <turbo>`).

					This attribute will not be exposed if the

					``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel

				@ -463,18 +472,18 @@ argument is passed to the kernel in the command line.

				``num_pstates``

					Number of P-states supported by the processor (between 0 and 255

					inclusive) including both turbo and non-turbo P-states (see

					`Turbo P-states Support`_).

					:ref:`turbo`).

					This attribute is present only if the value exposed by it is the same

					for all of the CPUs in the system.

					The value of this attribute is not affected by the ``no_turbo``

					setting described `below <no_turbo_attr_>`_.

					setting described :ref:`below <no_turbo_attr>`.

					This attribute is read-only.

				``turbo_pct``

					Ratio of the `turbo range <turbo_>`_ size to the size of the entire

					Ratio of the :ref:`turbo range <turbo>` size to the size of the entire

					range of supported P-states, in percent.

					This attribute is present only if the value exposed by it is the same

				@ -486,7 +495,7 @@ argument is passed to the kernel in the command line.

				``no_turbo``

					If set (equal to 1), the driver is not allowed to set any turbo P-states

					(see `Turbo P-states Support`_).  If unset (equal to 0, which is the

					(see :ref:`turbo`).  If unset (equal to 0, which is the

					default), turbo P-states can be set by the driver.

					[Note that ``intel_pstate`` does not support the general ``boost``

					attribute (supported by some other scaling drivers) which is replaced

				@ -495,11 +504,11 @@ argument is passed to the kernel in the command line.

					This attribute does not affect the maximum supported frequency value

					supplied to the ``CPUFreq`` core and exposed via the policy interface,

					but it affects the maximum possible value of per-policy P-state	limits

					(see `Interpretation of Policy Attributes`_ below for details).

					(see :ref:`policy_attributes_interpretation` below for details).

				``hwp_dynamic_boost``

					This attribute is only present if ``intel_pstate`` works in the

					`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in

					:ref:`active mode with the HWP feature enabled <active_mode_hwp>` in

					the processor.  If set (equal to 1), it causes the minimum P-state limit

					to be increased dynamically for a short time whenever a task previously

					waiting on I/O is selected to run on a given logical CPU (the purpose

				@ -514,12 +523,12 @@ argument is passed to the kernel in the command line.

					Operation mode of the driver: "active", "passive" or "off".

					"active"

						The driver is functional and in the `active mode

						<Active Mode_>`_.

						The driver is functional and in the :ref:`active mode

						<active_mode>`.

					"passive"

						The driver is functional and in the `passive mode

						<Passive Mode_>`_.

						The driver is functional and in the :ref:`passive mode

						<passive_mode>`.

					"off"

						The driver is not functional (it is not registered as a scaling

				@ -547,13 +556,15 @@ argument is passed to the kernel in the command line.

					attribute to "1" enables the energy-efficiency optimizations and setting

					to "0" disables them.

				.. _policy_attributes_interpretation:

				Interpretation of Policy Attributes

				-----------------------------------

				The interpretation of some ``CPUFreq`` policy attributes described in

				Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``

				as the current scaling driver and it generally depends on the driver's

				`operation mode <Operation Modes_>`_.

				:ref:`operation mode <operation_modes>`.

				First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and

				``scaling_cur_freq`` attributes are produced by applying a processor-specific

				@ -562,9 +573,10 @@ Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``

				attributes are capped by the frequency corresponding to the maximum P-state that

				the driver is allowed to set.

				If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is

				not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``

				and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.

				If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver

				is not allowed to use turbo P-states, so the maximum value of

				``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum

				non-turbo P-state frequency.

				Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and

				``scaling_min_freq`` to go down to that value if they were above it before.

				However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be

				@ -576,7 +588,7 @@ and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,

				which also is the value of ``cpuinfo_max_freq`` in either case.

				Next, the following policy attributes have special meaning if

				``intel_pstate`` works in the `active mode <Active Mode_>`_:

				``intel_pstate`` works in the :ref:`active mode <active_mode>`:

				``scaling_available_governors``

					List of P-state selection algorithms provided by ``intel_pstate``.

				@ -597,20 +609,22 @@ processor:

					Shows the base frequency of the CPU. Any frequency above this will be

					in the turbo frequency range.

				The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the

				The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the

				same as for other scaling drivers.

				Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``

				depends on the operation mode of the driver.  Namely, it is either

				"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the

				`passive mode <Passive Mode_>`_).

				"intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq"

				(in the :ref:`passive mode <passive_mode>`).

				.. _pstate_limits_coordination:

				Coordination of P-State Limits

				------------------------------

				``intel_pstate`` allows P-state limits to be set in two ways: with the help of

				the ``max_perf_pct`` and ``min_perf_pct`` `global attributes

				<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``

				the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes

				<global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq``

				``CPUFreq`` policy attributes.  The coordination between those limits is based

				on the following rules, regardless of the current operation mode of the driver:

				@ -632,17 +646,18 @@ on the following rules, regardless of the current operation mode of the driver:

				 3. The global and per-policy limits can be set independently.

				In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the

				In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the

				resulting effective values are written into hardware registers whenever the

				limits change in order to request its internal P-state selection logic to always

				set P-states within these limits.  Otherwise, the limits are taken into account

				by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver

				every time before setting a new P-state for a CPU.

				by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the

				driver every time before setting a new P-state for a CPU.

				Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument

				is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed

				at all and the only way to set the limits is by using the policy attributes.

				.. _energy_performance_hints:

				Energy vs Performance Hints

				---------------------------

				@ -702,9 +717,9 @@ output.

				On those systems each ``_PSS`` object returns a list of P-states supported by

				the corresponding CPU which basically is a subset of the P-states range that can

				be used by ``intel_pstate`` on the same system, with one exception: the whole

				`turbo range <turbo_>`_ is represented by one item in it (the topmost one).  By

				convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz

				than the frequency of the highest non-turbo P-state listed by it, but the

				:ref:`turbo range <turbo>` is represented by one item in it (the topmost one).

				By convention, the frequency returned by ``_PSS`` for that item is greater by

				1 MHz than the frequency of the highest non-turbo P-state listed by it, but the

				corresponding P-state representation (following the hardware specification)

				returned for it matches the maximum supported turbo P-state (or is the

				special value 255 meaning essentially "go as high as you can get").

				@ -730,18 +745,18 @@ benefit from running at turbo frequencies will be given non-turbo P-states

				instead.

				One more issue related to that may appear on systems supporting the

				`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the

				turbo threshold.  Namely, if that is not coordinated with the lists of P-states

				returned by ``_PSS`` properly, there may be more than one item corresponding to

				a turbo P-state in those lists and there may be a problem with avoiding the

				turbo range (if desirable or necessary).  Usually, to avoid using turbo

				P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed

				by ``_PSS``, but that is not sufficient when there are other turbo P-states in

				the list returned by it.

				:ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set

				the turbo threshold.  Namely, if that is not coordinated with the lists of

				P-states returned by ``_PSS`` properly, there may be more than one item

				corresponding to a turbo P-state in those lists and there may be a problem with

				avoiding the turbo range (if desirable or necessary).  Usually, to avoid using

				turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state

				listed by ``_PSS``, but that is not sufficient when there are other turbo

				P-states in the list returned by it.

				Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the

				`passive mode <Passive Mode_>`_, except that the number of P-states it can set

				is limited to the ones listed by the ACPI ``_PSS`` objects.

				:ref:`passive mode <passive_mode>`, except that the number of P-states it can

				set is limited to the ones listed by the ACPI ``_PSS`` objects.

				Kernel Command Line Options for ``intel_pstate``

				@ -756,11 +771,11 @@ of them have to be prepended with the ``intel_pstate=`` prefix.

					processor is supported by it.

				``active``

					Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start

					with.

					Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to

				        start with.

				``passive``

					Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to

					Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to

					start with.

				``force``

				@ -793,12 +808,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.

					and this option has no effect.

				``per_cpu_perf_limits``

					Use per-logical-CPU P-State limits (see `Coordination of P-state

					Limits`_ for details).

					Use per-logical-CPU P-State limits (see

				        :ref:`pstate_limits_coordination` for details).

				``no_cas``

					Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by

					default on hybrid systems without SMT.

					Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled

				        by default on hybrid systems without SMT.

				Diagnostics and Tuning

				======================

				@ -810,7 +825,7 @@ There are two static trace events that can be used for ``intel_pstate``

				diagnostics.  One of them is the ``cpu_frequency`` trace event generally used

				by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific

				to ``intel_pstate``.  Both of them are triggered by ``intel_pstate`` only if

				it works in the `active mode <Active Mode_>`_.

				it works in the :ref:`active mode <active_mode>`.

				The following sequence of shell commands can be used to enable them and see

				their output (if the kernel is generally configured to support event tracing)::

				@ -822,7 +837,7 @@ their output (if the kernel is generally configured to support event tracing)::

				 gnome-terminal--4510  [001] ..s.  1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476

				 cat-5235  [002] ..s.  1177.681723: cpu_frequency: state=2900000 cpu_id=2

				If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the

				If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the

				``cpu_frequency`` trace event will be triggered either by the ``schedutil``

				scaling governor (for the policies it is attached to), or by the ``CPUFreq``

				core (for the policies with other scaling governors).

1

Documentation/admin-guide/thermal/index.rst

View File

 @ -6,3 +6,4 @@ Thermal Subsystem
    :maxdepth: 1
    intel_powerclamp
    intel_thermal_throttle

									
										91

Documentation/admin-guide/thermal/intel_thermal_throttle.rstNormal file

										View File
									
				@ -0,0 +1,91 @@

				.. SPDX-License-Identifier: GPL-2.0

				.. include:: <isonum.txt>

				=======================================

				Intel thermal throttle events reporting

				=======================================

				:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

				Introduction

				------------

				Intel processors have built in automatic and adaptive thermal monitoring

				mechanisms that force the processor to reduce its power consumption in order

				to operate within predetermined temperature limits.

				Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and

				IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, & 3D):

				System Programming Guide" for more details.

				In general, there are two mechanisms to control the core temperature of the

				processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)".

				The status of the temperature sensor that triggers the thermal monitor (TM1/TM2)

				is indicated through the "thermal status flag" and "thermal status log flag" in

				MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for

				package level.

				Thermal Status flag, bit 0 — When set, indicates that the processor core

				temperature is currently at the trip temperature of the thermal monitor and that

				the processor power consumption is being reduced via either TM1 or TM2, depending

				on which is enabled. When clear, the flag indicates that the core temperature is

				below the thermal monitor trip temperature. This flag is read only.

				Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has

				tripped since the last power-up or reset or since the last time that software

				cleared this flag. This flag is a sticky bit; once set it remains set until

				cleared by software or until a power-up or reset of the processor. The default

				state is clear.

				It is possible that when user reads MSR_IA32_THERM_STATUS or

				MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case,

				"Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set

				to show any previous "TM1/TM2" activation. But since it needs to be cleared by

				the software, it can't show the number of occurrences of "TM1/TM2" activations.

				Hence, Linux provides counters of how many times the "Thermal Status flag" was

				set. Also presents how long the "Thermal Status flag" was active in milliseconds.

				Using these counters, users can check if the performance was limited because of

				thermal events. It is recommended to read from sysfs instead of directly reading

				MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate

				control.

				Sysfs Interface

				---------------

				Thermal throttling events are presented for each CPU under

				"/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number.

				All these counters are read-only. They can't be reset to 0. So, they can potentially

				overflow after reaching the maximum 64 bit unsigned integer.

				``core_throttle_count``

					Shows the number of times "Thermal Status flag" changed from 0 to 1 for this

					CPU since OS boot and thermal vector is initialized. This is a 64 bit counter.

				``package_throttle_count``

					Shows the number of times "Thermal Status flag" changed from 0 to 1 for the

					package containing this CPU since OS boot and thermal vector is initialized.

					Package status is broadcast to all CPUs; all CPUs in the package increment

					this count. This is a 64-bit counter.

				``core_throttle_max_time_ms``

					Shows the maximum amount of time for which "Thermal Status flag" has been

					set to 1 for this CPU at the core level since OS boot and thermal vector

					is initialized.

				``package_throttle_max_time_ms``

					Shows the maximum amount of time for which "Thermal Status flag" has been

					set to 1 for the package containing this CPU since OS boot and thermal

					vector is initialized.

				``core_throttle_total_time_ms``

					Shows the cumulative time for which "Thermal Status flag" has been

					set to 1 for this CPU for core level since OS boot and thermal vector

					is initialized.

				``package_throttle_total_time_ms``

					Shows the cumulative time for which "Thermal Status flag" has been set

					to 1 for the package containing this CPU since OS boot and thermal vector

					is initialized.

8

Documentation/arch/arm64/booting.rst

View File

 @ -391,13 +391,13 @@ Before jumping into the kernel, the following conditions must be met:
     - SMCR_EL2.LEN must be initialised to the same value for all CPUs the
       kernel will execute on.
     - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
     - HFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
     - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
     - HFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01.
     - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
     - HFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
     - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
     - HFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
   For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64):

5

Documentation/arch/arm64/sve.rst

View File

 @ -402,6 +402,11 @@ The regset data starts with struct user_sve_header, containing:
   streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
   if the target was not in streaming mode.
 * On systems that do not support SVE it is permitted to use SETREGSET to
   write SVE_PT_REGS_FPSIMD formatted data via NT_ARM_SVE, in this case the
   vector length should be specified as 0. This allows streaming mode to be
   disabled on systems with SME but not SVE.
 * If any register data is provided along with SVE_PT_VL_ONEXEC then the
   registers data will be interpreted with the current vector length, not
   the vector length configured for use on exec.

5

Documentation/arch/s390/s390dbf.rst

View File

 @ -243,9 +243,8 @@ Examples:
 Changing the size of debug areas
 ------------------------------------
 It is possible the change the size of debug areas through piping
 the number of pages to the debugfs file "pages". The resize request will
 also flush the debug areas.
 To resize a debug area, write the desired page count to the "pages" file.
 Existing data is preserved if it fits; otherwise, oldest entries are dropped.
 Example:

									
										3

Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml

										View File
									
				@ -39,6 +39,9 @@ properties:

				              - amlogic,a4-gpio-ao-intc

				              - amlogic,a5-gpio-intc

				              - amlogic,c3-gpio-intc

				              - amlogic,s6-gpio-intc

				              - amlogic,s7-gpio-intc

				              - amlogic,s7d-gpio-intc

				              - amlogic,t7-gpio-intc

				          - const: amlogic,meson-gpio-intc

									
										13

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml

										View File
									
				@ -25,13 +25,14 @@ properties:

				  interrupt-controller: true

				  '#interrupt-cells':

				    const: 2

				    const: 1

				    description:

				      The first cell is the IRQ number, the second cell is the trigger

				      type as defined in interrupt.txt in this directory.

				  interrupts:

				    maxItems: 6

				    minItems: 1

				    maxItems: 10

				    description: |

				      Depend to which INTC0 or INTC1 used.

				      INTC0 and INTC1 are two kinds of interrupt controller with enable and raw

				@ -74,13 +75,17 @@ examples:

				        interrupt-controller@12101b00 {

				            compatible = "aspeed,ast2700-intc-ic";

				            reg = <0 0x12101b00 0 0x10>;

				            #interrupt-cells = <2>;

				            #interrupt-cells = <1>;

				            interrupt-controller;

				            interrupts = <GIC_SPI 192 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 193 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 194 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 195 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 196 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>;

				                         <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 199 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 200 IRQ_TYPE_LEVEL_HIGH>,

				                         <GIC_SPI 201 IRQ_TYPE_LEVEL_HIGH>;

				        };

				    };

									
										4

Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml

										View File
									
				@ -58,6 +58,7 @@ properties:

				          - const: andestech,nceplic100

				      - items:

				          - enum:

				              - anlogic,dr1v90-plic

				              - canaan,k210-plic

				              - eswin,eic7700-plic

				              - sifive,fu540-c000-plic

				@ -75,6 +76,9 @@ properties:

				              - sophgo,sg2044-plic

				              - thead,th1520-plic

				          - const: thead,c900-plic

				      - items:

				          - const: ultrarisc,dp1000-plic

				          - const: ultrarisc,cp100-plic

				      - items:

				          - const: sifive,plic-1.0.0

				          - const: riscv,plic0

									
										17

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml

										View File
									
				@ -4,18 +4,23 @@

				$id: http://devicetree.org/schemas/interrupt-controller/thead,c900-aclint-mswi.yaml#

				$schema: http://devicetree.org/meta-schemas/core.yaml#

				title: Sophgo sg2042 CLINT Machine-level Software Interrupt Device

				title: ACLINT Machine-level Software Interrupt Device

				maintainers:

				  - Inochi Amaoto <inochiama@outlook.com>

				properties:

				  compatible:

				    items:

				      - enum:

				          - sophgo,sg2042-aclint-mswi

				          - sophgo,sg2044-aclint-mswi

				      - const: thead,c900-aclint-mswi

				    oneOf:

				      - items:

				          - enum:

				              - sophgo,sg2042-aclint-mswi

				              - sophgo,sg2044-aclint-mswi

				          - const: thead,c900-aclint-mswi

				      - items:

				          - enum:

				              - anlogic,dr1v90-aclint-mswi

				          - const: nuclei,ux900-aclint-mswi

				  reg:

				    maxItems: 1

									
										4

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml

										View File
									
				@ -30,6 +30,10 @@ properties:

				          - const: thead,c900-aclint-sswi

				      - items:

				          - const: mips,p8700-aclint-sswi

				      - items:

				          - enum:

				              - anlogic,dr1v90-aclint-sswi

				          - const: nuclei,ux900-aclint-sswi

				  reg:

				    maxItems: 1

									
										29

Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml

										View File
									
				@ -14,6 +14,7 @@ properties:

				    oneOf:

				      - enum:

				          - fsl,imx8-ddr-pmu

				          - fsl,imx8dxl-db-pmu

				          - fsl,imx8m-ddr-pmu

				          - fsl,imx8mq-ddr-pmu

				          - fsl,imx8mm-ddr-pmu

				@ -28,7 +29,10 @@ properties:

				              - fsl,imx8mp-ddr-pmu

				          - const: fsl,imx8m-ddr-pmu

				      - items:

				          - const: fsl,imx8dxl-ddr-pmu

				          - enum:

				              - fsl,imx8dxl-ddr-pmu

				              - fsl,imx8qm-ddr-pmu

				              - fsl,imx8qxp-ddr-pmu

				          - const: fsl,imx8-ddr-pmu

				      - items:

				          - enum:

				@ -43,6 +47,14 @@ properties:

				  interrupts:

				    maxItems: 1

				  clocks:

				    maxItems: 2

				  clock-names:

				    items:

				      - const: ipg

				      - const: cnt

				required:

				  - compatible

				  - reg

				@ -50,6 +62,21 @@ required:

				additionalProperties: false

				allOf:

				  - if:

				      properties:

				        compatible:

				          contains:

				            const: fsl,imx8dxl-db-pmu

				    then:

				      required:

				        - clocks

				        - clock-names

				    else:

				      properties:

				        clocks: false

				        clock-names: false

				examples:

				  - |

				    #include <dt-bindings/interrupt-controller/arm-gic.h>

									
										87

Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yamlNormal file

										View File
									
				@ -0,0 +1,87 @@

				# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)

				%YAML 1.2

				---

				$id: http://devicetree.org/schemas/thermal/fsl,imx91-tmu.yaml#

				$schema: http://devicetree.org/meta-schemas/core.yaml#

				title: NXP i.MX91 Thermal

				maintainers:

				  - Pengfei Li <pengfei.li_1@nxp.com>

				description:

				  i.MX91 features a new temperature sensor. It includes programmable

				  temperature threshold comparators for both normal and privileged

				  accesses and allows a programmable measurement frequency for the

				  Periodic One-Shot Measurement mode. Additionally, it provides

				  status registers for indicating the end of measurement and threshold

				  violation events.

				properties:

				  compatible:

				    items:

				      - const: fsl,imx91-tmu

				  reg:

				    maxItems: 1

				  clocks:

				    maxItems: 1

				  interrupts:

				    items:

				      - description: Comparator 1 irq

				      - description: Comparator 2 irq

				      - description: Data ready irq

				  interrupt-names:

				    items:

				      - const: thr1

				      - const: thr2

				      - const: ready

				  nvmem-cells:

				    items:

				      - description: Phandle to the trim control 1 provided by ocotp

				      - description: Phandle to the trim control 2 provided by ocotp

				  nvmem-cell-names:

				    items:

				      - const: trim1

				      - const: trim2

				  "#thermal-sensor-cells":

				    const: 0

				required:

				  - compatible

				  - reg

				  - clocks

				  - interrupts

				  - interrupt-names

				allOf:

				  - $ref: thermal-sensor.yaml

				unevaluatedProperties: false

				examples:

				  - |

				    #include <dt-bindings/interrupt-controller/arm-gic.h>

				    #include <dt-bindings/clock/imx93-clock.h>

				    thermal-sensor@44482000 {

				        compatible = "fsl,imx91-tmu";

				        reg = <0x44482000 0x1000>;

				        #thermal-sensor-cells = <0>;

				        clocks = <&clk IMX93_CLK_TMC_GATE>;

				        interrupt-parent = <&gic>;

				        interrupts = <GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>,

				                     <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>,

				                     <GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>;

				        interrupt-names = "thr1", "thr2", "ready";

				        nvmem-cells = <&tmu_trim1>, <&tmu_trim2>;

				        nvmem-cell-names = "trim1", "trim2";

				    };

				...

									
										9

Documentation/devicetree/bindings/thermal/qcom-tsens.yaml

										View File
									
				@ -36,10 +36,15 @@ properties:

				              - qcom,msm8974-tsens

				          - const: qcom,tsens-v0_1

				      - description:

				          v1 of TSENS without RPM which requires to be explicitly reset

				          and enabled in the driver.

				        enum:

				          - qcom,ipq5018-tsens

				      - description: v1 of TSENS

				        items:

				          - enum:

				              - qcom,ipq5018-tsens

				              - qcom,msm8937-tsens

				              - qcom,msm8956-tsens

				              - qcom,msm8976-tsens

				@ -50,11 +55,13 @@ properties:

				        items:

				          - enum:

				              - qcom,glymur-tsens

				              - qcom,kaanapali-tsens

				              - qcom,milos-tsens

				              - qcom,msm8953-tsens

				              - qcom,msm8996-tsens

				              - qcom,msm8998-tsens

				              - qcom,qcm2290-tsens

					      - qcom,qcs8300-tsens

				              - qcom,qcs615-tsens

				              - qcom,sa8255p-tsens

				              - qcom,sa8775p-tsens

									
										6

Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml

										View File
									
				@ -16,7 +16,11 @@ description:

				properties:

				  compatible:

				    const: renesas,r9a09g047-tsu

				    oneOf:

				      - const: renesas,r9a09g047-tsu # RZ/G3E

				      - items:

				          - const: renesas,r9a09g057-tsu # RZ/V2H

				          - const: renesas,r9a09g047-tsu # RZ/G3E

				  reg:

				    maxItems: 1

									
										47

Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yamlNormal file

										View File
									
				@ -0,0 +1,47 @@

				# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)

				%YAML 1.2

				---

				$id: http://devicetree.org/schemas/timer/realtek,rtd1625-systimer.yaml#

				$schema: http://devicetree.org/meta-schemas/core.yaml#

				title: Realtek System Timer

				maintainers:

				  - Hao-Wen Ting <haowen.ting@realtek.com>

				description:

				  The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating

				  at a fixed 1MHz frequency. Thanks to its compare match interrupt capability,

				  the timer natively supports oneshot mode for tick broadcast functionality.

				properties:

				  compatible:

				    oneOf:

				      - const: realtek,rtd1625-systimer

				      - items:

				          - const: realtek,rtd1635-systimer

				          - const: realtek,rtd1625-systimer

				  reg:

				    maxItems: 1

				  interrupts:

				    maxItems: 1

				required:

				  - compatible

				  - reg

				  - interrupts

				additionalProperties: false

				examples:

				  - |

				    #include <dt-bindings/interrupt-controller/arm-gic.h>

				    timer@89420 {

				        compatible = "realtek,rtd1635-systimer",

				                     "realtek,rtd1625-systimer";

				        reg = <0x89420 0x18>;

				        interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH>;

				    };

									
										2

Documentation/devicetree/bindings/vendor-prefixes.yaml

										View File
									
				@ -1705,6 +1705,8 @@ patternProperties:

				    description: Universal Scientific Industrial Co., Ltd.

				  "^usr,.*":

				    description: U.S. Robotics Corporation

				  "^ultrarisc,.*":

				    description: UltraRISC Technology Co., Ltd.

				  "^ultratronik,.*":

				    description: Ultratronik GmbH

				  "^utoo,.*":

23

Documentation/driver-api/thermal/intel_dptf.rst

View File

 @ -409,3 +409,26 @@ based on the processor generation.
 		Limit 1 from being exhausted.
 – Unknown: Can't classify.
 	On processors starting from Panther Lake additional hints are provided.
 	The hardware analyzes workload residencies over an extended period to
 	determine whether the workload classification tends toward idle/battery
 	life states or sustained/performance states. Based on this long-term
 	analysis, it classifies:
 	Power Classification: If the workload exhibits more idle or battery life
 	residencies, it is classified as "power".
 	Performance Classification: If the workload exhibits more sustained or
 	performance residencies, it is classified as "performance".
 	This approach enables applications to ignore short-term workload
 	fluctuations and instead respond to longer-term power vs. performance
 	trends.
 	Residency thresholds for this classification are CPU generation-specific.
 	Classification is reported via bit 4 of the workload_type_index:
 	Bit 4 = 1: Power classification
 	Bit 4 = 0: Performance classification

134

Documentation/filesystems/resctrl.rst

View File

 @ -17,17 +17,18 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
 flag bits:
 ===============================================	================================
 RDT (Resource Director Technology) Allocation	"rdt_a"
 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
 SMBA (Slow Memory Bandwidth Allocation)         ""
 BMEC (Bandwidth Monitoring Event Configuration) ""
 ABMC (Assignable Bandwidth Monitoring Counters) ""
 ===============================================	================================
 =============================================================== ================================
 RDT (Resource Director Technology) Allocation			"rdt_a"
 CAT (Cache Allocation Technology)				"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)				"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)					"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)				"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)				"mba"
 SMBA (Slow Memory Bandwidth Allocation)				""
 BMEC (Bandwidth Monitoring Event Configuration)			""
 ABMC (Assignable Bandwidth Monitoring Counters)			""
 SDCIAE (Smart Data Cache Injection Allocation Enforcement)	""
 =============================================================== ================================
 Historically, new features were made visible by default in /proc/cpuinfo. This
 resulted in the feature flags becoming hard to parse by humans. Adding a new
 @ -72,6 +73,11 @@ The 'info' directory contains information about the enabled
 resources. Each resource has its own subdirectory. The subdirectory
 names reflect the resource names.
 Most of the files in the resource's subdirectory are read-only, and
 describe properties of the resource. Resources that support global
 configuration options also include writable files that can be used
 to modify those settings.
 Each subdirectory contains the following files with respect to
 allocation:
 @ -90,12 +96,19 @@ related to allocation:
 		must be set when writing a mask.
 "shareable_bits":
 		Bitmask of shareable resource with other executing
 		entities (e.g. I/O). User can use this when
 		setting up exclusive cache partitions. Note that
 		some platforms support devices that have their
 		own settings for cache use which can over-ride
 		these bits.
 		Bitmask of shareable resource with other executing entities
 		(e.g. I/O). Applies to all instances of this resource. User
 		can use this when setting up exclusive cache partitions.
 		Note that some platforms support devices that have their
 		own settings for cache use which can over-ride these bits.
 		When "io_alloc" is enabled, a portion of each cache instance can
 		be configured for shared use between hardware and software.
 		"bit_usage" should be used to see which portions of each cache
 		instance is configured for hardware use via "io_alloc" feature
 		because every cache instance can have its "io_alloc" bitmask
 		configured independently via "io_alloc_cbm".
 "bit_usage":
 		Annotated capacity bitmasks showing how all
 		instances of the resource are used. The legend is:
 @ -109,16 +122,16 @@ related to allocation:
 			"H":
 			      Corresponding region is used by hardware only
 			      but available for software use. If a resource
 			      has bits set in "shareable_bits" but not all
 			      of these bits appear in the resource groups'
 			      schematas then the bits appearing in
 			      "shareable_bits" but no resource group will
 			      be marked as "H".
 			      has bits set in "shareable_bits" or "io_alloc_cbm"
 			      but not all of these bits appear in the resource
 			      groups' schemata then the bits appearing in
 			      "shareable_bits" or "io_alloc_cbm" but no
 			      resource group will be marked as "H".
 			"X":
 			      Corresponding region is available for sharing and
 			      used by hardware and software. These are the
 			      bits that appear in "shareable_bits" as
 			      well as a resource group's allocation.
 			      used by hardware and software. These are the bits
 			      that appear in "shareable_bits" or "io_alloc_cbm"
 			      as well as a resource group's allocation.
 			"S":
 			      Corresponding region is used by software
 			      and available for sharing.
 @ -136,6 +149,77 @@ related to allocation:
 			"1":
 			      Non-contiguous 1s value in CBM is supported.
 "io_alloc":
 		"io_alloc" enables system software to configure the portion of
 		the cache allocated for I/O traffic. File may only exist if the
 		system supports this feature on some of its cache resources.
 			"disabled":
 			      Resource supports "io_alloc" but the feature is disabled.
 			      Portions of cache used for allocation of I/O traffic cannot
 			      be configured.
 			"enabled":
 			      Portions of cache used for allocation of I/O traffic
 			      can be configured using "io_alloc_cbm".
 			"not supported":
 			      Support not available for this resource.
 		The feature can be modified by writing to the interface, for example:
 		To enable::
 			# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
 		To disable::
 			# echo 0 > /sys/fs/resctrl/info/L3/io_alloc
 		The underlying implementation may reduce resources available to
 		general (CPU) cache allocation. See architecture specific notes
 		below. Depending on usage requirements the feature can be enabled
 		or disabled.
 		On AMD systems, io_alloc feature is supported by the L3 Smart
 		Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
 		io_alloc is the highest CLOSID supported by the resource. When
 		io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and
 		no longer available for general (CPU) cache allocation. When CDP is
 		enabled, io_alloc routes I/O traffic using the highest CLOSID allocated
 		for the instruction cache (CDP_CODE), making this CLOSID no longer
 		available for general (CPU) cache allocation for both the CDP_CODE
 		and CDP_DATA resources.
 "io_alloc_cbm":
 		Capacity bitmasks that describe the portions of cache instances to
 		which I/O traffic from supported I/O devices are routed when "io_alloc"
 		is enabled.
 		CBMs are displayed in the following format:
 			<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
 		Example::
 			# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
 =ffff;1=ffff
 		CBMs can be configured by writing to the interface.
 		Example::
 			# echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm
 			# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
 =ffff;1=00ff
 			# echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm
 			# cat /sys/fs/resctrl/info/L3/io_alloc_cbm
 =00ff;1=000f
 		When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE
 		resources may reflect the same values. For example, values read from and
 		written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by
 		/sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa.
 Memory bandwidth(MB) subdirectory contains the following files
 with respect to allocation:

									
										113

Documentation/netlink/specs/em.yamlNormal file

										View File
									
				@ -0,0 +1,113 @@

				# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)

				name: em

				doc: |

				  Energy model netlink interface to notify its changes.

				protocol: genetlink

				uapi-header: linux/energy_model.h

				attribute-sets:

				  -

				    name: pds

				    attributes:

				      -

				        name: pd

				        type: nest

				        nested-attributes: pd

				        multi-attr: true

				  -

				    name: pd

				    attributes:

				      -

				        name: pad

				        type: pad

				      -

				        name: pd-id

				        type: u32

				      -

				        name: flags

				        type: u64

				      -

				        name: cpus

				        type: string

				  -

				    name: pd-table

				    attributes:

				      -

				        name: pd-id

				        type: u32

				      -

				        name: ps

				        type: nest

				        nested-attributes: ps

				        multi-attr: true

				  -

				    name: ps

				    attributes:

				      -

				        name: pad

				        type: pad

				      -

				        name: performance

				        type: u64

				      -

				        name: frequency

				        type: u64

				      -

				        name: power

				        type: u64

				      -

				        name: cost

				        type: u64

				      -

				        name: flags

				        type: u64

				operations:

				  list:

				    -

				      name: get-pds

				      attribute-set: pds

				      doc: Get the list of information for all performance domains.

				      do:

				        reply:

				          attributes:

				            - pd

				    -

				      name: get-pd-table

				      attribute-set: pd-table

				      doc: Get the energy model table of a performance domain.

				      do:

				        request:

				          attributes:

				            - pd-id

				        reply:

				          attributes:

				            - pd-id

				            - ps

				    -

				      name: pd-created

				      doc: A performance domain is created.

				      notify: get-pd-table

				      mcgrp: event

				    -

				      name: pd-updated

				      doc: A performance domain is updated.

				      notify: get-pd-table

				      mcgrp: event

				    -

				      name: pd-deleted

				      doc: A performance domain is deleted.

				      attribute-set: pd-table

				      event:

				        attributes:

				            - pd-id

				      mcgrp: event

				mcast-groups:

				  list:

				    -

				      name: event

1

Documentation/power/index.rst

View File

 @ -19,6 +19,7 @@ Power Management
     power_supply_class
     runtime_pm
     s2ram
     shutdown-debugging
     suspend-and-cpuhotplug
     suspend-and-interrupts
     swsusp-and-swap-files

9

Documentation/power/pm_qos_interface.rst

View File

 @ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
 From user space:
 The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
 The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
 the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system wakeup
 latency QoS.
 Only processes can register a PM QoS request.  To provide for automatic
 @ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
 parameter requests as follows.
 To register the default PM QoS target for the CPU latency QoS, the process must
 open /dev/cpu_dma_latency.
 open /dev/cpu_dma_latency.  To register a CPU system wakeup QoS limit, the
 process must open /dev/cpu_wakeup_latency.
 As long as the device node is held open that process has a registered
 request on the parameter.
 To change the requested target value, the process needs to write an s32 value to
 the open device node.  Alternatively, it can write a hex string for the value
 using the 10 char long format e.g. "0x12345678".  This translates to a
 cpu_latency_qos_update_request() call.
 using the 10 char long format e.g. "0x12345678".
 To remove the user mode request for a target value simply close the device
 node.

									
										10

Documentation/power/runtime_pm.rst

										View File
									
				@ -480,16 +480,6 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:

				  `bool pm_runtime_status_suspended(struct device *dev);`

				    - return true if the device's runtime PM status is 'suspended'

				  `void pm_runtime_allow(struct device *dev);`

				    - set the power.runtime_auto flag for the device and decrease its usage

				      counter (used by the /sys/devices/.../power/control interface to

				      effectively allow the device to be power managed at run time)

				  `void pm_runtime_forbid(struct device *dev);`

				    - unset the power.runtime_auto flag for the device and increase its usage

				      counter (used by the /sys/devices/.../power/control interface to

				      effectively prevent the device from being power managed at run time)

				  `void pm_runtime_no_callbacks(struct device *dev);`

				    - set the power.no_callbacks flag for the device and remove the runtime

				      PM attributes from /sys/devices/.../power (or prevent them from being

									
										53

Documentation/power/shutdown-debugging.rstNormal file

										View File
									
				@ -0,0 +1,53 @@

				.. SPDX-License-Identifier: GPL-2.0

				Debugging Kernel Shutdown Hangs with pstore

				+++++++++++++++++++++++++++++++++++++++++++

				Overview

				========

				If the system hangs while shutting down, the kernel logs may need to be

				retrieved to debug the issue.

				On systems that have a UART available, it is best to configure the kernel to use

				this UART for kernel console output.

				If a UART isn't available, the ``pstore`` subsystem provides a mechanism to

				persist this data across a system reset, allowing it to be retrieved on the next

				boot.

				Kernel Configuration

				====================

				To enable ``pstore`` and enable saving kernel ring buffer logs, set the

				following kernel configuration options:

				* ``CONFIG_PSTORE=y``

				* ``CONFIG_PSTORE_CONSOLE=y``

				Additionally, enable a backend to store the data. Depending upon your platform

				some potential options include:

				* ``CONFIG_EFI_VARS_PSTORE=y``

				* ``CONFIG_PSTORE_RAM=y``

				* ``CONFIG_CHROMEOS_PSTORE=y``

				* ``CONFIG_PSTORE_BLK=y``

				Kernel Command-line Parameters

				==============================

				Add these parameters to your kernel command line:

				* ``printk.always_kmsg_dump=Y``

					* Forces the kernel to dump the entire message buffer to pstore during

						shutdown

				* ``efi_pstore.pstore_disable=N``

					* For EFI-based systems, ensures the EFI backend is active

				Userspace Interaction and Log Retrieval

				=======================================

				On the next boot after a hang, pstore logs will be available in the pstore

				filesystem (``/sys/fs/pstore``) and can be retrieved by userspace.

				On systemd systems, the ``systemd-pstore`` service will help do the following:

				#. Locate pstore data in ``/sys/fs/pstore``

				#. Read and save it to ``/var/lib/systemd/pstore``

				#. Clear pstore data for the next event

18

MAINTAINERS

View File

 @ -9188,6 +9188,9 @@ S:	Maintained
 F:	kernel/power/energy_model.c
 F:	include/linux/energy_model.h
 F:	Documentation/power/energy-model.rst
 F:	Documentation/netlink/specs/em.yaml
 F:	include/uapi/linux/energy_model.h
 F:	kernel/power/em_netlink*.*
 EPAPR HYPERVISOR BYTE CHANNEL DEVICE DRIVER
 M:	Laurentiu Tudor <laurentiu.tudor@nxp.com>
 @ -17470,6 +17473,16 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/leds/backlight/mps,mp3309c.yaml
 F:	drivers/video/backlight/mp3309c.c
 MPAM DRIVER
 M:	James Morse <james.morse@arm.com>
 M:	Ben Horgan <ben.horgan@arm.com>
 R:	Reinette Chatre <reinette.chatre@intel.com>
 R:	Fenghua Yu <fenghuay@nvidia.com>
 S:	Maintained
 F:	drivers/resctrl/mpam_*
 F:	drivers/resctrl/test_mpam_*
 F:	include/linux/arm_mpam.h
 MPS MP2869 DRIVER
 M:	Wensheng Wang <wenswang@yeah.net>
 L:	linux-hwmon@vger.kernel.org
 @ -21681,6 +21694,11 @@ S:	Maintained
 F:	Documentation/devicetree/bindings/spi/realtek,rtl9301-snand.yaml
 F:	drivers/spi/spi-realtek-rtl-snand.c
 REALTEK SYSTIMER DRIVER
 M:	Hao-Wen Ting <haowen.ting@realtek.com>
 S:	Maintained
 F:	drivers/clocksource/timer-realtek.c
 REALTEK WIRELESS DRIVER (rtlwifi family)
 M:	Ping-Ke Shih <pkshih@realtek.com>
 L:	linux-wireless@vger.kernel.org

									
										26

arch/arm/include/asm/uaccess.h

										View File
									
				@ -283,10 +283,17 @@ extern int __put_user_8(void *, unsigned long long);

					__gu_err;							\

				})

				/*

				 * This is a type: either unsigned long, if the argument fits into

				 * that type, or otherwise unsigned long long.

				 */

				#define __long_type(x) \

					__typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))

				#define __get_user_err(x, ptr, err, __t)				\

				do {									\

					unsigned long __gu_addr = (unsigned long)(ptr);			\

					unsigned long __gu_val;						\

					__long_type(x) __gu_val;					\

					unsigned int __ua_flags;					\

					__chk_user_ptr(ptr);						\

					might_fault();							\

				@ -295,6 +302,7 @@ do {									\

					case 1:	__get_user_asm_byte(__gu_val, __gu_addr, err, __t); break;	\

					case 2:	__get_user_asm_half(__gu_val, __gu_addr, err, __t); break;	\

					case 4:	__get_user_asm_word(__gu_val, __gu_addr, err, __t); break;	\

					case 8:	__get_user_asm_dword(__gu_val, __gu_addr, err, __t); break;	\

					default: (__gu_val) = __get_user_bad();				\

					}								\

					uaccess_restore(__ua_flags);					\

				@ -353,6 +361,22 @@ do {									\

				#define __get_user_asm_word(x, addr, err, __t)			\

					__get_user_asm(x, addr, err, "ldr" __t)

				#ifdef __ARMEB__

				#define __WORD0_OFFS	4

				#define __WORD1_OFFS	0

				#else

				#define __WORD0_OFFS	0

				#define __WORD1_OFFS	4

				#endif

				#define __get_user_asm_dword(x, addr, err, __t)				\

					({								\

					unsigned long __w0, __w1;					\

					__get_user_asm(__w0, addr + __WORD0_OFFS, err, "ldr" __t);	\

					__get_user_asm(__w1, addr + __WORD1_OFFS, err, "ldr" __t);	\

					(x) = ((u64)__w1 << 32) | (u64) __w0;				\

				})

				#define __put_user_switch(x, ptr, __err, __fn)				\

					do {								\

						const __typeof__(*(ptr)) __user *__pu_ptr = (ptr);	\

26

arch/arm64/Kconfig

View File

 @ -47,7 +47,6 @@ config ARM64
 	select ARCH_HAS_SETUP_DMA_OPS
 	select ARCH_HAS_SET_DIRECT_MAP
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_MEM_ENCRYPT
 	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	select ARCH_STACKWALK
 	select ARCH_HAS_STRICT_KERNEL_RWX
 @ -2023,6 +2022,31 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 config ARM64_MPAM
 	bool "Enable support for MPAM"
 	select ARM64_MPAM_DRIVER if EXPERT	# does nothing yet
 	select ACPI_MPAM if ACPI
 	help
 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
 	  optional extension to the Arm architecture that allows each
 	  transaction issued to the memory system to be labelled with a
 	  Partition identifier (PARTID) and Performance Monitoring Group
 	  identifier (PMG).
 	  Memory system components, such as the caches, can be configured with
 	  policies to control how much of various physical resources (such as
 	  memory bandwidth or cache memory) the transactions labelled with each
 	  PARTID can consume.  Depending on the capabilities of the hardware,
 	  the PARTID and PMG can also be used as filtering criteria to measure
 	  the memory system resource consumption of different parts of a
 	  workload.
 	  Use of this extension requires CPU support, support in the
 	  Memory System Components (MSC), and a description from firmware
 	  of where the MSCs are in the address space.
 	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
 endmenu # "ARMv8.4 architectural features"
 menu "ARMv8.5 architectural features"

									
										8

arch/arm64/include/asm/alternative-macros.h

										View File
									
				@ -19,7 +19,7 @@

				#error "cpucaps have overflown ARM64_CB_BIT"

				#endif

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/stringify.h>

				@ -207,7 +207,7 @@ alternative_endif

				#define _ALTERNATIVE_CFG(insn1, insn2, cap, cfg, ...)	\

					alternative_insn insn1, insn2, cap, IS_ENABLED(cfg)

				#endif  /*  __ASSEMBLY__  */

				#endif  /*  __ASSEMBLER__  */

				/*

				 * Usage: asm(ALTERNATIVE(oldinstr, newinstr, cpucap));

				@ -219,7 +219,7 @@ alternative_endif

				#define ALTERNATIVE(oldinstr, newinstr, ...)   \

					_ALTERNATIVE_CFG(oldinstr, newinstr, __VA_ARGS__, 1)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				@ -263,6 +263,6 @@ alternative_has_cap_unlikely(const unsigned long cpucap)

					return true;

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ALTERNATIVE_MACROS_H */

									
										4

arch/arm64/include/asm/alternative.h

										View File
									
				@ -4,7 +4,7 @@

				#include <asm/alternative-macros.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/init.h>

				#include <linux/types.h>

				@ -37,5 +37,5 @@ static inline int apply_alternatives_module(void *start, size_t length)

				void alt_cb_patch_nops(struct alt_instr *alt, __le32 *origptr,

						       __le32 *updptr, int nr_inst);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ALTERNATIVE_H */

									
										4

arch/arm64/include/asm/arch_gicv3.h

										View File
									
				@ -9,7 +9,7 @@

				#include <asm/sysreg.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/irqchip/arm-gic-common.h>

				#include <linux/stringify.h>

				@ -188,5 +188,5 @@ static inline bool gic_has_relaxed_pmr_sync(void)

					return cpus_have_cap(ARM64_HAS_GIC_PRIO_RELAXED_SYNC);

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ARCH_GICV3_H */

									
										6

arch/arm64/include/asm/asm-extable.h

										View File
									
				@ -27,7 +27,7 @@

				/* Data fields for EX_TYPE_UACCESS_CPY */

				#define EX_DATA_UACCESS_WRITE	BIT(0)

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#define __ASM_EXTABLE_RAW(insn, fixup, type, data)	\

					.pushsection	__ex_table, "a";		\

				@ -77,7 +77,7 @@

					__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_UACCESS_CPY, \uaccess_is_write)

					.endm

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				#include <linux/stringify.h>

				@ -132,6 +132,6 @@

							    EX_DATA_REG(ADDR, addr)				\

							  ")")

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ASM_EXTABLE_H */

									
										12

arch/arm64/include/asm/assembler.h

										View File
									
				@ -5,7 +5,7 @@

				 * Copyright (C) 1996-2000 Russell King

				 * Copyright (C) 2012 ARM Ltd.

				 */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#error "Only include this from assembly code"

				#endif

				@ -325,14 +325,14 @@ alternative_cb_end

				 * tcr_set_t0sz - update TCR.T0SZ so that we can load the ID map

				 */

					.macro	tcr_set_t0sz, valreg, t0sz

					bfi	\valreg, \t0sz, #TCR_T0SZ_OFFSET, #TCR_TxSZ_WIDTH

					bfi	\valreg, \t0sz, #TCR_EL1_T0SZ_SHIFT, #TCR_EL1_T0SZ_WIDTH

					.endm

				/*

				 * tcr_set_t1sz - update TCR.T1SZ

				 */

					.macro	tcr_set_t1sz, valreg, t1sz

					bfi	\valreg, \t1sz, #TCR_T1SZ_OFFSET, #TCR_TxSZ_WIDTH

					bfi	\valreg, \t1sz, #TCR_EL1_T1SZ_SHIFT, #TCR_EL1_T1SZ_WIDTH

					.endm

				/*

				@ -371,7 +371,7 @@ alternative_endif

				 * [start, end) with dcache line size explicitly provided.

				 *

				 * 	op:		operation passed to dc instruction

				 * 	domain:		domain used in dsb instruciton

				 * 	domain:		domain used in dsb instruction

				 * 	start:          starting virtual address of the region

				 * 	end:            end virtual address of the region

				 *	linesz:		dcache line size

				@ -412,7 +412,7 @@ alternative_endif

				 * [start, end)

				 *

				 * 	op:		operation passed to dc instruction

				 * 	domain:		domain used in dsb instruciton

				 * 	domain:		domain used in dsb instruction

				 * 	start:          starting virtual address of the region

				 * 	end:            end virtual address of the region

				 * 	fixup:		optional label to branch to on user fault

				@ -589,7 +589,7 @@ alternative_endif

					.macro	offset_ttbr1, ttbr, tmp

				#if defined(CONFIG_ARM64_VA_BITS_52) && !defined(CONFIG_ARM64_LPA2)

					mrs	\tmp, tcr_el1

					and	\tmp, \tmp, #TCR_T1SZ_MASK

					and	\tmp, \tmp, #TCR_EL1_T1SZ_MASK

					cmp	\tmp, #TCR_T1SZ(VA_BITS_MIN)

					orr	\tmp, \ttbr, #TTBR1_BADDR_4852_OFFSET

					csel	\ttbr, \tmp, \ttbr, eq

									
										20

arch/arm64/include/asm/atomic_lse.h

										View File
									
				@ -103,17 +103,17 @@ static __always_inline void __lse_atomic_and(int i, atomic_t *v)

					return __lse_atomic_andnot(~i, v);

				}

				#define ATOMIC_FETCH_OP_AND(name, mb, cl...)				\

				#define ATOMIC_FETCH_OP_AND(name)					\

				static __always_inline int						\

				__lse_atomic_fetch_and##name(int i, atomic_t *v)			\

				{									\

					return __lse_atomic_fetch_andnot##name(~i, v);			\

				}

				ATOMIC_FETCH_OP_AND(_relaxed,   )

				ATOMIC_FETCH_OP_AND(_acquire,  a, "memory")

				ATOMIC_FETCH_OP_AND(_release,  l, "memory")

				ATOMIC_FETCH_OP_AND(        , al, "memory")

				ATOMIC_FETCH_OP_AND(_relaxed)

				ATOMIC_FETCH_OP_AND(_acquire)

				ATOMIC_FETCH_OP_AND(_release)

				ATOMIC_FETCH_OP_AND(        )

				#undef ATOMIC_FETCH_OP_AND

				@ -210,17 +210,17 @@ static __always_inline void __lse_atomic64_and(s64 i, atomic64_t *v)

					return __lse_atomic64_andnot(~i, v);

				}

				#define ATOMIC64_FETCH_OP_AND(name, mb, cl...)				\

				#define ATOMIC64_FETCH_OP_AND(name)					\

				static __always_inline long						\

				__lse_atomic64_fetch_and##name(s64 i, atomic64_t *v)			\

				{									\

					return __lse_atomic64_fetch_andnot##name(~i, v);		\

				}

				ATOMIC64_FETCH_OP_AND(_relaxed,   )

				ATOMIC64_FETCH_OP_AND(_acquire,  a, "memory")

				ATOMIC64_FETCH_OP_AND(_release,  l, "memory")

				ATOMIC64_FETCH_OP_AND(        , al, "memory")

				ATOMIC64_FETCH_OP_AND(_relaxed)

				ATOMIC64_FETCH_OP_AND(_acquire)

				ATOMIC64_FETCH_OP_AND(_release)

				ATOMIC64_FETCH_OP_AND(        )

				#undef ATOMIC64_FETCH_OP_AND

									
										4

arch/arm64/include/asm/barrier.h

										View File
									
				@ -7,7 +7,7 @@

				#ifndef __ASM_BARRIER_H

				#define __ASM_BARRIER_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/kasan-checks.h>

				@ -221,6 +221,6 @@ do {									\

				#include <asm-generic/barrier.h>

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_BARRIER_H */

									
										4

arch/arm64/include/asm/cache.h

										View File
									
				@ -35,7 +35,7 @@

				#define ARCH_DMA_MINALIGN	(128)

				#define ARCH_KMALLOC_MINALIGN	(8)

				#if !defined(__ASSEMBLY__) && !defined(BUILD_VDSO)

				#if !defined(__ASSEMBLER__) && !defined(BUILD_VDSO)

				#include <linux/bitops.h>

				#include <linux/kasan-enabled.h>

				@ -135,6 +135,6 @@ static inline u32 __attribute_const__ read_cpuid_effective_cachetype(void)

					return ctr;

				}

				#endif /* !defined(__ASSEMBLY__) && !defined(BUILD_VDSO) */

				#endif /* !defined(__ASSEMBLER__) && !defined(BUILD_VDSO) */

				#endif

									
										4

arch/arm64/include/asm/cpucaps.h

										View File
									
				@ -5,7 +5,7 @@

				#include <asm/cpucap-defs.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				/*

				 * Check whether a cpucap is possible at compiletime.

				@ -77,6 +77,6 @@ cpucap_is_possible(const unsigned int cap)

					return true;

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_CPUCAPS_H */

									
										8

arch/arm64/include/asm/cpufeature.h

										View File
									
				@ -19,7 +19,7 @@

				#define ARM64_SW_FEATURE_OVERRIDE_HVHE		4

				#define ARM64_SW_FEATURE_OVERRIDE_RODATA_OFF	8

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bug.h>

				#include <linux/jump_label.h>

				@ -199,7 +199,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;

				 *    registers (e.g, SCTLR, TCR etc.) or patching the kernel via

				 *    alternatives. The kernel patching is batched and performed at later

				 *    point. The actions are always initiated only after the capability

				 *    is finalised. This is usally denoted by "enabling" the capability.

				 *    is finalised. This is usually denoted by "enabling" the capability.

				 *    The actions are initiated as follows :

				 *	a) Action is triggered on all online CPUs, after the capability is

				 *	finalised, invoked within the stop_machine() context from

				@ -251,7 +251,7 @@ extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0;

				#define ARM64_CPUCAP_SCOPE_LOCAL_CPU		((u16)BIT(0))

				#define ARM64_CPUCAP_SCOPE_SYSTEM		((u16)BIT(1))

				/*

				 * The capabilitiy is detected on the Boot CPU and is used by kernel

				 * The capability is detected on the Boot CPU and is used by kernel

				 * during early boot. i.e, the capability should be "detected" and

				 * "enabled" as early as possibly on all booting CPUs.

				 */

				@ -1078,6 +1078,6 @@ static inline bool cpu_has_lpa2(void)

				#endif

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										6

arch/arm64/include/asm/cputype.h

										View File
									
				@ -247,9 +247,9 @@

				/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */

				#define MIDR_FUJITSU_ERRATUM_010001		MIDR_FUJITSU_A64FX

				#define MIDR_FUJITSU_ERRATUM_010001_MASK	(~MIDR_CPU_VAR_REV(1, 0))

				#define TCR_CLEAR_FUJITSU_ERRATUM_010001	(TCR_NFD1 | TCR_NFD0)

				#define TCR_CLEAR_FUJITSU_ERRATUM_010001	(TCR_EL1_NFD1 | TCR_EL1_NFD0)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/sysreg.h>

				@ -328,6 +328,6 @@ static inline u32 __attribute_const__ read_cpuid_cachetype(void)

				{

					return read_cpuid(CTR_EL0);

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/current.h

										View File
									
				@ -4,7 +4,7 @@

				#include <linux/compiler.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

				@ -23,7 +23,7 @@ static __always_inline struct task_struct *get_current(void)

				#define current get_current()

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_CURRENT_H */

									
										4

arch/arm64/include/asm/debug-monitors.h

										View File
									
				@ -48,7 +48,7 @@

				#define AARCH32_BREAK_THUMB2_LO	0xf7f0

				#define AARCH32_BREAK_THUMB2_HI	0xa000

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

				#define DBG_ARCH_ID_RESERVED	0	/* In case of ptrace ABI updates. */

				@ -88,5 +88,5 @@ static inline bool try_step_suspended_breakpoints(struct pt_regs *regs)

				bool try_handle_aarch32_break(struct pt_regs *regs);

				#endif	/* __ASSEMBLY */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_DEBUG_MONITORS_H */

									
										13

arch/arm64/include/asm/efi.h

										View File
									
				@ -126,21 +126,14 @@ static inline void efi_set_pgd(struct mm_struct *mm)

						if (mm != current->active_mm) {

							/*

							 * Update the current thread's saved ttbr0 since it is

							 * restored as part of a return from exception. Enable

							 * access to the valid TTBR0_EL1 and invoke the errata

							 * workaround directly since there is no return from

							 * exception when invoking the EFI run-time services.

							 * restored as part of a return from exception.

							 */

							update_saved_ttbr0(current, mm);

							uaccess_ttbr0_enable();

							post_ttbr_update_workaround();

						} else {

							/*

							 * Defer the switch to the current thread's TTBR0_EL1

							 * until uaccess_enable(). Restore the current

							 * thread's saved ttbr0 corresponding to its active_mm

							 * Restore the current thread's saved ttbr0

							 * corresponding to its active_mm

							 */

							uaccess_ttbr0_disable();

							update_saved_ttbr0(current, current->active_mm);

						}

					}

									
										4

arch/arm64/include/asm/el2_setup.h

										View File
									
				@ -7,7 +7,7 @@

				#ifndef __ARM_KVM_INIT_H__

				#define __ARM_KVM_INIT_H__

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#error Assembly-only header

				#endif

				@ -24,7 +24,7 @@

					 * ID_AA64MMFR4_EL1.E2H0 < 0. On such CPUs HCR_EL2.E2H is RES1, but it

					 * can reset into an UNKNOWN state and might not read as 1 until it has

					 * been initialized explicitly.

					 * Initalize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H

					 * Initialize HCR_EL2.E2H so that later code can rely upon HCR_EL2.E2H

					 * indicating whether the CPU is running in E2H mode.

					 */

					mrs_s	x1, SYS_ID_AA64MMFR4_EL1

									
										4

arch/arm64/include/asm/elf.h

										View File
									
				@ -133,7 +133,7 @@

				#define ELF_ET_DYN_BASE		(2 * DEFAULT_MAP_WINDOW_64 / 3)

				#endif /* CONFIG_ARM64_FORCE_52BIT */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <uapi/linux/elf.h>

				#include <linux/bug.h>

				@ -293,6 +293,6 @@ static inline int arch_check_elf(void *ehdr, bool has_interp,

					return 0;

				}

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/esr.h

										View File
									
				@ -431,7 +431,7 @@

				#define ESR_ELx_IT_GCSPOPCX		6

				#define ESR_ELx_IT_GCSPOPX		7

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/types.h>

				static inline unsigned long esr_brk_comment(unsigned long esr)

				@ -534,6 +534,6 @@ static inline bool esr_iss_is_eretab(unsigned long esr)

				}

				const char *esr_get_class_string(unsigned long esr);

				#endif /* __ASSEMBLY */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_ESR_H */

									
										4

arch/arm64/include/asm/fixmap.h

										View File
									
				@ -15,7 +15,7 @@

				#ifndef _ASM_ARM64_FIXMAP_H

				#define _ASM_ARM64_FIXMAP_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/kernel.h>

				#include <linux/math.h>

				#include <linux/sizes.h>

				@ -117,5 +117,5 @@ extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t pr

				#include <asm-generic/fixmap.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* _ASM_ARM64_FIXMAP_H */

									
										2

arch/arm64/include/asm/fpsimd.h

										View File
									
				@ -12,7 +12,7 @@

				#include <asm/sigcontext.h>

				#include <asm/sysreg.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitmap.h>

				#include <linux/build_bug.h>

									
										6

arch/arm64/include/asm/ftrace.h

										View File
									
				@ -37,7 +37,7 @@

				 */

				#define ARCH_FTRACE_SHIFT_STACK_TRACER 1

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compat.h>

				extern void _mcount(unsigned long);

				@ -217,9 +217,9 @@ static inline bool arch_syscall_match_sym_name(const char *sym,

					 */

					return !strcmp(sym + 8, name);

				}

				#endif /* ifndef __ASSEMBLY__ */

				#endif /* ifndef __ASSEMBLER__ */

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#ifdef CONFIG_FUNCTION_GRAPH_TRACER

				void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,

									
										6

arch/arm64/include/asm/gpr-num.h

										View File
									
				@ -2,7 +2,7 @@

				#ifndef __ASM_GPR_NUM_H

				#define __ASM_GPR_NUM_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

					.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30

					.equ	.L__gpr_num_x\num, \num

				@ -11,7 +11,7 @@

					.equ	.L__gpr_num_xzr, 31

					.equ	.L__gpr_num_wzr, 31

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				#define __DEFINE_ASM_GPR_NUMS					\

				"	.irp	num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \

				@ -21,6 +21,6 @@

				"	.equ	.L__gpr_num_xzr, 31\n"				\

				"	.equ	.L__gpr_num_wzr, 31\n"

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_GPR_NUM_H */

									
										2

arch/arm64/include/asm/hwcap.h

										View File
									
				@ -46,7 +46,7 @@

				#define COMPAT_HWCAP2_SB	(1 << 5)

				#define COMPAT_HWCAP2_SSBS	(1 << 6)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/log2.h>

				/*

									
										4

arch/arm64/include/asm/image.h

										View File
									
				@ -20,7 +20,7 @@

				#define ARM64_IMAGE_FLAG_PAGE_SIZE_64K		3

				#define ARM64_IMAGE_FLAG_PHYS_BASE		1

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#define arm64_image_flag_field(flags, field) \

								(((flags) >> field##_SHIFT) & field##_MASK)

				@ -54,6 +54,6 @@ struct arm64_image_header {

					__le32 res5;

				};

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_IMAGE_H */

									
										4

arch/arm64/include/asm/insn.h

										View File
									
				@ -12,7 +12,7 @@

				#include <asm/insn-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				enum aarch64_insn_hint_cr_op {

					AARCH64_INSN_HINT_NOP	= 0x0 << 5,

				@ -730,6 +730,6 @@ u32 aarch32_insn_mcr_extract_crm(u32 insn);

				typedef bool (pstate_check_t)(unsigned long);

				extern pstate_check_t * const aarch32_opcode_cond_checks[16];

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif	/* __ASM_INSN_H */

									
										4

arch/arm64/include/asm/jump_label.h

										View File
									
				@ -8,7 +8,7 @@

				#ifndef __ASM_JUMP_LABEL_H

				#define __ASM_JUMP_LABEL_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				#include <asm/insn.h>

				@ -58,5 +58,5 @@ static __always_inline bool arch_static_branch_jump(struct static_key * const ke

					return true;

				}

				#endif  /* __ASSEMBLY__ */

				#endif  /* __ASSEMBLER__ */

				#endif	/* __ASM_JUMP_LABEL_H */

									
										2

arch/arm64/include/asm/kasan.h

										View File
									
				@ -2,7 +2,7 @@

				#ifndef __ASM_KASAN_H

				#define __ASM_KASAN_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/linkage.h>

				#include <asm/memory.h>

									
										4

arch/arm64/include/asm/kexec.h

										View File
									
				@ -25,7 +25,7 @@

				#define KEXEC_ARCH KEXEC_ARCH_AARCH64

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/**

				 * crash_setup_regs() - save registers for the panic kernel

				@ -130,6 +130,6 @@ extern int load_other_segments(struct kimage *image,

						char *cmdline);

				#endif

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/kgdb.h

										View File
									
				@ -14,7 +14,7 @@

				#include <linux/ptrace.h>

				#include <asm/debug-monitors.h>

				#ifndef	__ASSEMBLY__

				#ifndef	__ASSEMBLER__

				static inline void arch_kgdb_breakpoint(void)

				{

				@ -36,7 +36,7 @@ static inline int kgdb_single_step_handler(struct pt_regs *regs,

				}

				#endif

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				/*

				 * gdb remote procotol (well most versions of it) expects the following

									
										4

arch/arm64/include/asm/kvm_asm.h

										View File
									
				@ -46,7 +46,7 @@

				#define __KVM_HOST_SMCCC_FUNC___kvm_hyp_init			0

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/mm.h>

				@ -303,7 +303,7 @@ void kvm_compute_final_ctr_el0(struct alt_instr *alt,

				void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt,

					u64 elr_phys, u64 par, uintptr_t vcpu, u64 far, u64 hpfar);

				#else /* __ASSEMBLY__ */

				#else /* __ASSEMBLER__ */

				.macro get_host_ctxt reg, tmp

					adr_this_cpu \reg, kvm_host_data, \tmp

									
										4

arch/arm64/include/asm/kvm_mmu.h

										View File
									
				@ -49,7 +49,7 @@

				 * mappings, and none of this applies in that case.

				 */

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/alternative.h>

				@ -396,5 +396,5 @@ void kvm_s2_ptdump_create_debugfs(struct kvm *kvm);

				static inline void kvm_s2_ptdump_create_debugfs(struct kvm *kvm) {}

				#endif /* CONFIG_PTDUMP_STAGE2_DEBUGFS */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ARM64_KVM_MMU_H__ */

									
										4

arch/arm64/include/asm/kvm_mte.h

										View File
									
				@ -5,7 +5,7 @@

				#ifndef __ASM_KVM_MTE_H

				#define __ASM_KVM_MTE_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/sysreg.h>

				@ -62,5 +62,5 @@ alternative_else_nop_endif

				.endm

				#endif /* CONFIG_ARM64_MTE */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_KVM_MTE_H */

									
										6

arch/arm64/include/asm/kvm_ptrauth.h

										View File
									
				@ -8,7 +8,7 @@

				#ifndef __ASM_KVM_PTRAUTH_H

				#define __ASM_KVM_PTRAUTH_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/sysreg.h>

				@ -100,7 +100,7 @@ alternative_else_nop_endif

				.endm

				#endif /* CONFIG_ARM64_PTR_AUTH */

				#else  /* !__ASSEMBLY */

				#else  /* !__ASSEMBLER__ */

				#define __ptrauth_save_key(ctxt, key)					\

					do {								\

				@ -120,5 +120,5 @@ alternative_else_nop_endif

						__ptrauth_save_key(ctxt, APGA);				\

					} while(0)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_KVM_PTRAUTH_H */

									
										2

arch/arm64/include/asm/linkage.h

										View File
									
				@ -1,7 +1,7 @@

				#ifndef __ASM_LINKAGE_H

				#define __ASM_LINKAGE_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/assembler.h>

				#endif

									
										5

arch/arm64/include/asm/memory.h

										View File
									
				@ -207,7 +207,7 @@

				 */

				#define TRAMP_SWAPPER_OFFSET	(2 * PAGE_SIZE)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitops.h>

				#include <linux/compiler.h>

				@ -392,7 +392,6 @@ static inline unsigned long virt_to_pfn(const void *kaddr)

				 *  virt_to_page(x)	convert a _valid_ virtual address to struct page *

				 *  virt_addr_valid(x)	indicates whether a virtual address is valid

				 */

				#define ARCH_PFN_OFFSET		((unsigned long)PHYS_PFN_OFFSET)

				#if defined(CONFIG_DEBUG_VIRTUAL)

				#define page_to_virt(x)	({						\

				@ -422,7 +421,7 @@ static inline unsigned long virt_to_pfn(const void *kaddr)

				})

				void dump_mem_limit(void);

				#endif /* !ASSEMBLY */

				#endif /* !__ASSEMBLER__ */

				/*

				 * Given that the GIC architecture permits ITS implementations that can only be

									
										4

arch/arm64/include/asm/mmu.h

										View File
									
				@ -12,7 +12,7 @@

				#define USER_ASID_FLAG	(UL(1) << USER_ASID_BIT)

				#define TTBR_ASID_MASK	(UL(0xffff) << 48)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/refcount.h>

				#include <asm/cpufeature.h>

				@ -112,5 +112,5 @@ void kpti_install_ng_mappings(void);

				static inline void kpti_install_ng_mappings(void) {}

				#endif

				#endif	/* !__ASSEMBLY__ */

				#endif	/* !__ASSEMBLER__ */

				#endif

									
										20

arch/arm64/include/asm/mmu_context.h

										View File
									
				@ -8,7 +8,7 @@

				#ifndef __ASM_MMU_CONTEXT_H

				#define __ASM_MMU_CONTEXT_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compiler.h>

				#include <linux/sched.h>

				@ -61,11 +61,6 @@ static inline void cpu_switch_mm(pgd_t *pgd, struct mm_struct *mm)

					cpu_do_switch_mm(virt_to_phys(pgd),mm);

				}

				/*

				 * TCR.T0SZ value to use when the ID map is active.

				 */

				#define idmap_t0sz	TCR_T0SZ(IDMAP_VA_BITS)

				/*

				 * Ensure TCR.T0SZ is set to the provided value.

				 */

				@ -73,18 +68,15 @@ static inline void __cpu_set_tcr_t0sz(unsigned long t0sz)

				{

					unsigned long tcr = read_sysreg(tcr_el1);

					if ((tcr & TCR_T0SZ_MASK) == t0sz)

					if ((tcr & TCR_EL1_T0SZ_MASK) == t0sz)

						return;

					tcr &= ~TCR_T0SZ_MASK;

					tcr &= ~TCR_EL1_T0SZ_MASK;

					tcr |= t0sz;

					write_sysreg(tcr, tcr_el1);

					isb();

				}

				#define cpu_set_default_tcr_t0sz()	__cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual))

				#define cpu_set_idmap_tcr_t0sz()	__cpu_set_tcr_t0sz(idmap_t0sz)

				/*

				 * Remove the idmap from TTBR0_EL1 and install the pgd of the active mm.

				 *

				@ -103,7 +95,7 @@ static inline void cpu_uninstall_idmap(void)

					cpu_set_reserved_ttbr0();

					local_flush_tlb_all();

					cpu_set_default_tcr_t0sz();

					__cpu_set_tcr_t0sz(TCR_T0SZ(vabits_actual));

					if (mm != &init_mm && !system_uses_ttbr0_pan())

						cpu_switch_mm(mm->pgd, mm);

				@ -113,7 +105,7 @@ static inline void cpu_install_idmap(void)

				{

					cpu_set_reserved_ttbr0();

					local_flush_tlb_all();

					cpu_set_idmap_tcr_t0sz();

					__cpu_set_tcr_t0sz(TCR_T0SZ(IDMAP_VA_BITS));

					cpu_switch_mm(lm_alias(idmap_pg_dir), &init_mm);

				}

				@ -330,6 +322,6 @@ static inline void deactivate_mm(struct task_struct *tsk,

				#include <asm-generic/mmu_context.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* !__ASM_MMU_CONTEXT_H */

									
										4

arch/arm64/include/asm/mte-kasan.h

										View File
									
				@ -9,7 +9,7 @@

				#include <asm/cputype.h>

				#include <asm/mte-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/types.h>

				@ -259,6 +259,6 @@ static inline int mte_enable_kernel_store_only(void)

				#endif /* CONFIG_ARM64_MTE */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_MTE_KASAN_H  */

									
										4

arch/arm64/include/asm/mte.h

										View File
									
				@ -8,7 +8,7 @@

				#include <asm/compiler.h>

				#include <asm/mte-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitfield.h>

				#include <linux/kasan-enabled.h>

				@ -282,5 +282,5 @@ static inline void mte_check_tfsr_exit(void)

				}

				#endif /* CONFIG_KASAN_HW_TAGS */

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_MTE_H  */

									
										4

arch/arm64/include/asm/page.h

										View File
									
				@ -10,7 +10,7 @@

				#include <asm/page-def.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/personality.h> /* for READ_IMPLIES_EXEC */

				#include <linux/types.h> /* for gfp_t */

				@ -45,7 +45,7 @@ int pfn_is_map_memory(unsigned long pfn);

				#include <asm/memory.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#define VM_DATA_DEFAULT_FLAGS	(VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED)

									
										125

arch/arm64/include/asm/pgtable-hwdef.h

										View File
									
				@ -228,102 +228,53 @@

				/*

				 * TCR flags.

				 */

				#define TCR_T0SZ_OFFSET		0

				#define TCR_T1SZ_OFFSET		16

				#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_T0SZ_OFFSET)

				#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_T1SZ_OFFSET)

				#define TCR_TxSZ(x)		(TCR_T0SZ(x) | TCR_T1SZ(x))

				#define TCR_TxSZ_WIDTH		6

				#define TCR_T0SZ_MASK		(((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T0SZ_OFFSET)

				#define TCR_T1SZ_MASK		(((UL(1) << TCR_TxSZ_WIDTH) - 1) << TCR_T1SZ_OFFSET)

				#define TCR_T0SZ(x)		((UL(64) - (x)) << TCR_EL1_T0SZ_SHIFT)

				#define TCR_T1SZ(x)		((UL(64) - (x)) << TCR_EL1_T1SZ_SHIFT)

				#define TCR_EPD0_SHIFT		7

				#define TCR_EPD0_MASK		(UL(1) << TCR_EPD0_SHIFT)

				#define TCR_IRGN0_SHIFT		8

				#define TCR_IRGN0_MASK		(UL(3) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_NC		(UL(0) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WBWA		(UL(1) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WT		(UL(2) << TCR_IRGN0_SHIFT)

				#define TCR_IRGN0_WBnWA		(UL(3) << TCR_IRGN0_SHIFT)

				#define TCR_T0SZ_MASK		TCR_EL1_T0SZ_MASK

				#define TCR_T1SZ_MASK		TCR_EL1_T1SZ_MASK

				#define TCR_EPD1_SHIFT		23

				#define TCR_EPD1_MASK		(UL(1) << TCR_EPD1_SHIFT)

				#define TCR_IRGN1_SHIFT		24

				#define TCR_IRGN1_MASK		(UL(3) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_NC		(UL(0) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WBWA		(UL(1) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WT		(UL(2) << TCR_IRGN1_SHIFT)

				#define TCR_IRGN1_WBnWA		(UL(3) << TCR_IRGN1_SHIFT)

				#define TCR_EPD0_MASK		TCR_EL1_EPD0_MASK

				#define TCR_EPD1_MASK		TCR_EL1_EPD1_MASK

				#define TCR_IRGN_NC		(TCR_IRGN0_NC | TCR_IRGN1_NC)

				#define TCR_IRGN_WBWA		(TCR_IRGN0_WBWA | TCR_IRGN1_WBWA)

				#define TCR_IRGN_WT		(TCR_IRGN0_WT | TCR_IRGN1_WT)

				#define TCR_IRGN_WBnWA		(TCR_IRGN0_WBnWA | TCR_IRGN1_WBnWA)

				#define TCR_IRGN_MASK		(TCR_IRGN0_MASK | TCR_IRGN1_MASK)

				#define TCR_IRGN0_MASK		TCR_EL1_IRGN0_MASK

				#define TCR_IRGN0_WBWA		(TCR_EL1_IRGN0_WBWA << TCR_EL1_IRGN0_SHIFT)

				#define TCR_ORGN0_MASK		TCR_EL1_ORGN0_MASK

				#define TCR_ORGN0_WBWA		(TCR_EL1_ORGN0_WBWA << TCR_EL1_ORGN0_SHIFT)

				#define TCR_ORGN0_SHIFT		10

				#define TCR_ORGN0_MASK		(UL(3) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_NC		(UL(0) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WBWA		(UL(1) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WT		(UL(2) << TCR_ORGN0_SHIFT)

				#define TCR_ORGN0_WBnWA		(UL(3) << TCR_ORGN0_SHIFT)

				#define TCR_SH0_MASK		TCR_EL1_SH0_MASK

				#define TCR_SH0_INNER		(TCR_EL1_SH0_INNER << TCR_EL1_SH0_SHIFT)

				#define TCR_ORGN1_SHIFT		26

				#define TCR_ORGN1_MASK		(UL(3) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_NC		(UL(0) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WBWA		(UL(1) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WT		(UL(2) << TCR_ORGN1_SHIFT)

				#define TCR_ORGN1_WBnWA		(UL(3) << TCR_ORGN1_SHIFT)

				#define TCR_SH1_MASK		TCR_EL1_SH1_MASK

				#define TCR_ORGN_NC		(TCR_ORGN0_NC | TCR_ORGN1_NC)

				#define TCR_ORGN_WBWA		(TCR_ORGN0_WBWA | TCR_ORGN1_WBWA)

				#define TCR_ORGN_WT		(TCR_ORGN0_WT | TCR_ORGN1_WT)

				#define TCR_ORGN_WBnWA		(TCR_ORGN0_WBnWA | TCR_ORGN1_WBnWA)

				#define TCR_ORGN_MASK		(TCR_ORGN0_MASK | TCR_ORGN1_MASK)

				#define TCR_TG0_SHIFT		TCR_EL1_TG0_SHIFT

				#define TCR_TG0_MASK		TCR_EL1_TG0_MASK

				#define TCR_TG0_4K		(TCR_EL1_TG0_4K << TCR_EL1_TG0_SHIFT)

				#define TCR_TG0_64K		(TCR_EL1_TG0_64K << TCR_EL1_TG0_SHIFT)

				#define TCR_TG0_16K		(TCR_EL1_TG0_16K << TCR_EL1_TG0_SHIFT)

				#define TCR_SH0_SHIFT		12

				#define TCR_SH0_MASK		(UL(3) << TCR_SH0_SHIFT)

				#define TCR_SH0_INNER		(UL(3) << TCR_SH0_SHIFT)

				#define TCR_TG1_SHIFT		TCR_EL1_TG1_SHIFT

				#define TCR_TG1_MASK		TCR_EL1_TG1_MASK

				#define TCR_TG1_16K		(TCR_EL1_TG1_16K << TCR_EL1_TG1_SHIFT)

				#define TCR_TG1_4K		(TCR_EL1_TG1_4K << TCR_EL1_TG1_SHIFT)

				#define TCR_TG1_64K		(TCR_EL1_TG1_64K << TCR_EL1_TG1_SHIFT)

				#define TCR_SH1_SHIFT		28

				#define TCR_SH1_MASK		(UL(3) << TCR_SH1_SHIFT)

				#define TCR_SH1_INNER		(UL(3) << TCR_SH1_SHIFT)

				#define TCR_SHARED		(TCR_SH0_INNER | TCR_SH1_INNER)

				#define TCR_TG0_SHIFT		14

				#define TCR_TG0_MASK		(UL(3) << TCR_TG0_SHIFT)

				#define TCR_TG0_4K		(UL(0) << TCR_TG0_SHIFT)

				#define TCR_TG0_64K		(UL(1) << TCR_TG0_SHIFT)

				#define TCR_TG0_16K		(UL(2) << TCR_TG0_SHIFT)

				#define TCR_TG1_SHIFT		30

				#define TCR_TG1_MASK		(UL(3) << TCR_TG1_SHIFT)

				#define TCR_TG1_16K		(UL(1) << TCR_TG1_SHIFT)

				#define TCR_TG1_4K		(UL(2) << TCR_TG1_SHIFT)

				#define TCR_TG1_64K		(UL(3) << TCR_TG1_SHIFT)

				#define TCR_IPS_SHIFT		32

				#define TCR_IPS_MASK		(UL(7) << TCR_IPS_SHIFT)

				#define TCR_A1			(UL(1) << 22)

				#define TCR_ASID16		(UL(1) << 36)

				#define TCR_TBI0		(UL(1) << 37)

				#define TCR_TBI1		(UL(1) << 38)

				#define TCR_HA			(UL(1) << 39)

				#define TCR_HD			(UL(1) << 40)

				#define TCR_HPD0_SHIFT		41

				#define TCR_HPD0		(UL(1) << TCR_HPD0_SHIFT)

				#define TCR_HPD1_SHIFT		42

				#define TCR_HPD1		(UL(1) << TCR_HPD1_SHIFT)

				#define TCR_TBID0		(UL(1) << 51)

				#define TCR_TBID1		(UL(1) << 52)

				#define TCR_NFD0		(UL(1) << 53)

				#define TCR_NFD1		(UL(1) << 54)

				#define TCR_E0PD0		(UL(1) << 55)

				#define TCR_E0PD1		(UL(1) << 56)

				#define TCR_TCMA0		(UL(1) << 57)

				#define TCR_TCMA1		(UL(1) << 58)

				#define TCR_DS			(UL(1) << 59)

				#define TCR_IPS_SHIFT		TCR_EL1_IPS_SHIFT

				#define TCR_IPS_MASK		TCR_EL1_IPS_MASK

				#define TCR_A1			TCR_EL1_A1

				#define TCR_ASID16		TCR_EL1_AS

				#define TCR_TBI0		TCR_EL1_TBI0

				#define TCR_TBI1		TCR_EL1_TBI1

				#define TCR_HA			TCR_EL1_HA

				#define TCR_HD			TCR_EL1_HD

				#define TCR_HPD0		TCR_EL1_HPD0

				#define TCR_HPD1		TCR_EL1_HPD1

				#define TCR_TBID0		TCR_EL1_TBID0

				#define TCR_TBID1		TCR_EL1_TBID1

				#define TCR_E0PD0		TCR_EL1_E0PD0

				#define TCR_E0PD1		TCR_EL1_E0PD1

				#define TCR_DS			TCR_EL1_DS

				/*

				 * TTBR.

									
										6

arch/arm64/include/asm/pgtable-prot.h

										View File
									
				@ -62,7 +62,7 @@

				#define _PAGE_READONLY_EXEC	(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN)

				#define _PAGE_EXECONLY		(_PAGE_DEFAULT | PTE_RDONLY | PTE_NG | PTE_PXN)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/cpufeature.h>

				#include <asm/pgtable-types.h>

				@ -84,7 +84,7 @@ extern unsigned long prot_ns_shared;

				#else

				static inline bool __pure lpa2_is_enabled(void)

				{

					return read_tcr() & TCR_DS;

					return read_tcr() & TCR_EL1_DS;

				}

				#define PTE_MAYBE_SHARED	(lpa2_is_enabled() ? 0 : PTE_SHARED)

				@ -127,7 +127,7 @@ static inline bool __pure lpa2_is_enabled(void)

				#define PAGE_READONLY_EXEC	__pgprot(_PAGE_READONLY_EXEC)

				#define PAGE_EXECONLY		__pgprot(_PAGE_EXECONLY)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#define pte_pi_index(pte) ( \

					((pte & BIT(PTE_PI_IDX_3)) >> (PTE_PI_IDX_3 - 3)) | \

									
										22

arch/arm64/include/asm/pgtable.h

										View File
									
				@ -30,7 +30,7 @@

				#define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/cmpxchg.h>

				#include <asm/fixmap.h>

				@ -130,12 +130,16 @@ static inline void arch_leave_lazy_mmu_mode(void)

				#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

				/*

				 * Outside of a few very special situations (e.g. hibernation), we always

				 * use broadcast TLB invalidation instructions, therefore a spurious page

				 * fault on one CPU which has been handled concurrently by another CPU

				 * does not need to perform additional invalidation.

				 * We use local TLB invalidation instruction when reusing page in

				 * write protection fault handler to avoid TLBI broadcast in the hot

				 * path.  This will cause spurious page faults if stale read-only TLB

				 * entries exist.

				 */

				#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0)

				#define flush_tlb_fix_spurious_fault(vma, address, ptep)	\

					local_flush_tlb_page_nonotify(vma, address)

				#define flush_tlb_fix_spurious_fault_pmd(vma, address, pmdp)	\

					local_flush_tlb_page_nonotify(vma, address)

				/*

				 * ZERO_PAGE is a global shared page that is always zero: used

				@ -433,7 +437,7 @@ bool pgattr_change_is_safe(pteval_t old, pteval_t new);

				 *   1      0      |   1           0          1

				 *   1      1      |   0           1          x

				 *

				 * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via

				 * When hardware DBM is not present, the software PTE_DIRTY bit is updated via

				 * the page fault mechanism. Checking the dirty status of a pte becomes:

				 *

				 *   PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)

				@ -599,7 +603,7 @@ static inline int pte_protnone(pte_t pte)

					/*

					 * pte_present_invalid() tells us that the pte is invalid from HW

					 * perspective but present from SW perspective, so the fields are to be

					 * interpretted as per the HW layout. The second 2 checks are the unique

					 * interpreted as per the HW layout. The second 2 checks are the unique

					 * encoding that we use for PROT_NONE. It is insufficient to only use

					 * the first check because we share the same encoding scheme with pmds

					 * which support pmd_mkinvalid(), so can be present-invalid without

				@ -1949,6 +1953,6 @@ static inline void clear_young_dirty_ptes(struct vm_area_struct *vma,

				#endif /* CONFIG_ARM64_CONTPTE */

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_PGTABLE_H */

									
										4

arch/arm64/include/asm/proc-fns.h

										View File
									
				@ -9,7 +9,7 @@

				#ifndef __ASM_PROCFNS_H

				#define __ASM_PROCFNS_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/page.h>

				@ -21,5 +21,5 @@ extern u64 cpu_do_resume(phys_addr_t ptr, u64 idmap_ttbr);

				#include <asm/memory.h>

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_PROCFNS_H */

									
										4

arch/arm64/include/asm/processor.h

										View File
									
				@ -25,7 +25,7 @@

				#define MTE_CTRL_STORE_ONLY		(1UL << 19)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/build_bug.h>

				#include <linux/cache.h>

				@ -437,5 +437,5 @@ int set_tsc_mode(unsigned int val);

				#define GET_TSC_CTL(adr)        get_tsc_mode((adr))

				#define SET_TSC_CTL(val)        set_tsc_mode((val))

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_PROCESSOR_H */

									
										4

arch/arm64/include/asm/ptrace.h

										View File
									
				@ -94,7 +94,7 @@

				 */

				#define NO_SYSCALL (-1)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bug.h>

				#include <linux/types.h>

				@ -361,5 +361,5 @@ static inline void procedure_link_pointer_set(struct pt_regs *regs,

				extern unsigned long profile_pc(struct pt_regs *regs);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif

									
										4

arch/arm64/include/asm/rsi_smc.h

										View File
									
				@ -122,7 +122,7 @@

				 */

				#define SMC_RSI_ATTESTATION_TOKEN_CONTINUE	SMC_RSI_FID(0x195)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct realm_config {

					union {

				@ -142,7 +142,7 @@ struct realm_config {

					 */

				} __aligned(0x1000);

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				/*

				 * Read configuration for the current Realm.

									
										4

arch/arm64/include/asm/rwonce.h

										View File
									
				@ -5,7 +5,7 @@

				#ifndef __ASM_RWONCE_H

				#define __ASM_RWONCE_H

				#if defined(CONFIG_LTO) && !defined(__ASSEMBLY__)

				#if defined(CONFIG_LTO) && !defined(__ASSEMBLER__)

				#include <linux/compiler_types.h>

				#include <asm/alternative-macros.h>

				@ -62,7 +62,7 @@

				})

				#endif	/* !BUILD_VDSO */

				#endif	/* CONFIG_LTO && !__ASSEMBLY__ */

				#endif	/* CONFIG_LTO && !__ASSEMBLER__ */

				#include <asm-generic/rwonce.h>

									
										4

arch/arm64/include/asm/scs.h

										View File
									
				@ -2,7 +2,7 @@

				#ifndef _ASM_SCS_H

				#define _ASM_SCS_H

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#include <asm/asm-offsets.h>

				#include <asm/sysreg.h>

				@ -55,6 +55,6 @@ enum {

				int __pi_scs_patch(const u8 eh_frame[], int size, bool skip_dry_run);

				#endif /* __ASSEMBLY __ */

				#endif /* __ASSEMBLER__ */

				#endif /* _ASM_SCS_H */

									
										4

arch/arm64/include/asm/sdei.h

										View File
									
				@ -9,7 +9,7 @@

				#define SDEI_STACK_SIZE		IRQ_STACK_SIZE

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/linkage.h>

				#include <linux/preempt.h>

				@ -49,5 +49,5 @@ unsigned long do_sdei_event(struct pt_regs *regs,

				unsigned long sdei_arch_get_entry_point(int conduit);

				#define sdei_arch_get_entry_point(x)	sdei_arch_get_entry_point(x)

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif	/* __ASM_SDEI_H */

									
										2

arch/arm64/include/asm/simd.h

										View File
									
				@ -29,7 +29,7 @@ static __must_check inline bool may_use_simd(void)

					 */

					return !WARN_ON(!system_capabilities_finalized()) &&

					       system_supports_fpsimd() &&

					       !in_hardirq() && !irqs_disabled() && !in_nmi();

					       !in_hardirq() && !in_nmi();

				}

				#else /* ! CONFIG_KERNEL_MODE_NEON */

									
										4

arch/arm64/include/asm/smp.h

										View File
									
				@ -23,7 +23,7 @@

				#define CPU_STUCK_REASON_52_BIT_VA	(UL(1) << CPU_STUCK_REASON_SHIFT)

				#define CPU_STUCK_REASON_NO_GRAN	(UL(2) << CPU_STUCK_REASON_SHIFT)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/threads.h>

				#include <linux/cpumask.h>

				@ -155,6 +155,6 @@ bool cpus_are_stuck_in_kernel(void);

				extern void crash_smp_send_stop(void);

				extern bool smp_crash_stop_failed(void);

				#endif /* ifndef __ASSEMBLY__ */

				#endif /* ifndef __ASSEMBLER__ */

				#endif /* ifndef __ASM_SMP_H */

									
										4

arch/arm64/include/asm/spectre.h

										View File
									
				@ -12,7 +12,7 @@

				#define BP_HARDEN_EL2_SLOTS 4

				#define __BP_HARDEN_HYP_VECS_SZ	((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/smp.h>

				#include <asm/percpu.h>

				@ -119,5 +119,5 @@ void spectre_bhb_patch_clearbhb(struct alt_instr *alt,

								__le32 *origptr, __le32 *updptr, int nr_inst);

				void spectre_print_disabled_mitigations(void);

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_SPECTRE_H */

									
										4

arch/arm64/include/asm/stacktrace/frame.h

										View File
									
				@ -25,7 +25,7 @@

				#define FRAME_META_TYPE_FINAL		1

				#define FRAME_META_TYPE_PT_REGS		2

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/* 

				 * A standard AAPCS64 frame record.

				 */

				@ -43,6 +43,6 @@ struct frame_record_meta {

					struct frame_record record;

					u64 type;

				};

				#endif /* __ASSEMBLY */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_STACKTRACE_FRAME_H */

									
										2

arch/arm64/include/asm/suspend.h

										View File
									
				@ -23,7 +23,7 @@ struct cpu_suspend_ctx {

				 * __cpu_suspend_enter()'s caller, and populated by __cpu_suspend_enter().

				 * This data must survive until cpu_resume() is called.

				 *

				 * This struct desribes the size and the layout of the saved cpu state.

				 * This struct describes the size and the layout of the saved cpu state.

				 * The layout of the callee_saved_regs is defined by the implementation

				 * of __cpu_suspend_enter(), and cpu_resume(). This struct must be passed

				 * in by the caller as __cpu_suspend_enter()'s stack-frame is gone once it

									
										12

arch/arm64/include/asm/sysreg.h

										View File
									
				@ -52,7 +52,7 @@

				#ifndef CONFIG_BROKEN_GAS_INST

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				// The space separator is omitted so that __emit_inst(x) can be parsed as

				// either an assembler directive or an assembler macro argument.

				#define __emit_inst(x)			.inst(x)

				@ -71,11 +71,11 @@

									 (((x) >> 24) & 0x000000ff))

				#endif	/* CONFIG_CPU_BIG_ENDIAN */

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

				#define __emit_inst(x)			.long __INSTR_BSWAP(x)

				#else  /* __ASSEMBLY__ */

				#else  /* __ASSEMBLER__ */

				#define __emit_inst(x)			".long " __stringify(__INSTR_BSWAP(x)) "\n\t"

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* CONFIG_BROKEN_GAS_INST */

				@ -1129,9 +1129,7 @@

				#define gicr_insn(insn)			read_sysreg_s(GICV5_OP_GICR_##insn)

				#define gic_insn(v, insn)		write_sysreg_s(v, GICV5_OP_GIC_##insn)

				#define ARM64_FEATURE_FIELD_BITS	4

				#ifdef __ASSEMBLY__

				#ifdef __ASSEMBLER__

					.macro	mrs_s, rt, sreg

					 __emit_inst(0xd5200000|(\sreg)|(.L__gpr_num_\rt))

									
										4

arch/arm64/include/asm/system_misc.h

										View File
									
				@ -7,7 +7,7 @@

				#ifndef __ASM_SYSTEM_MISC_H

				#define __ASM_SYSTEM_MISC_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/compiler.h>

				#include <linux/linkage.h>

				@ -28,6 +28,6 @@ void arm64_notify_die(const char *str, struct pt_regs *regs,

				struct mm_struct;

				extern void __show_regs(struct pt_regs *);

				#endif	/* __ASSEMBLY__ */

				#endif	/* __ASSEMBLER__ */

				#endif	/* __ASM_SYSTEM_MISC_H */

									
										2

arch/arm64/include/asm/thread_info.h

										View File
									
				@ -10,7 +10,7 @@

				#include <linux/compiler.h>

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				struct task_struct;

									
										85

arch/arm64/include/asm/tlbflush.h

										View File
									
				@ -8,7 +8,7 @@

				#ifndef __ASM_TLBFLUSH_H

				#define __ASM_TLBFLUSH_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/bitfield.h>

				#include <linux/mm_types.h>

				@ -249,6 +249,19 @@ static inline unsigned long get_trans_granule(void)

				 *		cannot be easily determined, the value TLBI_TTL_UNKNOWN will

				 *		perform a non-hinted invalidation.

				 *

				 *	local_flush_tlb_page(vma, addr)

				 *		Local variant of flush_tlb_page().  Stale TLB entries may

				 *		remain in remote CPUs.

				 *

				 *	local_flush_tlb_page_nonotify(vma, addr)

				 *		Same as local_flush_tlb_page() except MMU notifier will not be

				 *		called.

				 *

				 *	local_flush_tlb_contpte(vma, addr)

				 *		Invalidate the virtual-address range

				 *		'[addr, addr+CONT_PTE_SIZE)' mapped with contpte on local CPU

				 *		for the user address space corresponding to 'vma->mm'.  Stale

				 *		TLB entries may remain in remote CPUs.

				 *

				 *	Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented

				 *	on top of these routines, since that is our interface to the mmu_gather

				@ -282,6 +295,33 @@ static inline void flush_tlb_mm(struct mm_struct *mm)

					mmu_notifier_arch_invalidate_secondary_tlbs(mm, 0, -1UL);

				}

				static inline void __local_flush_tlb_page_nonotify_nosync(struct mm_struct *mm,

											  unsigned long uaddr)

				{

					unsigned long addr;

					dsb(nshst);

					addr = __TLBI_VADDR(uaddr, ASID(mm));

					__tlbi(vale1, addr);

					__tlbi_user(vale1, addr);

				}

				static inline void local_flush_tlb_page_nonotify(struct vm_area_struct *vma,

										 unsigned long uaddr)

				{

					__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);

					dsb(nsh);

				}

				static inline void local_flush_tlb_page(struct vm_area_struct *vma,

									unsigned long uaddr)

				{

					__local_flush_tlb_page_nonotify_nosync(vma->vm_mm, uaddr);

					mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, uaddr & PAGE_MASK,

										(uaddr & PAGE_MASK) + PAGE_SIZE);

					dsb(nsh);

				}

				static inline void __flush_tlb_page_nosync(struct mm_struct *mm,

									   unsigned long uaddr)

				{

				@ -472,6 +512,22 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,

					dsb(ish);

				}

				static inline void local_flush_tlb_contpte(struct vm_area_struct *vma,

									   unsigned long addr)

				{

					unsigned long asid;

					addr = round_down(addr, CONT_PTE_SIZE);

					dsb(nshst);

					asid = ASID(vma->vm_mm);

					__flush_tlb_range_op(vale1, addr, CONT_PTES, PAGE_SIZE, asid,

							     3, true, lpa2_is_enabled());

					mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, addr,

										    addr + CONT_PTE_SIZE);

					dsb(nsh);

				}

				static inline void flush_tlb_range(struct vm_area_struct *vma,

								   unsigned long start, unsigned long end)

				{

				@ -524,6 +580,33 @@ static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *b

				{

					__flush_tlb_range_nosync(mm, start, end, PAGE_SIZE, true, 3);

				}

				static inline bool __pte_flags_need_flush(ptdesc_t oldval, ptdesc_t newval)

				{

					ptdesc_t diff = oldval ^ newval;

					/* invalid to valid transition requires no flush */

					if (!(oldval & PTE_VALID))

						return false;

					/* Transition in the SW bits requires no flush */

					diff &= ~PTE_SWBITS_MASK;

					return diff;

				}

				static inline bool pte_needs_flush(pte_t oldpte, pte_t newpte)

				{

					return __pte_flags_need_flush(pte_val(oldpte), pte_val(newpte));

				}

				#define pte_needs_flush pte_needs_flush

				static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd)

				{

					return __pte_flags_need_flush(pmd_val(oldpmd), pmd_val(newpmd));

				}

				#define huge_pmd_needs_flush huge_pmd_needs_flush

				#endif

				#endif

									
										4

arch/arm64/include/asm/uaccess.h

										View File
									
				@ -422,9 +422,9 @@ static __must_check __always_inline bool user_access_begin(const void __user *pt

				}

				#define user_access_begin(a,b)	user_access_begin(a,b)

				#define user_access_end()	uaccess_ttbr0_disable()

				#define unsafe_put_user(x, ptr, label) \

				#define arch_unsafe_put_user(x, ptr, label) \

					__raw_put_mem("sttr", x, uaccess_mask_ptr(ptr), label, U)

				#define unsafe_get_user(x, ptr, label) \

				#define arch_unsafe_get_user(x, ptr, label) \

					__raw_get_mem("ldtr", x, uaccess_mask_ptr(ptr), label, U)

				/*

									
										4

arch/arm64/include/asm/vdso.h

										View File
									
				@ -7,7 +7,7 @@

				#define __VDSO_PAGES    4

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <generated/vdso-offsets.h>

				@ -19,6 +19,6 @@

				extern char vdso_start[], vdso_end[];

				extern char vdso32_start[], vdso32_end[];

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_H */

									
										4

arch/arm64/include/asm/vdso/compat_barrier.h

										View File
									
				@ -5,7 +5,7 @@

				#ifndef __COMPAT_BARRIER_H

				#define __COMPAT_BARRIER_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				/*

				 * Warning: This code is meant to be used from the compat vDSO only.

				 */

				@ -31,6 +31,6 @@

				#define smp_rmb()	aarch32_smp_rmb()

				#define smp_wmb()	aarch32_smp_wmb()

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __COMPAT_BARRIER_H */

									
										4

arch/arm64/include/asm/vdso/compat_gettimeofday.h

										View File
									
				@ -5,7 +5,7 @@

				#ifndef __ASM_VDSO_COMPAT_GETTIMEOFDAY_H

				#define __ASM_VDSO_COMPAT_GETTIMEOFDAY_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/barrier.h>

				#include <asm/unistd_compat_32.h>

				@ -161,6 +161,6 @@ static inline bool vdso_clocksource_ok(const struct vdso_clock *vc)

				}

				#define vdso_clocksource_ok	vdso_clocksource_ok

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_COMPAT_GETTIMEOFDAY_H */

									
										4

arch/arm64/include/asm/vdso/getrandom.h

										View File
									
				@ -3,7 +3,7 @@

				#ifndef __ASM_VDSO_GETRANDOM_H

				#define __ASM_VDSO_GETRANDOM_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/unistd.h>

				#include <asm/vdso/vsyscall.h>

				@ -33,6 +33,6 @@ static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, uns

					return ret;

				}

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_GETRANDOM_H */

									
										4

arch/arm64/include/asm/vdso/gettimeofday.h

										View File
									
				@ -7,7 +7,7 @@

				#ifdef __aarch64__

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/alternative.h>

				#include <asm/arch_timer.h>

				@ -96,7 +96,7 @@ static __always_inline const struct vdso_time_data *__arch_get_vdso_u_time_data(

				#define __arch_get_vdso_u_time_data __arch_get_vdso_u_time_data

				#endif /* IS_ENABLED(CONFIG_CC_IS_GCC) && IS_ENABLED(CONFIG_PAGE_SIZE_64KB) */

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#else /* !__aarch64__ */

									
										4

arch/arm64/include/asm/vdso/processor.h

										View File
									
				@ -5,13 +5,13 @@

				#ifndef __ASM_VDSO_PROCESSOR_H

				#define __ASM_VDSO_PROCESSOR_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				static inline void cpu_relax(void)

				{

					asm volatile("yield" ::: "memory");

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* __ASM_VDSO_PROCESSOR_H */

									
										4

arch/arm64/include/asm/vdso/vsyscall.h

										View File
									
				@ -2,7 +2,7 @@

				#ifndef __ASM_VDSO_VSYSCALL_H

				#define __ASM_VDSO_VSYSCALL_H

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <vdso/datapage.h>

				@ -22,6 +22,6 @@ void __arch_update_vdso_clock(struct vdso_clock *vc)

				/* The asm-generic header needs to be included after the definitions above */

				#include <asm-generic/vdso/vsyscall.h>

				#endif /* !__ASSEMBLY__ */

				#endif /* !__ASSEMBLER__ */

				#endif /* __ASM_VDSO_VSYSCALL_H */

									
										4

arch/arm64/include/asm/virt.h

										View File
									
				@ -56,7 +56,7 @@

				 */

				#define BOOT_CPU_FLAG_E2H	BIT_ULL(32)

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <asm/ptrace.h>

				#include <asm/sections.h>

				@ -161,6 +161,6 @@ static inline bool is_hyp_nvhe(void)

					return is_hyp_mode_available() && !is_kernel_in_hyp_mode();

				}

				#endif /* __ASSEMBLY__ */

				#endif /* __ASSEMBLER__ */

				#endif /* ! __ASM__VIRT_H */

									
										4

arch/arm64/include/asm/vmap_stack.h

										View File
									
				@ -3,9 +3,7 @@

				#ifndef __ASM_VMAP_STACK_H

				#define __ASM_VMAP_STACK_H

				#include <linux/bug.h>

				#include <linux/gfp.h>

				#include <linux/kconfig.h>

				#include <linux/vmalloc.h>

				#include <linux/pgtable.h>

				#include <asm/memory.h>

				@ -19,8 +17,6 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node)

				{

					void *p;

					BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));

					p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,

							__builtin_return_address(0));

					return kasan_reset_tag(p);

									
										2

arch/arm64/include/uapi/asm/kvm.h

										View File
									
				@ -31,7 +31,7 @@

				#define KVM_SPSR_FIQ	4

				#define KVM_NR_SPSR	5

				#ifndef __ASSEMBLY__

				#ifndef __ASSEMBLER__

				#include <linux/psci.h>

				#include <linux/types.h>

				#include <asm/ptrace.h>

Compare commits

660 Commits 4a26e7032d ... 619f4edc8d

16 Documentation/ABI/testing/sysfs-power Unescape Escape View File

142 Documentation/admin-guide/RAS/main.rst Unescape Escape View File

16 Documentation/admin-guide/kernel-parameters.txt Unescape Escape View File

9 Documentation/admin-guide/pm/cpuidle.rst Unescape Escape View File

133 Documentation/admin-guide/pm/intel_pstate.rst Unescape Escape View File

1 Documentation/admin-guide/thermal/index.rst Unescape Escape View File

91 Documentation/admin-guide/thermal/intel_thermal_throttle.rst Normal file Unescape Escape View File

8 Documentation/arch/arm64/booting.rst Unescape Escape View File

5 Documentation/arch/arm64/sve.rst Unescape Escape View File

5 Documentation/arch/s390/s390dbf.rst Unescape Escape View File

3 Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml Unescape Escape View File

13 Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml Unescape Escape View File

4 Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml Unescape Escape View File

17 Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml Unescape Escape View File

4 Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml Unescape Escape View File

29 Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml Unescape Escape View File

87 Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml Normal file Unescape Escape View File

9 Documentation/devicetree/bindings/thermal/qcom-tsens.yaml Unescape Escape View File

6 Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml Unescape Escape View File

47 Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml Normal file Unescape Escape View File

2 Documentation/devicetree/bindings/vendor-prefixes.yaml Unescape Escape View File

23 Documentation/driver-api/thermal/intel_dptf.rst Unescape Escape View File

134 Documentation/filesystems/resctrl.rst Unescape Escape View File

113 Documentation/netlink/specs/em.yaml Normal file Unescape Escape View File

1 Documentation/power/index.rst Unescape Escape View File

9 Documentation/power/pm_qos_interface.rst Unescape Escape View File

10 Documentation/power/runtime_pm.rst Unescape Escape View File

53 Documentation/power/shutdown-debugging.rst Normal file Unescape Escape View File

18 MAINTAINERS Unescape Escape View File

26 arch/arm/include/asm/uaccess.h Unescape Escape View File

26 arch/arm64/Kconfig Unescape Escape View File

8 arch/arm64/include/asm/alternative-macros.h Unescape Escape View File

4 arch/arm64/include/asm/alternative.h Unescape Escape View File

4 arch/arm64/include/asm/arch_gicv3.h Unescape Escape View File

6 arch/arm64/include/asm/asm-extable.h Unescape Escape View File

12 arch/arm64/include/asm/assembler.h Unescape Escape View File

20 arch/arm64/include/asm/atomic_lse.h Unescape Escape View File

4 arch/arm64/include/asm/barrier.h Unescape Escape View File

4 arch/arm64/include/asm/cache.h Unescape Escape View File

4 arch/arm64/include/asm/cpucaps.h Unescape Escape View File

8 arch/arm64/include/asm/cpufeature.h Unescape Escape View File

6 arch/arm64/include/asm/cputype.h Unescape Escape View File

4 arch/arm64/include/asm/current.h Unescape Escape View File

4 arch/arm64/include/asm/debug-monitors.h Unescape Escape View File

13 arch/arm64/include/asm/efi.h Unescape Escape View File

4 arch/arm64/include/asm/el2_setup.h Unescape Escape View File

4 arch/arm64/include/asm/elf.h Unescape Escape View File

4 arch/arm64/include/asm/esr.h Unescape Escape View File

4 arch/arm64/include/asm/fixmap.h Unescape Escape View File

2 arch/arm64/include/asm/fpsimd.h Unescape Escape View File

6 arch/arm64/include/asm/ftrace.h Unescape Escape View File

6 arch/arm64/include/asm/gpr-num.h Unescape Escape View File

2 arch/arm64/include/asm/hwcap.h Unescape Escape View File

4 arch/arm64/include/asm/image.h Unescape Escape View File

4 arch/arm64/include/asm/insn.h Unescape Escape View File

4 arch/arm64/include/asm/jump_label.h Unescape Escape View File

2 arch/arm64/include/asm/kasan.h Unescape Escape View File

4 arch/arm64/include/asm/kexec.h Unescape Escape View File

4 arch/arm64/include/asm/kgdb.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_asm.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_mmu.h Unescape Escape View File

4 arch/arm64/include/asm/kvm_mte.h Unescape Escape View File

6 arch/arm64/include/asm/kvm_ptrauth.h Unescape Escape View File

2 arch/arm64/include/asm/linkage.h Unescape Escape View File

5 arch/arm64/include/asm/memory.h Unescape Escape View File

4 arch/arm64/include/asm/mmu.h Unescape Escape View File

20 arch/arm64/include/asm/mmu_context.h Unescape Escape View File

4 arch/arm64/include/asm/mte-kasan.h Unescape Escape View File

4 arch/arm64/include/asm/mte.h Unescape Escape View File

4 arch/arm64/include/asm/page.h Unescape Escape View File

125 arch/arm64/include/asm/pgtable-hwdef.h Unescape Escape View File

6 arch/arm64/include/asm/pgtable-prot.h Unescape Escape View File

22 arch/arm64/include/asm/pgtable.h Unescape Escape View File

4 arch/arm64/include/asm/proc-fns.h Unescape Escape View File

4 arch/arm64/include/asm/processor.h Unescape Escape View File

4 arch/arm64/include/asm/ptrace.h Unescape Escape View File

4 arch/arm64/include/asm/rsi_smc.h Unescape Escape View File

4 arch/arm64/include/asm/rwonce.h Unescape Escape View File

660 Commits

4a26e7032d ... 619f4edc8d

16

Documentation/ABI/testing/sysfs-power

View File

142

Documentation/admin-guide/RAS/main.rst

View File

16

Documentation/admin-guide/kernel-parameters.txt

View File

9

Documentation/admin-guide/pm/cpuidle.rst

View File

133

Documentation/admin-guide/pm/intel_pstate.rst

View File

1

Documentation/admin-guide/thermal/index.rst

View File

91

Documentation/admin-guide/thermal/intel_thermal_throttle.rst Normal file

View File

8

Documentation/arch/arm64/booting.rst

View File

5

Documentation/arch/arm64/sve.rst

View File

5

Documentation/arch/s390/s390dbf.rst

View File

3

Documentation/devicetree/bindings/interrupt-controller/amlogic,meson-gpio-intc.yaml

View File

13

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2700-intc.yaml

View File

4

Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.yaml

View File

17

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-mswi.yaml

View File

4

Documentation/devicetree/bindings/interrupt-controller/thead,c900-aclint-sswi.yaml

View File

29

Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml

View File

87

Documentation/devicetree/bindings/thermal/fsl,imx91-tmu.yaml Normal file

View File

9

Documentation/devicetree/bindings/thermal/qcom-tsens.yaml

View File

6

Documentation/devicetree/bindings/thermal/renesas,r9a09g047-tsu.yaml

View File

47

Documentation/devicetree/bindings/timer/realtek,rtd1625-systimer.yaml Normal file

View File

2

Documentation/devicetree/bindings/vendor-prefixes.yaml

View File

23

Documentation/driver-api/thermal/intel_dptf.rst

View File

134

Documentation/filesystems/resctrl.rst

View File

113

Documentation/netlink/specs/em.yaml Normal file

View File

1

Documentation/power/index.rst

View File

9

Documentation/power/pm_qos_interface.rst

View File

10

Documentation/power/runtime_pm.rst

View File

53

Documentation/power/shutdown-debugging.rst Normal file

View File

18

MAINTAINERS

View File

26

arch/arm/include/asm/uaccess.h

View File

26

arch/arm64/Kconfig

View File

8

arch/arm64/include/asm/alternative-macros.h

View File

4

arch/arm64/include/asm/alternative.h

View File

4

arch/arm64/include/asm/arch_gicv3.h

View File

6

arch/arm64/include/asm/asm-extable.h

View File

12

arch/arm64/include/asm/assembler.h

View File

20

arch/arm64/include/asm/atomic_lse.h

View File

4

arch/arm64/include/asm/barrier.h

View File

4

arch/arm64/include/asm/cache.h

View File

4

arch/arm64/include/asm/cpucaps.h

View File

8

arch/arm64/include/asm/cpufeature.h

View File

6

arch/arm64/include/asm/cputype.h

View File

4

arch/arm64/include/asm/current.h

View File

4

arch/arm64/include/asm/debug-monitors.h

View File

13

arch/arm64/include/asm/efi.h

View File

4

arch/arm64/include/asm/el2_setup.h

View File

4

arch/arm64/include/asm/elf.h

View File

4

arch/arm64/include/asm/esr.h

View File

4

arch/arm64/include/asm/fixmap.h

View File

2

arch/arm64/include/asm/fpsimd.h

View File

6

arch/arm64/include/asm/ftrace.h

View File

6

arch/arm64/include/asm/gpr-num.h

View File

2

arch/arm64/include/asm/hwcap.h

View File

4

arch/arm64/include/asm/image.h

View File

4

arch/arm64/include/asm/insn.h

View File

4

arch/arm64/include/asm/jump_label.h

View File

2

arch/arm64/include/asm/kasan.h

View File

4

arch/arm64/include/asm/kexec.h

View File

4

arch/arm64/include/asm/kgdb.h

View File

4

arch/arm64/include/asm/kvm_asm.h

View File

4

arch/arm64/include/asm/kvm_mmu.h

View File

4

arch/arm64/include/asm/kvm_mte.h

View File

6

arch/arm64/include/asm/kvm_ptrauth.h

View File

2

arch/arm64/include/asm/linkage.h

View File

5

arch/arm64/include/asm/memory.h

View File

4

arch/arm64/include/asm/mmu.h

View File

20

arch/arm64/include/asm/mmu_context.h

View File

4

arch/arm64/include/asm/mte-kasan.h

View File

4

arch/arm64/include/asm/mte.h

View File

4

arch/arm64/include/asm/page.h

View File

125

arch/arm64/include/asm/pgtable-hwdef.h

View File

6

arch/arm64/include/asm/pgtable-prot.h

View File

22

arch/arm64/include/asm/pgtable.h

View File

4

arch/arm64/include/asm/proc-fns.h

View File

4

arch/arm64/include/asm/processor.h

View File

4

arch/arm64/include/asm/ptrace.h

View File

4

arch/arm64/include/asm/rsi_smc.h

View File

4

arch/arm64/include/asm/rwonce.h

View File

4

arch/arm64/include/asm/scs.h

View File