linux

Commit Graph

Author	SHA1	Message	Date
Christian Loehle	de51589f9b	cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN epp_policy uses the same values as cpufreq_policy.policy and resets to CPUFREQ_POLICY_UNKNOWN during offlining. Be consistent about it and initialize to CPUFREQ_POLICY_UNKNOWN instead of 0, too. No functional change intended. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20241211122605.3048503-3-christian.loehle@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-12-18 15:49:25 +01:00
Rafael J. Wysocki	20e20f83dd	cpufreq: intel_pstate: Drop Arrow Lake from "scaling factor" list Since HYBRID_SCALING_FACTOR_MTL is not going to be suitable for Arrow Lake in general, drop it from the "known hybrid scaling factors" list of platforms, so the scaling factor for it will be determined with the help of information provided by the platform firmware via CPPC. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2307515.iZASKD2KPV@rjwysocki.net	2024-12-10 20:08:36 +01:00
Rafael J. Wysocki	9b18d536b1	cpufreq: intel_pstate: Use CPPC to get scaling factors The perf-to-frequency scaling factors are used by intel_pstate on hybrid platforms to cast performance levels to frequency on different types of CPUs which is needed because the generic cpufreq sysfs interface works in the frequency domain. For some hybrid platforms already in the field, the scaling factors are known, but for others (including some upcoming ones) they most likely will be different and the only way to get them that scales is to use information provided by the platform firmware. In this particular case, the requisite information can be obtained via CPPC. If the P-core hybrid scaling factor for the given processor model is not known, use CPPC to compute hybrid scaling factors for all CPUs. Since the current default hybrid scaling factor is only suitable for a few early hybrid platforms, add intel_hybrid_scaling_factor[] entries for them and initialize the scaling factor to zero ("unknown") by default. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8476313.T7Z3S40VBb@rjwysocki.net	2024-12-10 20:08:36 +01:00
Rafael J. Wysocki	a8aaea4f6e	Merge back cpufreq material for 6.13	2024-11-14 12:23:32 +01:00
Srinivas Pandruvada	00e2c199cb	cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids Update EPP default for balance_performance to 32. This will give better performance out of the box using Intel P-State powersave governor while still offering power savings compared to performance governor. This is in line with what has already been done for Emerald Rapids and Sapphire Rapids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20241112235946.368082-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-11-13 12:59:02 +01:00
Rafael J. Wysocki	1a1030d10a	cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling() Notice that hybrid_init_cpu_capacity_scaling() only needs to hold hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling() calls, so introduce a "locked" wrapper around the latter and call it from the former. This allows to drop a local variable and a label that are not needed any more. Also, rename __hybrid_init_cpu_capacity_scaling() to __hybrid_refresh_cpu_capacity_scaling() for consistency. Interestingly enough, this fixes a locking issue introduced by commit `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") that put an arch_enable_hybrid_capacity_scale() call under hybrid_capacity_lock, which was a mistake because the latter is acquired in CPU hotplug paths and so it cannot be held around cpus_read_lock() calls. Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/ Fixes: `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net [ rjw: Changelog update ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-11-11 15:18:41 +01:00
Rafael J. Wysocki	92447aa5f6	cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially Commit `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") overlooked a corner case in which some CPUs may be offline to start with and brought back online later, after the intel_pstate driver has been registered, so their asymmetric capacity will not be set. Address this by calling hybrid_update_capacity() in the CPU initialization path that is executed instead of the online path for those CPUs. Note that this asymmetric capacity update will be skipped during driver initialization and mode switches because hybrid_max_perf_cpu is NULL in those cases. Fixes: `929ebc93cc` ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1913414.tdWV9SEqCh@rjwysocki.net	2024-11-04 22:51:10 +01:00
Rafael J. Wysocki	a97e293e07	cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration Modify intel_pstate_register_driver() to clear hybrid_max_perf_cpu before calling cpufreq_register_driver(), so that asymmetric CPU capacity scaling is not updated until hybrid_init_cpu_capacity_scaling() runs down the road. This is done in preparation for a subsequent change adding asymmetric CPU capacity computation to the CPU init path to handle CPUs that are initially offline. The information on whether or not hybrid_max_perf_cpu was NULL before it has been cleared is passed to hybrid_init_cpu_capacity_scaling(), so full initialization of CPU capacity scaling can be skipped if it has been carried out already. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4616631.LvFx2qVVIh@rjwysocki.net	2024-11-04 22:51:10 +01:00
Uwe Kleine-König	8b4865cd90	cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock notify_hwp_interrupt() is called via sysvec_thermal() -> smp_thermal_vector() -> intel_thermal_interrupt() in hard irq context. For this reason it must not use a simple spin_lock that sleeps with PREEMPT_RT enabled. So convert it to a raw spinlock. Reported-by: xiao sheng wen <atzlinux@sina.com> Link: https://bugs.debian.org/1076483 Signed-off-by: Uwe Kleine-König <ukleinek@debian.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: xiao sheng wen <atzlinux@sina.com> Link: https://patch.msgid.link/20240919081121.10784-2-ukleinek@debian.org Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-10-01 20:43:39 +02:00
Rafael J. Wysocki	929ebc93cc	cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems Make intel_pstate use the HWP_HIGHEST_PERF values from MSR_HWP_CAPABILITIES to set asymmetric CPU capacity information via the previously introduced arch_set_cpu_capacity() on hybrid systems without SMT. Setting asymmetric CPU capacity is generally necessary to allow the scheduler to compute task sizes in a consistent way across all CPUs in a system where they differ by capacity. That, in turn, should help to improve scheduling decisions. It is also necessary for the schedutil cpufreq governor to operate as expected on hybrid systems where tasks migrate between CPUs of different capacities. The underlying observation is that intel_pstate already uses MSR_HWP_CAPABILITIES to get CPU performance information which is exposed by it via sysfs and CPU performance scaling is based on it. Thus using this information for setting asymmetric CPU capacity is consistent with what the driver has been doing already. Moreover, HWP_HIGHEST_PERF reflects the maximum capacity of a given CPU including both the instructions-per-cycle (IPC) factor and the maximum turbo frequency and the units in which that value is expressed are the same for all CPUs in the system, so the maximum capacity ratio between two CPUs can be obtained by computing the ratio of their HWP_HIGHEST_PERF values. Of course, in principle that capacity ratio need not be directly applicable at lower frequencies, so using it for providing the asymmetric CPU capacity information to the scheduler is a rough approximation, but it is as good as it gets. Also, measurements indicate that this approximation is not too bad in practice. If the given system is hybrid and non-SMT, the new code disables ITMT support in the scheduler (because it may get in the way of asymmetric CPU capacity code in the scheduler that automatically gets enabled by setting asymmetric CPU capacity) after initializing all online CPUs and finds the one with the maximum HWP_HIGHEST_PERF value. Next, it computes the capacity number for each (online) CPU by dividing the product of its HWP_HIGHEST_PERF and SCHED_CAPACITY_SCALE by the maximum HWP_HIGHEST_PERF. When a CPU goes offline, its capacity is reset to SCHED_CAPACITY_SCALE and if it is the one with the maximum HWP_HIGHEST_PERF value, the capacity numbers for all of the other online CPUs are recomputed. This also takes care of a cleanup during driver operation mode changes. Analogously, when a new CPU goes online, its capacity number is updated and if its HWP_HIGHEST_PERF value is greater than the current maximum one, the capacity numbers for all of the other online CPUs are recomputed. The case when the driver is notified of a CPU capacity change, either through the HWP interrupt or through an ACPI notification, is handled similarly to the CPU online case above, except that if the target CPU is the current highest-capacity one and its capacity is reduced, the capacity numbers for all of the other online CPUs need to be recomputed either. If the driver's "no_trubo" sysfs attribute is updated, all of the CPU capacity information is computed from scratch to reflect the new turbo status. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance Link: https://patch.msgid.link/1979653.PYKUYFuaPT@rjwysocki.net [ rjw: Fixed a typo in the changelog ] [ rjw: Renamed 3 new functions and added a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-09-04 13:45:35 +02:00
Srinivas Pandruvada	3ca2a3d1e7	cpufreq: intel_pstate: Support Granite Rapids and Sierra Forest OOB mode Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled. The OOB identifying bits are same as for the prior generation CPUs like Emerald Rapids servers. Add Granite Rapids and Sierra Forest CPU models to intel_pstate_cpu_oob_ids[]. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240802184839.1909091-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-08-19 15:39:22 +02:00
Pedro Henrique Kopper	64a66f4a3c	cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids On Intel Emerald Rapids machines, we ship the Energy Performance Preference (EPP) default for balance_performance as 128. However, during an internal investigation together with Intel, we have determined that 32 is a more suitable value. This leads to significant improvements in both performance and energy: POV-Ray: 32% faster \| 12% less energy OpenSSL: 12% faster \| energy within 1% Build Linux Kernel: 29% faster \| 18% less energy Therefore, we should move the default EPP for balance_performance to 32. This is in line with what has already been done for Sapphire Rapids. Signed-off-by: Pedro Henrique Kopper <pedro.kopper@canonical.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/Zqu6zjVMoiXwROBI@capivara Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-08-02 14:40:13 +02:00
Rafael J. Wysocki	7ad9eab9d4	ARM cpufreq updates for 6.11 - cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen). - Make exit() callback return void (Lizhe and Viresh Kumar). - Minor cleanups and fixes in several drivers (Bryan Brattlof, Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado, Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu). -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmaM3agACgkQ0rkcPK6B Ehw2QA//W+HaHbEf3zOFvwDgG23h3ampEzIoZ1LTznU7rsK7as1XgJ12pHk3uZyy L9OppUeN0zH9LaIgOCG5C5oVnRujl30LK3jo/vyBkGROdpng6w4Wci/2XIqPEZFJ sMC3om+VgbXGu1UaxSTX/fBjuWeuoLY6rrGHjkDcAh52bgEWuRTzgOIrcRTRpcvb G8Gy1YU/t2j/UocYkiR3s5JAFyujmiWcoD4fO4wt+JaYRnDmfQXSrE9X0dpjN+Vp wxftLn3RgbuIXGmrDnnwUiDa/e6YSTLKgkrdzshSyOeHUzW7SoMfkMqb26bnFsLY m2FKnTtT2uQIPdFwrPPseXhUvjklyOAeIZH6tO/QGoteXU3SVWB1kBQNcVbztWF5 hHGL/qERACIt3xU/WQ0h1nvTMf46+1vc944uArh6F6t/XvmcoXv05YDRymyZBWLx mNRqG89gDex/TB+R15GBbXibK2UEGB26Bu84m7nFgbo5B0oM+OPebm49133gfz3V b8XaxzQMMFgdV3CpqRxQTNSnPWiwspttBZE7hYULONDxj8Ys/yfY7Gq8khjQxEBO xxQ4QRtlwkLSilyNb19i5LM9F+HpmkxdjO6su3SgZW5QVUUKsNA/aY0CbrXuIRiS dBGwBz8/EZ/7+/bK+TIU5tdR8UCSrVifF/bVGaQnWRWvB/2gPhw= =qMmS -----END PGP SIGNATURE----- Merge tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge ARM cpufreq updates for 6.11 from Viresh Kumar: "- cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen). - Make exit() callback return void (Lizhe and Viresh Kumar). - Minor cleanups and fixes in several drivers (Bryan Brattlof, Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado, Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu)." * tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (21 commits) cpufreq: sti: fix build warning cpufreq: mediatek: Use dev_err_probe in every error path in probe cpufreq: Add Loongson-3 CPUFreq driver support cpufreq: Make cpufreq_driver->exit() return void cpufreq: pcc: Remove empty exit() callback cpufreq: loongson2: Remove empty exit() callback cpufreq: nforce2: Remove empty exit() callback cpufreq: sti: add missing MODULE_DEVICE_TABLE entry for stih418 cpufreq: ti: update OPP table for AM62Px SoCs cpufreq: ti: update OPP table for AM62Ax SoCs cpufreq: sun50i: add Allwinner H700 speed bin cpufreq/cppc: Don't compare desired_perf in target() OPP: ti: Fix ti_opp_supply_probe wrong return values cpufreq: ti-cpufreq: Handle deferred probe with dev_err_probe() cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro cpufreq: longhaul: Fix kernel-doc param for longhaul_setstate cpufreq: qcom-nvmem: eliminate uses of of_node_put() cpufreq: qcom-nvmem: fix memory leaks in probe error paths cpufreq: scmi: Avoid overflow of target_freq in fast switch cpufreq: sun50i: replace of_node_put() with automatic cleanup handler ...	2024-07-09 17:58:20 +02:00
Lizhe	b4b1ddc9df	cpufreq: Make cpufreq_driver->exit() return void The cpufreq core doesn't check the return type of the exit() callback and there is not much the core can do on failures at that point. Just drop the returned value and make it return void. Signed-off-by: Lizhe <sensor1010@163.com> [ Viresh: Reworked the patches to fix all missing changes together. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Kevin Hilman <khilman@baylibre.com> # omap	2024-07-09 08:45:30 +05:30
Srinivas Pandruvada	d845cd901b	cpufreq: intel_pstate: Support highest performance change interrupt On some systems, the HWP (Hardware P-states) highest performance level can change from the value set at boot-up. This behavior can lead to two issues: - The 'cpuinfo_max_freq' within the 'cpufreq' sysfs will not reflect the CPU's highest achievable performance. - Even if the CPU's highest performance level is increased after booting, the CPU may not reach the full expected performance. The availability of this feature is indicated by the CPUID instruction: if CPUID[6].EAX[15] is set to 1, the feature is supported. When supported, setting bit 2 of the MSR_HWP_INTERRUPT register enables notifications of the highest performance level changes. Therefore, as part of enabling the HWP interrupt, bit 2 of the MSR_HWP_INTERRUPT should also be set when this feature is supported. Upon a change in the highest performance level, a new HWP interrupt is generated, with bit 3 of the MSR_HWP_STATUS register set, and the MSR_HWP_CAPABILITIES register is updated with the new highest performance limit. The processing of the interrupt is the same as the guaranteed performance change. Notify change to cpufreq core and update MSR_HWP_REQUEST with new performance limits. The current driver implementation already takes care of the highest performance change as part of: commit `dfeeedc1bf` ("cpufreq: intel_pstate: Update cpuinfo.max_freq on HWP_CAP changes") For example: Before highest performance change interrupt: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq `3700000` cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq `3700000` After highest performance changes interrupt: cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 3900000 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3900000 Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240624161109.1427640-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-28 20:54:13 +02:00
Rafael J. Wysocki	b11ec63abe	Merge back cpufreq material for v6.11.	2024-06-27 21:20:10 +02:00
Srinivas Pandruvada	acfc429e42	cpufreq: intel_pstate: Replace boot_cpu_has() Replace boot_cpu_has() with cpu_feature_enabled(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240624162714.1431182-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-25 21:00:15 +02:00
Rafael J. Wysocki	a1ff59784b	cpufreq: intel_pstate: Use HWP to initialize ITMT if CPPC is missing It is reported that single-thread performance on some hybrid systems dropped significantly after commit `7feec7430e` ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") which prevented _CPC from being used if the support for it had not been confirmed by the platform firmware. The problem is that if the platform firmware does not confirm CPPC v2 support, cppc_get_perf_caps() returns an error which prevents the intel_pstate driver from enabling ITMT. Consequently, the scheduler does not get any hints on CPU performance differences, so in a hybrid system some tasks may run on CPUs with lower capacity even though they should be running on high-capacity CPUs. To address this, modify intel_pstate to use the information from MSR_HWP_CAPABILITIES to enable ITMT if CPPC is not available (which is done already if the highest performance number coming from CPPC is not realistic). Fixes: `7feec7430e` ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked") Closes: https://lore.kernel.org/linux-acpi/d01b0a1f-bd33-47fe-ab41-43843d8a374f@kfocus.org Link: https://lore.kernel.org/linux-acpi/ZnD22b3Br1ng7alf@kf-XE Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Cc: 5.19+ <stable@vger.kernel.org> # 5.19+ Link: https://patch.msgid.link/12460110.O9o76ZdvQC@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>	2024-06-24 20:55:11 +02:00
Srinivas Pandruvada	ede951c27d	cpufreq: intel_pstate: Update Lunar Lake hybrid scaling factor Change hybrid scaling factor for Lunar Lake. Scaling factor is 1.15 for P-cores compared to E-cores. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-19 21:12:58 +02:00
Srinivas Pandruvada	6c30b137c0	cpufreq: intel_pstate: Update Arrow Lake hybrid scaling factor Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the same scaling factor. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240618055221.446108-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-19 21:12:58 +02:00
Rafael J. Wysocki	db404525c1	Merge back new cpufreq material for v6.11.	2024-06-14 15:32:31 +02:00
Rafael J. Wysocki	350cbb5d2f	cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo() After recent changes in intel_pstate, global.turbo_disabled is only set at the initialization time and never changed. However, it turns out that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE, the initial state of which is reflected by global.turbo_disabled, can be flipped later and there should be a way to take that into account (other than checking that MSR every time the driver runs which is costly and useless overhead on the vast majority of systems). For this purpose, notice that before the changes in question, store_no_turbo() contained a turbo_is_disabled() check that was used for updating global.turbo_disabled if the "turbo disabled" bit in MSR_IA32_MISC_ENABLE had been flipped and that functionality can be restored. Then, users will be able to reset global.turbo_disabled by writing 0 to no_turbo which used to work before on systems with flipping "turbo disabled" bit. This guarantees the driver state to remain in sync, but READ_ONCE() annotations need to be added in two places where global.turbo_disabled is accessed locklessly, so modify the driver to make that happen. Fixes: `0940f1a801` ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization") Closes: https://lore.kernel.org/linux-pm/bf3ebf1571a4788e97daf861eb493c12d42639a3.camel@xry111.site Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reported-by: Xi Ruoyao <xry111@xry111.site> Tested-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-12 14:11:50 +02:00
Srinivas Pandruvada	7e1c3f584e	cpufreq: intel_pstate: Support Emerald Rapids OOB mode Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is enabled in Emerald Rapids. The OOB identifying bits are same as for the prior generation CPUs like Sapphire Rapids servers, so also add Emerald Rapids to the intel_pstate_cpu_oob_ids[] list. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 21:19:11 +02:00
Srinivas Pandruvada	e2ae7893b7	cpufreq: intel_pstate: Use Meteor Lake EPPs for Arrow Lake Use the same default EPPs as Meteor Lake generation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 21:12:06 +02:00
Srinivas Pandruvada	8bdab3c8f2	cpufreq: intel_pstate: Update Meteor Lake EPPs Update the default balance_performance EPP to 64. This gives better performance and also perf/watt compared to current value of 115. For example: Speedometer 2.1 score: +19% Perf/watt: +5.25% Webxprt 4 score score: +12% Perf/watt: +6.12% 3DMark Wildlife extreme unlimited score score: +3.2% Perf/watt: +11.5% Geekbench6 MT score: +2.14% Perf/watt: +0.32% Also update balance_power EPP default to 179. With this change: Video Playback power is reduced by 52% Team video conference power is reduced by 35% With Power profile daemon now sets balance_power EPP on DC instead of balance_performance, updating balance_power EPP will help to extend battery life. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 21:05:09 +02:00
Tony Luck	ca8752384c	cpufreq: intel_pstate: Switch to new Intel CPU model defines New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-07 20:33:46 +02:00
Srinivas Pandruvada	1e24c31351	cpufreq: intel_pstate: Fix unchecked HWP MSR access Fix unchecked MSR access error for processors with no HWP support. On such processors, maximum frequency can be changed by the system firmware using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results in accessing HWP MSR 0x771. Call Trace: <TASK> generic_exec_single+0x58/0x120 smp_call_function_single+0xbf/0x110 rdmsrl_on_cpu+0x46/0x60 intel_pstate_get_hwp_cap+0x1b/0x70 intel_pstate_update_limits+0x2a/0x60 acpi_processor_notify+0xb7/0x140 acpi_ev_notify_dispatch+0x3b/0x60 HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled. Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is true. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/ Fixes: `e8217b4bec` ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently") Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-06-03 18:00:23 +02:00
Jeff Johnson	0a206fe35d	cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc make C=1 currently gives the following warning: drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata' Add the missing ":" to fix the trivial kernel-doc syntax error. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-05-07 15:02:43 +02:00
Arnd Bergmann	8c556541a5	cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[] The reference to this variable is hidden in an #ifdef: drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=] Use the same check around the definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-03 16:41:59 +02:00
Rafael J. Wysocki	e8217b4bec	cpufreq: intel_pstate: Update the maximum CPU frequency consistently There are 3 places at which the maximum CPU frequency may change, store_no_turbo(), intel_pstate_update_limits() (when called by the cpufreq core) and intel_pstate_notify_work() (when handling a HWP change notification). Currently, cpuinfo.max_freq is only updated by store_no_turbo() and intel_pstate_notify_work(), although it principle it may be necessary to update it in intel_pstate_update_limits() either. Make all of them mutually consistent. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:22 +02:00
Rafael J. Wysocki	f32587dcbe	cpufreq: intel_pstate: Replace three global.turbo_disabled checks Replace the global.turbo_disabled in __intel_pstate_update_max_freq() with a global.no_turbo one to make store_no_turbo() actually update the maximum CPU frequency on the trubo preference changes, which needs to be consistent with arch_set_max_freq_ratio() called from there. For more consistency, replace the global.turbo_disabled checks in __intel_pstate_cpu_init() and intel_cpufreq_adjust_perf() with global.no_turbo checks either. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:17 +02:00
Rafael J. Wysocki	9558fae8ce	cpufreq: intel_pstate: Read global.no_turbo under READ_ONCE() Because global.no_turbo is generally not read under intel_pstate_driver_lock make store_no_turbo() use WRITE_ONCE() for updating it (this is the only place at which it is updated except for the initialization) and make the majority of places reading it use READ_ONCE(). Also remove redundant global.turbo_disabled checks from places that depend on the 'true' value of global.no_turbo because it can only be 'true' if global.turbo_disabled is also 'true'. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:10 +02:00
Rafael J. Wysocki	c626a43845	cpufreq: intel_pstate: Rearrange show_no_turbo() and store_no_turbo() Now that global.turbo_disabled can only change at the cpufreq driver registration time, initialize global.no_turbo at that time too so they are in sync to start with (if the former is set, the latter cannot be updated later anyway). That allows show_no_turbo() to be simlified because it does not need to check global.turbo_disabled and store_no_turbo() can be rearranged to avoid doing anything if the new value of global.no_turbo is equal to the current one and only return an error on attempts to clear global.no_turbo when global.turbo_disabled. While at it, eliminate the redundant ret variable from store_no_turbo(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:57:04 +02:00
Rafael J. Wysocki	0940f1a801	cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization The global.turbo_disabled is updated quite often, especially in the passive mode in which case it is updated every time the scheduler calls into the driver. However, this is generally not necessary and it adds MSR read overhead to scheduler code paths (and that particular MSR is slow to read). For this reason, make the driver read MSR_IA32_MISC_ENABLE_TURBO_DISABLE just once at the cpufreq driver registration time and remove all of the in-flight updates of global.turbo_disabled. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:56:58 +02:00
Rafael J. Wysocki	032c5565eb	cpufreq: intel_pstate: Fold intel_pstate_max_within_limits() into caller Fold intel_pstate_max_within_limits() into its only caller. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2024-04-02 12:56:51 +02:00
Rafael J. Wysocki	e97a98238d	cpufreq: intel_pstate: Use __ro_after_init for three variables There are at least 3 variables in intel_pstate that do not get updated after they have been initialized, so annotate them with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:43 +02:00
Rafael J. Wysocki	0f2828e17b	cpufreq: intel_pstate: Get rid of unnecessary READ_ONCE() annotations Drop two redundant checks involving READ_ONCE() from notify_hwp_interrupt() and make it check hwp_active without READ_ONCE() which is not necessary, because that variable is only set once during the early initialization of the driver. In order to make that clear, annotate hwp_active with __ro_after_init. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:40 +02:00
Rafael J. Wysocki	432acb219a	cpufreq: intel_pstate: Wait for canceled delayed work to complete Make intel_pstate_disable_hwp_interrupt() wait for canceled delayed work to complete to avoid leftover work items running when it returns which may be during driver unregistration and may confuse things going forward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:36 +02:00
Rafael J. Wysocki	12ebba42d2	cpufreq: intel_pstate: Simplify spinlock locking Because intel_pstate_enable/disable_hwp_interrupt() are only called from thread context, they need not save the IRQ flags when using a spinlock as interrupts are guaranteed to be enabled when they run, so make them use spin_lock/unlock_irq(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:33 +02:00
Rafael J. Wysocki	f186b2dace	cpufreq: intel_pstate: Drop redundant locking from intel_pstate_driver_cleanup() Remove the spinlock locking from intel_pstate_driver_cleanup() as it is not necessary because no other code accessing all_cpu_data[] can run in parallel with that function. Had the locking been necessary, though, it would have been incorrect because the lock in question is acquired from a hardirq handler and it cannot be acquired from thread context without disabling interrupts. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-04-02 12:56:08 +02:00
Rafael J. Wysocki	e4d0d7f194	Merge back cpufreq material for 6.9-rc1.	2024-03-11 15:08:45 +01:00
Srinivas Pandruvada	1f4b7fdd71	cpufreq: intel_pstate: Update default EPPs for Meteor Lake Update default balanced_performance EPP to 115 and performance EPP to 16. Changing the balanced_performance EPP has better performance/watt compared to default powerup EPP value of 128. Changing the performance EPP to 0x10 shows reduced power for similar performance as EPP 0. On small form factor devices this is beneficial as lower power results in lower CPU and skin temperature. This results in reduced thermal throttling and higher performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:02:33 +01:00
Srinivas Pandruvada	240a8da623	cpufreq: intel_pstate: Allow model specific EPPs The current implementation allows model specific EPP override for balanced_performance. Add feature to allow model specific EPP for all predefined EPP strings. For example for some CPU models, even changing performance EPP has benefits Use a mask of EPPs as driver_data instead of just balanced_performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:02:33 +01:00
Doug Smythies	f0a0fc10ab	cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used. Fix it. Fixes: `a365ab6b9d` cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-24 15:01:59 +01:00
Jiri Slaby (SUSE)	4615ac9010	cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait Commit `09c448d3c6` ("cpufreq: intel_pstate: Use IOWAIT flag in Atom algorithm") removed the last user of cpudata::prev_cummulative_iowait. Remove the member too. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-02-13 13:47:35 +01:00
Rafael J. Wysocki	192cdb1c90	cpufreq: intel_pstate: Refine computation of P-state for given frequency On systems using HWP, if a given frequency is equal to the maximum turbo frequency or the maximum non-turbo frequency, the HWP performance level corresponding to it is already known and can be used directly without any computation. Accordingly, adjust the code to use the known HWP performance levels in the cases mentioned above. This also helps to avoid limiting CPU capacity artificially in some cases when the BIOS produces the HWP_CAP numbers using a different E-core-to-P-core performance scaling factor than expected by the kernel. Fixes: `f5c8cf2a49` ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores") Cc: 6.1+ <stable@vger.kernel.org> # 6.1+ Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-01-22 15:18:11 +01:00
Srinivas Pandruvada	bde4f5ff82	cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake On some Meteor Lake platforms, maximum one core turbo frequency is not observed. During hybrid performance to frequency conversion, the maximum frequency is 100 MHz less. This results in requesting maximum frequency 100 MHz less. For example when the max one core turbo is 4.9 GHz: MSR HWP_CAPABILITIES shows highest performance ratio for P-core is 0x3E. With the current scaling factor of 78741 (1.27x for converting frequency to performance) results in max frequency of 4.8 GHz. This results in capping the max scaling frequency as 4.8 GHz, which is 100 MHz less than the desired. Add capability to define per CPU model specific scaling factor and define scaling factor of 80000 (1.25x for converting frequency to performance for P-cores) for Meteor Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Debug message adjustment, subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-01-10 15:02:25 +01:00
Zhenguo Yao	e95013156a	cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode Users may disable HWP in firmware, in which case intel_pstate will give up unless the CPU model is explicitly supported. See also the following past commits: - commit `df51f287b5` ("cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode") - commit `d8de7a44e1` ("cpufreq: intel_pstate: Add Skylake servers support") - commit `706c532885` ("cpufreq: intel_pstate: Add Cometlake support in no-HWP mode") - commit `fbdc21e9b0` ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode") - commit `71bb5c82aa` ("cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode") Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-12-19 21:38:09 +01:00
Srinivas Pandruvada	2719675fa8	cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP The platform firmware can provide a balance performance EPP value by enabling HWP and programming the EPP to the desired value. However, currently this only takes effect for processors listed in intel_epp_balance_perf[], so in order to enable a new processor model to utilize this mechanism, that table needs to be updated. It arguably should not be necessary to modify the kernel to work properly with every new generation of processors, though, and distributions that don't always ship the most recent kernels should be able to run reasonably well on new hardware without code changes. For this reason, move the check to avoid updating the EPP when the balance performance EPP is unmodified from the power-up default of 0x80 after the check that allows the firmware-provided balance performance EPP value to be retrieved. This will cause the code to always look for the firmware- provided value before consulting intel_epp_balance_perf[] and the handling of new hardware will not depend on whether or not that thable has been updated yet. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-12-05 21:29:47 +01:00
Srinivas Pandruvada	37b6ddba96	cpufreq: intel_pstate: Revise global turbo disable check Setting global turbo flag based on CPU 0 P-state limits is problematic as it limits max P-state request on every CPU on the system just based on its P-state limits. There are two cases in which global.turbo_disabled flag is set: - When the MSR_IA32_MISC_ENABLE_TURBO_DISABLE bit is set to 1 in the MSR MSR_IA32_MISC_ENABLE. This bit can be only changed by the system BIOS before power up. - When the max non turbo P-state is same as max turbo P-state for CPU 0. The second check is not a valid to decide global turbo state based on the CPU 0. CPU 0 max turbo P-state can be same as max non turbo P-state, but for other CPUs this may not be true. There is no guarantee that max P-state limits are same for every CPU. This is possible that during fusing max P-state for a CPU is constrained. Also with the Intel Speed Select performance profile, CPU 0 may not be present in all profiles. In this case the max non turbo and turbo P-state can be set to the lowest possible P-state by the hardware when switched to such profile. Since max non turbo and turbo P-state is same, global.turbo_disabled flag will be set. Once global.turbo_disabled is set, any scaling max and min frequency update for any CPU will result in its max P-state constrained to the max non turbo P-state. Hence remove the check of max non turbo P-state equal to max turbo P-state of CPU 0 to set global turbo disabled flag. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2023-09-11 20:18:17 +02:00

1 2 3 4 5 ...

507 Commits