Commit Graph

507 Commits

Author SHA1 Message Date
Christian Loehle de51589f9b cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN
epp_policy uses the same values as cpufreq_policy.policy and resets
to CPUFREQ_POLICY_UNKNOWN during offlining. Be consistent about
it and initialize to CPUFREQ_POLICY_UNKNOWN instead of 0, too.

No functional change intended.

Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20241211122605.3048503-3-christian.loehle@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-18 15:49:25 +01:00
Rafael J. Wysocki 20e20f83dd cpufreq: intel_pstate: Drop Arrow Lake from "scaling factor" list
Since HYBRID_SCALING_FACTOR_MTL is not going to be suitable for Arrow
Lake in general, drop it from the "known hybrid scaling factors" list of
platforms, so the scaling factor for it will be determined with the
help of information provided by the platform firmware via CPPC.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2307515.iZASKD2KPV@rjwysocki.net
2024-12-10 20:08:36 +01:00
Rafael J. Wysocki 9b18d536b1 cpufreq: intel_pstate: Use CPPC to get scaling factors
The perf-to-frequency scaling factors are used by intel_pstate on hybrid
platforms to cast performance levels to frequency on different types of
CPUs which is needed because the generic cpufreq sysfs interface works
in the frequency domain.

For some hybrid platforms already in the field, the scaling factors are
known, but for others (including some upcoming ones) they most likely
will be different and the only way to get them that scales is to use
information provided by the platform firmware.  In this particular case,
the requisite information can be obtained via CPPC.

If the P-core hybrid scaling factor for the given processor model is not
known, use CPPC to compute hybrid scaling factors for all CPUs.

Since the current default hybrid scaling factor is only suitable for a
few early hybrid platforms, add intel_hybrid_scaling_factor[] entries
for them and initialize the scaling factor to zero ("unknown") by
default.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8476313.T7Z3S40VBb@rjwysocki.net
2024-12-10 20:08:36 +01:00
Rafael J. Wysocki a8aaea4f6e Merge back cpufreq material for 6.13 2024-11-14 12:23:32 +01:00
Srinivas Pandruvada 00e2c199cb cpufreq: intel_pstate: Update Balance-performance EPP for Granite Rapids
Update EPP default for balance_performance to 32.

This will give better performance out of the box using Intel P-State
powersave governor while still offering power savings compared to
performance governor.

This is in line with what has already been done for Emerald Rapids and
Sapphire Rapids.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20241112235946.368082-1-srinivas.pandruvada@linux.intel.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-13 12:59:02 +01:00
Rafael J. Wysocki 1a1030d10a cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
Notice that hybrid_init_cpu_capacity_scaling() only needs to hold
hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling()
calls, so introduce a "locked" wrapper around the latter and call
it from the former.  This allows to drop a local variable and a
label that are not needed any more.

Also, rename __hybrid_init_cpu_capacity_scaling() to
__hybrid_refresh_cpu_capacity_scaling() for consistency.

Interestingly enough, this fixes a locking issue introduced by commit
929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on
hybrid systems") that put an arch_enable_hybrid_capacity_scale() call
under hybrid_capacity_lock, which was a mistake because the latter is
acquired in CPU hotplug paths and so it cannot be held around
cpus_read_lock() calls.

Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/
Fixes: 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net
[ rjw: Changelog update ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-11 15:18:41 +01:00
Rafael J. Wysocki 92447aa5f6 cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially
Commit 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU
capacity on hybrid systems") overlooked a corner case in which some
CPUs may be offline to start with and brought back online later,
after the intel_pstate driver has been registered, so their asymmetric
capacity will not be set.

Address this by calling hybrid_update_capacity() in the CPU
initialization path that is executed instead of the online path
for those CPUs.

Note that this asymmetric capacity update will be skipped during
driver initialization and mode switches because hybrid_max_perf_cpu
is NULL in those cases.

Fixes: 929ebc93cc ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1913414.tdWV9SEqCh@rjwysocki.net
2024-11-04 22:51:10 +01:00
Rafael J. Wysocki a97e293e07 cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration
Modify intel_pstate_register_driver() to clear hybrid_max_perf_cpu
before calling cpufreq_register_driver(), so that asymmetric CPU
capacity scaling is not updated until hybrid_init_cpu_capacity_scaling()
runs down the road.  This is done in preparation for a subsequent
change adding asymmetric CPU capacity computation to the CPU init path
to handle CPUs that are initially offline.

The information on whether or not hybrid_max_perf_cpu was NULL before
it has been cleared is passed to hybrid_init_cpu_capacity_scaling(),
so full initialization of CPU capacity scaling can be skipped if it
has been carried out already.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4616631.LvFx2qVVIh@rjwysocki.net
2024-11-04 22:51:10 +01:00
Uwe Kleine-König 8b4865cd90 cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
notify_hwp_interrupt() is called via sysvec_thermal() ->
smp_thermal_vector() -> intel_thermal_interrupt() in hard irq context.
For this reason it must not use a simple spin_lock that sleeps with
PREEMPT_RT enabled. So convert it to a raw spinlock.

Reported-by: xiao sheng wen <atzlinux@sina.com>
Link: https://bugs.debian.org/1076483
Signed-off-by: Uwe Kleine-König <ukleinek@debian.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: xiao sheng wen <atzlinux@sina.com>
Link: https://patch.msgid.link/20240919081121.10784-2-ukleinek@debian.org
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01 20:43:39 +02:00
Rafael J. Wysocki 929ebc93cc cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems
Make intel_pstate use the HWP_HIGHEST_PERF values from
MSR_HWP_CAPABILITIES to set asymmetric CPU capacity information
via the previously introduced arch_set_cpu_capacity() on hybrid
systems without SMT.

Setting asymmetric CPU capacity is generally necessary to allow the
scheduler to compute task sizes in a consistent way across all CPUs
in a system where they differ by capacity.  That, in turn, should help
to improve scheduling decisions.  It is also necessary for the schedutil
cpufreq governor to operate as expected on hybrid systems where tasks
migrate between CPUs of different capacities.

The underlying observation is that intel_pstate already uses
MSR_HWP_CAPABILITIES to get CPU performance information which is
exposed by it via sysfs and CPU performance scaling is based on it.
Thus using this information for setting asymmetric CPU capacity is
consistent with what the driver has been doing already.  Moreover,
HWP_HIGHEST_PERF reflects the maximum capacity of a given CPU including
both the instructions-per-cycle (IPC) factor and the maximum turbo
frequency and the units in which that value is expressed are the same
for all CPUs in the system, so the maximum capacity ratio between two
CPUs can be obtained by computing the ratio of their HWP_HIGHEST_PERF
values.  Of course, in principle that capacity ratio need not be
directly applicable at lower frequencies, so using it for providing the
asymmetric CPU capacity information to the scheduler is a rough
approximation, but it is as good as it gets.  Also, measurements
indicate that this approximation is not too bad in practice.

If the given system is hybrid and non-SMT, the new code disables ITMT
support in the scheduler (because it may get in the way of asymmetric CPU
capacity code in the scheduler that automatically gets enabled by setting
asymmetric CPU capacity) after initializing all online CPUs and finds
the one with the maximum HWP_HIGHEST_PERF value.  Next, it computes the
capacity number for each (online) CPU by dividing the product of its
HWP_HIGHEST_PERF and SCHED_CAPACITY_SCALE by the maximum HWP_HIGHEST_PERF.

When a CPU goes offline, its capacity is reset to SCHED_CAPACITY_SCALE
and if it is the one with the maximum HWP_HIGHEST_PERF value, the
capacity numbers for all of the other online CPUs are recomputed.  This
also takes care of a cleanup during driver operation mode changes.

Analogously, when a new CPU goes online, its capacity number is updated
and if its HWP_HIGHEST_PERF value is greater than the current maximum
one, the capacity numbers for all of the other online CPUs are
recomputed.

The case when the driver is notified of a CPU capacity change, either
through the HWP interrupt or through an ACPI notification, is handled
similarly to the CPU online case above, except that if the target CPU
is the current highest-capacity one and its capacity is reduced, the
capacity numbers for all of the other online CPUs need to be recomputed
either.

If the driver's "no_trubo" sysfs attribute is updated, all of the CPU
capacity information is computed from scratch to reflect the new turbo
status.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> # scale invariance
Link: https://patch.msgid.link/1979653.PYKUYFuaPT@rjwysocki.net
[ rjw: Fixed a typo in the changelog ]
[ rjw: Renamed 3 new functions and added a comment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-04 13:45:35 +02:00
Srinivas Pandruvada 3ca2a3d1e7 cpufreq: intel_pstate: Support Granite Rapids and Sierra Forest OOB mode
Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is
enabled.

The OOB identifying bits are same as for the prior generation CPUs like
Emerald Rapids servers. Add Granite Rapids and Sierra Forest CPU models to
intel_pstate_cpu_oob_ids[].

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240802184839.1909091-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-19 15:39:22 +02:00
Pedro Henrique Kopper 64a66f4a3c cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids
On Intel Emerald Rapids machines, we ship the Energy Performance Preference
(EPP) default for balance_performance as 128. However, during an internal
investigation together with Intel, we have determined that 32 is a more
suitable value. This leads to significant improvements in both performance
and energy:

POV-Ray: 32% faster | 12% less energy
OpenSSL: 12% faster | energy within 1%
Build Linux Kernel: 29% faster | 18% less energy

Therefore, we should move the default EPP for balance_performance to 32.
This is in line with what has already been done for Sapphire Rapids.

Signed-off-by: Pedro Henrique Kopper <pedro.kopper@canonical.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/Zqu6zjVMoiXwROBI@capivara
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02 14:40:13 +02:00
Rafael J. Wysocki 7ad9eab9d4 ARM cpufreq updates for 6.11
- cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen).
 - Make exit() callback return void (Lizhe and Viresh Kumar).
 - Minor cleanups and fixes in several drivers (Bryan Brattlof,
   Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado,
   Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmaM3agACgkQ0rkcPK6B
 Ehw2QA//W+HaHbEf3zOFvwDgG23h3ampEzIoZ1LTznU7rsK7as1XgJ12pHk3uZyy
 L9OppUeN0zH9LaIgOCG5C5oVnRujl30LK3jo/vyBkGROdpng6w4Wci/2XIqPEZFJ
 sMC3om+VgbXGu1UaxSTX/fBjuWeuoLY6rrGHjkDcAh52bgEWuRTzgOIrcRTRpcvb
 G8Gy1YU/t2j/UocYkiR3s5JAFyujmiWcoD4fO4wt+JaYRnDmfQXSrE9X0dpjN+Vp
 wxftLn3RgbuIXGmrDnnwUiDa/e6YSTLKgkrdzshSyOeHUzW7SoMfkMqb26bnFsLY
 m2FKnTtT2uQIPdFwrPPseXhUvjklyOAeIZH6tO/QGoteXU3SVWB1kBQNcVbztWF5
 hHGL/qERACIt3xU/WQ0h1nvTMf46+1vc944uArh6F6t/XvmcoXv05YDRymyZBWLx
 mNRqG89gDex/TB+R15GBbXibK2UEGB26Bu84m7nFgbo5B0oM+OPebm49133gfz3V
 b8XaxzQMMFgdV3CpqRxQTNSnPWiwspttBZE7hYULONDxj8Ys/yfY7Gq8khjQxEBO
 xxQ4QRtlwkLSilyNb19i5LM9F+HpmkxdjO6su3SgZW5QVUUKsNA/aY0CbrXuIRiS
 dBGwBz8/EZ/7+/bK+TIU5tdR8UCSrVifF/bVGaQnWRWvB/2gPhw=
 =qMmS
 -----END PGP SIGNATURE-----

Merge tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Merge ARM cpufreq updates for 6.11 from Viresh Kumar:

"- cpufreq: Add Loongson-3 CPUFreq driver support (Huacai Chen).
 - Make exit() callback return void (Lizhe and Viresh Kumar).
 - Minor cleanups and fixes in several drivers (Bryan Brattlof,
   Javier Carrasco, Jagadeesh Kona, Jeff Johnson, Nícolas F. R. A. Prado,
   Primoz Fiser, Raphael Gallais-Pou, and Riwen Lu)."

* tag 'cpufreq-arm-updates-6.11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (21 commits)
  cpufreq: sti: fix build warning
  cpufreq: mediatek: Use dev_err_probe in every error path in probe
  cpufreq: Add Loongson-3 CPUFreq driver support
  cpufreq: Make cpufreq_driver->exit() return void
  cpufreq: pcc: Remove empty exit() callback
  cpufreq: loongson2: Remove empty exit() callback
  cpufreq: nforce2: Remove empty exit() callback
  cpufreq: sti: add missing MODULE_DEVICE_TABLE entry for stih418
  cpufreq: ti: update OPP table for AM62Px SoCs
  cpufreq: ti: update OPP table for AM62Ax SoCs
  cpufreq: sun50i: add Allwinner H700 speed bin
  cpufreq/cppc: Don't compare desired_perf in target()
  OPP: ti: Fix ti_opp_supply_probe wrong return values
  cpufreq: ti-cpufreq: Handle deferred probe with dev_err_probe()
  cpufreq: dt-platdev: add missing MODULE_DESCRIPTION() macro
  cpufreq: longhaul: Fix kernel-doc param for longhaul_setstate
  cpufreq: qcom-nvmem: eliminate uses of of_node_put()
  cpufreq: qcom-nvmem: fix memory leaks in probe error paths
  cpufreq: scmi: Avoid overflow of target_freq in fast switch
  cpufreq: sun50i: replace of_node_put() with automatic cleanup handler
  ...
2024-07-09 17:58:20 +02:00
Lizhe b4b1ddc9df cpufreq: Make cpufreq_driver->exit() return void
The cpufreq core doesn't check the return type of the exit() callback
and there is not much the core can do on failures at that point. Just
drop the returned value and make it return void.

Signed-off-by: Lizhe <sensor1010@163.com>
[ Viresh: Reworked the patches to fix all missing changes together. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek
Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress
Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com> # omap
2024-07-09 08:45:30 +05:30
Srinivas Pandruvada d845cd901b cpufreq: intel_pstate: Support highest performance change interrupt
On some systems, the HWP (Hardware P-states) highest performance level
can change from the value set at boot-up. This behavior can lead to two
issues:

- The 'cpuinfo_max_freq' within the 'cpufreq' sysfs will not reflect
the CPU's highest achievable performance.
- Even if the CPU's highest performance level is increased after booting,
the CPU may not reach the full expected performance.

The availability of this feature is indicated by the CPUID instruction:
if CPUID[6].EAX[15] is set to 1, the feature is supported. When supported,
setting bit 2 of the MSR_HWP_INTERRUPT register enables notifications of
the highest performance level changes. Therefore, as part of enabling the
HWP interrupt, bit 2 of the MSR_HWP_INTERRUPT should also be set when this
feature is supported.

Upon a change in the highest performance level, a new HWP interrupt is
generated, with bit 3 of the MSR_HWP_STATUS register set, and the
MSR_HWP_CAPABILITIES register is updated with the new highest performance
limit.

The processing of the interrupt is the same as the guaranteed performance
change. Notify change to cpufreq core and update MSR_HWP_REQUEST with new
performance limits.

The current driver implementation already takes care of the highest
performance change as part of:
commit dfeeedc1bf ("cpufreq: intel_pstate: Update cpuinfo.max_freq
on HWP_CAP changes")

For example:
Before highest performance change interrupt:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3700000
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
3700000

After highest performance changes interrupt:
cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
3900000
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
3900000

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240624161109.1427640-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-28 20:54:13 +02:00
Rafael J. Wysocki b11ec63abe Merge back cpufreq material for v6.11. 2024-06-27 21:20:10 +02:00
Srinivas Pandruvada acfc429e42 cpufreq: intel_pstate: Replace boot_cpu_has()
Replace boot_cpu_has() with cpu_feature_enabled().

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240624162714.1431182-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-25 21:00:15 +02:00
Rafael J. Wysocki a1ff59784b cpufreq: intel_pstate: Use HWP to initialize ITMT if CPPC is missing
It is reported that single-thread performance on some hybrid systems
dropped significantly after commit 7feec7430e ("ACPI: CPPC: Only probe
for _CPC if CPPC v2 is acked") which prevented _CPC from being used if
the support for it had not been confirmed by the platform firmware.

The problem is that if the platform firmware does not confirm CPPC v2
support, cppc_get_perf_caps() returns an error which prevents the
intel_pstate driver from enabling ITMT.  Consequently, the scheduler
does not get any hints on CPU performance differences, so in a hybrid
system some tasks may run on CPUs with lower capacity even though they
should be running on high-capacity CPUs.

To address this, modify intel_pstate to use the information from
MSR_HWP_CAPABILITIES to enable ITMT if CPPC is not available (which is
done already if the highest performance number coming from CPPC is not
realistic).

Fixes: 7feec7430e ("ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked")
Closes: https://lore.kernel.org/linux-acpi/d01b0a1f-bd33-47fe-ab41-43843d8a374f@kfocus.org
Link: https://lore.kernel.org/linux-acpi/ZnD22b3Br1ng7alf@kf-XE
Reported-by: Aaron Rainbolt <arainbolt@kfocus.org>
Tested-by: Aaron Rainbolt <arainbolt@kfocus.org>
Cc: 5.19+ <stable@vger.kernel.org> # 5.19+
Link: https://patch.msgid.link/12460110.O9o76ZdvQC@rjwysocki.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
2024-06-24 20:55:11 +02:00
Srinivas Pandruvada ede951c27d cpufreq: intel_pstate: Update Lunar Lake hybrid scaling factor
Change hybrid scaling factor for Lunar Lake.

Scaling factor is 1.15 for P-cores compared to E-cores.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240618055221.446108-2-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-19 21:12:58 +02:00
Srinivas Pandruvada 6c30b137c0 cpufreq: intel_pstate: Update Arrow Lake hybrid scaling factor
Arrow Lake uses the same scaling factor as Meteor Lake, so reuse the
same scaling factor.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/20240618055221.446108-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-19 21:12:58 +02:00
Rafael J. Wysocki db404525c1 Merge back new cpufreq material for v6.11. 2024-06-14 15:32:31 +02:00
Rafael J. Wysocki 350cbb5d2f cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()
After recent changes in intel_pstate, global.turbo_disabled is only set
at the initialization time and never changed.  However, it turns out
that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE,
the initial state of which is reflected by global.turbo_disabled, can be
flipped later and there should be a way to take that into account (other
than checking that MSR every time the driver runs which is costly and
useless overhead on the vast majority of systems).

For this purpose, notice that before the changes in question,
store_no_turbo() contained a turbo_is_disabled() check that was used
for updating global.turbo_disabled if the "turbo disabled" bit in
MSR_IA32_MISC_ENABLE had been flipped and that functionality can be
restored.  Then, users will be able to reset global.turbo_disabled
by writing 0 to no_turbo which used to work before on systems with
flipping "turbo disabled" bit.

This guarantees the driver state to remain in sync, but READ_ONCE()
annotations need to be added in two places where global.turbo_disabled
is accessed locklessly, so modify the driver to make that happen.

Fixes: 0940f1a801 ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization")
Closes: https://lore.kernel.org/linux-pm/bf3ebf1571a4788e97daf861eb493c12d42639a3.camel@xry111.site
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reported-by: Xi Ruoyao <xry111@xry111.site>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-12 14:11:50 +02:00
Srinivas Pandruvada 7e1c3f584e cpufreq: intel_pstate: Support Emerald Rapids OOB mode
Prevent intel_pstate from loading when OOB (Out Of Band) P-states mode is
enabled in Emerald Rapids.

The OOB identifying bits are same as for the prior generation CPUs
like Sapphire Rapids servers, so also add Emerald Rapids to the
intel_pstate_cpu_oob_ids[] list.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 21:19:11 +02:00
Srinivas Pandruvada e2ae7893b7 cpufreq: intel_pstate: Use Meteor Lake EPPs for Arrow Lake
Use the same default EPPs as Meteor Lake generation.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 21:12:06 +02:00
Srinivas Pandruvada 8bdab3c8f2 cpufreq: intel_pstate: Update Meteor Lake EPPs
Update the default balance_performance EPP to 64. This gives better
performance and also perf/watt compared to current value of 115.

For example:

Speedometer 2.1
	score: +19%
	Perf/watt: +5.25%

Webxprt 4 score
	score: +12%
	Perf/watt: +6.12%

3DMark Wildlife extreme unlimited score
	score: +3.2%
	Perf/watt: +11.5%

Geekbench6 MT
	score: +2.14%
	Perf/watt: +0.32%

Also update balance_power EPP default to 179. With this change:
	Video Playback power is reduced by 52%
	Team video conference power is reduced by 35%

With Power profile daemon now sets balance_power EPP on DC instead of
balance_performance, updating balance_power EPP will help to extend
battery life.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 21:05:09 +02:00
Tony Luck ca8752384c cpufreq: intel_pstate: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 20:33:46 +02:00
Srinivas Pandruvada 1e24c31351 cpufreq: intel_pstate: Fix unchecked HWP MSR access
Fix unchecked MSR access error for processors with no HWP support. On
such processors, maximum frequency can be changed by the system firmware
using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results
in accessing HWP MSR 0x771.

Call Trace:
	<TASK>
	generic_exec_single+0x58/0x120
	smp_call_function_single+0xbf/0x110
	rdmsrl_on_cpu+0x46/0x60
	intel_pstate_get_hwp_cap+0x1b/0x70
	intel_pstate_update_limits+0x2a/0x60
	acpi_processor_notify+0xb7/0x140
	acpi_ev_notify_dispatch+0x3b/0x60

HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled.
Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is
true.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/
Fixes: e8217b4bec ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently")
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-03 18:00:23 +02:00
Jeff Johnson 0a206fe35d cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc
make C=1 currently gives the following warning:

drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata'

Add the missing ":" to fix the trivial kernel-doc syntax error.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-07 15:02:43 +02:00
Arnd Bergmann 8c556541a5 cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]
The reference to this variable is hidden in an #ifdef:

drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=]

Use the same check around the definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-03 16:41:59 +02:00
Rafael J. Wysocki e8217b4bec cpufreq: intel_pstate: Update the maximum CPU frequency consistently
There are 3 places at which the maximum CPU frequency may change,
store_no_turbo(), intel_pstate_update_limits() (when called by the
cpufreq core) and intel_pstate_notify_work() (when handling a HWP
change notification).  Currently, cpuinfo.max_freq is only updated by
store_no_turbo() and intel_pstate_notify_work(), although it principle
it may be necessary to update it in intel_pstate_update_limits() either.

Make all of them mutually consistent.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:57:22 +02:00
Rafael J. Wysocki f32587dcbe cpufreq: intel_pstate: Replace three global.turbo_disabled checks
Replace the global.turbo_disabled in __intel_pstate_update_max_freq() with
a global.no_turbo one to make store_no_turbo() actually update the maximum
CPU frequency on the trubo preference changes, which needs to be consistent
with arch_set_max_freq_ratio() called from there.

For more consistency, replace the global.turbo_disabled checks in
__intel_pstate_cpu_init() and intel_cpufreq_adjust_perf() with
global.no_turbo checks either.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:57:17 +02:00
Rafael J. Wysocki 9558fae8ce cpufreq: intel_pstate: Read global.no_turbo under READ_ONCE()
Because global.no_turbo is generally not read under intel_pstate_driver_lock
make store_no_turbo() use WRITE_ONCE() for updating it (this is the only
place at which it is updated except for the initialization) and make the
majority of places reading it use READ_ONCE().

Also remove redundant global.turbo_disabled checks from places that
depend on the 'true' value of global.no_turbo because it can only be
'true' if global.turbo_disabled is also 'true'.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:57:10 +02:00
Rafael J. Wysocki c626a43845 cpufreq: intel_pstate: Rearrange show_no_turbo() and store_no_turbo()
Now that global.turbo_disabled can only change at the cpufreq driver
registration time, initialize global.no_turbo at that time too so they
are in sync to start with (if the former is set, the latter cannot be
updated later anyway).

That allows show_no_turbo() to be simlified because it does not need
to check global.turbo_disabled and store_no_turbo() can be rearranged
to avoid doing anything if the new value of global.no_turbo is equal
to the current one and only return an error on attempts to clear
global.no_turbo when global.turbo_disabled.

While at it, eliminate the redundant ret variable from store_no_turbo().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:57:04 +02:00
Rafael J. Wysocki 0940f1a801 cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization
The global.turbo_disabled is updated quite often, especially in the
passive mode in which case it is updated every time the scheduler calls
into the driver.  However, this is generally not necessary and it adds
MSR read overhead to scheduler code paths (and that particular MSR is
slow to read).

For this reason, make the driver read MSR_IA32_MISC_ENABLE_TURBO_DISABLE
just once at the cpufreq driver registration time and remove all of the
in-flight updates of global.turbo_disabled.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:56:58 +02:00
Rafael J. Wysocki 032c5565eb cpufreq: intel_pstate: Fold intel_pstate_max_within_limits() into caller
Fold intel_pstate_max_within_limits() into its only caller.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2024-04-02 12:56:51 +02:00
Rafael J. Wysocki e97a98238d cpufreq: intel_pstate: Use __ro_after_init for three variables
There are at least 3 variables in intel_pstate that do not get updated
after they have been initialized, so annotate them with __ro_after_init.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02 12:56:43 +02:00
Rafael J. Wysocki 0f2828e17b cpufreq: intel_pstate: Get rid of unnecessary READ_ONCE() annotations
Drop two redundant checks involving READ_ONCE() from notify_hwp_interrupt()
and make it check hwp_active without READ_ONCE() which is not necessary,
because that variable is only set once during the early initialization of
the driver.

In order to make that clear, annotate hwp_active with __ro_after_init.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02 12:56:40 +02:00
Rafael J. Wysocki 432acb219a cpufreq: intel_pstate: Wait for canceled delayed work to complete
Make intel_pstate_disable_hwp_interrupt() wait for canceled delayed work
to complete to avoid leftover work items running when it returns which
may be during driver unregistration and may confuse things going forward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02 12:56:36 +02:00
Rafael J. Wysocki 12ebba42d2 cpufreq: intel_pstate: Simplify spinlock locking
Because intel_pstate_enable/disable_hwp_interrupt() are only called from
thread context, they need not save the IRQ flags when using a spinlock
as interrupts are guaranteed to be enabled when they run, so make them
use spin_lock/unlock_irq().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02 12:56:33 +02:00
Rafael J. Wysocki f186b2dace cpufreq: intel_pstate: Drop redundant locking from intel_pstate_driver_cleanup()
Remove the spinlock locking from intel_pstate_driver_cleanup() as it is
not necessary because no other code accessing all_cpu_data[] can run in
parallel with that function.

Had the locking been necessary, though, it would have been incorrect
because the lock in question is acquired from a hardirq handler and
it cannot be acquired from thread context without disabling interrupts.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-02 12:56:08 +02:00
Rafael J. Wysocki e4d0d7f194 Merge back cpufreq material for 6.9-rc1. 2024-03-11 15:08:45 +01:00
Srinivas Pandruvada 1f4b7fdd71 cpufreq: intel_pstate: Update default EPPs for Meteor Lake
Update default balanced_performance EPP to 115 and performance EPP to 16.

Changing the balanced_performance EPP has better performance/watt
compared to default powerup EPP value of 128.

Changing the performance EPP to 0x10 shows reduced power for similar
performance as EPP 0. On small form factor devices this is beneficial
as lower power results in lower CPU and skin temperature. This
results in reduced thermal throttling and higher performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24 15:02:33 +01:00
Srinivas Pandruvada 240a8da623 cpufreq: intel_pstate: Allow model specific EPPs
The current implementation allows model specific EPP override for
balanced_performance. Add feature to allow model specific EPP for all
predefined EPP strings. For example for some CPU models, even changing
performance EPP has benefits

Use a mask of EPPs as driver_data instead of just balanced_performance.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24 15:02:33 +01:00
Doug Smythies f0a0fc10ab cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
There is a loophole in pstate limit clamping for the intel_cpufreq CPU
frequency scaling driver (intel_pstate in passive mode), schedutil CPU
frequency scaling governor, HWP (HardWare Pstate) control enabled, when
the adjust_perf call back path is used.

Fix it.

Fixes: a365ab6b9d cpufreq: intel_pstate: Implement the ->adjust_perf() callback
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-24 15:01:59 +01:00
Jiri Slaby (SUSE) 4615ac9010 cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait
Commit 09c448d3c6 ("cpufreq: intel_pstate: Use IOWAIT flag in Atom
algorithm") removed the last user of cpudata::prev_cummulative_iowait.
Remove the member too.

Found by https://github.com/jirislaby/clang-struct.

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-13 13:47:35 +01:00
Rafael J. Wysocki 192cdb1c90 cpufreq: intel_pstate: Refine computation of P-state for given frequency
On systems using HWP, if a given frequency is equal to the maximum turbo
frequency or the maximum non-turbo frequency, the HWP performance level
corresponding to it is already known and can be used directly without
any computation.

Accordingly, adjust the code to use the known HWP performance levels in
the cases mentioned above.

This also helps to avoid limiting CPU capacity artificially in some
cases when the BIOS produces the HWP_CAP numbers using a different
E-core-to-P-core performance scaling factor than expected by the kernel.

Fixes: f5c8cf2a49 ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores")
Cc: 6.1+ <stable@vger.kernel.org> # 6.1+
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-22 15:18:11 +01:00
Srinivas Pandruvada bde4f5ff82 cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake
On some Meteor Lake platforms, maximum one core turbo frequency is not
observed. During hybrid performance to frequency conversion, the maximum
frequency is 100 MHz less. This results in requesting maximum frequency
100 MHz less.

For example when the max one core turbo is 4.9 GHz:
MSR HWP_CAPABILITIES shows highest performance ratio for P-core is 0x3E.
With the current scaling factor of 78741 (1.27x for converting frequency
to performance) results in max frequency of 4.8 GHz. This results in
capping the max scaling frequency as 4.8 GHz, which is 100 MHz less than
the desired.

Add capability to define per CPU model specific scaling factor and define
scaling factor of 80000 (1.25x for converting frequency to performance for
P-cores) for Meteor Lake.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Debug message adjustment, subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-10 15:02:25 +01:00
Zhenguo Yao e95013156a cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
Users may disable HWP in firmware, in which case intel_pstate will give up
unless the CPU model is explicitly supported.

See also the following past commits:

 - commit df51f287b5 ("cpufreq: intel_pstate: Add Sapphire Rapids support
   in no-HWP mode")
 - commit d8de7a44e1 ("cpufreq: intel_pstate: Add Skylake servers support")
 - commit 706c532885 ("cpufreq: intel_pstate: Add Cometlake support in
   no-HWP mode")
 - commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in
   no-HWP mode")
 - commit 71bb5c82aa ("cpufreq: intel_pstate: Add Tigerlake support in
   no-HWP mode")

Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-19 21:38:09 +01:00
Srinivas Pandruvada 2719675fa8 cpufreq: intel_pstate: Prioritize firmware-provided balance performance EPP
The platform firmware can provide a balance performance EPP value by
enabling HWP and programming the EPP to the desired value.

However, currently this only takes effect for processors listed in
intel_epp_balance_perf[], so in order to enable a new processor model
to utilize this mechanism, that table needs to be updated.  It arguably
should not be necessary to modify the kernel to work properly with
every new generation of processors, though, and distributions that don't
always ship the most recent kernels should be able to run reasonably well
on new hardware without code changes.

For this reason, move the check to avoid updating the EPP when the balance
performance EPP is unmodified from the power-up default of 0x80 after the
check that allows the firmware-provided balance performance EPP value to
be retrieved.  This will cause the code to always look for the firmware-
provided value before consulting intel_epp_balance_perf[] and the handling
of new hardware will not depend on whether or not that thable has been
updated yet.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-05 21:29:47 +01:00
Srinivas Pandruvada 37b6ddba96 cpufreq: intel_pstate: Revise global turbo disable check
Setting global turbo flag based on CPU 0 P-state limits is problematic
as it limits max P-state request on every CPU on the system just based
on its P-state limits.

There are two cases in which global.turbo_disabled flag is set:
- When the MSR_IA32_MISC_ENABLE_TURBO_DISABLE bit is set to 1
in the MSR MSR_IA32_MISC_ENABLE. This bit can be only changed by
the system BIOS before power up.
- When the max non turbo P-state is same as max turbo P-state for CPU 0.

The second check is not a valid to decide global turbo state based on
the CPU 0. CPU 0 max turbo P-state can be same as max non turbo P-state,
but for other CPUs this may not be true.

There is no guarantee that max P-state limits are same for every CPU. This
is possible that during fusing max P-state for a CPU is constrained. Also
with the Intel Speed Select performance profile, CPU 0 may not be present
in all profiles. In this case the max non turbo and turbo P-state can be
set to the lowest possible P-state by the hardware when switched to
such profile. Since max non turbo and turbo P-state is same,
global.turbo_disabled flag will be set.

Once global.turbo_disabled is set, any scaling max and min frequency
update for any CPU will result in its max P-state constrained to the max
non turbo P-state.

Hence remove the check of max non turbo P-state equal to max turbo P-state
of CPU 0 to set global turbo disabled flag.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-11 20:18:17 +02:00