Documentation: thermal: Document thermal throttling on Intel platforms

Add documentation for Intel thermal throttling reporting events.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
[ rjw: Subject adjustment, file name change, minor edits ]
Link: https://patch.msgid.link/20251113212104.221632-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit is contained in:
Srinivas Pandruvada 2025-11-13 13:21:03 -08:00 committed by Rafael J. Wysocki
parent 172880f7c9
commit 3402bc010d
2 changed files with 92 additions and 0 deletions

View File

@ -6,3 +6,4 @@ Thermal Subsystem
:maxdepth: 1
intel_powerclamp
intel_thermal_throttle

View File

@ -0,0 +1,91 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
=======================================
Intel thermal throttle events reporting
=======================================
:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Introduction
------------
Intel processors have built in automatic and adaptive thermal monitoring
mechanisms that force the processor to reduce its power consumption in order
to operate within predetermined temperature limits.
Refer to section "THERMAL MONITORING AND PROTECTION" in the "Intel® 64 and
IA-32 Architectures Software Developers Manual Volume 3 (3A, 3B, 3C, & 3D):
System Programming Guide" for more details.
In general, there are two mechanisms to control the core temperature of the
processor. They are called "Thermal Monitor 1 (TM1) and Thermal Monitor 2 (TM2)".
The status of the temperature sensor that triggers the thermal monitor (TM1/TM2)
is indicated through the "thermal status flag" and "thermal status log flag" in
MSR_IA32_THERM_STATUS for core level and MSR_IA32_PACKAGE_THERM_STATUS for
package level.
Thermal Status flag, bit 0 — When set, indicates that the processor core
temperature is currently at the trip temperature of the thermal monitor and that
the processor power consumption is being reduced via either TM1 or TM2, depending
on which is enabled. When clear, the flag indicates that the core temperature is
below the thermal monitor trip temperature. This flag is read only.
Thermal Status Log flag, bit 1 — When set, indicates that the thermal sensor has
tripped since the last power-up or reset or since the last time that software
cleared this flag. This flag is a sticky bit; once set it remains set until
cleared by software or until a power-up or reset of the processor. The default
state is clear.
It is possible that when user reads MSR_IA32_THERM_STATUS or
MSR_IA32_PACKAGE_THERM_STATUS, TM1/TM2 is not active. In this case,
"Thermal Status flag" will read "0" and the "Thermal Status Log flag" will be set
to show any previous "TM1/TM2" activation. But since it needs to be cleared by
the software, it can't show the number of occurrences of "TM1/TM2" activations.
Hence, Linux provides counters of how many times the "Thermal Status flag" was
set. Also presents how long the "Thermal Status flag" was active in milliseconds.
Using these counters, users can check if the performance was limited because of
thermal events. It is recommended to read from sysfs instead of directly reading
MSRs as the "Thermal Status Log flag" is reset by the driver to implement rate
control.
Sysfs Interface
---------------
Thermal throttling events are presented for each CPU under
"/sys/devices/system/cpu/cpuX/thermal_throttle/", where "X" is the CPU number.
All these counters are read-only. They can't be reset to 0. So, they can potentially
overflow after reaching the maximum 64 bit unsigned integer.
``core_throttle_count``
Shows the number of times "Thermal Status flag" changed from 0 to 1 for this
CPU since OS boot and thermal vector is initialized. This is a 64 bit counter.
``package_throttle_count``
Shows the number of times "Thermal Status flag" changed from 0 to 1 for the
package containing this CPU since OS boot and thermal vector is initialized.
Package status is broadcast to all CPUs; all CPUs in the package increment
this count. This is a 64-bit counter.
``core_throttle_max_time_ms``
Shows the maximum amount of time for which "Thermal Status flag" has been
set to 1 for this CPU at the core level since OS boot and thermal vector
is initialized.
``package_throttle_max_time_ms``
Shows the maximum amount of time for which "Thermal Status flag" has been
set to 1 for the package containing this CPU since OS boot and thermal
vector is initialized.
``core_throttle_total_time_ms``
Shows the cumulative time for which "Thermal Status flag" has been
set to 1 for this CPU for core level since OS boot and thermal vector
is initialized.
``package_throttle_total_time_ms``
Shows the cumulative time for which "Thermal Status flag" has been set
to 1 for the package containing this CPU since OS boot and thermal vector
is initialized.