Commit Graph

388 Commits

Author SHA1 Message Date
Rafael J. Wysocki aa35e56a52 thermal: core: Introduce .should_bind() thermal zone callback
The current design of the code binding cooling devices to trip points in
thermal zones is convoluted and hard to follow.

Namely, a driver that registers a thermal zone can provide .bind()
and .unbind() operations for it, which are required to call either
thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
respectively, or thermal_zone_bind_cooling_device() and
thermal_zone_unbind_cooling_device(), respectively, for every relevant
trip point and the given cooling device.  Moreover, if .bind() is
provided and .unbind() is not, the cleanup necessary during the removal
of a thermal zone or a cooling device may not be carried out.

In other words, the core relies on the thermal zone owners to do the
right thing, which is error prone and far from obvious, even though all
of that is not really necessary.  Specifically, if the core could ask
the thermal zone owner, through a special thermal zone callback, whether
or not a given cooling device should be bound to a given trip point in
the given thermal zone, it might as well carry out all of the binding
and unbinding by itself.  In particular, the unbinding can be done
automatically without involving the thermal zone owner at all because
all of the thermal instances associated with a thermal zone or cooling
device going away must be deleted regardless.

Accordingly, introduce a new thermal zone operation, .should_bind(),
that can be invoked by the thermal core for a given thermal zone,
trip point and cooling device combination in order to check whether
or not the cooling device should be bound to the trip point at hand.
It takes an additional cooling_spec argument allowing the thermal
zone owner to specify the highest and lowest cooling states of the
cooling device and its weight for the given trip point binding.

Make the thermal core use this operation, if present, in the absence of
.bind() and .unbind().  Note that .should_bind() will be called under
the thermal zone lock.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/9334403.CDJkKcVGEf@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki d48005511a thermal: core: Move thermal zone locking out of bind/unbind functions
Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
acquire the thermal zone lock, the locking rules for their callers get
complicated.  In particular, the thermal zone lock cannot be acquired
in any code path leading to one of these functions even though it might
be useful to do so.

To address this, remove the thermal zone locking from both these
functions, add lockdep assertions for the thermal zone lock to both
of them and make their callers acquire the thermal zone lock instead.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3837835.kQq0lBPeGt@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki eb3591cde1 thermal: core: Drop redundant thermal instance checks
Because the trip and cdev pointers are sufficient to identify a thermal
instance holding them unambiguously, drop the additional thermal zone
checks from two loops walking the list of thermal instances in a
thermal zone.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/10527734.nUPlyArG6x@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki b4e6d39817 thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
It is not necessary to look up the thermal zone and the cooling device
in the respective global lists to check whether or not they are
registered.  It is sufficient to check whether or not their respective
list nodes are empty for this purpose.

Use the above observation to simplify thermal_bind_cdev_to_trip().  In
addition, eliminate an unnecessary ternary operator from it.

Moreover, add lockdep_assert_held() for thermal_list_lock to it because
that lock must be held by its callers when it is running.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/3324214.44csPzL39Z@rjwysocki.net
2024-08-22 17:43:14 +02:00
Rafael J. Wysocki a8bbe6f10f thermal: core: Fold two functions into their respective callers
Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz()
into thermal_zone_device_register_with_trips() to reduce code bloat and
make it somewhat easier to follow the code flow.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/2962184.e9J7NaK4W3@rjwysocki.net
2024-08-22 17:43:13 +02:00
Daniel Lezcano f9ba1e0517 thermal/core: Compute low and high boundaries in thermal_zone_device_update()
In order to set the scene for the thresholds support which have to
manipulate the low and high temperature boundaries for the interrupt
support, we must pass the low and high values to the incoming
thresholds routine.

The variables are set from the thermal_zone_set_trips() where the
function loops the thermal trips to figure out the next and the
previous temperatures to set the interrupt to be triggered when they
are crossed.

These variables will be needed by the function in charge of handling
the thresholds in the incoming changes but they are local to the
aforementioned function thermal_zone_set_trips().

Move the low and high boundaries computation out of the function in
thermal_zone_device_update() so they are accessible from there.

The positive side effect is they are computed in the same loop as
handle_thermal_trip(), so we remove one loop.

Co-developed-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240816081241.1925221-2-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-19 16:06:58 +02:00
Rafael J. Wysocki 6e6f58a170 thermal: gov_bang_bang: Use governor_data to reduce overhead
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.

For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.

However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.

To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Kästle <peter@piie.net>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
2024-08-16 13:13:59 +02:00
Rafael J. Wysocki f7c1b0e4ae thermal: core: Back off when polling thermal zones on errors
Commit a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes).  However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason.  It is also not really useful to continuously poll thermal zones
that never respond.

To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value.  At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.

Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone.  In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <tom.ty89@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2962033.e9J7NaK4W3@rjwysocki.net
2024-07-24 12:40:23 +02:00
Rafael J. Wysocki e5f98896ef thermal: trip: Split thermal_zone_device_set_mode()
Pull a wrapper around thermal zone .change_mode() callback out of
thermal_zone_device_set_mode() because it will be used elsewhere
subsequently.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2206793.irdbgypaU6@rjwysocki.net
2024-07-23 14:18:15 +02:00
Rafael J. Wysocki e528be3c87 thermal: core: Allow thermal zones to tell the core to ignore them
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.

Before commit a8a2617744 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a2617744 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.

That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.

Unfortunately, the simple approach taken by commit a8a2617744,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down.  If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.

To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way.  Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.

On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.

Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.

Fixes: a8a2617744 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: 6.10+ <stable@vger.kernel.org> # 6.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-18 13:35:55 +02:00
Rafael J. Wysocki ab33da3a22 Merge branch 'thermal-core'
Merge updates related to the thermal core for 6.11-rc1:

 - Redesign the .set_trip_temp() thermal zone callback to take a trip
   pointer instead of a trip ID and update its users (Rafael Wysocki).

 - Avoid using invalid combinations of polling_delay and passive_delay
   thermal zone parameters (Rafael Wysocki).

 - Update a cooling device registration function to take a const
   argument (Krzysztof Kozlowski).

 - Make the uniphier thermal driver use thermal_zone_for_each_trip() for
   walking trip points (Rafael Wysocki).

* thermal-core:
  thermal: core: Add sanity checks for polling_delay and passive_delay
  thermal: trip: Fold __thermal_zone_get_trip() into its caller
  thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback
  thermal: imx: Drop critical trip check from imx_set_trip_temp()
  thermal: trip: Add conversion macros for thermal trip priv field
  thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()
  thermal: core: Change passive_delay and polling_delay data type
  thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
  thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
2024-07-15 20:43:21 +02:00
Rafael J. Wysocki 3669716401 thermal: core: Add sanity checks for polling_delay and passive_delay
If polling_delay is nonzero and passive_delay is greater than
polling_delay, the thermal zone temperature will be updated less
often when tz->passive is nonzero, which is not as expected.  Make
the thermal zone registration fail with -EINVAL in that case as
this is a clear thermal zone configuration mistake.

If polling_delay is nonzero and passive_delay is 0, which is regarded
as a valid thermal zone configuration, the thermal zone will use polling
except when tz->passive is nonzero.  However, the expected behavior in
that case is to continue temperature polling with the same delay value
regardless of tz->passive, so set passive_delay to the polling_delay
value then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5802156.DvuYhMxLoT@rjwysocki.net
2024-07-12 15:14:57 +02:00
Rafael J. Wysocki 462be1c353 Merge back thermal control material for 6.11. 2024-07-10 13:01:38 +02:00
Rafael J. Wysocki d05374dee2 thermal: core: Change passive_delay and polling_delay data type
It is better to use unsigned int as the data type for the passive_delay
and polling_delay arguments of thermal_zone_device_register_with_trips()
because they are implicitly cast to unsigned int anyway in
thermal_set_delay_jiffies() and if they happen to be negative at that
point, the resulting behavior may not be as desired.

Update the thermal_zone_device_register_with_trips() definition
accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/5803791.DvuYhMxLoT@rjwysocki.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-09 18:19:24 +02:00
Rafael J. Wysocki 94eacc1c58 thermal: core: Fix list sorting in __thermal_zone_device_update()
The order in which lists are sorted in __thermal_zone_device_update()
is reverse with respect to what it should be due to a mistake in
thermal_trip_notify_cmp().

Fix it and observe that it is not necessary to sort the lists in
different orders.  They can both be sorted in ascending order if
way_down_list is walked in reverse order which allows the code to
be slightly more straightforward (and less prone to silly mistakes).

Fixes: 7454f2c42c ("thermal: core: Sort trip point crossing notifications by temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
2024-07-08 17:24:22 +02:00
Rafael J. Wysocki a8a2617744 thermal: core: Call monitor_thermal_zone() if zone temperature is invalid
Commit 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip()
if zone temperature is invalid") caused __thermal_zone_device_update()
to return early if the current thermal zone temperature was invalid.

This was done to avoid running handle_thermal_trip() and governor
callbacks in that case which led to confusion.  However, it went too
far because monitor_thermal_zone() still needs to be called even when
the zone temperature is invalid to ensure that it will be updated
eventually in case thermal polling is enabled and the driver has no
other means to notify the core of zone temperature changes (for example,
it does not register an interrupt handler or ACPI notifier).

Also if the .set_trips() zone callback is expected to set up monitoring
interrupts for a thermal zone, it has to be provided with valid
boundaries and that can only happen if the zone temperature is known.

Accordingly, to ensure that __thermal_zone_device_update() will
run again after a failing zone temperature check, make it call
monitor_thermal_zone() regardless of whether or not the zone
temperature is valid and make the latter schedule a thermal zone
temperature update if the zone temperature is invalid even if
polling is not enabled for the thermal zone.

Fixes: 202aa0d4bb ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net
[ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04 19:01:59 +02:00
Krzysztof Kozlowski 4acab508eb thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()
The 'type' string passed to thermal_of_cooling_device_register() is a
'const char *', so do the same in the devm interface.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240703083141.96013-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04 14:25:20 +02:00
Rafael J. Wysocki 06d55c4278 Merge branch 'thermal-core'
Merge thermal core changes for v6.11:

 - Fix and clean up several minor shortcomings in thermal debug (Rafael
   Wysocki).

 - Rename __thermal_zone_set_trips() to thermal_zone_set_trips() and
   make it use trip thresholds (Rafael Wysocki).

 - Use READ_ONCE() for lockless access to trip temperature and
   hysteresis (Rafael Wysocki).

 - Drop unnecessary cooling device target state checks from the
   Bang-Bang thermal governor (Rafael Wysocki).

 - Avoid invoking thermal governor .trip_crossed() callback for critical
   and hot trip points (Rafael Wysocki).

* thermal-core:
  thermal: core: Avoid calling .trip_crossed() for critical and hot trips
  thermal: gov_bang_bang: Drop unnecessary cooling device target state checks
  thermal: trip: Use READ_ONCE() for lockless access to trip properties
  thermal: trip: Make thermal_zone_set_trips() use trip thresholds
  thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips()
  thermal: trip: Use common set of trip type names
  thermal/debugfs: Move some statements from under thermal_dbg->lock
  thermal/debugfs: Compute maximum temperature for mitigation episode as a whole
  thermal/debugfs: Adjust check for trips without statistics in tze_seq_show()
  thermal/debugfs: Fix up units in "mitigations" files
  thermal/debugfs: Print mitigation timestamp value in milliseconds
  thermal/debugfs: Do not extend mitigation episodes beyond system resume
  thermal/debugfs: Use helper to update trip point overstepping duration
2024-06-25 16:27:38 +02:00
Linus Torvalds fbe7ef3f2f Thermal control fixes for 6.10-rc5
- Remove the filtered mode for mt8188 from lvts_thermal as it is not
    supported on this platform and fail the lvts_thermal initialization
    when the golden temperature is zero as that means the efuse data is
    not correctly set (Julien Panis).
 
  - Update the processor_thermal part of the Intel int340x driver to
    support shared interrupts as the processor thermal device interrupt
    may in fact be shared with PCI devices (Srinivas Pandruvada).
 
  - Synchronize the suspend-prepare and post-suspend actions of the
    thermal PM notifier to avoid a destructive race condition and
    change the priority of that notifier to the minimum to avoid
    interference between the work items spawned by it and the other
    PM notifiers during system resume (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmZ1clgSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxftEQAKE9MJHvo1zTgGq3jU198pYj6/oIf4F8
 GR8wTSIhmSO+YgUINbjcGaUoCbwROeaLzLDyAXsrG0vS1t1/c5TGGujvQnMCnrdz
 rSwGoXtJeRK2yYNUWz/Mt1e/ai04sn/i9/9gZJOagRQaas45SdUIxLqpQs17R5cP
 R/8tFAyjMVyQ1MvI4mP3zK1yvE4fBeC18ZGKXNM57tzGBr0dWcSLrVJH/vS1/3Zx
 947LCzngfw8rPx1v+1LSIaCja62StUHBfmkTHnXDaegCFaWCuyUGgHV7yV3RW5sc
 +jQfMo6WH+O10TWvP28tc9dwamB0kfGr7oZXDH0ucpTwg621tThQZL/urxcAX1BS
 i5HQlATfUD/qWq1c5KUEZMRVje/rU8+SnSbf/xZeQKF4albOoZUWAk6VWqyJiwrr
 VTnMsVVCEZ62Vawyk7tchdMu233GMoMndRPu/Xq+ourKf8cDzzXdzn678cofahpn
 5u+droa/n9O7gkku8Mz4zTmgVyXyJtLi1tMGHYT/M7XM/OrKlwvqmpAnUSuAfaQ8
 C8ywxyrk1GJQoy8LLStaSvUFcAtrdmBpv5hSFPDXkbmy/xHaKw7+vccjL9CybmXo
 P4u2psBQ4opeg0x5I+dS5Tp0WcFXcdo5UGSaw2dRT+7WDxA9ALk3hs/s+vHzv1zO
 Fe7iya1SwbZB
 =qXau
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These fix the Mediatek lvts_thermal driver, the Intel int340x driver,
  and the thermal core (two issues related to system suspend).

  Specifics:

   - Remove the filtered mode for mt8188 from lvts_thermal as it is not
     supported on this platform and fail the lvts_thermal initialization
     when the golden temperature is zero as that means the efuse data is
     not correctly set (Julien Panis)

   - Update the processor_thermal part of the Intel int340x driver to
     support shared interrupts as the processor thermal device interrupt
     may in fact be shared with PCI devices (Srinivas Pandruvada)

   - Synchronize the suspend-prepare and post-suspend actions of the
     thermal PM notifier to avoid a destructive race condition and
     change the priority of that notifier to the minimum to avoid
     interference between the work items spawned by it and the other
     PM notifiers during system resume (Rafael Wysocki)"

* tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: int340x: processor_thermal: Support shared interrupts
  thermal: core: Change PM notifier priority to the minimum
  thermal: core: Synchronize suspend-prepare and post-suspend actions
  thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data
  thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-21 11:16:56 -07:00
Rafael J. Wysocki 3995904d54 Merge back thermal control material for v6.11. 2024-06-21 14:42:30 +02:00
Rafael J. Wysocki 494c7d0550 thermal: core: Change PM notifier priority to the minimum
It is reported that commit 5a5efdaffd ("thermal: core: Resume thermal
zones asynchronously") causes battery data in sysfs on Thinkpad P1 Gen2
to become invalid after a resume from S3 (and it is necessary to reboot
the machine to restore correct battery data).  Some investigation into
the problem indicated that it happened because, after the commit in
question, the ACPI battery PM notifier ran in parallel with
thermal_zone_device_resume() for one of the thermal zones which
apparently confused the platform firmware on the affected system.

While the exact reason for the firmware confusion remains unclear, it
is arguably not particularly relevant, and the expected behavior of the
affected system can be restored by making the thermal PM notifier run
at the lowest priority which avoids interference between work items
spawned by it and the other PM notifiers (that will run before those
work items now).

Fixes: 5a5efdaffd ("thermal: core: Resume thermal zones asynchronously")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218881
Reported-by: fhortner@yahoo.de
Tested-by: fhortner@yahoo.de
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14 17:36:59 +02:00
Rafael J. Wysocki d2278f3533 thermal: core: Synchronize suspend-prepare and post-suspend actions
After commit 5a5efdaffd ("thermal: core: Resume thermal zones
asynchronously") it is theoretically possible that, if a system suspend
starts immediately after a system resume, thermal_zone_device_resume()
spawned by the thermal PM notifier for one of the thermal zones at the
end of the system resume will run after the PM thermal notifier for the
suspend-prepare action.  If that happens, tz->suspended set by the latter
will be reset by the former which may lead to unexpected consequences.

To avoid that race, synchronize thermal_zone_device_resume() with the
suspend-prepare thermal PM notifier with the help of additional bool
field and completion in struct thermal_zone_device.

Note that this also ensures running __thermal_zone_device_update() at
least once for each thermal zone between system resume and the following
system suspend in case it is needed to start thermal mitigation.

Fixes: 5a5efdaffd ("thermal: core: Resume thermal zones asynchronously")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-14 17:36:59 +02:00
Rafael J. Wysocki 72196c20c3 thermal: core: Avoid calling .trip_crossed() for critical and hot trips
Invoking the governor .trip_crossed() callback for critical and hot
trips is pointless because they are handled directly by the core,
so make thermal_governor_trip_crossed() avoid doing that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-11 21:06:44 +02:00
Rafael J. Wysocki 893bae9223 thermal: trip: Make thermal_zone_set_trips() use trip thresholds
Modify thermal_zone_set_trips() to use trip thresholds instead of
computing the low temperature for each trip to avoid deriving both
the high and low temperature levels from the same trip (which may
happen if the zone temperature falls into the hysteresis range of
one trip).

Accordingly, make __thermal_zone_device_update() call
thermal_zone_set_trips() later, when threshold values have been
updated for all trips.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11 21:04:56 +02:00
Rafael J. Wysocki 28d5cc1267 thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips()
Drop the pointless double underline prefix from the function name as per
the subject.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11 21:04:48 +02:00
Rafael J. Wysocki 9b73b5052a thermal/debugfs: Do not extend mitigation episodes beyond system resume
Because thermal zone handling by the thermal core is started from
scratch during resume from system-wide suspend, prevent the debug
code from extending mitigation episodes beyond that point by ending
the mitigation episode currently in progress, if any, for each thermal
zone.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-06-11 21:04:00 +02:00
Rafael J. Wysocki 1af89dedc8 thermal: core: Do not fail cdev registration because of invalid initial state
It is reported that commit 31a0fa0019 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan.  If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019.

Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.

Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.

Fixes: 31a0fa0019 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com
Reported-by: Laura Nao <laura.nao@collabora.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Laura Nao <laura.nao@collabora.com>
2024-06-07 13:51:51 +02:00
Rafael J. Wysocki ae2170d6ea thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid
When a trip point becomes invalid after being crossed on the way up,
it is involved in a mitigation episode that needs to be adjusted to
compensate for the trip going away.

For this reason, introduce thermal_zone_trip_down() as a wrapper
around thermal_trip_crossed() and make thermal_zone_set_trip_temp()
call it if the new temperature of the trip at hand is equal to
THERMAL_TEMP_INVALID and it has been crossed on the way up to trigger
all of the necessary adjustments in user space, the thermal debug
code and the zone governor.

Fixes: 8c69a777e4 ("thermal: core: Fix the handling of invalid trip points")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki cb573eec60 thermal: core: Introduce thermal_trip_crossed()
Add a helper function called thermal_trip_crossed() to be invoked by
__thermal_zone_device_update() in order to notify user space, the
thermal debug code and the zone governor about trip crossing.

Subsequently, this will also be used in the case when a trip point
becomes invalid after being crossed on the way up.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-05-27 13:00:00 +02:00
Rafael J. Wysocki 8c69a777e4 thermal: core: Fix the handling of invalid trip points
Commit 9ad18043fb ("thermal: core: Send trip crossing notifications
at init time if needed") overlooked the case when a trip point that
has started as invalid is set to a valid temperature later.  Namely,
the initial threshold value for all trips is zero, so if a previously
invalid trip becomes valid and its (new) low temperature is above the
zone temperature, a spurious trip crossing notification will occur and
it may trigger the WARN_ON() in handle_thermal_trip().

To address this, set the initial threshold for all trips to INT_MAX.

There is also the case when a valid writable trip becomes invalid that
requires special handling.  First, in accordance with the change
mentioned above, the trip's threshold needs to be set to INT_MAX to
avoid the same issue.  Second, if the trip in question is passive and
it has been crossed by the thermal zone temperature on the way up, the
zone's passive count has been incremented and it is in the passive
polling mode, so its passive count needs to be adjusted to allow the
passive polling to be turned off eventually.

Fixes: 9ad18043fb ("thermal: core: Send trip crossing notifications at init time if needed")
Fixes: 042a3d80f1 ("thermal: core: Move passive polling management to the core")
Reported-by: Zhang Rui <zhang.rui@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
2024-05-17 12:37:33 +02:00
Rafael J. Wysocki 042a3d80f1 thermal: core: Move passive polling management to the core
Passive polling is enabled by setting the 'passive' field in
struct thermal_zone_device to a positive value so long as the
'passive_delay_jiffies' field is greater than zero.  It causes
the thermal core to actively check the thermal zone temperature
periodically which in theory should be done after crossing a
passive trip point on the way up in order to allow governors to
react more rapidly to temperature changes and adjust mitigation
more precisely.

However, the 'passive' field in struct thermal_zone_device is currently
managed by governors which is quite problematic.  First of all, only
two governors, Step-Wise and Power Allocator, update that field at
all, so the other governors do not benefit from passive polling,
although in principle they should.  Moreover, if the zone governor is
changed from, say, Step-Wise to Fair-Share after 'passive' has been
incremented by the former, it is not going to be reset back to zero by
the latter even if the zone temperature falls down below all passive
trip points.

For this reason, make handle_thermal_trip() increment 'passive'
to enable passive polling for the given thermal zone whenever a
passive trip point is crossed on the way up and decrement it
whenever a passive trip point is crossed on the way down.  Also
remove the 'passive' field updates from governors and additionally
clear it in thermal_zone_device_init() to prevent passive polling
from being enabled after a system resume just beacuse it was enabled
before suspending the system.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30 21:16:13 +02:00
Rafael J. Wysocki 202aa0d4bb thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid
Make __thermal_zone_device_update() bail out if update_temperature()
fails to update the zone temperature because __thermal_zone_get_temp()
has returned an error and the current zone temperature is
THERMAL_TEMP_INVALID (user space receiving netlink thermal messages,
thermal debug code and thermal governors may get confused otherwise).

Fixes: 9ad18043fb ("thermal: core: Send trip crossing notifications at init time if needed")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-30 21:16:13 +02:00
Rafael J. Wysocki 31a0fa0019 thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()
If cdev_dt_seq_show() runs before the first state transition of a cooling
device, it will not print any state residency information for it, even
though it might be reasonably expected to print residency information for
the initial state of the cooling device.

For this reason, rearrange the code to get the initial state of a cooling
device at the registration time and pass it to thermal_debug_cdev_add(),
so that the latter can create a duration record for that state which will
allow cdev_dt_seq_show() to print its residency information.

Fixes: 755113d767 ("thermal/debugfs: Add thermal cooling device debugfs information")
Reported-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-26 15:01:56 +02:00
Rafael J. Wysocki f831892e23 thermal: core: Introduce thermal_governor_trip_crossed()
Add a wrapper around the .trip_crossed() governor callback invocation
to reduce code duplications slightly and improve the code layout in
__thermal_zone_device_update().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-24 20:45:06 +02:00
Rafael J. Wysocki 8dff6e8438 thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats()
which is a better match for the purpose of the function.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:09 +02:00
Rafael J. Wysocki 0a293c7758 thermal/debugfs: Avoid excessive updates of trip point statistics
Since thermal_debug_update_temp() is called before invoking
thermal_debug_tz_trip_down() for the trips that were crossed by the
zone temperature on the way up, it updates the statistics for them
as though the current zone temperature was above the low temperature
of each of them.  However, if a given trip has just been crossed on the
way down, the zone temperature is in fact below its low temperature,
but this is handled by thermal_debug_tz_trip_down() running after the
update of the trip statistics.

The remedy is to call thermal_debug_update_temp() after
thermal_debug_tz_trip_down() has been invoked for all of the
trips in question, but then thermal_debug_tz_trip_up() needs to
be adjusted, so it does not update the statistics for the trips
that has just been crossed on the way up, as that will be taken
care of by thermal_debug_update_temp() down the road.

Modify the code accordingly.

Fixes: 7ef01f228c ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:43:02 +02:00
Rafael J. Wysocki 2ae0998c67 thermal: core: Relocate critical and hot trip handling
Modify handle_thermal_trip() to call handle_critical_trips() only after
finding that the trip temperature has been crossed on the way up and
remove the redundant temperature check from the latter.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:42:39 +02:00
Rafael J. Wysocki ad2f8bccd0 thermal: core: Drop the .throttle() governor callback
Since all of the governors in the tree have been switched over to using
the new callbacks, either .trip_crossed() or .manage(), the .throttle()
governor callback is not used any more, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-24 20:42:31 +02:00
Rafael J. Wysocki 976f44133f thermal: core: Introduce .manage() callback for thermal governors
Introduce a new thermal governor callback called .manage() that will be
invoked once per thermal zone update after processing all of the trip
points in the core.

This will allow governors that look at multiple trip points together
to check all of them in a consistent configuration, so they don't need
to play tricks with skipping .throttle() invocations that they are not
interested in and they can avoid carrying out the same computations for
multiple times in one cycle.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-23 20:38:41 +02:00
Rafael J. Wysocki 80f5fd45c7 thermal: core: Introduce .trip_crossed() callback for thermal governors
Introduce a new thermal governor callback called .trip_crossed()
that will be invoked whenever a trip point is crossed by the zone
temperature, either on the way up or on the way down.

The trip crossing direction information will be passed to it and if
multiple trips are crossed in the same direction during one thermal zone
update, the new callback will be invoked for them in temperature order,
either ascending or descending, depending on the trip crossing
direction.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-19 22:11:06 +02:00
Rafael J. Wysocki 7454f2c42c thermal: core: Sort trip point crossing notifications by temperature
If multiple trip points are crossed in one go and the trips table in
the thermal zone device object is not sorted, the corresponding trip
point crossing notifications sent to user space will not be ordered
either.

Moreover, if the trips table is sorted by trip temperature in ascending
order, the trip crossing notifications on the way up will be sent in that
order too, but the trip crossing notifications on the way down will be
sent in the reverse order.

This is generally confusing and it is better to make the kernel send the
notifications in the order of growing (on the way up) or falling (on the
way down) trip temperature.

To achieve that, instead of sending a trip crossing notification and
recording a trip crossing event in the statistics right away from
handle_thermal_trip(), put the trip in question on a list that will be
sorted by __thermal_zone_device_update() after processing all of the trips
and before sending the notifications and recording trip crossing events.

Link: https://lore.kernel.org/linux-pm/20240306085428.88011-1-daniel.lezcano@linaro.org/
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08 16:01:20 +02:00
Rafael J. Wysocki 9ad18043fb thermal: core: Send trip crossing notifications at init time if needed
If a trip point is already exceeded by the zone temperature at the
initialization time, no trip crossing notification is send regarding
this even though mitigation should be started then.

Address this by rearranging the code in handle_thermal_trip() to
send a trip crossing notification for trip points already exceeded
by the zone temperature initially which also allows to reduce its
size by using the observation that the initialization and regular
trip crossing on the way up become the same case then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08 16:01:20 +02:00
Rafael J. Wysocki f99c1b87a9 thermal: core: Rewrite comments in handle_thermal_trip()
Make the comments regarding trip crossing and threshold updates in
handle_thermal_trip() slightly more clear.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-04-08 16:01:20 +02:00
Rafael J. Wysocki daeeb032f4 thermal: core: Move threshold out of struct thermal_trip
The threshold field in struct thermal_trip is only used internally by
the thermal core and it is better to prevent drivers from misusing it.
It also takes some space unnecessarily in the trip tables passed by
drivers to the core during thermal zone registration.

For this reason, introduce struct thermal_trip_desc as a wrapper around
struct thermal_trip, move the threshold field directly into it and make
the thermal core store struct thermal_trip_desc objects in the internal
thermal zone trip tables.  Adjust all of the code using trip tables in
the thermal core accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-04-08 16:01:20 +02:00
Flavio Suligoi 32abd25087 thermal: core: Remove excess empty line from a comment
The first and the third lines of the kerneldoc comment for:

thermal_zone_device_set_polling()

belong to the same sentences, so join them together.

Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-05 13:17:33 +01:00
Rafael J. Wysocki 4a62d588a8 thermal: core: Eliminate writable trip points masks
All of the thermal_zone_device_register_with_trips() callers pass zero
writable trip points masks to it, so drop the mask argument from that
function and update all of its callers accordingly.

This also removes the artificial trip points per zone limit of 32,
related to using writable trip points masks.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-27 12:04:38 +01:00
Rafael J. Wysocki 5340f76472 thermal: core: Add flags to struct thermal_trip
In order to allow thermal zone creators to specify the writability of
trip point temperature and hysteresis on a per-trip basis, add a flags
field to struct thermal_trip and define flags to represent the desired
trip properties.

Also make thermal_zone_device_register_with_trips() set the
THERMAL_TRIP_FLAG_RW_TEMP flag for all trips covered by the writable
trips mask passed to it and modify the thermal sysfs code to look at
the trip flags instead of using the writable trips mask directly or
checking the presence of the .set_trip_hyst() zone callback.

Additionally, make trip_point_temp_store() and trip_point_hyst_store()
fail with an error code if the trip passed to one of them has
THERMAL_TRIP_FLAG_RW_TEMP or THERMAL_TRIP_FLAG_RW_HYST,
respectively, clear in its flags.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-27 12:02:50 +01:00
Nathan Chancellor da1983355c thermal: core: Move initial num_trips assignment before memcpy()
When booting a CONFIG_FORTIFY_SOURCE=y kernel compiled with a toolchain
that supports __counted_by() (such as clang-18 and newer), there is a
panic on boot:

  [    2.913770] memcpy: detected buffer overflow: 72 byte write of buffer size 0
  [    2.920834] WARNING: CPU: 2 PID: 1 at lib/string_helpers.c:1027 __fortify_report+0x5c/0x74
  ...
  [    3.039208] Call trace:
  [    3.041643]  __fortify_report+0x5c/0x74
  [    3.045469]  __fortify_panic+0x18/0x20
  [    3.049209]  thermal_zone_device_register_with_trips+0x4c8/0x4f8

This panic occurs because trips is counted by num_trips but num_trips is
assigned after the call to memcpy(), so the fortify checks think the
buffer size is zero because tz was allocated with kzalloc().

Move the num_trips assignment before the memcpy() to resolve the panic
and ensure that the fortify checks work properly.

Fixes: 9b0a627586 ("thermal: core: Store zone trips table in struct thermal_zone_device")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-27 11:58:47 +01:00
Rafael J. Wysocki 698a1eb1f7 thermal: core: Store zone ops in struct thermal_zone_device
The current code requires thermal zone creators to pass pointers to
writable ops structures to thermal_zone_device_register_with_trips()
which needs to modify the target struct thermal_zone_device_ops object
if the "critical" operation in it is NULL.

Moreover, the callers of thermal_zone_device_register_with_trips() are
required to hold on to the struct thermal_zone_device_ops object passed
to it until the given thermal zone is unregistered.

Both of these requirements are quite inconvenient, so modify struct
thermal_zone_device to contain struct thermal_zone_device_ops as field and
make thermal_zone_device_register_with_trips() copy the contents of the
struct thermal_zone_device_ops passed to it via a pointer (which can be
const now) to that field.

Also adjust the code using thermal zone ops accordingly and modify
thermal_of_zone_register() to use a local ops variable during
thermal zone registration so ops do not need to be freed in
thermal_of_zone_unregister() any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-23 18:24:48 +01:00
Rafael J. Wysocki 9b0a627586 thermal: core: Store zone trips table in struct thermal_zone_device
The current code expects thermal zone creators to pass a pointer to a
writable trips table to thermal_zone_device_register_with_trips() and
that trips table is then used by the thermal core going forward.

Consequently, the callers of thermal_zone_device_register_with_trips()
are required to hold on to the trips table passed to it until the given
thermal zone is unregistered, at which point the trips table can be
freed, but at the same time they are not expected to access that table
directly.  This is both error prone and confusing.

To address it, turn the trips table pointer in struct thermal_zone_device
into a flex array (counted by its num_trips field), allocate it during
thermal zone device allocation and copy the contents of the trips table
supplied by the zone creator (which can be const now) into it, which
will allow the callers of thermal_zone_device_register_with_trips() to
drop their trip tables right after the zone registration.

This requires the imx thermal driver to be adjusted to store the new
temperature in its internal trips table in imx_set_trip_temp(), because
it will be separate from the core's trips table now and it has to be
explicitly kept in sync with the latter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-02-23 18:24:47 +01:00
Christophe JAILLET 57a427c81c thermal: core: Use kstrdup_const() during cooling device registration
Some *thermal_cooling_device_register() calls pass a string literal as
the 'type' parameter, so kstrdup_const() can be used instead of
kstrdup() to avoid a memory allocation in such cases.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-12 15:38:23 +01:00
Daniel Lezcano 7ef01f228c thermal/debugfs: Add thermal debugfs information for mitigation episodes
The mitigation episodes are recorded. A mitigation episode happens
when the first trip point is crossed the way up and then the way
down. During this episode other trip points can be crossed also and
are accounted for this mitigation episode. The interesting information
is the average temperature at the trip point, the undershot and the
overshot. The standard deviation of the mitigated temperature will be
added later.

The thermal debugfs directory structure tries to stay consistent with
the sysfs one but in a very simplified way:

thermal/
 `-- thermal_zones
     |-- 0
     |   `-- mitigations
     `-- 1
         `-- mitigations

The content of the mitigations file has the following format:

,-Mitigation at 349988258us, duration=130136ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |     130136 |     68227 |     62500 |     75625 |
|    1 |  passive |     75000 |      2000 |     104209 |     74857 |     71666 |     77500 |
,-Mitigation at 272451637us, duration=75000ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |      75000 |     68561 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      60714 |     74820 |     70555 |     77500 |
,-Mitigation at 238184119us, duration=27316ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |      27316 |     73377 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      19468 |     75284 |     69444 |     77500 |
,-Mitigation at 39863713us, duration=136196ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |     136196 |     73922 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      91721 |     74386 |     69444 |     78125 |

More information for a better understanding of the thermal behavior
will be added after. The idea is to give detailed statistics
information about the undershots and overshots, the temperature speed,
etc... As all the information in a single file is too much, the idea
would be to create a directory named with the mitigation timestamp
where all data could be added.

Please note this code is immune against trip ordering but not against
a trip temperature change while a mitigation is happening. However,
this situation should be extremely rare, perhaps not happening and we
might question ourselves if something should be done in the core
framework for other components first.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: White space fixups, rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-12 15:37:49 +01:00
Daniel Lezcano 755113d767 thermal/debugfs: Add thermal cooling device debugfs information
The thermal framework does not have any debug information except a
sysfs stat which is a bit controversial. This one allocates big chunks
of memory for every cooling devices with a high number of states and
could represent on some systems in production several megabytes of
memory for just a portion of it. As the sysfs is limited to a page
size, the output is not exploitable with large data array and gets
truncated.

The patch provides the same information than sysfs except the
transitions are dynamically allocated, thus they won't show more
events than the ones which actually occurred. There is no longer a
size limitation and it opens the field for more debugging information
where the debugfs is designed for, not sysfs.

The thermal debugfs directory structure tries to stay consistent with
the sysfs one but in a very simplified way:

thermal/
 -- cooling_devices
    |-- 0
    |   |-- clear
    |   |-- time_in_state_ms
    |   |-- total_trans
    |   `-- trans_table
    |-- 1
    |   |-- clear
    |   |-- time_in_state_ms
    |   |-- total_trans
    |   `-- trans_table
    |-- 2
    |   |-- clear
    |   |-- time_in_state_ms
    |   |-- total_trans
    |   `-- trans_table
    |-- 3
    |   |-- clear
    |   |-- time_in_state_ms
    |   |-- total_trans
    |   `-- trans_table
    `-- 4
        |-- clear
        |-- time_in_state_ms
        |-- total_trans
        `-- trans_table

The content of the files in the cooling devices directory is the same
as the sysfs one except for the trans_table which has the following
format:

Transition	Hits
1->0      	246
0->1      	246
2->1      	632
1->2      	632
3->2      	98
2->3      	98

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: White space fixups, rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-12 15:34:56 +01:00
Rafael J. Wysocki 2f521890aa thermal: netlink: Pass thermal zone pointer to notify routines
There are several rountines in the thermal netlink API that take a
thermal zone ID or a thermal zone type as their arguments, but from
their callers perspective it would be more convenient to pass a thermal
zone pointer to them and let them extract the necessary data from the
given thermal zone object by themselves.

Modify the code accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-09 13:32:23 +01:00
Rafael J. Wysocki f52557edf0 thermal: netlink: Pass pointers to thermal_notify_tz_trip_up/down()
Instead of requiring the callers of thermal_notify_tz_trip_up/down() to
provide specific values needed to populate struct param in them, make
them extract those values from objects passed by the callers via const
pointers.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-09 13:31:37 +01:00
Rafael J. Wysocki d654362d53 - Converted Mediatek Thermal to the json-schema (Rafał Miłecki)
- Fixed DT bindings issue on Loongson (Binbin Zhou)
 
 - Fixed returning NULL instead of -ENODEV on Loogsoo (Binbin Zhou)
 
 - Added the DT binding for the tsens on SM8650 platform (Neil Armstrong)
 
 - Added a reboot on critical option feature (Fabio Estevam)
 
 - Made usage of DEFINE_SIMPLE_DEV_PM_OPS on AmLogic (Uwe Kleine-König)
 
 - Added the D1/T113s THS controller support on Sun8i (Maxim Kiselev)
 
 - Fixed example in the DT binding for QCom SPMI (Johan Hovold)
 
 - Fixed compilation warning for the tmon utility (Florian Eckert)
 
 - Added interrupt based configuration on Exynos along with a set of
   related cleanups (Mateusz Majewski)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmWT0yUACgkQqDIjiipP
 6E+YjggAsrsQNrsUgiW0/M0i75kfcEBgLfXscyPFsUYwpByb+haWzAJSLrvCBko8
 zPFvNE0or6KTQJCtseWWQDNQKWilAKARurEg7vuahfWo5LmfauPGMxsw+iHM9TgW
 8Ptkc1biy3TNr1zVCpQrCZK9GdLGsG7JRgxHi4Hfr/Hb/FZN9Mm0Yk1pRpFU+pn8
 Ff5UScMcU7NWhhxlQavLOoMmyAmh2k/jvfCSlmXGj7kxRrX8YIC02dTDjFm5cpyP
 Toy2POWkcZr4Xr+kWOh8pxojZVwKU2pN7cYyLJn8+OO+rpUAf0ol4PW0mly7uMgH
 1HCZ0x0hiNJQ8N6SwN+Eptq9TpgCBw==
 =W2kc
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.8-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into thermal

Merge thermal control material for 6.8-rc1 from Daniel Lezcano:

"- Converted Mediatek Thermal to the json-schema (Rafał Miłecki)

 - Fixed DT bindings issue on Loongson (Binbin Zhou)

 - Fixed returning NULL instead of -ENODEV on Loogsoo (Binbin Zhou)

 - Added the DT binding for the tsens on SM8650 platform (Neil Armstrong)

 - Added a reboot on critical option feature (Fabio Estevam)

 - Made usage of DEFINE_SIMPLE_DEV_PM_OPS on AmLogic (Uwe Kleine-König)

 - Added the D1/T113s THS controller support on Sun8i (Maxim Kiselev)

 - Fixed example in the DT binding for QCom SPMI (Johan Hovold)

 - Fixed compilation warning for the tmon utility (Florian Eckert)

 - Added interrupt based configuration on Exynos along with a set of
   related cleanups (Mateusz Majewski)"

* tag 'thermal-v6.8-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (24 commits)
  thermal/drivers/exynos: Use set_trips ops
  thermal/drivers/exynos: Use BIT wherever possible
  thermal/drivers/exynos: Split initialization of TMU and the thermal zone
  thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210
  thermal/drivers/exynos: Simplify regulator (de)initialization
  thermal/drivers/exynos: Handle devm_regulator_get_optional return value correctly
  thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to threaded interrupts
  thermal/drivers/exynos: Drop id field
  thermal/drivers/exynos: Remove an unnecessary field description
  tools/thermal/tmon: Fix compilation warning for wrong format
  dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examples
  dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node names
  thermal/drivers/sun8i: Add D1/T113s THS controller support
  dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controller
  thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions
  thermal: amlogic: Make amlogic_thermal_disable() return void
  thermal/thermal_of: Allow rebooting after critical temp
  reboot: Introduce thermal_zone_device_critical_reboot()
  thermal/core: Prepare for introduction of thermal reboot
  dt-bindings: thermal-zones: Document critical-action
  ...
2024-01-02 13:45:36 +01:00
Fabio Estevam 79fa723ba8 reboot: Introduce thermal_zone_device_critical_reboot()
Introduce thermal_zone_device_critical_reboot() to trigger an
emergency reboot.

It is a counterpart of thermal_zone_device_critical() with the
difference that it will force a reboot instead of shutdown.

The motivation for doing this is to allow the thermal subystem
to trigger a reboot when the temperature reaches the critical
temperature.

Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231129124330.519423-3-festevam@gmail.com
2024-01-02 09:33:18 +01:00
Fabio Estevam 5a0e241003 thermal/core: Prepare for introduction of thermal reboot
Add some helper functions to make it easier introducing the support
for thermal reboot.

No functional change.

Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231129124330.519423-2-festevam@gmail.com
2024-01-02 09:33:18 +01:00
Lukasz Luba a8c959402d thermal: core: Add governor callback for thermal zone change
Add a new callback to the struct thermal_governor. It can be used for
updating governors when there is a change in the thermal zone internals,
e.g. thermal cooling device is bind to the thermal zone.

That makes possible to move some heavy operations like memory allocations
related to the number of cooling instances out of the throttle() callback.

Both callback code paths (throttle() and update_tz()) are protected with
the same thermal zone lock, which guaranties the consistency.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29 18:01:00 +01:00
Rafael J. Wysocki 5a5efdaffd thermal: core: Resume thermal zones asynchronously
The resume of thermal zones in thermal_pm_notify() is carried out
sequentially, which may be a problem if __thermal_zone_device_update()
takes a significant time to run for some thermal zones, because some
other thermal zones may need to wait for them to resume then and if
any other PM notifiers are going to be invoked after the thermal one,
they will need to wait for it either.

To address this, make thermal_pm_notify() switch the poll_queue delayed
work over to a one-shot thermal_zone_device_resume() work function that
will restore the original one during the thermal zone resume and queue
up poll_queue without a delay for each thermal zone.

Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/
Reported-by: Radu Solea <radusolea@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28 14:20:16 +01:00
Rafael J. Wysocki 33fcb595dc thermal: core: Initialize poll_queue in thermal_zone_device_init()
In preparation for a subsequent change, move the initialization of the
poll_queue delayed work from thermal_zone_device_register_with_trips()
to thermal_zone_device_init() which is called by the former.

However, because thermal_zone_device_init() is also called by
thermal_pm_notify(), make the latter call cancel_delayed_work() on
poll_queue before invoking the former, so as to allow the work
item to be re-initialized safely.

Also move thermal_zone_device_check() which needs to be defined
before thermal_zone_device_init(), so the latter can pass it to the
INIT_DELAYED_WORK() macro.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28 14:20:16 +01:00
Rafael J. Wysocki 4e814173a8 thermal: core: Fix thermal zone suspend-resume synchronization
There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:

 1. The resume code runs in a PM notifier which is invoked after user
    space has been thawed, so it can run concurrently with user space
    which can trigger a thermal zone device removal.  If that happens,
    the thermal zone resume code may use a stale pointer to the next
    list element and crash, because it does not hold thermal_list_lock
    while walking thermal_tz_list.

 2. The thermal zone resume code calls thermal_zone_device_init()
    outside the zone lock, so user space or an update triggered by
    the platform firmware may see an inconsistent state of a
    thermal zone leading to unexpected behavior.

 3. Clearing the in_suspend global variable in thermal_pm_notify()
    allows __thermal_zone_device_update() to continue for all thermal
    zones and it may as well run before the thermal_tz_list walk (or
    at any point during the list walk for that matter) and attempt to
    operate on a thermal zone that has not been resumed yet.  It may
    also race destructively with thermal_zone_device_init().

To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.

Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28 14:20:15 +01:00
Rafael J. Wysocki 04e6ccfc93 thermal: core: Fix NULL pointer dereference in zone registration error path
If device_register() in thermal_zone_device_register_with_trips()
returns an error, the tz variable is set to NULL and subsequently
dereferenced in kfree(tz->tzp).

Commit adc8749b15 ("thermal/drivers/core: Use put_device() if
device_register() fails") added the tz = NULL assignment in question to
avoid a possible double-free after dropping the reference to the zone
device.  However, after commit 4649620d94 ("thermal: core: Make
thermal_zone_device_unregister() return after freeing the zone"), that
assignment has become redundant, because dropping the reference to the
zone device does not cause the zone object to be freed any more.

Drop it to address the NULL pointer dereference.

Fixes: 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2023-12-15 18:24:24 +01:00
Daniel Lezcano 404f62cd64 thermal/core: Check get_temp ops is present when registering a tz
Initially the check against the get_temp ops in the
thermal_zone_device_update() was put in there in order to catch
drivers not providing this method.

Instead of checking again and again the function if the ops exists in
the update function, let's do the check at registration time, so it is
checked one time and for all.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-13 14:35:32 +01:00
Rafael J. Wysocki b38aa87f67 thermal: core: Rework thermal zone availability check
In order to avoid running __thermal_zone_device_update() for thermal
zones going away, the thermal zone lock is held around device_del()
in thermal_zone_device_unregister() and thermal_zone_device_update()
passes the given thermal zone device to device_is_registered().
This allows thermal_zone_device_update() to skip the
__thermal_zone_device_update() if device_del() has already run for
the thermal zone at hand.

However, instead of looking at driver core internals, the thermal
subsystem may as well rely on its own data structures for this
purpose.  Namely, if the thermal zone is not present in
thermal_tz_list, it can be regarded as unavailable, which in fact is
already the case in thermal_zone_device_unregister().  Accordingly,
the device_is_registered() check in thermal_zone_device_update() can
be replaced with checking whether or not the node list_head in struct
thermal_zone_device is empty, in which case it is not there in
thermal_tz_list.

To make this work, though, it is necessary to initialize tz->node
in thermal_zone_device_register_with_trips() before registering the
thermal zone device and it needs to be added to thermal_tz_list and
deleted from it under its zone lock.

After the above modifications, the zone lock does not need to be
held around device_del() in thermal_zone_device_unregister() any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-12 13:01:03 +01:00
Rafael J. Wysocki c3ffdfff97 thermal: Drop redundant and confusing device_is_registered() checks
Multiple places in the thermal subsystem (most importantly, sysfs
attribute callback functions) check if the given thermal zone device is
still registered in order to return early in case the device_del() in
thermal_zone_device_unregister() has run already.

However, after thermal_zone_device_unregister() has been made wait for
all of the zone-related activity to complete before returning, it is
not necessary to do that any more, because all of the code holding a
reference to the thermal zone device object will be waited for even if
it does not do anything special to enforce this.

Accordingly, drop all of the device_is_registered() checks that are now
redundant and get rid of the zone locking that is not necessary any more
after dropping them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-12 13:00:28 +01:00
Rafael J. Wysocki 4649620d94 thermal: core: Make thermal_zone_device_unregister() return after freeing the zone
Make thermal_zone_device_unregister() wait until all of the references
to the given thermal zone object have been dropped and free it before
returning.

This guarantees that when thermal_zone_device_unregister() returns,
there is no leftover activity regarding the thermal zone in question
which is required by some of its callers (for instance, modular driver
code that wants to know when it is safe to let the module go away).

Subsequently, this will allow some confusing device_is_registered()
checks to be dropped from the thermal sysfs and core code.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-11 20:49:53 +01:00
Rafael J. Wysocki 44844db913 thermal: core: Add trip thresholds for trip crossing detection
The trip crossing detection in handle_thermal_trip() does not work
correctly in the cases when a trip point is crossed on the way up and
then the zone temperature stays above its low temperature (that is, its
temperature decreased by its hysteresis).  The trip temperature may
be passed by the zone temperature subsequently in that case, even
multiple times, but that does not count as the trip crossing as long as
the zone temperature does not fall below the trip's low temperature or,
in other words, until the trip is crossed on the way down.

|-----------low--------high------------|
             |<--------->|
             |    hyst   |
             |           |
             |          -|--> crossed on the way up
             |
         <---|-- crossed on the way down

However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
every time the trip temperature is passed by the zone temperature on
the way up regardless of whether or not the trip has been crossed on
the way down yet.  Moreover, it will not call thermal_notify_tz_trip_down()
if the last zone temperature was between the trip's temperature and its
low temperature, so some "trip crossed on the way down" events may not
be reported.

To address this issue, introduce trip thresholds equal to either the
temperature of the given trip, or its low temperature, such that if
the trip's threshold is passed by the zone temperature on the way up,
its value will be set to the trip's low temperature and
thermal_notify_tz_trip_up() will be called, and if the trip's threshold
is passed by the zone temperature on the way down, its value will be set
to the trip's temperature (high) and thermal_notify_tz_trip_down() will
be called.  Accordingly, if the threshold is passed on the way up, it
cannot be passed on the way up again until its passed on the way down
and if it is passed on the way down, it cannot be passed on the way down
again until it is passed on the way up which guarantees correct
triggering of trip crossing notifications.

If the last temperature of the zone is invalid, the trip's threshold
will be set depending of the zone's current temperature: If that
temperature is above the trip's temperature, its threshold will be
set to its low temperature or otherwise its threshold will be set to
its (high) temperature.  Because the zone temperature is initially
set to invalid and tz->last_temperature is only updated by
update_temperature(), this is sufficient to set the correct initial
threshold values for all trips.

Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-20 16:59:55 +01:00
Rafael J. Wysocki 8c35b1f472 thermal: core: Pass trip pointer to governor throttle callback
Modify the governor .throttle() callback definition so that it takes a
trip pointer instead of a trip index as its second argument, adjust the
governors accordingly and update the core code invoking .throttle().

This causes the governors to become independent of the representation
of the list of trips in the thermal zone structure.

This change is not expected to alter the general functionality.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-10-20 19:26:37 +02:00
Rafael J. Wysocki a26b452e83 Merge branch 'acpi-thermal'
The ACPI thermal driver changes include some thermal core modifications
that are depended on by subsequent thermal core changes, so merge them.

* acpi-thermal: (26 commits)
  thermal: trip: Drop lockdep assertion from thermal_zone_trip_id()
  thermal: trip: Remove lockdep assertion from for_each_thermal_trip()
  thermal: core: Drop thermal_zone_device_exec()
  ACPI: thermal: Use thermal_zone_for_each_trip() for updating trips
  ACPI: thermal: Combine passive and active trip update functions
  ACPI: thermal: Move get_active_temp()
  ACPI: thermal: Fix up function header formatting in two places
  ACPI: thermal: Drop list of device ACPI handles from struct acpi_thermal
  ACPI: thermal: Rename structure fields holding temperature in deci-Kelvin
  ACPI: thermal: Drop critical_valid and hot_valid trip flags
  ACPI: thermal: Do not use trip indices for cooling device binding
  ACPI: thermal: Mark uninitialized active trips as invalid
  ACPI: thermal: Merge trip initialization functions
  ACPI: thermal: Collapse trip devices update function wrappers
  ACPI: thermal: Collapse trip devices update functions
  ACPI: thermal: Add device list to struct acpi_thermal_trip
  ACPI: thermal: Fix a small leak in acpi_thermal_add()
  ACPI: thermal: Drop valid flag from struct acpi_thermal_trip
  ACPI: thermal: Drop redundant trip point flags
  ACPI: thermal: Untangle initialization and updates of active trips
  ...
2023-10-11 17:56:51 +02:00
Dan Carpenter c99626092e thermal: core: prevent potential string overflow
The dev->id value comes from ida_alloc() so it's a number between zero
and INT_MAX.  If it's too high then these sprintf()s will overflow.

Fixes: 203d3d4aa4 ("the generic thermal sysfs driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-11 16:15:09 +02:00
Rafael J. Wysocki 4963e34ce7 thermal: core: Drop thermal_zone_device_exec()
Because thermal_zone_device_exec() has no users any more and there are
no plans to use it anywhere, revert commit 9a99a996d1 ("thermal: core:
Introduce thermal_zone_device_exec()") that introduced it.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-10-05 13:32:55 +02:00
Rafael J. Wysocki d069ed6b75 thermal: core: Allow trip pointers to be used for cooling device binding
Add new helper functions, thermal_bind_cdev_to_trip() and
thermal_unbind_cdev_from_trip(), to allow a trip pointer to be used for
binding a cooling device to a trip point and unbinding it, respectively,
and redefine the existing helpers, thermal_zone_bind_cooling_device()
and thermal_zone_unbind_cooling_device(), as wrappers around the new
ones, respectively.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-28 12:57:10 +02:00
Rafael J. Wysocki 2c7b4bfade thermal: core: Store trip pointer in struct thermal_instance
Replace the integer trip number stored in struct thermal_instance with
a pointer to the relevant trip and adjust the code using the structure
in question accordingly.

The main reason for making this change is to allow the trip point to
cooling device binding code more straightforward, as illustrated by
subsequent modifications of the ACPI thermal driver, but it also helps
to clarify the overall design and allows the governor code overhead to
be reduced (through subsequent modifications).

The only case in which it adds complexity is trip_point_show() that
needs to walk the trips[] table to find the index of the given trip
point, but this is not a critical path and the interface that
trip_point_show() belongs to is problematic anyway (for instance, it
doesn't cover the case when the same cooling devices is associated
with multiple trip points).

This is a preliminary change and the affected code will be refined by
a series of subsequent modifications of thermal governors, the core and
the ACPI thermal driver.

The general functionality is not expected to be affected by this change.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-28 12:55:29 +02:00
Rafael J. Wysocki 9502108876 thermal: core: Drop trips_disabled bitmask
After recent changes, thermal_zone_get_trip() cannot fail, as invoked
from thermal_zone_device_register_with_trips(), so the only role of
the trips_disabled bitmask is struct thermal_zone_device is to make
handle_thermal_trip() skip trip points whose temperature was initially
zero.  However, since the unit of temperature in the thermal core is
millicelsius, zero may very well be a valid temperature value at least
in some usage scenarios and the trip temperature may as well change
later.  Thus there is no reason to permanently disable trip points
with initial temperature equal to zero.

Accordingly, drop the trips_disabled bitmask along with the code
related to it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-09-25 11:46:19 +02:00
Rafael J. Wysocki fb2c10245f thermal: core: Fix disabled trip point check in handle_thermal_trip()
Commit bc840ea5f9 ("thermal: core: Do not handle trip points with
invalid temperature") added a check for invalid temperature to the
disabled trip point check in handle_thermal_trip(), but that check was
added at a point when the trip structure has not been initialized yet.

This may cause handle_thermal_trip() to skip a valid trip point in some
cases, so fix it by moving the check to a suitable place, after
__thermal_zone_get_trip() has been called to populate the trip
structure.

Fixes: bc840ea5f9 ("thermal: core: Do not handle trip points with invalid temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-14 21:51:49 +02:00
Rafael J. Wysocki edd220b33f thermal: core: Drop thermal_zone_device_register()
There are no more users of thermal_zone_device_register(), so drop it
from the core.

Note that thermal_zone_device_register_with_trips() may be renamed to
thermal_zone_device_register() in the future, but only after a grace
period allowing all of the possible work in progress that may be using
the latter to adjust.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-05 21:42:18 +02:00
Rafael J. Wysocki d332db8fc1 thermal: core: Add function for registering tripless thermal zones
Multiple callers of thermal_zone_device_register() don't pass any trips
to it and they might use a shortened argument list for that, so add
a special function with fewer arguments for this purpose.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-09-05 21:42:18 +02:00
Rafael J. Wysocki 35d8dbbb25 thermal: core: Drop unused .get_trip_*() callbacks
After recent changes in the ACPI thermal driver and in the Intel DTS
IOSF thermal driver, all thermal zone drivers are expected to use trip
tables for initialization and none of them should implement
.get_trip_type(), .get_trip_temp() or .get_trip_hyst() callbacks, so
drop these callbacks entirely from the core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-29 20:46:31 +02:00
Rafael J. Wysocki 0c2ec0f165 Merge branch 'acpi-thermal'
Merge ACPI thermal driver changes for 6.6-rc1:

 - Drop non-functional nocrt parameter from ACPI thermal (Mario
   Limonciello).

 - Clean up the ACPI thermal driver, rework the handling of firmware
   notifications in it and make it provide a table of generic trip point
   structures to the core during initialization (Rafael Wysocki).

* acpi-thermal:
  ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()
  ACPI: thermal: Drop unnecessary thermal zone callbacks
  ACPI: thermal: Rework thermal_get_trend()
  ACPI: thermal: Use trip point table to register thermal zones
  thermal: core: Rework and rename __for_each_thermal_trip()
  ACPI: thermal: Introduce struct acpi_thermal_trip
  ACPI: thermal: Carry out trip point updates under zone lock
  ACPI: thermal: Clean up acpi_thermal_register_thermal_zone()
  thermal: core: Add priv pointer to struct thermal_trip
  thermal: core: Introduce thermal_zone_device_exec()
  thermal: core: Do not handle trip points with invalid temperature
  ACPI: thermal: Drop redundant local variable from acpi_thermal_resume()
  ACPI: thermal: Do not attach private data to ACPI handles
  ACPI: thermal: Drop enabled flag from struct acpi_thermal_active
  ACPI: thermal: Drop nocrt parameter
2023-08-25 20:44:26 +02:00
Rafael J. Wysocki 9a99a996d1 thermal: core: Introduce thermal_zone_device_exec()
Introduce a new helper function, thermal_zone_device_exec(), that can
be used by drivers to run a given callback routine under the zone lock.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-08-17 11:23:32 +02:00
Rafael J. Wysocki bc840ea5f9 thermal: core: Do not handle trip points with invalid temperature
Trip points with temperature set to THERMAL_TEMP_INVALID are as good as
disabled, so make handle_thermal_trip() ignore them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-08-10 20:57:35 +02:00
Ahmad Fatoum 80ddce5f2d thermal: core: constify params in thermal_zone_device_register
Since commit 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone
parameters structure"), thermal_zone_device_register() allocates a copy
of the tzp argument and callers need not explicitly manage its lifetime.

This means the function no longer cares about the parameter being
mutable, so constify it.

No functional change.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-24 09:51:31 +02:00
Daniel Lezcano 7cefbaf081 thermal: core: Encapsulate tz->device field
There are still some drivers needing to play with the thermal zone
device internals. That is not the best but until we can figure out if
the information is really needed, let's encapsulate the field used in
the thermal zone device structure, so we can move forward relocating
the thermal zone device structure definition in the thermal framework
private headers.

Some drivers are accessing tz->device, that implies they need to have
the knowledge of the thermal_zone_device structure but we want to
self-encapsulate this structure and reduce the scope of the structure
to the thermal core only.

By adding this wrapper, these drivers won't need the thermal zone
device structure definition and are no longer an obstacle to its
relocation to the private thermal core headers.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:20:12 +02:00
Daniel Lezcano 3d439b1a2a thermal/core: Alloc-copy-free the thermal zone parameters structure
The caller of the function thermal_zone_device_register_with_trips()
can pass a thermal_zone_params structure parameter.

This one is used by the thermal core code until the thermal zone is
destroyed. That forces the caller, so the driver, to keep the pointer
valid until it unregisters the thermal zone if we want to make the
thermal zone device structure private the core code.

As the thermal zone device structure would be private, the driver can
not access to thermal zone device structure to retrieve the tzp field
after it passed it to register the thermal zone.

So instead of forcing the users of the function to deal with the tzp
structure life cycle, make the usage easier by allocating our own
thermal zone params, copying the parameter content and by freeing at
unregister time. The user can then create the parameters on the stack,
pass it to the registering function and forget about it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230404075138.2914680-3-daniel.lezcano@linaro.org
2023-04-07 18:36:28 +02:00
Zhang Rui ded2d383b1 thermal/core: Remove thermal_bind_params structure
Remove struct thermal_bind_params because no one is using it for thermal
binding now.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230330104526.3196-1-rui.zhang@intel.com
2023-04-07 11:18:22 +02:00
Rafael J. Wysocki 75f74a9071 - Add more thermal zone device encapsulation: prevent setting
structure field directly, access the sensor device instead the
   thermal zone's device for trace, relocate the traces in
   drivers/thermal (Daniel Lezcano)
 
 - Use the generic trip point for the i.MX and remove the get_trip_temp
   ops (Daniel Lezcano)
 
 - Use the devm_platform_ioremap_resource() in the Hisilicon driver
   (Yang Li)
 
 - Remove R-Car H3 ES1.* handling as public has only access to the ES2
   version and the upstream support for the ES1 has been shutdown (Wolfram Sang)
 
 - Add a delay after initializing the bank in order to let the time to
   the hardware to initialze itself before reading the temperature
   (Amjad Ouled-Ameur)
 
 - Add MT8365 support (Amjad Ouled-Ameur)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmQof0cACgkQqDIjiipP
 6E/tXQgArKKlM52mo3pg880JsiWOWGrS7pJN0x9MR0nqUm83sLTDf21fPoYmn+EJ
 wrzClIX1iHCDVCWCVxao7OIT1mxez9L2NAHseXDSDQJcZ0fflTE8wZ8xeLr6q5GN
 /ifHfCqiC98yejPcKIf2TqdGgqpCzyQ++sZoc3H6/jwysSkFlBc+YgKx+XasQR6k
 5swQ3E81zx0ouB+t1GDieXB6YRsjZzR2KQbbExoHexPue1DTIuuumz8M1Fgz4a4b
 gXRHbrGp3vmLORIAOZiVDyjzC7jwy7oN552g16yZLGDUdLaJ03gRRx7fvNzDUEMW
 mBzxak4WnNWEatCh691X6W5MdPO/uQ==
 =naJV
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v6.4-rc1-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal control material for 6.4-rc1 from Daniel Lezcano:

"- Add more thermal zone device encapsulation: prevent setting
   structure field directly, access the sensor device instead the
   thermal zone's device for trace, relocate the traces in
   drivers/thermal (Daniel Lezcano)

 - Use the generic trip point for the i.MX and remove the get_trip_temp
   ops (Daniel Lezcano)

 - Use the devm_platform_ioremap_resource() in the Hisilicon driver
   (Yang Li)

 - Remove R-Car H3 ES1.* handling as public has only access to the ES2
   version and the upstream support for the ES1 has been shutdown (Wolfram
   Sang)

 - Add a delay after initializing the bank in order to let the time to
   the hardware to initialze itself before reading the temperature
   (Amjad Ouled-Ameur)

 - Add MT8365 support (Amjad Ouled-Ameur)"

* tag 'thermal-v6.4-rc1-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/ti: Use fixed update interval
  thermal/drivers/stm: Don't set no_hwmon to false
  thermal/drivers/db8500: Use driver dev instead of tz->device
  thermal/core: Relocate the traces definition in thermal directory
  thermal/drivers/hisi: Use devm_platform_ioremap_resource()
  thermal/drivers/imx: Use the thermal framework for the trip point
  thermal/drivers/imx: Remove get_trip_temp ops
  thermal/drivers/rcar_gen3_thermal: Remove R-Car H3 ES1.* handling
  thermal/drivers/mediatek: Add delay after thermal banks initialization
  thermal/drivers/mediatek: Add support for MT8365 SoC
  thermal/drivers/mediatek: Control buffer enablement tweaks
  dt-bindings: thermal: mediatek: Add binding documentation for MT8365 SoC
2023-04-03 20:43:32 +02:00
Rafael J. Wysocki cd246fa969 thermal: core: Clean up thermal_list_lock locking
Once thermal_list_lock has been acquired in
__thermal_cooling_device_register(), it is not necessary to drop it
and take it again until all of the thermal zones have been updated,
so change the code accordingly.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-03 20:40:21 +02:00
Daniel Lezcano 32a7a02117 thermal/core: Relocate the traces definition in thermal directory
The traces are exported but only local to the thermal core code. On
the other side, the traces take the thermal zone device structure as
argument, thus they have to rely on the exported thermal.h header
file. As we want to move the structure to the private thermal core
header, first we have to relocate those traces to the same place as
many drivers do.

Cc: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307133735.90772-2-daniel.lezcano@linaro.org
2023-04-01 20:51:45 +02:00
Rafael J. Wysocki ce07727aff Merge back thermal control material for 6.4-rc1. 2023-03-27 13:46:13 +02:00
Rafael J. Wysocki 6babf38d89 Merge branch 'thermal-acpi'
Merge a fix for a recent thermal-related regression in the ACPI
processor driver.

* thermal-acpi:
  ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  thermal: core: Introduce thermal_cooling_device_update()
  thermal: core: Introduce thermal_cooling_device_present()
  ACPI: processor: Reorder acpi_processor_driver_init()
2023-03-24 17:11:27 +01:00
Ido Schimmel f1b80a3878 thermal: core: Restore behavior regarding invalid trip points
Commit 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip()
function") stopped marking trip points with a zero temperature as
disabled, behavior that was originally introduced in commit 81ad4276b5
("Thermal: Ignore invalid trip points").

When using the mlxsw driver we see that when such trip points are not
disabled, the thermal subsystem repeatedly tries to set the state of the
associated cooling devices to the maximum state.

Address this by restoring the original behavior and mark trip points
with a zero temperature as disabled.

Fixes: 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip() function")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-22 19:59:08 +01:00
Rafael J. Wysocki 790930f442 thermal: core: Introduce thermal_cooling_device_update()
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.

That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state.  Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).

The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.

Note that the actual user of the new function will be added separately.

Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-03-22 15:20:38 +01:00
Rafael J. Wysocki c43198af05 thermal: core: Introduce thermal_cooling_device_present()
Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-03-22 15:20:38 +01:00
Daniel Lezcano 3034f859b9 thermal: Add a thermal zone id accessor
In order to get the thermal zone id but without directly accessing the
thermal zone device structure, add an accessor.

Use the accessor in the hwmon_scmi and acpi_thermal.

No functional change intented.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03 20:45:02 +01:00
Daniel Lezcano 072e35c988 thermal/core: Add thermal_zone_device structure 'type' accessor
The thermal zone device structure is exposed via the exported
thermal.h header. This structure should stay private the thermal core
code. In order to encapsulate the structure, let's add an accessor to
get the 'type' of the thermal zone.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03 20:45:02 +01:00
Daniel Lezcano a6ff3c0021 thermal/core: Add a thermal zone 'devdata' accessor
The thermal zone device structure is exposed to the different drivers
and obviously they access the internals while that should be
restricted to the core thermal code.

In order to self-encapsulate the thermal core code, we need to prevent
the drivers accessing directly the thermal zone structure and provide
accessor functions to deal with.

Provide an accessor to the 'devdata' structure and make use of it in
the different drivers.

No functional changes intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03 20:45:02 +01:00
ye xingchen 5bbafd4362 thermal: core: Use sysfs_emit_at() instead of scnprintf()
Follow the advice in Documentation/filesystems/sysfs.rst that show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09 20:39:48 +01:00
Rafael J. Wysocki 9e0a9be24b thermal: Fail object registration if thermal class is not registered
If thermal_class is not registered with the driver core, there is no way
to expose the interfaces used by the thermal control framework, so
prevent thermal zones and cooling devices from being registered in
that case by returning an error from object registration functions.

For this purpose, use a thermal_class pointer that will be NULL if the
class is not registered.  To avoid wasting memory in that case, allocate
the thermal class object dynamically and if it fails to register, free
it and clear the thermal_class pointer to NULL.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-25 16:51:19 +01:00
Daniel Lezcano 5b8de18ee9 thermal/core: Move the thermal trip code to a dedicated file
The thermal_core.c files contains a lot of functions handling
different thermal components like the governors, the trip points, the
cooling device, the OF cooling device, etc ...

This organization does not help to migrate to a more sane code where
there is a better self-encapsulation as all the components' internals
can be directly accessed from a single file.

For the sake of clarity, let's move the thermal trip points code in a
dedicated thermal_trip.c file and add a function to browse all the
trip points like we do with the thermal zones, the govenors and the
cooling devices.

The same can be done for the cooling devices and the governor code but
that will come later as the current work in the thermal framework is
to fix the trip point handling and use a generic trip point structure.

No functional changes intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-25 16:40:39 +01:00