Some *thermal_cooling_device_register() calls pass a string literal as
the 'type' parameter, so kstrdup_const() can be used instead of
kstrdup() to avoid a memory allocation in such cases.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The mitigation episodes are recorded. A mitigation episode happens
when the first trip point is crossed the way up and then the way
down. During this episode other trip points can be crossed also and
are accounted for this mitigation episode. The interesting information
is the average temperature at the trip point, the undershot and the
overshot. The standard deviation of the mitigated temperature will be
added later.
The thermal debugfs directory structure tries to stay consistent with
the sysfs one but in a very simplified way:
thermal/
`-- thermal_zones
|-- 0
| `-- mitigations
`-- 1
`-- mitigations
The content of the mitigations file has the following format:
,-Mitigation at 349988258us, duration=130136ms
| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |
| 0 | passive | 65000 | 2000 | 130136 | 68227 | 62500 | 75625 |
| 1 | passive | 75000 | 2000 | 104209 | 74857 | 71666 | 77500 |
,-Mitigation at 272451637us, duration=75000ms
| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |
| 0 | passive | 65000 | 2000 | 75000 | 68561 | 62500 | 75000 |
| 1 | passive | 75000 | 2000 | 60714 | 74820 | 70555 | 77500 |
,-Mitigation at 238184119us, duration=27316ms
| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |
| 0 | passive | 65000 | 2000 | 27316 | 73377 | 62500 | 75000 |
| 1 | passive | 75000 | 2000 | 19468 | 75284 | 69444 | 77500 |
,-Mitigation at 39863713us, duration=136196ms
| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |
| 0 | passive | 65000 | 2000 | 136196 | 73922 | 62500 | 75000 |
| 1 | passive | 75000 | 2000 | 91721 | 74386 | 69444 | 78125 |
More information for a better understanding of the thermal behavior
will be added after. The idea is to give detailed statistics
information about the undershots and overshots, the temperature speed,
etc... As all the information in a single file is too much, the idea
would be to create a directory named with the mitigation timestamp
where all data could be added.
Please note this code is immune against trip ordering but not against
a trip temperature change while a mitigation is happening. However,
this situation should be extremely rare, perhaps not happening and we
might question ourselves if something should be done in the core
framework for other components first.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: White space fixups, rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The thermal framework does not have any debug information except a
sysfs stat which is a bit controversial. This one allocates big chunks
of memory for every cooling devices with a high number of states and
could represent on some systems in production several megabytes of
memory for just a portion of it. As the sysfs is limited to a page
size, the output is not exploitable with large data array and gets
truncated.
The patch provides the same information than sysfs except the
transitions are dynamically allocated, thus they won't show more
events than the ones which actually occurred. There is no longer a
size limitation and it opens the field for more debugging information
where the debugfs is designed for, not sysfs.
The thermal debugfs directory structure tries to stay consistent with
the sysfs one but in a very simplified way:
thermal/
-- cooling_devices
|-- 0
| |-- clear
| |-- time_in_state_ms
| |-- total_trans
| `-- trans_table
|-- 1
| |-- clear
| |-- time_in_state_ms
| |-- total_trans
| `-- trans_table
|-- 2
| |-- clear
| |-- time_in_state_ms
| |-- total_trans
| `-- trans_table
|-- 3
| |-- clear
| |-- time_in_state_ms
| |-- total_trans
| `-- trans_table
`-- 4
|-- clear
|-- time_in_state_ms
|-- total_trans
`-- trans_table
The content of the files in the cooling devices directory is the same
as the sysfs one except for the trans_table which has the following
format:
Transition Hits
1->0 246
0->1 246
2->1 632
1->2 632
3->2 98
2->3 98
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: White space fixups, rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are several rountines in the thermal netlink API that take a
thermal zone ID or a thermal zone type as their arguments, but from
their callers perspective it would be more convenient to pass a thermal
zone pointer to them and let them extract the necessary data from the
given thermal zone object by themselves.
Modify the code accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Instead of requiring the callers of thermal_notify_tz_trip_up/down() to
provide specific values needed to populate struct param in them, make
them extract those values from objects passed by the callers via const
pointers.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
- Fixed DT bindings issue on Loongson (Binbin Zhou)
- Fixed returning NULL instead of -ENODEV on Loogsoo (Binbin Zhou)
- Added the DT binding for the tsens on SM8650 platform (Neil Armstrong)
- Added a reboot on critical option feature (Fabio Estevam)
- Made usage of DEFINE_SIMPLE_DEV_PM_OPS on AmLogic (Uwe Kleine-König)
- Added the D1/T113s THS controller support on Sun8i (Maxim Kiselev)
- Fixed example in the DT binding for QCom SPMI (Johan Hovold)
- Fixed compilation warning for the tmon utility (Florian Eckert)
- Added interrupt based configuration on Exynos along with a set of
related cleanups (Mateusz Majewski)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmWT0yUACgkQqDIjiipP
6E+YjggAsrsQNrsUgiW0/M0i75kfcEBgLfXscyPFsUYwpByb+haWzAJSLrvCBko8
zPFvNE0or6KTQJCtseWWQDNQKWilAKARurEg7vuahfWo5LmfauPGMxsw+iHM9TgW
8Ptkc1biy3TNr1zVCpQrCZK9GdLGsG7JRgxHi4Hfr/Hb/FZN9Mm0Yk1pRpFU+pn8
Ff5UScMcU7NWhhxlQavLOoMmyAmh2k/jvfCSlmXGj7kxRrX8YIC02dTDjFm5cpyP
Toy2POWkcZr4Xr+kWOh8pxojZVwKU2pN7cYyLJn8+OO+rpUAf0ol4PW0mly7uMgH
1HCZ0x0hiNJQ8N6SwN+Eptq9TpgCBw==
=W2kc
-----END PGP SIGNATURE-----
Merge tag 'thermal-v6.8-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into thermal
Merge thermal control material for 6.8-rc1 from Daniel Lezcano:
"- Converted Mediatek Thermal to the json-schema (Rafał Miłecki)
- Fixed DT bindings issue on Loongson (Binbin Zhou)
- Fixed returning NULL instead of -ENODEV on Loogsoo (Binbin Zhou)
- Added the DT binding for the tsens on SM8650 platform (Neil Armstrong)
- Added a reboot on critical option feature (Fabio Estevam)
- Made usage of DEFINE_SIMPLE_DEV_PM_OPS on AmLogic (Uwe Kleine-König)
- Added the D1/T113s THS controller support on Sun8i (Maxim Kiselev)
- Fixed example in the DT binding for QCom SPMI (Johan Hovold)
- Fixed compilation warning for the tmon utility (Florian Eckert)
- Added interrupt based configuration on Exynos along with a set of
related cleanups (Mateusz Majewski)"
* tag 'thermal-v6.8-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (24 commits)
thermal/drivers/exynos: Use set_trips ops
thermal/drivers/exynos: Use BIT wherever possible
thermal/drivers/exynos: Split initialization of TMU and the thermal zone
thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210
thermal/drivers/exynos: Simplify regulator (de)initialization
thermal/drivers/exynos: Handle devm_regulator_get_optional return value correctly
thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to threaded interrupts
thermal/drivers/exynos: Drop id field
thermal/drivers/exynos: Remove an unnecessary field description
tools/thermal/tmon: Fix compilation warning for wrong format
dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examples
dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node names
thermal/drivers/sun8i: Add D1/T113s THS controller support
dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controller
thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions
thermal: amlogic: Make amlogic_thermal_disable() return void
thermal/thermal_of: Allow rebooting after critical temp
reboot: Introduce thermal_zone_device_critical_reboot()
thermal/core: Prepare for introduction of thermal reboot
dt-bindings: thermal-zones: Document critical-action
...
Introduce thermal_zone_device_critical_reboot() to trigger an
emergency reboot.
It is a counterpart of thermal_zone_device_critical() with the
difference that it will force a reboot instead of shutdown.
The motivation for doing this is to allow the thermal subystem
to trigger a reboot when the temperature reaches the critical
temperature.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231129124330.519423-3-festevam@gmail.com
Add some helper functions to make it easier introducing the support
for thermal reboot.
No functional change.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20231129124330.519423-2-festevam@gmail.com
Add a new callback to the struct thermal_governor. It can be used for
updating governors when there is a change in the thermal zone internals,
e.g. thermal cooling device is bind to the thermal zone.
That makes possible to move some heavy operations like memory allocations
related to the number of cooling instances out of the throttle() callback.
Both callback code paths (throttle() and update_tz()) are protected with
the same thermal zone lock, which guaranties the consistency.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The resume of thermal zones in thermal_pm_notify() is carried out
sequentially, which may be a problem if __thermal_zone_device_update()
takes a significant time to run for some thermal zones, because some
other thermal zones may need to wait for them to resume then and if
any other PM notifiers are going to be invoked after the thermal one,
they will need to wait for it either.
To address this, make thermal_pm_notify() switch the poll_queue delayed
work over to a one-shot thermal_zone_device_resume() work function that
will restore the original one during the thermal zone resume and queue
up poll_queue without a delay for each thermal zone.
Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/
Reported-by: Radu Solea <radusolea@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In preparation for a subsequent change, move the initialization of the
poll_queue delayed work from thermal_zone_device_register_with_trips()
to thermal_zone_device_init() which is called by the former.
However, because thermal_zone_device_init() is also called by
thermal_pm_notify(), make the latter call cancel_delayed_work() on
poll_queue before invoking the former, so as to allow the work
item to be re-initialized safely.
Also move thermal_zone_device_check() which needs to be defined
before thermal_zone_device_init(), so the latter can pass it to the
INIT_DELAYED_WORK() macro.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are 3 synchronization issues with thermal zone suspend-resume
during system-wide transitions:
1. The resume code runs in a PM notifier which is invoked after user
space has been thawed, so it can run concurrently with user space
which can trigger a thermal zone device removal. If that happens,
the thermal zone resume code may use a stale pointer to the next
list element and crash, because it does not hold thermal_list_lock
while walking thermal_tz_list.
2. The thermal zone resume code calls thermal_zone_device_init()
outside the zone lock, so user space or an update triggered by
the platform firmware may see an inconsistent state of a
thermal zone leading to unexpected behavior.
3. Clearing the in_suspend global variable in thermal_pm_notify()
allows __thermal_zone_device_update() to continue for all thermal
zones and it may as well run before the thermal_tz_list walk (or
at any point during the list walk for that matter) and attempt to
operate on a thermal zone that has not been resumed yet. It may
also race destructively with thermal_zone_device_init().
To address these issues, add thermal_list_lock locking to
thermal_pm_notify(), especially arount the thermal_tz_list,
make it call thermal_zone_device_init() back-to-back with
__thermal_zone_device_update() under the zone lock and replace
in_suspend with per-zone bool "suspend" indicators set and unset
under the given zone's lock.
Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/
Reported-by: Bo Ye <bo.ye@mediatek.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If device_register() in thermal_zone_device_register_with_trips()
returns an error, the tz variable is set to NULL and subsequently
dereferenced in kfree(tz->tzp).
Commit adc8749b15 ("thermal/drivers/core: Use put_device() if
device_register() fails") added the tz = NULL assignment in question to
avoid a possible double-free after dropping the reference to the zone
device. However, after commit 4649620d94 ("thermal: core: Make
thermal_zone_device_unregister() return after freeing the zone"), that
assignment has become redundant, because dropping the reference to the
zone device does not cause the zone object to be freed any more.
Drop it to address the NULL pointer dereference.
Fixes: 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Initially the check against the get_temp ops in the
thermal_zone_device_update() was put in there in order to catch
drivers not providing this method.
Instead of checking again and again the function if the ops exists in
the update function, let's do the check at registration time, so it is
checked one time and for all.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to avoid running __thermal_zone_device_update() for thermal
zones going away, the thermal zone lock is held around device_del()
in thermal_zone_device_unregister() and thermal_zone_device_update()
passes the given thermal zone device to device_is_registered().
This allows thermal_zone_device_update() to skip the
__thermal_zone_device_update() if device_del() has already run for
the thermal zone at hand.
However, instead of looking at driver core internals, the thermal
subsystem may as well rely on its own data structures for this
purpose. Namely, if the thermal zone is not present in
thermal_tz_list, it can be regarded as unavailable, which in fact is
already the case in thermal_zone_device_unregister(). Accordingly,
the device_is_registered() check in thermal_zone_device_update() can
be replaced with checking whether or not the node list_head in struct
thermal_zone_device is empty, in which case it is not there in
thermal_tz_list.
To make this work, though, it is necessary to initialize tz->node
in thermal_zone_device_register_with_trips() before registering the
thermal zone device and it needs to be added to thermal_tz_list and
deleted from it under its zone lock.
After the above modifications, the zone lock does not need to be
held around device_del() in thermal_zone_device_unregister() any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Multiple places in the thermal subsystem (most importantly, sysfs
attribute callback functions) check if the given thermal zone device is
still registered in order to return early in case the device_del() in
thermal_zone_device_unregister() has run already.
However, after thermal_zone_device_unregister() has been made wait for
all of the zone-related activity to complete before returning, it is
not necessary to do that any more, because all of the code holding a
reference to the thermal zone device object will be waited for even if
it does not do anything special to enforce this.
Accordingly, drop all of the device_is_registered() checks that are now
redundant and get rid of the zone locking that is not necessary any more
after dropping them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Make thermal_zone_device_unregister() wait until all of the references
to the given thermal zone object have been dropped and free it before
returning.
This guarantees that when thermal_zone_device_unregister() returns,
there is no leftover activity regarding the thermal zone in question
which is required by some of its callers (for instance, modular driver
code that wants to know when it is safe to let the module go away).
Subsequently, this will allow some confusing device_is_registered()
checks to be dropped from the thermal sysfs and core code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The trip crossing detection in handle_thermal_trip() does not work
correctly in the cases when a trip point is crossed on the way up and
then the zone temperature stays above its low temperature (that is, its
temperature decreased by its hysteresis). The trip temperature may
be passed by the zone temperature subsequently in that case, even
multiple times, but that does not count as the trip crossing as long as
the zone temperature does not fall below the trip's low temperature or,
in other words, until the trip is crossed on the way down.
|-----------low--------high------------|
|<--------->|
| hyst |
| |
| -|--> crossed on the way up
|
<---|-- crossed on the way down
However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
every time the trip temperature is passed by the zone temperature on
the way up regardless of whether or not the trip has been crossed on
the way down yet. Moreover, it will not call thermal_notify_tz_trip_down()
if the last zone temperature was between the trip's temperature and its
low temperature, so some "trip crossed on the way down" events may not
be reported.
To address this issue, introduce trip thresholds equal to either the
temperature of the given trip, or its low temperature, such that if
the trip's threshold is passed by the zone temperature on the way up,
its value will be set to the trip's low temperature and
thermal_notify_tz_trip_up() will be called, and if the trip's threshold
is passed by the zone temperature on the way down, its value will be set
to the trip's temperature (high) and thermal_notify_tz_trip_down() will
be called. Accordingly, if the threshold is passed on the way up, it
cannot be passed on the way up again until its passed on the way down
and if it is passed on the way down, it cannot be passed on the way down
again until it is passed on the way up which guarantees correct
triggering of trip crossing notifications.
If the last temperature of the zone is invalid, the trip's threshold
will be set depending of the zone's current temperature: If that
temperature is above the trip's temperature, its threshold will be
set to its low temperature or otherwise its threshold will be set to
its (high) temperature. Because the zone temperature is initially
set to invalid and tz->last_temperature is only updated by
update_temperature(), this is sufficient to set the correct initial
threshold values for all trips.
Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify the governor .throttle() callback definition so that it takes a
trip pointer instead of a trip index as its second argument, adjust the
governors accordingly and update the core code invoking .throttle().
This causes the governors to become independent of the representation
of the list of trips in the thermal zone structure.
This change is not expected to alter the general functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The ACPI thermal driver changes include some thermal core modifications
that are depended on by subsequent thermal core changes, so merge them.
* acpi-thermal: (26 commits)
thermal: trip: Drop lockdep assertion from thermal_zone_trip_id()
thermal: trip: Remove lockdep assertion from for_each_thermal_trip()
thermal: core: Drop thermal_zone_device_exec()
ACPI: thermal: Use thermal_zone_for_each_trip() for updating trips
ACPI: thermal: Combine passive and active trip update functions
ACPI: thermal: Move get_active_temp()
ACPI: thermal: Fix up function header formatting in two places
ACPI: thermal: Drop list of device ACPI handles from struct acpi_thermal
ACPI: thermal: Rename structure fields holding temperature in deci-Kelvin
ACPI: thermal: Drop critical_valid and hot_valid trip flags
ACPI: thermal: Do not use trip indices for cooling device binding
ACPI: thermal: Mark uninitialized active trips as invalid
ACPI: thermal: Merge trip initialization functions
ACPI: thermal: Collapse trip devices update function wrappers
ACPI: thermal: Collapse trip devices update functions
ACPI: thermal: Add device list to struct acpi_thermal_trip
ACPI: thermal: Fix a small leak in acpi_thermal_add()
ACPI: thermal: Drop valid flag from struct acpi_thermal_trip
ACPI: thermal: Drop redundant trip point flags
ACPI: thermal: Untangle initialization and updates of active trips
...
The dev->id value comes from ida_alloc() so it's a number between zero
and INT_MAX. If it's too high then these sprintf()s will overflow.
Fixes: 203d3d4aa4 ("the generic thermal sysfs driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Because thermal_zone_device_exec() has no users any more and there are
no plans to use it anywhere, revert commit 9a99a996d1 ("thermal: core:
Introduce thermal_zone_device_exec()") that introduced it.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Add new helper functions, thermal_bind_cdev_to_trip() and
thermal_unbind_cdev_from_trip(), to allow a trip pointer to be used for
binding a cooling device to a trip point and unbinding it, respectively,
and redefine the existing helpers, thermal_zone_bind_cooling_device()
and thermal_zone_unbind_cooling_device(), as wrappers around the new
ones, respectively.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Replace the integer trip number stored in struct thermal_instance with
a pointer to the relevant trip and adjust the code using the structure
in question accordingly.
The main reason for making this change is to allow the trip point to
cooling device binding code more straightforward, as illustrated by
subsequent modifications of the ACPI thermal driver, but it also helps
to clarify the overall design and allows the governor code overhead to
be reduced (through subsequent modifications).
The only case in which it adds complexity is trip_point_show() that
needs to walk the trips[] table to find the index of the given trip
point, but this is not a critical path and the interface that
trip_point_show() belongs to is problematic anyway (for instance, it
doesn't cover the case when the same cooling devices is associated
with multiple trip points).
This is a preliminary change and the affected code will be refined by
a series of subsequent modifications of thermal governors, the core and
the ACPI thermal driver.
The general functionality is not expected to be affected by this change.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
After recent changes, thermal_zone_get_trip() cannot fail, as invoked
from thermal_zone_device_register_with_trips(), so the only role of
the trips_disabled bitmask is struct thermal_zone_device is to make
handle_thermal_trip() skip trip points whose temperature was initially
zero. However, since the unit of temperature in the thermal core is
millicelsius, zero may very well be a valid temperature value at least
in some usage scenarios and the trip temperature may as well change
later. Thus there is no reason to permanently disable trip points
with initial temperature equal to zero.
Accordingly, drop the trips_disabled bitmask along with the code
related to it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Commit bc840ea5f9 ("thermal: core: Do not handle trip points with
invalid temperature") added a check for invalid temperature to the
disabled trip point check in handle_thermal_trip(), but that check was
added at a point when the trip structure has not been initialized yet.
This may cause handle_thermal_trip() to skip a valid trip point in some
cases, so fix it by moving the check to a suitable place, after
__thermal_zone_get_trip() has been called to populate the trip
structure.
Fixes: bc840ea5f9 ("thermal: core: Do not handle trip points with invalid temperature")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are no more users of thermal_zone_device_register(), so drop it
from the core.
Note that thermal_zone_device_register_with_trips() may be renamed to
thermal_zone_device_register() in the future, but only after a grace
period allowing all of the possible work in progress that may be using
the latter to adjust.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Multiple callers of thermal_zone_device_register() don't pass any trips
to it and they might use a shortened argument list for that, so add
a special function with fewer arguments for this purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After recent changes in the ACPI thermal driver and in the Intel DTS
IOSF thermal driver, all thermal zone drivers are expected to use trip
tables for initialization and none of them should implement
.get_trip_type(), .get_trip_temp() or .get_trip_hyst() callbacks, so
drop these callbacks entirely from the core.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge ACPI thermal driver changes for 6.6-rc1:
- Drop non-functional nocrt parameter from ACPI thermal (Mario
Limonciello).
- Clean up the ACPI thermal driver, rework the handling of firmware
notifications in it and make it provide a table of generic trip point
structures to the core during initialization (Rafael Wysocki).
* acpi-thermal:
ACPI: thermal: Eliminate code duplication from acpi_thermal_notify()
ACPI: thermal: Drop unnecessary thermal zone callbacks
ACPI: thermal: Rework thermal_get_trend()
ACPI: thermal: Use trip point table to register thermal zones
thermal: core: Rework and rename __for_each_thermal_trip()
ACPI: thermal: Introduce struct acpi_thermal_trip
ACPI: thermal: Carry out trip point updates under zone lock
ACPI: thermal: Clean up acpi_thermal_register_thermal_zone()
thermal: core: Add priv pointer to struct thermal_trip
thermal: core: Introduce thermal_zone_device_exec()
thermal: core: Do not handle trip points with invalid temperature
ACPI: thermal: Drop redundant local variable from acpi_thermal_resume()
ACPI: thermal: Do not attach private data to ACPI handles
ACPI: thermal: Drop enabled flag from struct acpi_thermal_active
ACPI: thermal: Drop nocrt parameter
Introduce a new helper function, thermal_zone_device_exec(), that can
be used by drivers to run a given callback routine under the zone lock.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Trip points with temperature set to THERMAL_TEMP_INVALID are as good as
disabled, so make handle_thermal_trip() ignore them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Since commit 3d439b1a2a ("thermal/core: Alloc-copy-free the thermal zone
parameters structure"), thermal_zone_device_register() allocates a copy
of the tzp argument and callers need not explicitly manage its lifetime.
This means the function no longer cares about the parameter being
mutable, so constify it.
No functional change.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are still some drivers needing to play with the thermal zone
device internals. That is not the best but until we can figure out if
the information is really needed, let's encapsulate the field used in
the thermal zone device structure, so we can move forward relocating
the thermal zone device structure definition in the thermal framework
private headers.
Some drivers are accessing tz->device, that implies they need to have
the knowledge of the thermal_zone_device structure but we want to
self-encapsulate this structure and reduce the scope of the structure
to the thermal core only.
By adding this wrapper, these drivers won't need the thermal zone
device structure definition and are no longer an obstacle to its
relocation to the private thermal core headers.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The caller of the function thermal_zone_device_register_with_trips()
can pass a thermal_zone_params structure parameter.
This one is used by the thermal core code until the thermal zone is
destroyed. That forces the caller, so the driver, to keep the pointer
valid until it unregisters the thermal zone if we want to make the
thermal zone device structure private the core code.
As the thermal zone device structure would be private, the driver can
not access to thermal zone device structure to retrieve the tzp field
after it passed it to register the thermal zone.
So instead of forcing the users of the function to deal with the tzp
structure life cycle, make the usage easier by allocating our own
thermal zone params, copying the parameter content and by freeing at
unregister time. The user can then create the parameters on the stack,
pass it to the registering function and forget about it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20230404075138.2914680-3-daniel.lezcano@linaro.org
structure field directly, access the sensor device instead the
thermal zone's device for trace, relocate the traces in
drivers/thermal (Daniel Lezcano)
- Use the generic trip point for the i.MX and remove the get_trip_temp
ops (Daniel Lezcano)
- Use the devm_platform_ioremap_resource() in the Hisilicon driver
(Yang Li)
- Remove R-Car H3 ES1.* handling as public has only access to the ES2
version and the upstream support for the ES1 has been shutdown (Wolfram Sang)
- Add a delay after initializing the bank in order to let the time to
the hardware to initialze itself before reading the temperature
(Amjad Ouled-Ameur)
- Add MT8365 support (Amjad Ouled-Ameur)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmQof0cACgkQqDIjiipP
6E/tXQgArKKlM52mo3pg880JsiWOWGrS7pJN0x9MR0nqUm83sLTDf21fPoYmn+EJ
wrzClIX1iHCDVCWCVxao7OIT1mxez9L2NAHseXDSDQJcZ0fflTE8wZ8xeLr6q5GN
/ifHfCqiC98yejPcKIf2TqdGgqpCzyQ++sZoc3H6/jwysSkFlBc+YgKx+XasQR6k
5swQ3E81zx0ouB+t1GDieXB6YRsjZzR2KQbbExoHexPue1DTIuuumz8M1Fgz4a4b
gXRHbrGp3vmLORIAOZiVDyjzC7jwy7oN552g16yZLGDUdLaJ03gRRx7fvNzDUEMW
mBzxak4WnNWEatCh691X6W5MdPO/uQ==
=naJV
-----END PGP SIGNATURE-----
Merge tag 'thermal-v6.4-rc1-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal control material for 6.4-rc1 from Daniel Lezcano:
"- Add more thermal zone device encapsulation: prevent setting
structure field directly, access the sensor device instead the
thermal zone's device for trace, relocate the traces in
drivers/thermal (Daniel Lezcano)
- Use the generic trip point for the i.MX and remove the get_trip_temp
ops (Daniel Lezcano)
- Use the devm_platform_ioremap_resource() in the Hisilicon driver
(Yang Li)
- Remove R-Car H3 ES1.* handling as public has only access to the ES2
version and the upstream support for the ES1 has been shutdown (Wolfram
Sang)
- Add a delay after initializing the bank in order to let the time to
the hardware to initialze itself before reading the temperature
(Amjad Ouled-Ameur)
- Add MT8365 support (Amjad Ouled-Ameur)"
* tag 'thermal-v6.4-rc1-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal/drivers/ti: Use fixed update interval
thermal/drivers/stm: Don't set no_hwmon to false
thermal/drivers/db8500: Use driver dev instead of tz->device
thermal/core: Relocate the traces definition in thermal directory
thermal/drivers/hisi: Use devm_platform_ioremap_resource()
thermal/drivers/imx: Use the thermal framework for the trip point
thermal/drivers/imx: Remove get_trip_temp ops
thermal/drivers/rcar_gen3_thermal: Remove R-Car H3 ES1.* handling
thermal/drivers/mediatek: Add delay after thermal banks initialization
thermal/drivers/mediatek: Add support for MT8365 SoC
thermal/drivers/mediatek: Control buffer enablement tweaks
dt-bindings: thermal: mediatek: Add binding documentation for MT8365 SoC
Once thermal_list_lock has been acquired in
__thermal_cooling_device_register(), it is not necessary to drop it
and take it again until all of the thermal zones have been updated,
so change the code accordingly.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The traces are exported but only local to the thermal core code. On
the other side, the traces take the thermal zone device structure as
argument, thus they have to rely on the exported thermal.h header
file. As we want to move the structure to the private thermal core
header, first we have to relocate those traces to the same place as
many drivers do.
Cc: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230307133735.90772-2-daniel.lezcano@linaro.org
Commit 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip()
function") stopped marking trip points with a zero temperature as
disabled, behavior that was originally introduced in commit 81ad4276b5
("Thermal: Ignore invalid trip points").
When using the mlxsw driver we see that when such trip points are not
disabled, the thermal subsystem repeatedly tries to set the state of the
associated cooling devices to the maximum state.
Address this by restoring the original behavior and mark trip points
with a zero temperature as disabled.
Fixes: 7c3d5c20dc ("thermal/core: Add a generic thermal_zone_get_trip() function")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.
That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state. Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).
The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.
Note that the actual user of the new function will be added separately.
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.
No expected functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
In order to get the thermal zone id but without directly accessing the
thermal zone device structure, add an accessor.
Use the accessor in the hwmon_scmi and acpi_thermal.
No functional change intented.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The thermal zone device structure is exposed via the exported
thermal.h header. This structure should stay private the thermal core
code. In order to encapsulate the structure, let's add an accessor to
get the 'type' of the thermal zone.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The thermal zone device structure is exposed to the different drivers
and obviously they access the internals while that should be
restricted to the core thermal code.
In order to self-encapsulate the thermal core code, we need to prevent
the drivers accessing directly the thermal zone structure and provide
accessor functions to deal with.
Provide an accessor to the 'devdata' structure and make use of it in
the different drivers.
No functional changes intended.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Follow the advice in Documentation/filesystems/sysfs.rst that show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If thermal_class is not registered with the driver core, there is no way
to expose the interfaces used by the thermal control framework, so
prevent thermal zones and cooling devices from being registered in
that case by returning an error from object registration functions.
For this purpose, use a thermal_class pointer that will be NULL if the
class is not registered. To avoid wasting memory in that case, allocate
the thermal class object dynamically and if it fails to register, free
it and clear the thermal_class pointer to NULL.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The thermal_core.c files contains a lot of functions handling
different thermal components like the governors, the trip points, the
cooling device, the OF cooling device, etc ...
This organization does not help to migrate to a more sane code where
there is a better self-encapsulation as all the components' internals
can be directly accessed from a single file.
For the sake of clarity, let's move the thermal trip points code in a
dedicated thermal_trip.c file and add a function to browse all the
trip points like we do with the thermal zones, the govenors and the
cooling devices.
The same can be done for the cooling devices and the governor code but
that will come later as the current work in the thermal framework is
to fix the trip point handling and use a generic trip point structure.
No functional changes intended.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As per documentation for the ida_destroy() function: "If the IDA is
already empty, there is no need to call this function."
The thermal framework is in the init sequence, so the ida was not yet
used and consequently it is empty in case of error.
There is no need to call ida_destroy(), let's remove the calls.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The thermal subsystem initialization miss an netlink unregistering
function in the error. Add it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lets not open code device_unregister() unnecessarily.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
thermal_release() already frees cdev, let it do rest of the cleanup as
well in order to simplify the error paths in
__thermal_cooling_device_register().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
put_device() shouldn't be called before a prior call to
device_register(). __thermal_cooling_device_register() doesn't follow
that properly and needs fixing. Also
thermal_cooling_device_destroy_sysfs() is getting called unnecessarily
on few error paths.
Fix all this by placing the calls at the right place.
Based on initial work done by Caleb Connolly.
Fixes: 4748f9687c ("thermal: core: fix some possible name leaks in error paths")
Fixes: c408b3d1d9 ("thermal: Validate new state in cur_state_store()")
Reported-by: Caleb Connolly <caleb.connolly@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Frank Rowand <frowand.list@gmail.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Caleb Connolly <caleb.connolly@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The set_trip_temp() callback is used when changing the trip temperature
through sysfs. As it is called with the thermal-zone-device lock held
it must not use thermal_zone_get_trip() directly or it will deadlock.
Fixes: 78c3e2429be8 ("thermal/drivers/qcom: Use generic thermal_zone_get_trip() function")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20221214131617.2447-2-johan+linaro@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
The thermal zone ops defines a set_trip callback where we can invoke
the backend driver to set an interrupt for the next trip point
temperature being crossed the way up or down, or setting the low level
with the hysteresis.
The ops is only called from the thermal sysfs code where the userspace
has the ability to modify a trip point characteristic.
With the effort of encapsulating the thermal framework core code,
let's create a thermal_zone_set_trip() which is the writable side of
the thermal_zone_get_trip() and put there all the ops encapsulation.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20221003092602.1323944-4-daniel.lezcano@linaro.org
The thermal_zone_device_ops structure defines a set of ops family,
get_trip_temp(), get_trip_hyst(), get_trip_type(). Each of them is
returning a property of a trip point.
The result is the code is calling the ops everywhere to get a trip
point which is supposed to be defined in the backend driver. It is a
non-sense as a thermal trip can be generic and used by the backend
driver to declare its trip points.
Part of the thermal framework has been changed and all the OF thermal
drivers are using the same definition for the trip point and use a
thermal zone registration variant to pass those trip points which are
part of the thermal zone device structure.
Consequently, we can use a generic function to get the trip points
when they are stored in the thermal zone device structure.
This approach can be generalized to all the drivers and we can get rid
of the ops->get_trip_*. That will result to a much more simpler code
and make possible to rework how the thermal trip are handled in the
thermal core framework as discussed previously.
This change adds a function thermal_zone_get_trip() where we get the
thermal trip point structure which contains all the properties (type,
temp, hyst) instead of doing multiple calls to ops->get_trip_*.
That opens the door for trip point extension with more attributes. For
instance, replacing the trip points disabled bitmask with a 'disabled'
field in the structure.
Here we replace all the calls to ops->get_trip_* in the thermal core
code with a call to the thermal_zone_get_trip() function.
The thermal zone ops defines a callback to retrieve the critical
temperature. As the trip handling is being reworked, all the trip
points will be the same whatever the driver and consequently finding
the critical trip temperature will be just a loop to search for a
critical trip point type.
Provide such a generic function, so we encapsulate the ops
get_crit_temp() which can be removed when all the backend drivers are
using the generic trip points handling.
While at it, add the thermal_zone_get_num_trips() to encapsulate the
code more and reduce the grip with the thermal framework internals.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221003092602.1323944-2-daniel.lezcano@linaro.org
In some error paths before device_register(), the names allocated
by dev_set_name() are not freed. Move dev_set_name() front to
device_register(), so the name can be freed while calling
put_device().
Fixes: 1dd7128b83 ("thermal/core: Fix null pointer dereference in thermal_release()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thermal device operations may be called after thermal zone device removal.
After thermal zone device removal, thermal zone device operations must
no longer be called. To prevent such calls from happening, ensure that
the thermal device is registered before executing any thermal device
operations.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Protect access to thermal operations against thermal zone removal by
acquiring the thermal zone device mutex. After acquiring the mutex, check
if the thermal zone device is registered and abort the operation if not.
With this change, we can call __thermal_zone_device_update() instead of
thermal_zone_device_update() from trip_point_temp_store() and from
emul_temp_store(). Similar, we can call __thermal_zone_set_trips() instead
of thermal_zone_set_trips() from trip_point_hyst_store().
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In thermal_zone_device_set_mode(), the thermal zone mutex is released only
to be reacquired in the subsequent call to thermal_zone_device_update().
Introduce __thermal_zone_device_update(), which is similar to
thermal_zone_device_update() but has to be called with the thermal device
mutex held. Call the new function from thermal_zone_device_set_mode()
to avoid the extra thermal device mutex release/acquire sequence in that
function.
With the new function in place, re-implement thermal_zone_device_update()
as wrapper around __thermal_zone_device_update() to acquire and release
the thermal device mutex.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thermal device attributes may still be opened after unregistering
the thermal zone and deleting the thermal device.
Currently there is no protection against accessing thermal device
operations after unregistering a thermal zone. To enable adding
such protection, protect the device delete operation with the
thermal zone device mutex. This requires splitting the call to
device_unregister() into its components, device_del() and put_device().
Only the first call can be executed under mutex protection, since
put_device() may result in releasing the thermal zone device memory.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Accesses to thermal zones, and with it the thermal zone device mutex,
are still possible after the thermal zone device has been unregistered.
For example, thermal_zone_get_temp() can be called from temp_show()
in thermal_sysfs.c if the sysfs attribute was opened before the thermal
device was unregistered.
Move the call to mutex_destroy from thermal_zone_device_unregister()
to thermal_release() to ensure that it is only destroyed after it is
guaranteed to be no longer accessed.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Return an error pointer if ->get_max_state() fails. The current code
returns NULL which will cause an oops in the callers.
Fixes: c408b3d1d9 ("thermal: Validate new state in cur_state_store()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In cur_state_store(), the new state of the cooling device is received
from user-space and is not validated by the thermal core but the same is
left for the individual drivers to take care of. Apart from duplicating
the code it leaves possibility for introducing bugs where a driver may
not do it right.
Lets make the thermal core check the new state itself and store the max
value in the cooling device structure.
Link: https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Drop the valid pointer check for type in
thermal_zone_device_register_with_trips() as we already have it confirmed
for != NULL from the previous if block.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20220909181322.10933-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
On one of the Chrome system, if we define more than 12 trip points,
probe for thermal sensor fails with
"int3403 thermal: probe of INTC1046:03 failed with error -22"
and throws an error as
"thermal_sys: Error: Incorrect number of thermal trips".
The thermal_zone_device_register() interface needs maximum
number of trip points supported in a zone as an argument.
This number can't exceed THERMAL_MAX_TRIPS, which is currently
set to 12. To address this issue, THERMAL_MAX_TRIPS value
has to be increased.
This interface also has an argument to specify a mask of trips
which are writable. This mask is defined as an int.
This mask sets the ceiling for increasing maximum number of
supported trips. With the current implementation, maximum number
of trips can be supported is 31.
Also, THERMAL_MAX_TRIPS macro is used in one place only.
So, remove THERMAL_MAX_TRIPS macro and compare num_trips
directly with using a macro BITS_PER_TYPE(int)-1.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function thermal_zone_device_is_enabled() must be called with the
thermal zone lock held. In the resume path, it is called without.
As the thermal_zone_device_is_enabled() is also checked in
thermal_zone_device_update(), do the check in resume() function is
pointless, except for saving an extra initialization which does not
hurt if it is done in all the cases.
Fixes: ca48ad71717dd ("thermal/core: Move the mutex inside the thermal_zone_device_update() function")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
All the different calls inside the thermal_zone_device_update()
function take the mutex.
The previous changes move the mutex out of the different functions,
like the throttling ops. Now that the mutexes are all at the same
level in the call stack for the thermal_zone_device_update() function,
they can be moved inside this one.
That has the benefit of:
1. Simplify the code by not having a plethora of places where the lock is taken
2. Probably closes more race windows because releasing the lock from
one line to another can give the opportunity to the thermal zone to change
its state in the meantime. For example, the thermal zone can be
enabled right after checking it is disabled.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-5-daniel.lezcano@linaro.org
All the governors throttling ops are taking/releasing the lock at the
beginning and the end of the function.
We can move the mutex to the throttling call site instead.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-4-daniel.lezcano@linaro.org
The should_stop_polling() function wraps the function
thermal_zone_device_is_enabled().
The monitor_thermal_zone() function checks if the thermal zone is
enabled via the should_stop_polling() function.
However, the instant after checking the thermal zone is enabled, this
one can be disabled, so even if that reduces the race window, it does
not prevent that and the monitoring can be set again with the thermal
zone disabled.
For this reason, the function should_stop_polling() is replaced by a
direct check of the thermal zone mode with the mutex locks held, that
prevents the situation described above.
As the semantic is clear with the thermal_zone_is_enabled() function,
we can remove the should_stop_polling() function and replace the check
with the former function.
While at it, reorder the checks to improve the readability of the
monitor_thermal_zone() function.
In the future, the thermal_zone_device_disable() and the
thermal_zone_device_enable() functions should unset / set the polling
timer directly instead of relying on the next
thermal_zone_device_update() call to do that. That will make a
synchronous thermal zone mode change but the locking scheme should be
double checked for that which out of the scope of this change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-2-daniel.lezcano@linaro.org
The current code calls monitor_thermal_zone() inside the
handle_thermal_trip() function. But this one is called in a loop for
each trip point which means the monitoring is rearmed several times
for nothing (assuming there could be several passive and active trip
points).
Move the monitor_thermal_zone() function out of the
handle_thermal_trip() function and after the thermal trip loop, so the
timer will be disabled or rearmed one time.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-1-daniel.lezcano@linaro.org
This transient change allows to use old and new OF together until all
the drivers are converted to use the new OF API.
This will go away when the old OF code will be removed.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220804224349.1926752-3-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
global variable, fix comments and rework the trace information
(Lukasz Luba)
- Add the include/dt-bindings/thermal.h under the area covered by the
thermal maintainer in the MAINTAINERS file (Lukas Bulwahn)
- Improve the error output by giving the sensor identification when a
thermal zone failed to initialize, the DT bindings by changing the
positive logic and adding the r8a779f0 support on the rcar3 (Wolfram
Sang)
- Convert the QCom tsens DT binding to the dtsformat format (Krzysztof
Kozlowski)
- Remove the pointless get_trend() function in the QCom, Ux500 and
tegra thermal drivers, along with the unused DROP_FULL and
RAISE_FULL trends definitions. Simplify the code by using clamp()
macros (Daniel Lezcano)
- Fix ref_table memory leak at probe time on the k3_j72xx bandgap
(Bryan Brattlof)
- Fix array underflow in prep_lookup_table (Dan Carpenter)
- Add static annotation to the k3_j72xx_bandgap_j7* data structure
(Jin Xiaoyun)
- Fix typos in comments detected on sun8i by Coccinelle (Julia Lawall)
- Fix typos in comments on rzg2l (Biju Das)
- Remove as unnecessary call to dev_err() as the error is already
printed by the failing function on u8500 (Yang Li)
- Register the thermal zones as hwmon sensors for the Qcom thermal
sensors (Dmitry Baryshkov)
- Fix 'tmon' tool compilation issue by adding phtread.h include
(Markus Mayer)
- Fix typo in the comments for the 'tmon' tool (Slark Xiao)
- Consolidate the thermal core code by beginning to move the thermal
trip structure from the thermal OF code as a generic structure to be
used by the different sensors when registering a thermal zone
(Daniel Lezcano)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmLkDEgACgkQqDIjiipP
6E9PPAf/fZRYgzqgv68lYy2hnJBZEha7z76KyKKxbPATy65VQHzHBqWyPgOnZWx8
xm26tlDJMFEGql/Sy5QetvnFdDqvY33Q0FBhDbmCdCp7vxxirDNKxXhGnxUggCIt
PrloMzC9zjgdNaFTclf/ceCFNwHPnY8l5kxGHhVDn/l5vvGFB869HKMT+13FMCQM
cKVNZY0F3BgmY0ouAMbXT2jwNm/FIYfXC9CFaQo9XhiTAvqU1h4BI08S8JdXsve0
VVBi8MB0sBolWIQ/GVlC1IWj1FhxgMfvcfZAOlyia7I4kQz7K5wAHxiHnhA+GHsZ
NdxVeGTIdmjIInvRxnsT7yR2HcitkA==
=sDfh
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal control changes for 5.20-rc1 from Daniel Lezcano:
"- Make per cpufreq / devfreq cooling device ops instead of using a
global variable, fix comments and rework the trace information
(Lukasz Luba)
- Add the include/dt-bindings/thermal.h under the area covered by the
thermal maintainer in the MAINTAINERS file (Lukas Bulwahn)
- Improve the error output by giving the sensor identification when a
thermal zone failed to initialize, the DT bindings by changing the
positive logic and adding the r8a779f0 support on the rcar3 (Wolfram
Sang)
- Convert the QCom tsens DT binding to the dtsformat format (Krzysztof
Kozlowski)
- Remove the pointless get_trend() function in the QCom, Ux500 and
tegra thermal drivers, along with the unused DROP_FULL and
RAISE_FULL trends definitions. Simplify the code by using clamp()
macros (Daniel Lezcano)
- Fix ref_table memory leak at probe time on the k3_j72xx bandgap
(Bryan Brattlof)
- Fix array underflow in prep_lookup_table (Dan Carpenter)
- Add static annotation to the k3_j72xx_bandgap_j7* data structure
(Jin Xiaoyun)
- Fix typos in comments detected on sun8i by Coccinelle (Julia Lawall)
- Fix typos in comments on rzg2l (Biju Das)
- Remove as unnecessary call to dev_err() as the error is already
printed by the failing function on u8500 (Yang Li)
- Register the thermal zones as hwmon sensors for the Qcom thermal
sensors (Dmitry Baryshkov)
- Fix 'tmon' tool compilation issue by adding phtread.h include
(Markus Mayer)
- Fix typo in the comments for the 'tmon' tool (Slark Xiao)
- Consolidate the thermal core code by beginning to move the thermal
trip structure from the thermal OF code as a generic structure to be
used by the different sensors when registering a thermal zone
(Daniel Lezcano)"
* tag 'thermal-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (36 commits)
thermal/of: Initialize trip points separately
thermal/of: Use thermal trips stored in the thermal zone
thermal/core: Add thermal_trip in thermal_zone
thermal/core: Rename 'trips' to 'num_trips'
thermal/core: Move thermal_set_delay_jiffies to static
thermal/core: Remove unneeded EXPORT_SYMBOLS
thermal/of: Move thermal_trip structure to thermal.h
thermal/of: Remove the device node pointer for thermal_trip
thermal/of: Replace device node match with device node search
thermal/core: Remove duplicate information when an error occurs
thermal/core: Avoid calling ->get_trip_temp() unnecessarily
thermal/tools/tmon: Fix typo 'the the' in comment
thermal/tools/tmon: Include pthread and time headers in tmon.h
thermal/ti-soc-thermal: Fix comment typo
thermal/drivers/qcom/spmi-adc-tm5: Register thermal zones as hwmon sensors
thermal/drivers/qcom/temp-alarm: Register thermal zones as hwmon sensors
thermal/drivers/u8500: Remove unnecessary print function dev_err()
thermal/drivers/rzg2l: Fix comments
thermal/drivers/sun8i: Fix typo in comment
thermal/drivers/k3_j72xx_bandgap: Make k3_j72xx_bandgap_j721e_data and k3_j72xx_bandgap_j7200_data static
...
The thermal trip points are properties of a thermal zone and the
different sub systems should be able to save them in the thermal zone
structure instead of having their own definition.
Give the opportunity to the drivers to create a thermal zone with
thermal trips which will be accessible directly from the thermal core
framework.
As we added the thermal trip points structure in the thermal zone,
let's extend the thermal zone register function to have the thermal
trip structures as a parameter and store it in the 'trips' field of
the thermal zone structure.
The thermal zone contains the trip point, we can store them directly
when registering the thermal zone. That will allow another step
forward to remove the duplicate thermal zone structure we find in the
thermal_of code.
Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220722200007.1839356-9-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
In order to use thermal trips defined in the thermal structure, rename
the 'trips' field to 'num_trips' to have the 'trips' field containing the
thermal trip points.
Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220722200007.1839356-8-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The function 'thermal_set_delay_jiffies' is only used in
thermal_core.c but it is defined and implemented in a separate
file. Move the function to thermal_core.c and make it static.
Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20220722200007.1839356-7-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The pr_err already tells it is an error, it is pointless to add the
'Error:' string in the messages. Remove them.
Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20220722200007.1839356-2-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
As the trip temperature is already available when calling the function
handle_critical_trips(), pass it as a parameter instead of having this
function calling the ops again to retrieve the same data.
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220718145038.1114379-2-daniel.lezcano@linaro.org
Use ida_alloc()/ida_free() instead of deprecated
ida_simple_get()/ida_simple_remove() as recommended.
Signed-off-by: keliu <liuke94@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During the suspend is in process, thermal_zone_device_update bails out
thermal zone re-evaluation for any sensor trip violation without
setting next valid trip to that sensor. It assumes during resume
it will re-evaluate same thermal zone and update trip. But when it is
in suspend temperature goes down and on resume path while updating
thermal zone if temperature is less than previously violated trip,
thermal zone set trip function evaluates the same previous high and
previous low trip as new high and low trip. Since there is no change
in high/low trip, it bails out from thermal zone set trip API without
setting any trip. It leads to a case where sensor high trip or low
trip is disabled forever even though thermal zone has a valid high
or low trip.
During thermal zone device init, reset thermal zone previous high
and low trip. It resolves above mentioned scenario.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When device_register() return failed, program will goto out_kfree_type
to release 'cdev->device' by put_device(). That will call thermal_release()
to free 'cdev'. But the follow-up processes access 'cdev' continually.
That trggers the UAF bug.
====================================================================
BUG: KASAN: use-after-free in __thermal_cooling_device_register+0x75b/0xa90
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
dump_stack_lvl+0xe2/0x152
print_address_description.constprop.0+0x21/0x140
? __thermal_cooling_device_register+0x75b/0xa90
kasan_report.cold+0x7f/0x11b
? __thermal_cooling_device_register+0x75b/0xa90
__thermal_cooling_device_register+0x75b/0xa90
? memset+0x20/0x40
? __sanitizer_cov_trace_pc+0x1d/0x50
? __devres_alloc_node+0x130/0x180
devm_thermal_of_cooling_device_register+0x67/0xf0
max6650_probe.cold+0x557/0x6aa
......
Freed by task 258:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0x109/0x140
kfree+0x117/0x4c0
thermal_release+0xa0/0x110
device_release+0xa7/0x240
kobject_put+0x1ce/0x540
put_device+0x20/0x30
__thermal_cooling_device_register+0x731/0xa90
devm_thermal_of_cooling_device_register+0x67/0xf0
max6650_probe.cold+0x557/0x6aa [max6650]
Do not use 'cdev' again after put_device() to fix the problem like doing
in thermal_zone_device_register().
[dlezcano]: as requested by Rafael, change the affectation into two statements.
Fixes: 5848376181 ("thermal/drivers/core: Use a char pointer for the cooling device name")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20211015024504.947520-1-william.xuanziyang@huawei.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
If both dev_set_name() and device_register() failed, then null pointer
dereference occurs in thermal_release() which will use strncmp() to
compare the name.
So fix it by adding dev_set_name() return value check.
Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com>
Link: https://lore.kernel.org/r/20211015083230.67658-1-songyuanzheng@huawei.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The slope of the temperature increase or decrease can be high and when
the temperature crosses the trip point, there could be a significant
difference between the trip temperature and the measured temperatures.
That forces the userspace to read the temperature back right after
receiving a trip violation notification.
In order to be efficient, give the temperature which resulted in the
trip violation.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lore.kernel.org/r/20211001223323.1836640-1-daniel.lezcano@linaro.org
After printing the list of thermal governors, then this function prints
a newline character. The problem is that "size" has not been updated
after printing the last governor. This means that it can write one
character (the NUL terminator) beyond the end of the buffer.
Get rid of the "size" variable and just use "PAGE_SIZE - count" directly.
Fixes: 1b4f48494e ("thermal: core: group functions related to governor handling")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210916131342.GB25094@kili
- Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan
Zhang)
- Export additionnal attributes for the int340x thermal processor
(Srinivas Pandruvada)
- Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra
Kamble)
- Fix kernel documentation for thermal_zone_device_unregister() and
use devm_platform_get_and_ioremap_resource() (Yang Yingliang)
- Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas
Söderlund)
- Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven)
- Add missing of_node_put() for the iMX and Spreadtrum sensors
(Krzysztof Kozlowski)
- Add tegra3 thermal sensor DT bindings (Dmitry Osipenko)
- Stop the thermal zone monitoring when unregistering it to prevent a
temperature update without the 'get_temp' callback (Dmitry Osipenko)
- Add rk3568 DT bindings, convert bindings to yaml schemas and add the
corresponding compatible in the Rockchip sensor (Ezequiel Garcia)
- Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson)
- Use the find_first_zero_bit() function instead of custom code (Andy
Shevchenko)
- Fix the kernel doc for the device cooling device (Yang Li)
- Reorg the processor thermal int340x to set the scene for the PCI
mmio driver (Srinivas Pandruvada)
- Add PCI MMIO driver for the int340x processor thermal driver
(Srinivas Pandruvada)
- Add hwmon sensors for the mediatek sensor (Frank Wunderlich)
- Fix warning for return value reported by Smatch for the int340x
thermal processor (Srinivas Pandruvada)
- Fix wrong register access and decoding for the int340x thermal
processor (Srinivas Pandruvada)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmDkbdAACgkQqDIjiipP
6E8pTwgAgc4RMdSBdNMVWP1Lc3Gprmg8uXMhma9ZlmJD+l3h4b4P+Zm7HjU+SPHI
emvvqHgiWGw/ta/Fuhx9XnqTUjZznG3gMYohnfKe7jPgVTxmud+Yf0/E3dRrDNWl
WNolS8rv4dLf1mjR+vGZ63KasK0Rz5Z4YxDV4kOvh+/VUhqg3XpZ1OTKOh/B9IWX
pUTedq7hIZ44ZMINcwObvLNTeaoEhPNpgA4Rs07vQPYugY0V61HszqR/Mu+l8Rgp
LiE8NUBANJ+8+wydHe/vP6lvOthh7YGSx4091rUe+X17qgfBDKCTsOyECnZ/UX+r
aB1MaTAnr1H0dfM49yeoFcMSPc1rGA==
=q0nG
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Add rk3568 sensor support (Finley Xiao)
- Add missing MODULE_DEVICE_TABLE for the Spreadtrum sensor (Chunyan
Zhang)
- Export additionnal attributes for the int340x thermal processor
(Srinivas Pandruvada)
- Add SC7280 compatible for the tsens driver (Rajeshwari Ravindra
Kamble)
- Fix kernel documentation for thermal_zone_device_unregister() and use
devm_platform_get_and_ioremap_resource() (Yang Yingliang)
- Fix coefficient calculations for the rcar_gen3 sensor driver (Niklas
Söderlund)
- Fix shadowing variable rcar_gen3_ths_tj_1 (Geert Uytterhoeven)
- Add missing of_node_put() for the iMX and Spreadtrum sensors
(Krzysztof Kozlowski)
- Add tegra3 thermal sensor DT bindings (Dmitry Osipenko)
- Stop the thermal zone monitoring when unregistering it to prevent a
temperature update without the 'get_temp' callback (Dmitry Osipenko)
- Add rk3568 DT bindings, convert bindings to yaml schemas and add the
corresponding compatible in the Rockchip sensor (Ezequiel Garcia)
- Add the sc8180x compatible for the Qualcomm tsensor (Bjorn Andersson)
- Use the find_first_zero_bit() function instead of custom code (Andy
Shevchenko)
- Fix the kernel doc for the device cooling device (Yang Li)
- Reorg the processor thermal int340x to set the scene for the PCI mmio
driver (Srinivas Pandruvada)
- Add PCI MMIO driver for the int340x processor thermal driver
(Srinivas Pandruvada)
- Add hwmon sensors for the mediatek sensor (Frank Wunderlich)
- Fix warning for return value reported by Smatch for the int340x
thermal processor (Srinivas Pandruvada)
- Fix wrong register access and decoding for the int340x thermal
processor (Srinivas Pandruvada)
* tag 'thermal-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (23 commits)
thermal/drivers/int340x/processor_thermal: Fix tcc setting
thermal/drivers/int340x/processor_thermal: Fix warning for return value
thermal/drivers/mediatek: Add sensors-support
thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver
thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
thermal: devfreq_cooling: Fix kernel-doc
thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()
dt-bindings: thermal: tsens: Add sc8180x compatible
dt-bindings: rockchip-thermal: Support the RK3568 SoC compatible
dt-bindings: thermal: convert rockchip-thermal to json-schema
thermal/core/thermal_of: Stop zone device before unregistering it
dt-bindings: thermal: Add binding for Tegra30 thermal sensor
thermal/drivers/sprd: Add missing of_node_put for loop iteration
thermal/drivers/imx_sc: Add missing of_node_put for loop iteration
thermal/drivers/rcar_gen3_thermal: Do not shadow rcar_gen3_ths_tj_1
thermal/drivers/rcar_gen3_thermal: Fix coefficient calculations
thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
thermal/core: Correct function name thermal_zone_device_unregister()
dt-bindings: thermal: tsens: Add compatible string to TSENS binding for SC7280
thermal/drivers/int340x: processor_thermal: Export additional attributes
...
Fix the following make W=1 kernel build warning:
drivers/thermal/thermal_core.c:1376: warning: expecting prototype for thermal_device_unregister(). Prototype was for thermal_zone_device_unregister() instead
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210517051020.3463536-1-yangyingliang@huawei.com
thermal_notify_framework just updates for a single trip point where as
thermal_zone_device_update does other bookkeeping like updating the
temperature of the thermal zone and setting the next trip point. The only
driver that was using thermal_notify_framework was updated in the previous
patch to use thermal_zone_device_update instead. Since there are no users
for thermal_notify_framework remove it.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210122023406.3500424-3-thara.gopinath@linaro.org
Fix the following error:
smatch warnings:
drivers/thermal/thermal_core.c:1020 __thermal_cooling_device_register() warn: possible memory leak of 'cdev'
by freeing the cdev when exiting the function in the error path.
Fixes: 5848376181 ("thermal/drivers/core: Use a char pointer for the cooling device name")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210319202257.890848-1-daniel.lezcano@linaro.org
We want to have any kind of name for the cooling devices as we do no
longer want to rely on auto-numbering. Let's replace the cooling
device's fixed array by a char pointer to be allocated dynamically
when registering the cooling device, so we don't limit the length of
the name.
Rework the error path at the same time as we have to rollback the
allocations in case of error.
Tested with a dummy device having the name:
"Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
A village on the island of Anglesey (Wales), known to have the longest
name in Europe.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20210314111333.16551-1-daniel.lezcano@linaro.org
The function thermal_zone_device_reset() is called in the
thermal_zone_device_register() which allocates and initialize the
structure. The passive field is already zero-ed by the allocation, the
function is useless.
Call directly thermal_zone_device_init() instead and
thermal_zone_device_reset().
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201222181110.1231977-1-daniel.lezcano@linaro.org
The code does no longer use the ms unit based fields to set the
delays as they are replaced by the jiffies.
Remove them and replace their user to use the jiffies version instead.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Peter Kästle <peter@piie.net>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20201216220337.839878-3-daniel.lezcano@linaro.org
The delays are also stored in jiffies based unit. Use them instead of
the ms.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20201216220337.839878-2-daniel.lezcano@linaro.org
The delays are stored in ms units and when the polling function is
called this delay is converted into jiffies at each call.
Instead of doing the conversion again and again, compute the jiffies
at init time and use the value directly when setting the polling.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20201216220337.839878-1-daniel.lezcano@linaro.org
The last site calling the thermal_zone_bind_cooling_device() function
with the THERMAL_TRIPS_NONE parameter was removed.
We can get rid of this test as no user of this function is calling
this function with this parameter.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20201214233811.485669-5-daniel.lezcano@linaro.org
The functions thermal_zone_device_rebind_exception and
thermal_zone_device_unbind_exception are not used from anywhere.
Remove that code.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20201214233811.485669-2-daniel.lezcano@linaro.org
Currently there is no way to the sensors to directly call an ops in
interrupt mode without calling thermal_zone_device_update assuming all
the trip points are defined.
A sensor may want to do something special if a trip point is hot or
critical.
This patch adds the critical and hot ops to the thermal zone device,
so a sensor can directly invoke them or let the thermal framework to
call the sensor specific ones.
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20201210121514.25760-2-daniel.lezcano@linaro.org
The actual code is silently ignoring a thermal zone update when a
driver is requesting it without a get_temp ops set.
That looks not correct, as the caller should not have called this
function if the thermal zone is unable to read the temperature.
That makes the code less robust as the check won't detect the driver
is inconsistently using the thermal API and that does not help to
improve the framework as these circumvolutions hide the problem at the
source.
In order to detect the situation when it happens, let's add a warning
when the update is requested without the get_temp() ops set.
Any warning emitted will have to be fixed at the source of the
problem: the caller must not call thermal_zone_device_update if there
is not get_temp callback set.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Amit Kucheria <amitk@kernel.org>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20201210121514.25760-1-daniel.lezcano@linaro.org
The trip points are checked one by one with multiple condition
branches where one condition is enough to disable the trip point.
Merge all these conditions in a single 'OR' statement.
Signed-off-by: Bernard Zhao <bernard@vivo.com>
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201027013743.62392-1-bernard@vivo.com
[dlezcano] Changed patch description
Since the power actor section has one function power_actor_set_power()
move it into Intelligent Power Allocation (IPA). There is no other user
of that helper function. It would also allow to remove the check of
cdev_is_power_actor() because the code which calls it in IPA already does
the needed check. Make the function static since only IPA use it.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201015112441.4056-5-lukasz.luba@arm.com
Since the Intelligent Power Allocation (IPA) uses different way to get
minimum and maximum power for a given cooling device, the helper functions
are not needed. There is no other code which uses them, so remove the
helper functions.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201015112441.4056-4-lukasz.luba@arm.com
The upper and lower limits of thermal throttle state in the
DT do not apply to the Intelligent Power Allocation (IPA) governor.
Add the clamping for cooling device upper and lower limits in the
power_actor_set_power() used by IPA.
Signed-off-by: Michael Kao <michael.kao@mediatek.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201007024332.30322-1-michael.kao@mediatek.com
1. devfreq_cooling.c: The variable *tz is not used in
devfreq_cooling_get_requested_power(), devfreq_cooling_state2power()
and devfreq_cooling_power2state().
2. cpufreq_cooling.c: After 84fe2cab48, the variable *tz is not used
anymore in cpufreq_get_requested_power(), cpufreq_state2power() and
cpufreq_power2state().
Remove the variable *tz.
Signed-off-by: zhuguangqing <zhuguangqing@xiaomi.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200914071101.13575-1-zhuguangqing83@gmail.com
The mutex poweroff_lock is initialized statically. It is
unnecessary to initialize by mutex_init().
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200916062139.191233-1-miaoqinglang@huawei.com
The user-after-free bug in thermal_zone_device_unregister() is reported by
KASAN. It happens because struct thermal_zone_device is released during of
device_unregister() invocation, and hence the "tz" variable shouldn't be
touched by thermal_notify_tz_delete(tz->id).
Fixes: 55cdf0a283 ("thermal: core: Add notifications call in the framework")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200817235854.26816-1-digetx@gmail.com
Now the calls to enable/disable a thermal zone are centralized in a
call to a function, we can add in these the corresponding netlink
notifications.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200727231033.26512-1-daniel.lezcano@linaro.org
When a thermal zone is looked up by an ID and no zone is found matching
that ID, the thermal_zone_get_by_id() function will return a pointer to
the thermal zone list head which isn't actually a valid thermal zone.
This can lead to a subsequent crash because a valid pointer is returned
to the called, but dereferencing that pointer as struct thermal_zone is
not safe.
Fixes: 329b064fbd ("thermal: core: Get thermal zone by id")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200724170105.2705467-1-thierry.reding@gmail.com
The generic netlink is initialized at subsys_initcall, so far after
the thermal init routine and the thermal generic netlink family
initialization.
On ŝome platforms, that leads to a memory corruption.
The fix was sent to netdev@ to move the genetlink framework
initialization at core_initcall.
Move the thermal core initialization to postcore level which is very
close to core level.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200717164217.18819-2-daniel.lezcano@linaro.org
The initcalls like to play joke. In our case, the thermal-netlink
initcall is called after the thermal-core initcall but this one sends
a notification before the former is initialized. No issue was spotted,
but it could lead to a memory corruption, so instead of relying on the
core_initcall for the thermal-netlink, let's initialize directly from
the thermal-core init routine, so we have full control of the init
ordering.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200717164217.18819-1-daniel.lezcano@linaro.org
The generic netlink protocol is implemented but the different
notification functions are not yet connected to the core code.
These changes add the notification calls in the different
corresponding places.
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20200706105538.2159-4-daniel.lezcano@linaro.org
The next patch will introduce the generic netlink protocol to handle
events, sampling and command from the thermal framework. In order to
deal with the thermal zone, it uses its unique identifier to
characterize it in the message. Passing an integer is more efficient
than passing an entire string.
This change provides a function returning back a thermal zone pointer
corresponding to the identifier passed as parameter.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20200706105538.2159-2-daniel.lezcano@linaro.org
The cdev, tz and governor list, as well as their respective locks are
statically defined in the thermal_core.c file.
In order to give a sane access to these list, like browsing all the
thermal zones or all the cooling devices, let's define a set of
helpers where we pass a callback as a parameter to be called for each
thermal entity.
We keep the self-encapsulation and ensure the locks are correctly
taken when looking at the list.
Acked-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200706105538.2159-1-daniel.lezcano@linaro.org
set_mode() is only called when tzd's mode is about to change. Actual
setting is performed in thermal_core, in thermal_zone_device_set_mode().
The meaning of set_mode() callback is actually to notify the driver about
the mode being changed and giving the driver a chance to oppose such
change.
To better reflect the purpose of the method rename it to change_mode()
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
[for acerhdf]
Acked-by: Peter Kaestle <peter@piie.net>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200629122925.21729-12-andrzej.p@collabora.com
Polling DISABLED devices is not desired, as all such "disabled" devices
are meant to be handled by userspace. This patch introduces and uses
should_stop_polling() to decide whether the device should be polled or not.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200629122925.21729-10-andrzej.p@collabora.com
Use thermal_zone_device_{en|dis}able() and thermal_zone_device_is_enabled().
Consequently, all set_mode() implementations in drivers:
- can stop modifying tzd's "mode" member,
- shall stop taking tzd's lock, as it is taken in the helpers
- shall stop calling thermal_zone_device_update() as it is called in the
helpers
- can assume they are called when the mode truly changes, so checks to
verify that can be dropped
Not providing set_mode() by a driver no longer prevents the core from
being able to set tzd's mode, so the relevant check in mode_store() is
removed.
Other comments:
- acpi/thermal.c: tz->thermal_zone->mode will be updated only after we
return from set_mode(), so use function parameter in thermal_set_mode()
instead, no need to call acpi_thermal_check() in set_mode()
- thermal/imx_thermal.c: regmap writes and mode assignment are done in
thermal_zone_device_{en|dis}able() and set_mode() callback
- thermal/intel/intel_quark_dts_thermal.c: soc_dts_{en|dis}able() are a
part of set_mode() callback, so they don't need to modify tzd->mode, and
don't need to fall back to the opposite mode if unsuccessful, as the return
value will be propagated to thermal_zone_device_{en|dis}able() and
ultimately tzd's member will not be changed in thermal_zone_device_set_mode().
- thermal/of-thermal.c: no need to set zone->mode to DISABLED in
of_parse_thermal_zones() as a tzd is kzalloc'ed so mode is DISABLED anyway
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
[for acerhdf]
Acked-by: Peter Kaestle <peter@piie.net>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200629122925.21729-8-andrzej.p@collabora.com
Prepare for making the drivers not access tzd's private members.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
[staticize thermal_zone_device_set_mode()]
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200629122925.21729-7-andrzej.p@collabora.com
get_mode() is now redundant, as the state is stored in struct
thermal_zone_device.
Consequently the "mode" attribute in sysfs can always be visible, because
it is always possible to get the mode from struct tzd.
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
[for acerhdf]
Acked-by: Peter Kaestle <peter@piie.net>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200629122925.21729-6-andrzej.p@collabora.com
The thermal framework can no longer be compiled as a module as of
commit 554b3529fe ("thermal/drivers/core: Remove the module Kconfig's
option"). Remove the MODULE_* tags.
Rui is mentioned in the copyright line at the top of the file and the
license is mentioned in the SPDX tags. So no loss of information.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/74339a09a55f8f3d86c4074fc2bf853a302d6186.1589199124.git.amit.kucheria@linaro.org
The last temperature and the current temperature are show via a
dev_debug. The line before, those temperature are also traced.
It is pointless to duplicate the traces for the temperatures,
remove the dev_dbg traces.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200331165449.30355-2-daniel.lezcano@linaro.org
Now that the thermal framework is built-in, in order to facilitate
thermal mitigation as early as possible in the boot cycle, move the
thermal framework initialization to core_initcall.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/f8ff0ab4a8e9c2eca5a26fb2256365b26cb326ce.1571656015.git.amit.kucheria@linaro.org
When registering a thermal zone device, we currently return -EINVAL in
four cases. This makes it a little hard to debug the real cause of the
failure.
Print some error messages to make it easier for developer to figure out
what happened.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Never directly free @dev after calling device_register(), even if it
returned an error! Always use put_device() to give up the reference
initialized. Clean up the rollback block also.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Now that the governor table is in place and the macro allows to browse the
table, declare the governor so the entry is added in the governor table
in the init section.
The [un]register_thermal_governors function does no longer need to use the
exported [un]register thermal governor's specific function which in turn
call the [un]register_thermal_governor. The governors are fully
self-encapsulated.
The cyclic dependency is no longer needed, remove it.
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Pull thermal management updates from Zhang Rui:
- Remove the 'module' Kconfig option for thermal subsystem framework
because the thermal framework are required to be ready as early as
possible to avoid overheat at boot time (Daniel Lezcano)
- Fix a bug that thermal framework pokes disabled thermal zones upon
resume (Wei Wang)
- A couple of cleanups and trivial fixes on int340x thermal drivers
(Srinivas Pandruvada, Zhang Rui, Sumeet Pawnikar)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
drivers: thermal: processor_thermal: Downgrade error message
mlxsw: Remove obsolete dependency on THERMAL=m
hwmon/drivers/core: Simplify complex dependency
thermal/drivers/core: Fix typo in the option name
thermal/drivers/core: Remove depends on THERMAL in Kconfig
thermal/drivers/core: Remove module unload code
thermal/drivers/core: Remove the module Kconfig's option
thermal: core: skip update disabled thermal zones after suspend
thermal: make device_register's type argument const
thermal: intel: int340x: processor_thermal_device: simplify to get driver data
thermal/int3403_thermal: favor _TMP instead of PTYP
thermal_of_cooling_device_register() and thermal_cooling_device_register()
are typically called from driver probe functions, and
thermal_cooling_device_unregister() is called from remove functions. This
makes both a perfect candidate for device managed functions.
Introduce devm_thermal_of_cooling_device_register(). This function can
also be used to replace thermal_cooling_device_register() by passing a NULL
pointer as device node. The new function requires both struct device *
and struct device_node * as parameters since the struct device_node *
parameter is not always identical to dev->of_node.
Don't introduce a device managed remove function since it is not needed
at this point.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Now the thermal core is no longer compiled as a module. Remove the
unloading module code and move the unregister function to the __init
section.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
It is unnecessary to update disabled thermal zones post suspend and
sometimes leads error/warning in bad behaved thermal drivers.
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
...because it can be, the buffer is strlcpy'd into a local buffer in a
thermal struct member.
Signed-off-by: Jean-Francois Dagenais <jeff.dagenais@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
commit ff140fea84 ("Thermal: handle thermal zone device properly
during system sleep") added PM hook to call thermal zone reset during
sleep. However resetting thermal zone will also clear the passive state
and thus cancel the polling queue which leads the passive cooling device
state not being cleared properly after sleep.
thermal_pm_notify => thermal_zone_device_reset set passive to 0
thermal_zone_trip_update will skip update passive as `old_target ==
instance->target'.
monitor_thermal_zone => thermal_zone_device_set_polling will cancel
tz->poll_queue, so the cooling device state will not be changed
afterwards.
Reported-by: Kame Wang <kamewang@google.com>
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
For SMP systems, thermal worker should use power_efficient_wq in power
saving mode, that will make scheduler more flexible on selecting an active
core for running work handler to avoid keeping work handler always
running on a single core, that will save some power.
Even if 'power_efficient_wq' relevant configs are disabled
'system_freezable_power_efficient_wq' is identical to system_freezable_wq,
behavior is unchanged.
Signed-off-by: Jeson Gao <jeson.gao@unisoc.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This patch fixes use-after-free that was detected by KASAN. The bug is
triggered on a CPUFreq driver module unload by freeing 'cdev' on device
unregister and then using the freed structure during of the cdev's sysfs
data destruction. The solution is to unregister the sysfs at first, then
destroy sysfs data and finally release the cooling device.
Cc: <stable@vger.kernel.org> # v4.17+
Fixes: 8ea229511e ("thermal: Add cooling device's statistics in sysfs")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The naming isn't consistent across all sysfs callbacks in the thermal
core, some have a short name like type_show() and others have long names
like thermal_cooling_device_weight_show(). This patch tries to make it
consistent by shortening the name of sysfs callbacks.
Some of the sysfs files are named similarly for both thermal zone and
cooling device (like: type) and to avoid name clash between their
show/store routines, the cooling device specific sysfs callbacks are
prefixed with "cdev_".
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This extends the sysfs interface for thermal cooling devices and exposes
some pretty useful statistics. These statistics have proven to be quite
useful specially while doing benchmarks related to the task scheduler,
where we want to make sure that nothing has disrupted the test,
specially the cooling device which may have put constraints on the CPUs.
The information exposed here tells us to what extent the CPUs were
constrained by the thermal framework.
The write-only "reset" file is used to reset the statistics.
The read-only "time_in_state_ms" file shows the time (in msec) spent by the
device in the respective cooling states, and it prints one line per
cooling state.
The read-only "total_trans" file shows single positive integer value
showing the total number of cooling state transitions the device has
gone through since the time the cooling device is registered or the time
when statistics were reset last.
The read-only "trans_table" file shows a two dimensional matrix, where
an entry <i,j> (row i, column j) represents the number of transitions
from State_i to State_j.
This is how the directory structure looks like for a single cooling
device:
$ ls -R /sys/class/thermal/cooling_device0/
/sys/class/thermal/cooling_device0/:
cur_state max_state power stats subsystem type uevent
/sys/class/thermal/cooling_device0/power:
autosuspend_delay_ms runtime_active_time runtime_suspended_time
control runtime_status
/sys/class/thermal/cooling_device0/stats:
reset time_in_state_ms total_trans trans_table
This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and
ARM 64-bit Hisilicon hikey960 board running Android.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reorder error handling code in order to fix some resources leaks in some
cases:
- 'tz' would leak if 'thermal_zone_create_device_groups()' fails
- memory allocated by 'thermal_zone_create_device_groups()' would leak
if 'device_register()' fails
With this patch, we now have 2 error handling paths: one before
'device_register()', and one after it.
This is needed because some resources are released in 'thermal_release()'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Simplify code by using the new 'thermal_zone_destroy_device_groups()'
helper function.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>