Commit Graph

102 Commits

Author SHA1 Message Date
Asad Kamal 8de7948da7 drm/amd/pm: Update uclk/sclk limit report format
Use OD (pp_od_clk_voltage) interface to report current limits,
default or those set by user, for SCLK and UCLK on aldebaran.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:10:56 -04:00
Ma Jun b2207dc698 drm/amdgpu/pm: Add support for MACO flag checking
Add support for MACO flag checking.
MACO mode only works if BACO is supported.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:07:59 -04:00
Ma Jun 1b19959427 drm/amdgpu/pm: Change the member function name in pp_hwmgr_func and pptable_funcs
Use a unified and more explicit name get_bamaco_support
to replace is_baco_support and get_asic_baco_capability

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:07:50 -04:00
Dave Airlie af165fb00a amd-drm-next-6.9-2024-03-01:
amdgpu:
 - GC 11.5.1 updates
 - Misc display cleanups
 - NBIO 7.9 updates
 - Backlight fixes
 - DMUB fixes
 - MPO fixes
 - atomfirmware table updates
 - SR-IOV fixes
 - VCN 4.x updates
 - use RMW accessors for pci config registers
 - PSR fixes
 - Suspend/resume fixes
 - RAS fixes
 - ABM fixes
 - Misc code cleanups
 - SI DPM fix
 - Revert freesync video
 
 amdkfd:
 - Misc cleanups
 - Error handling fixes
 
 radeon:
 - use RMW accessors for pci config registers
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZeI9fgAKCRC93/aFa7yZ
 2DmCAQCVUsteinfHeJiyd7+pWnZdJVKzy5fRayvfOV8InHapawD/RAMlJhxy08IO
 B8tw7HY+YqRqW12T3klwRiBTk3Iw9Ac=
 =vBW4
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-03-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-03-01:

amdgpu:
- GC 11.5.1 updates
- Misc display cleanups
- NBIO 7.9 updates
- Backlight fixes
- DMUB fixes
- MPO fixes
- atomfirmware table updates
- SR-IOV fixes
- VCN 4.x updates
- use RMW accessors for pci config registers
- PSR fixes
- Suspend/resume fixes
- RAS fixes
- ABM fixes
- Misc code cleanups
- SI DPM fix
- Revert freesync video

amdkfd:
- Misc cleanups
- Error handling fixes

radeon:
- use RMW accessors for pci config registers

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-03-08 11:21:13 +10:00
Asad Kamal b485b899e5 drm/amd/pm: Fix esm reg mask use to get pcie speed
Fix mask used for esm ctrl register to get pcie link
speed on smu_v11_0_3, smu_v13_0_2 & smu_v13_0_6

Fixes: 511a95552e ("drm/amd/pm: Add SMU 13.0.6 support")
Fixes: c05d1c4015 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)")
Fixes: f1c3785931 ("drm/amd/powerplay: add Arcturus support for gpu metrics export")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-28 17:10:00 -05:00
Asad Kamal a3c4581806 drm/amd/pm: Skip reporting pcie width/speed on vfs
Skip reporting pcie link width/speed on vfs for
smu_v13_0_6 & smu_v13_0_2

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-28 17:09:53 -05:00
Linus Torvalds ed8d84530a This cycle, I2C removes the currently unused CLASS_DDC support
(controllers set the flag, but there is no client to use it). Also,
 CLASS_SPD support gets simplified to prepare removal in the future.
 Class based instantiation is not recommended these days anyhow.
 Furthermore, I2C core now creates a debugfs directory per I2C adapter.
 Current bus driver users were converted to use it. Then, there are also
 quite some driver updates. Standing out are patches for the wmt-driver
 which is refactored to support more variants. This is the rebased pull
 request where a large series for the designware driver was dropped.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmWph0UPHHdzYUBrZXJu
 ZWwub3JnAAoJEBQN5MwUoCm2kbIQAJotSmX0mM+nNPReYCMMiloxoxUwgpiErNwY
 WDrYQSezthAJ1LDsGOEeLcE4f4I+UcUHBO1BoERtOZg3cGtE0Ii5N845sp100S9O
 ktyaKS5utoErymThWFFrnZX60/8yKXUMzZmNzy96560gPcxbFyyyVhKfBSPzK9T+
 O8CGu7GRNqgWHlvH3yqGeCbreWYrYVSrluEpBu6807cp3zDxrU+autOnsewm5+md
 ka3DdqrbxJSblYK8fJKESAUgkRmZgYKbgl0iiCuqX+ib6I4OA3Z68ny7dl0fY3Ws
 vwt7d88SaBKDdJmUZyb/sm4aJsW69GN+ECZolxrn4TIw45k4tes2s6Ma5+TV3E9h
 Fd1RuqduFEqQ7cj31UPe2x8rgj5Fo5nbjCWxdZv+/3zF8+cHwi8iwkp2PScsPCsa
 fmCdehUE5DrgobsRNANe6XJzxY5wp2VNpGEWKeaQz2Z0/d9T1YFS7a8aewvhXoPC
 isZboi6GQh2XoE8UgGJa29VUuaIkUW513DwCGw8mz1yKN+kHGcsRXXjkjaZoQn3U
 MMvh/zkI2Hpy/m2R8PWeIq5XhLJvmlZ19JJzUHJIjXh9Fn9EVtXhlUleh6mzMfeM
 n8NOg7Eukep2sBgmaufkUKz2Jtogs59YDSXZEvqJjIkPM2Wi0hA18Qj+pilES1ff
 3ckk3mxY
 =8D3Q
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
 "This removes the currently unused CLASS_DDC support (controllers set
  the flag, but there is no client to use it).

  Also, CLASS_SPD support gets simplified to prepare removal in the
  future. Class based instantiation is not recommended these days
  anyhow.

  Furthermore, I2C core now creates a debugfs directory per I2C adapter.
  Current bus driver users were converted to use it.

  Finally, quite some driver updates. Standing out are patches for the
  wmt-driver which is refactored to support more variants.

  This is the rebased pull request where a large series for the
  designware driver was dropped"

* tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
  MAINTAINERS: use proper email for my I2C work
  i2c: stm32f7: add support for stm32mp25 soc
  i2c: stm32f7: perform I2C_ISR read once at beginning of event isr
  dt-bindings: i2c: document st,stm32mp25-i2c compatible
  i2c: stm32f7: simplify status messages in case of errors
  i2c: stm32f7: perform most of irq job in threaded handler
  i2c: stm32f7: use dev_err_probe upon calls of devm_request_irq
  i2c: i801: Add lis3lv02d for Dell XPS 15 7590
  i2c: i801: Add lis3lv02d for Dell Precision 3540
  i2c: wmt: Reduce redundant: REG_CR setting
  i2c: wmt: Reduce redundant: function parameter
  i2c: wmt: Reduce redundant: clock mode setting
  i2c: wmt: Reduce redundant: wait event complete
  i2c: wmt: Reduce redundant: bus busy check
  i2c: mux: reg: Remove class-based device auto-detection support
  i2c: make i2c_bus_type const
  dt-bindings: at24: add ROHM BR24G04
  eeprom: at24: use of_match_ptr()
  i2c: cpm: Remove linux,i2c-index conversion from be32
  i2c: imx: Make SDA actually optional for bus recovering
  ...
2024-01-18 17:29:01 -08:00
Heiner Kallweit f21682b362 drm/amd/pm: Remove I2C_CLASS_SPD support
I2C_CLASS_SPD was used to expose the EEPROM content to user space,
via the legacy eeprom driver. Now that this driver has been removed,
we can remove I2C_CLASS_SPD support. at24 driver with explicit
instantiation should be used instead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2024-01-18 21:10:41 +01:00
Dinghao Liu 7a88f23e76 drm/amd/pm: fix a memleak in aldebaran_tables_init
When kzalloc() for smu_table->ecc_table fails, we should free
the previously allocated resources to prevent memleak.

Fixes: edd7942085 ("drm/amd/pm: add message smu to get ecc_table v2")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 18:09:14 -05:00
Dinghao Liu b0e5c88d8a drm/amd/pm: fix a memleak in aldebaran_tables_init
When kzalloc() for smu_table->ecc_table fails, we should free
the previously allocated resources to prevent memleak.

Fixes: edd7942085 ("drm/amd/pm: add message smu to get ecc_table v2")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:22 -05:00
Yifan Zhang 5bd8e05fe2 drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
is_mode1_reset_supported may be called before smu init, when smu_context
is unitialized in driver load/unload test. Call smu_cmn_get_smc_version
explicitly in is_mode1_reset_supported.

v2: apply to aldebaran in case is_mode1_reset_supported will be
uncommented (Candice Li)

Fixes: 710d9caec7 ("drm/amd/pm: drop most smu_cmn_get_smc_version in smu")
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 19:02:58 -04:00
Ma Jun 1958946858 drm/amd/pm: Support for getting power1_cap_min value
Support for getting power1_cap_min value on smu13 and smu11.
For other Asics, we still use 0 as the default value.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Yifan Zhang 710d9caec7 drm/amd/pm: drop most smu_cmn_get_smc_version in smu
smu_check_fw_version is called in smu hw init, thus smu if version
and version are garenteed to be stored in smu context. No need to
call smu_cmn_get_smc_version again after system boot up.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:27:17 -04:00
Lijo Lazar 8a2b51392a drm/amdgpu: Refactor FRU product information
Keep FRU related information together in a separate structure.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:52:08 -04:00
Le Ma c01c8523cb drm/amd/pm: integrate plpd allow/disallow into select_xgmi_plpd_policy in ppt level
The allow_xgmi_power_down(true/false) will be generally replaced by:
  - allow: select_xgmi_plpd_policy(XGMI_PLPD_DEFAULT)
  - disallow: select_xgmi_plpd_policy(XGMI_PLPD_DISALLOW)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:36:17 -04:00
Darren Powell 14bf1c475f amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 3
split switch statement into two and consolidate the common
   code for printing most of the types of clock speeds

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:34:05 -04:00
Darren Powell d62846f778 amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 2
Use variables to remove ternary expression in print statement
 and improve readability. This will help to optimize the code
 duplication in the switch statement
 Also Changed:
  replaced single_dpm_table->count as iterator in for loops
    with safer clocks_num_levels value
  replaced dpm_table.value usage with local var clocks_mhz

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:34:00 -04:00
Darren Powell 2ecf214b45 amdgpu/pm: Optimize emit_clock_levels for aldebaran - part 1
Use variables to remove the multiple nested ternary expressions
 and improve readability. This will help to optimize the code
 duplication in the switch statement
 Also Changed:
  Modify function aldebaran_get_clk_table to void function as it
    always returns 0
  Use const string "attempt_string" to cut down on repetition

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:33:54 -04:00
Darren Powell bc22d9e0ee amdgpu/pm: Replace print_clock_levels with emit_clock_levels for aldebaran
Replace print_clock_levels with emit_clock_levels for aldebaran
  * replace .print_clk_levels with .emit_clk_levels in aldebaran_ppt_funcs
  * added extra parameter int *offset
  * removed var size, uses arg *offset instead
  * removed call to smu_cmn_get_sysfs_buf
  * errors are returned to caller
  * returns 0 on success
additional incidental changes
  * changed type of vars i, now to remove comparing mismatch types
  * renamed var s/now/cur_value/
  * switch statement default now returns -EINVAL
  * RAS Recovery returns -EBUSY

Based on
  commit b06b48d7dd ("amdgpu/pm: Implement emit_clk_levels for navi10")

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:33:15 -04:00
Colin Ian King ddf1639b54 drm/amd: Fix spelling mistake "throtting" -> "throttling"
There is a spelling mistake in variable throtting_events, rename
it to throttling_events.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:26:50 -04:00
Mario Limonciello 9366c2e87d drm/amd: Rename AMDGPU_PP_SENSOR_GPU_POWER
Use the clearer name `AMDGPU_PP_SENSOR_GPU_AVG_POWER` instead.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15 18:08:30 -04:00
Mario Limonciello 4c64f2e420 drm/amd: Fix the return for average power on aldebaran
Aldebaran can only return average socket power for the first die.
The other dies return 0.  Instead of returning a bad value, return
-EOPNOTSUPP so that the attribute will be hidden.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15 18:08:29 -04:00
Mario Limonciello 05228211e8 drm/amd: Drop unnecessary helper for aldebaran
aldebaran_get_gpu_power() is only called by one place and just calls
aldebaran_get_smu_metrics_data(), so drop the helper.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15 18:08:29 -04:00
Mario Limonciello 47f1724db4 drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`
Some GPUs have been overloading average power values and input power
values. To disambiguate these, introduce a new
`AMDGPU_PP_SENSOR_GPU_INPUT_POWER` and the GPUs that share input
power update to use this instead of average power.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2746
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15 18:08:29 -04:00
Ran Sun 8c3d5b404d drm/amd/pm: Clean up errors in aldebaran_ppt.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: space required after that ',' (ctx:VxV)
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: else should follow close brace '}'

Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-09 09:40:08 -04:00
Yang Wang d934e537c1 drm/amd/pm: fix smu i2c data read risk
the smu driver_table is used for all types of smu
tables data transcation (e.g: PPtable, Metrics, i2c, Ecc..).

it is necessary to hold this lock to avoiding data tampering
during the i2c read operation.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-07-12 12:06:16 -04:00
Lijo Lazar 9661bf6876 drm/amd/pm: Keep interface version in PMFW header
Use the interface version directly from PMFW interface header file rather
than keeping another definition in common smu13 file.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:54:37 -04:00
Evan Quan 3059cd8c5f drm/amd/pm: disable cstate feature for gpu reset scenario
Suggested by PMFW team and same as what did for gfxoff feature.
This can address some Mode1Reset failures observed on SMU13.0.0.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-18 22:12:20 -04:00
Darren Powell ceb180361e amdgpu/pm: Fix possible array out-of-bounds if SCLK levels != 2
[v2]
simplified fix after Lijo's feedback
 removed clocks.num_levels from calculation of loop count
   removed unsafe accesses to shim table freq_values
 retained corner case output only min,now if
   clocks.num_levels == 1 && now > min

 [v1]
added a check to populate and use SCLK shim table freq_values only
   if using dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL or
                         AMD_DPM_FORCED_LEVEL_PERF_DETERMINISM
removed clocks.num_levels from calculation of shim table size
removed unsafe accesses to shim table freq_values
   output gfx_table values if using other dpm levels
added check for freq_match when using freq_values for when now == min_clk

== Test ==
LOGFILE=aldebaran-sclk.test.log
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}

lspci -nn | grep "VGA\|Display"  > $LOGFILE
FILES="pp_od_clk_voltage
pp_dpm_sclk"

for f in $FILES
do
  echo === $f === >> $LOGFILE
  cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22 16:55:22 -04:00
Darren Powell 543faf57ee amdgpu/pm: Fix incorrect variable for size of clocks array
[v2]
No Changes, added RB
 [v1]
Size of pp_clock_levels_with_latency is PP_MAX_CLOCK_LEVELS, not MAX_NUM_CLOCKS.
Both are currently defined as 16, modifying in case one value is modified in future
Changed code in both arcturus and aldabaran.

Also removed unneeded var count, and used min_t function

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-22 16:55:12 -04:00
Alex Deucher da1db031cd drm/amdgpu/swsmu: add SMU mailbox registers in SMU context
So we can eventaully use them in the common smu code for
accessing the SMU mailboxes without needing a lot of
per asic logic in the common code.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:45:00 -04:00
Stanley.Yang cbd3e8440e drm/amdgpu: print umc correctable error address
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:44:15 -04:00
Stanley.Yang 2f6247dad2 drm/amdgpu/pm: support mca_ceumc_addr in ecctable
SMU add a new variable mca_ceumc_addr to record
umc correctable error address in EccInfo table,
driver side add EccInfo_V2_t to support this feature

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:43:36 -04:00
Lijo Lazar b0f4d663fc drm/amd/pm: Fix missing thermal throttler status
On aldebaran, when thermal throttling happens due to excessive GPU
temperature, the reason for throttling event is missed in warning
message. This patch fixes it.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-26 14:56:32 -04:00
Kent Russell 4a93d938a4 drm/amdgpu: Use metrics data function to get unique_id for Aldebaran
This is abstracted well enough in the get_metrics_data function, so use
the function

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-31 23:05:54 -04:00
Stanley.Yang d510eccfa5 drm/amd/pm: add send bad channel info function
support message SMU update bad channel info to update HBM bad channel
info in OOB table

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 14:25:16 -04:00
Evan Quan 2d282665d2 drm/amd/pm: update the data type for retrieving enabled ppfeatures
Use uint64_t instead of an array of uint32_t. This can avoid
some non-necessary intermediate uint32_t -> uint64_t conversions.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Luben Tuikov 00d6936dbd drm/amdgpu: Set FRU bus for Aldebaran and Vega 20
The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20
ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU
is sought through that bus "pointer" and not through the RAS bus "pointer".

Cc: Roy Sun <Roy.Sun@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 2f60dd5076 ("drm/amd: Expose the FRU SMU I2C bus")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 17:59:53 -05:00
Alex Deucher e281d5940a drm/amdgpu/swsmu/i2c: return an error if the SMU is not running
Return an error if someone tries to use the i2c bus when the
SMU is not running.  Otherwise we can end up sending commands
to the SMU which will either get ignored or could cause other
issues depending on what state the GPU and SMU are in.

Cc: Luben.Tuikov@amd.com
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-27 15:50:03 -05:00
Luben Tuikov 2f60dd5076 drm/amd: Expose the FRU SMU I2C bus
Expose both SMU I2C buses. Some boards use the same bus for both the RAS
and FRU EEPROMs and others use different buses.  This enables the
additional I2C bus and sets the right buses to use for RAS and FRU EEPROM
access.

Cc: Roy Sun <Roy.Sun@amd.com>
Co-developed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-27 15:49:48 -05:00
Evan Quan 56383e8f4d drm/amd/pm: drop unneeded smu->sensor_lock
As all those related APIs are already well protected by
adev->pm.mutex and smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Evan Quan da11407f06 drm/amd/pm: drop unneeded smu->metrics_lock
As all those related APIs are already well protected by
adev->pm.mutex and smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Evan Quan e0638c7abc drm/amd/pm: drop unneeded lock protection smu->mutex
As all those APIs are already protected either by adev->pm.mutex
or smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Evan Quan 837d542a09 drm/amd/pm: relocate the power related headers
Instead of centralizing all headers in the same folder. Separate them into
different folders and place them among those source files those who really
need them.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:14 -05:00
Evan Quan ebfc253335 drm/amd/pm: do not expose the smu_context structure used internally in power
This can cover the power implementation details. And as what did for
powerplay framework, we hook the smu_context to adev->powerplay.pp_handle.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14 17:51:14 -05:00
Tao Zhou 15084a8e16 drm/amd/pm: only send GmiPwrDnControl msg on master die (v3)
PMFW only returns 0 on master die and sends NACK back on other dies for
the message.

v2: only send GmiPwrDnControl msg on master die instead of all
dies.
v3: remove the pointer check for get_socket_id and get_die_id as they
should be present on Aldebaran.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11 15:44:28 -05:00
Kent Russell de0af8a65e drm/amdgpu: Only overwrite serial if field is empty
On Aldebaran, the serial may be obtained from the FRU. Only overwrite
the serial with the unique_id if the serial is empty. This will support
printing serial numbers for mGPU devices where there are 2 unique_ids
for the 2 GPUs, but only one serial number for the board

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-30 08:54:43 -05:00
sashank saye 4da8b63944 drm/amdgpu: Send Message to SMU on aldebaran passthrough for sbr handling
For Aldebaran chip passthrough case we need to intimate SMU
about special handling for SBR.On older chips we send
LightSBR to SMU, enabling the same for Aldebaran. Slight
difference, compared to previous chips, is on Aldebaran, SMU
would do a heavy reset on SBR. Hence, the word Heavy
instead of Light SBR is used for SMU to differentiate.

Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: sashank saye <sashank.saye@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-28 16:03:19 -05:00
Lijo Lazar 3c27abee3f drm/amd/pm: Fix xgmi link control on aldebaran
Fix the message argument.
	0: Allow power down
	1: Disallow power down

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-16 13:43:46 -05:00
Lijo Lazar 91e16017b6 drm/amd/pm: Skip power state allocation
Power states are not valid for arcturus and aldebaran, no need to
allocate memory.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:09:42 -05:00