Commit Graph

15900 Commits

Author SHA1 Message Date
Sunil Khatri eac3b274aa drm/amdgpu: add print support for sdma_v_4_4_2 ip_dump
Add print support for ip dump for sdma_v_4_4_2 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:37 -04:00
Alex Deucher 9c7e69d2e1 drm/amdgpu/gfx9: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:11 -04:00
Alex Deucher 7e60ecc2b7 drm/amdgpu/gfx8: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:08 -04:00
Alex Deucher 12fb3e9c88 drm/amdgpu/gfx7: enable wave kill for compute queues
It should work the same for compute as well as gfx.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:42:03 -04:00
Philip Yang fb91065851 drm/amdkfd: Refactor queue wptr_bo GART mapping
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.

Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.

Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:35:04 -04:00
Sunil Khatri db54a725d5 drm/amdgpu: Add sdma_v4_4_2 ip dump for devcoredump
Add ip dump for sdma_v4_4_2 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:58 -04:00
Sunil Khatri a11b36ba9c drm/amdgpu: add print support for sdma_v_4_0 ip_dump
Add print support for ip dump for sdma_v_4_0 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:49 -04:00
Philip Yang c86ad39140 drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer,
otherwise amdgpu_bo_unref clear the local variable, the original pointer
not set to NULL, this could cause use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:44 -04:00
Philip Yang f9e292cbba drm/amdkfd: kfd_bo_mapped_dev support partition
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter
instead of adev, to support spatial partition. This is only used by CRIU
checkpoint restore now. No functional change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:36 -04:00
Jane Jian caaf576292 drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
For VCN/JPEG 4.0.3, use only the local addressing scheme.

- Mask bit higher than AID0 range

v2
remain the case for mmhub use master XCC

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:28 -04:00
Lijo Lazar 49cfaebe48 drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.

This change is necessary for VCN v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:18 -04:00
Lijo Lazar 585e3fdb36 drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.

This change is necessary for JPEG v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:34:12 -04:00
Pierre-Eric Pelloux-Prayer fec5f8e8c6 drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit
Before this commit, only submits with both a BO_HANDLES chunk and a
'bo_list_handle' would be rejected (by amdgpu_cs_parser_bos).

But if UMD sent multiple BO_HANDLES, what would happen is:
* only the last one would be really used
* all the others would leak memory as amdgpu_cs_p1_bo_handles would
  overwrite the previous p->bo_list value

This commit rejects submissions with multiple BO_HANDLES chunks to
match the implementation of the parser.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:57 -04:00
Sunil Khatri 80237bfc03 drm/amdgpu: Add sdma_v4_0 ip dump for devcoredump
Add ip dump for sdma_v4_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:52 -04:00
Sunil Khatri abf839f5eb drm/amdgpu: add print support for sdma_v_7_0 ip_dump
Add print support for ip dump for sdma_v_7_0 in
devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:45 -04:00
Ma Ke 6472de66c0 drm/amd/amdgpu: Fix uninitialized variable warnings
Return 0 to avoid returning an uninitialized variable r.

Cc: stable@vger.kernel.org
Fixes: 230dd6bb61 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:38 -04:00
Ma Ke 93381e6b61 drm/amdgpu: fix a possible null pointer dereference
In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode()
is assigned to mode, which will lead to a NULL pointer dereference on
failure of drm_cvt_mode(). Add a check to avoid npd.

Cc: stable@vger.kernel.org
Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:25 -04:00
David Belanger 666f14cab2 drm/amdgpu: Fix atomics on GFX12
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:17 -04:00
Sunil Khatri 4df9e2200f drm/amdgpu: Add sdma_v7_0 ip dump for devcoredump
Add ip dump for sdma_v7_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:33:10 -04:00
Alex Deucher f2ac526349 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().

Enable this workaround while we are root causing the issue with
the HW team.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:58 -04:00
YiPeng Chai a7e8467fbe drm/amdgpu: Remove unused code
Remove unused code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:20 -04:00
YiPeng Chai 56631dee29 drm/amdgpu: optimize logging deferred error info
1. Use pa_pfn as the radix-tree key index to log
   deferred error info.
2. Use local array to store a row of bad pages.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:32:14 -04:00
YiPeng Chai 27cdf8c3ca drm/amdgpu: optimize umc v12 address conversion function
Split into 3 parts:
1. Convert soc physical address via ras ta.
2. Expand bad pages from soc physical address.
3. Dump bad address info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:31:59 -04:00
Sunil Khatri e84f798a93 drm/amdgpu: add print support for sdma_v_5_0 ip_dump
Add support for ip dump for sdma_v_5_0 in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri 0f1a93704a drm/amdgpu: Add sdma_v5_0 ip dump for devcoredump
Add ip dump for sdma_v5_0 for devcoredump for all
instances of sdma.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri ccb54d7d91 drm/amdgpu: add print support for sdma_v_6_0 ip_dump
Add print support for ip dump for sdma_v_6_0 in
devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri 1eba165aa4 drm/amdgpu: Add sdma_v6_0 ip dump for devcoredump
Add ip dump for sdma_v6_0 for devcoredump for all
instances of sdma.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri 00bb3223bf drm/amdgpu: fix the print message in devcoredump
Fix the memory type logged for gtt memory size
which is wrongly logged as visible vram size.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri 43796955a8 drm/amdgpu: fix the extra space between two functions
fix extra line space between two functions.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:09 -04:00
Sunil Khatri 08bed7e4ff drm/amdgpu: add print support for sdma_v_5_2 ip_dump
Add support for ip dump for sdma_v_5_2 in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:08 -04:00
Sunil Khatri f763c3b543 drm/amdgpu: Add sdma_v5_2 ip dump for devcoredump
Add ip dump for sdma_v5_2 for devcoredump for all
instances of sdma.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23 17:07:08 -04:00
YiPeng Chai b3fb79cda5 drm/amdgpu: add mutex to protect ras shared memory
Add mutex to protect ras shared memory.

v2:
  Add TA_RAS_COMMAND__TRIGGER_ERROR command call
  status check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:30 -04:00
Boyuan Zhang 7d75ef3736 drm/amdgpu/vcn: not pause dpg for unified queue
For unified queue, DPG pause for encoding is done inside VCN firmware,
so there is no need to pause dpg based on ring type in kernel.

For VCN3 and below, pausing DPG for encoding in kernel is still needed.

v2: add more comments
v3: update commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:04 -04:00
Boyuan Zhang ecfa23c8df drm/amdgpu/vcn: identify unified queue in sw init
Determine whether VCN using unified queue in sw_init, instead of calling
functions later on.

v2: fix coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:43:48 -04:00
Aurabindo Pillai 28814be882 drm/amd: Bump KMS_DRIVER_MINOR version
Increase the KMS minor version to indicate GFX12 DCC support since this
contains a major change in how DCC is managed across IPs like GFX, DCN
etc. This will be used mainly by userspace like Mesa to figure out
DCC support on GFX12 hardware.

v2: fix version number (Alex)

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-15 17:41:23 -04:00
Alex Deucher 1cff1010be drm/amdgpu/mes12: add missing opcode string
Fixes the indexing of the string array.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher 478cb8badf drm/amdgpu/mes11: update opcode strings
Add new packet.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Mario Limonciello 9d8c094dda
drm/amd: Add power_saving_policy drm property to eDP connectors
When the `power_saving_policy` property is set to bit mask
"Require color accuracy" ABM should be disabled immediately and
any requests by sysfs to update will return an -EBUSY error.

When the `power_saving_policy` property is set to bit mask
"Require low latency" PSR should be disabled.

When the property is restored to an empty bit mask ABM and PSR
can be enabled again.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703051722.328-3-mario.limonciello@amd.com
2024-07-10 17:00:07 -04:00
Alex Deucher 8030f6533e drm/amdgpu: remove exp hw support check for gfx12
Enable it by default.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 13:45:57 -04:00
YiPeng Chai e23300dfff drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
The problem case is as follows:
1. GPU A triggers a gpu ras reset, and GPU A drives
   GPU B to also perform a gpu ras reset.
2. After gpu B ras reset started, gpu B queried a DE
   data. Since the DE data was queried in the ras reset
   thread instead of the page retirement thread, bad
   page retirement work would not be triggered. Then
   even if all gpu resets are completed, the bad pages
   will be cached in RAM until GPU B's bad page retirement
   work is triggered again and then saved to eeprom.

This patch can save the bad pages to eeprom in time after gpu
ras reset is completed.

v2:
  1. Add the above description to code comments.
  2. Reuse existing function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:41 -04:00
YiPeng Chai c04706914d drm/amdgpu: flush all cached ras bad pages to eeprom
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.

v2:
  Put the same code into a function and reuse the function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:35 -04:00
Sunil Khatri c39385710c drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX12.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:22 -04:00
Sunil Khatri a85cc86cce drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX11.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:04 -04:00
Alex Deucher 7d570f56f1 drm/amdgpu/job: Replace DRM_INFO/ERROR logging
Use the dev_info/err variants so we get per device logging
in multi-GPU cases.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:55 -04:00
Sunil Khatri 948f2828a6 drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX10.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:48 -04:00
Lijo Lazar d02ddefc7e drm/amdgpu: Initialize VF partition mode
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions.
Fetch the partition information during init and initialize partition
nodes. There is no support to switch partition mode in VF mode, hence
disable the same.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:28 -04:00
Gavin Wan 5d64af40e3 drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.
sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have
sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For
Even numbered vfs, the sdma2 & sdma3 (irq srouce id
CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:19 -04:00
Thomas Hellström 4c44f89c5d drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves
To address the problem with hitches moving when bulk move
sublists are lru-bumped, register the list cursors with the
ttm_lru_bulk_move structure when traversing its list, and
when lru-bumping the list, move the cursor hitch to the tail.
This also means it's mandatory for drivers to call
ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when
initializing and finalizing the bulk move structure, so add
those calls to the amdgpu- and xe driver.

Compared to v1 this is slightly more code but less fragile
and hopefully easier to understand.

Changes in previous series:
- Completely rework the functionality
- Avoid a NULL pointer dereference assigning manager->mem_type
- Remove some leftover code causing build problems
v2:
- For hitch bulk tail moves, store the mem_type in the cursor
  instead of with the manager.
v3:
- Remove leftover mem_type member from change in v2.
v6:
- Add some lockdep asserts (Matthew Brost)
- Avoid NULL pointer dereference (Matthew Brost)
- No need to check bo->resource before dereferencing
  bo->bulk_move (Matthew Brost)

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240705153206.68526-5-thomas.hellstrom@linux.intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-07-09 12:39:33 +02:00
Zhigang Luo ee98fb71ba drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1
to avoid reading wrong WPTR from doorbell in sriov vf, set
CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:27 -04:00
Yang Wang 59f488be76 drm/amdgpu: add ras event state device attribute support
add amdgpu ras 'event_state' sysfs device attribute support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:13 -04:00
Yang Wang 12b435a40c drm/amdgpu: add ras POSION_CONSUMPTION event id support
add amdgpu ras POSION_CONSUMPTION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:37 -04:00
Yang Wang 5b9de2596f drm/amdgpu: add ras POSION_CREATION event id support
add amdgpu ras POSION_CREATION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:18 -04:00
Yang Wang 75ac6a2506 drm/amdgpu: refine amdgpu ras event id core code
v1:
- use unified event id to manage ras events
- add a new function amdgpu_ras_query_error_status_with_event() to accept
  event type as parameter.

v2:
add a warn log to show the location of function failure
when calling amdgpu_ras_mark_event(). (Tao Zhou)

v3:
change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL.

v4:
rename amdgpu_ras_get_recovery_event() to
amdgpu_ras_get_fatal_error_event().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:11 -04:00
Christian König 320debca1b drm/amdgpu: reject gang submit on reserved VMIDs
A gang submit won't work if the VMID is reserved and we can't flush out
VM changes from multiple engines at the same time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar b6ad109166 drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2
DPG mode is enabled for vcn and jpeg on VCN v4_0_5

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:40 -04:00
Yang Wang 332210c13a drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG
remove redundant semicolons in RAS_EVENT_LOG to avoid
code format check warning.

Fixes: b712d7c201 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:43 -04:00
Frank Min 54837bd2be drm/amdgpu: restore dcc bo tilling configs while moving
While moving buffer which has dcc tiling config, it is needed to restore
its original dcc tiling.

1. extend copy flag to cover tiling bits
2. add logic to restore original dcc tiling config

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:27 -04:00
Sunil Khatri c8714ac982 drm/amdgpu: add gfx queue support for gfx12 ipdump
Add support of all the CP GFX queues for gfx12 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:22 -04:00
Sunil Khatri 495e6173a4 drm/amdgpu: add cp queue registers for gfx12 ipdump
Add gfx12 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:15 -04:00
Sunil Khatri f0c6b79bfc drm/amdgpu: enable redirection of irq's for IH v7.0
Enable redirection of irq for pagefaults for specific
clients to avoid overflow without dropping interrupts.

So here we redirect the interrupts to another IH ring
i.e ring1 where only these interrupts are processed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:09 -04:00
Sunil Khatri 906219ec94 drm:amdgpu: enable IH ring1 for IH v7.0
We need IH ring1 for handling the pagefault
interrupts which over flow in default
ring for specific usecases.

Enable ring1 allows software to redirect
high interrupts to ring1 from default IH
ring.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:02 -04:00
Yifan Zha 33f23fc315 drm/amdgpu: Set no_hw_access when VF request full GPU fails
[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.

[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:56 -04:00
Sunil Khatri 2262acad0a drm/amdgpu: add print support for gfx12 ipdump
Add support of gfx12 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:49 -04:00
Sunil Khatri fbbbb62112 drm/amdgpu: add gfx12 register support in ipdump
Add general registers of gfx12 in ipdump for
devcoredump support.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:43 -04:00
Frank Min ffcc5745ed drm/amdgpu: update gfxhub client id for gfx12
update gfxhub client id for gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:36 -04:00
Tim Huang 064d92436b drm/amd/pm: avoid to load smu firmware for APUs
Certain call paths still load the SMU firmware for APUs,
which needs to be skipped.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:30 -04:00
YiPeng Chai 78347b651a drm/amdgpu: sysfs node disable query error count during gpu reset
Sysfs node disable query error count during gpu reset.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:14 -04:00
Daniel Vetter 6be146cf57 amd-drm-next-6.11-2024-07-03:
amdgpu:
 - Use vmalloc for dc_state
 - Replay fixes
 - Freesync fixes
 - DCN 4.0.1 fixes
 - DML fixes
 - DCC updates
 - Misc code cleanups and bug fixes
 - 8K display fixes
 - DCN 3.5 fixes
 - Restructure DIO code
 - DML1 fixes
 - DML2 fixes
 - GFX11 fix
 - GFX12 updates
 - GFX12 modifiers fixes
 - RAS fixes
 - IP dump fixes
 - Add some updated IP version checks
 _ Silence UBSAN warning
 
 radeon:
 - GPUVM fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZoW9RgAKCRC93/aFa7yZ
 2AfjAQDJH95ijhdClk4S7t0ofjam7UPhVkJiUl8u73BNbuul/QEAk7RadnWw8SwC
 wvSXMCyHY3XszMJ1AdWG+4uwq5axPQE=
 =aWkn
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-07-03:

amdgpu:
- Use vmalloc for dc_state
- Replay fixes
- Freesync fixes
- DCN 4.0.1 fixes
- DML fixes
- DCC updates
- Misc code cleanups and bug fixes
- 8K display fixes
- DCN 3.5 fixes
- Restructure DIO code
- DML1 fixes
- DML2 fixes
- GFX11 fix
- GFX12 updates
- GFX12 modifiers fixes
- RAS fixes
- IP dump fixes
- Add some updated IP version checks
_ Silence UBSAN warning

radeon:
- GPUVM fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703211314.2041893-1-alexander.deucher@amd.com
2024-07-05 12:02:12 +02:00
Daniel Vetter 71e9f407fd amd-drm-next-6.11-2024-06-28:
amdgpu:
 - JPEG 5.x fixes
 - More FW loading cleanups
 - Misc code cleanups
 - GC 12.x fixes
 - ASPM fix
 - DCN 4.0.1 updates
 - SR-IOV fixes
 - HDCP fix
 - USB4 fixes
 - Silence UBSAN warnings
 - MES submission fixes
 - Update documentation for new products
 - DCC updates
 - Initial ISP 4.x plumbing
 - RAS fixes
 - Misc small fixes
 
 amdkfd:
 - Fix missing unlock in error path for adding queues
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZn8qfAAKCRC93/aFa7yZ
 2H9QAQCZZsu8vps+HVUY5vihNXNL9TdUbrrroQBEyk+DsJzE9QD8CJW8/oiP8/fa
 /EsAWXNZUH+mPxRaVWTxp0j2P941pww=
 =uZbG
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-28:

amdgpu:
- JPEG 5.x fixes
- More FW loading cleanups
- Misc code cleanups
- GC 12.x fixes
- ASPM fix
- DCN 4.0.1 updates
- SR-IOV fixes
- HDCP fix
- USB4 fixes
- Silence UBSAN warnings
- MES submission fixes
- Update documentation for new products
- DCC updates
- Initial ISP 4.x plumbing
- RAS fixes
- Misc small fixes

amdkfd:
- Fix missing unlock in error path for adding queues

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628213135.427214-1-alexander.deucher@amd.com
2024-07-05 11:39:23 +02:00
Daniel Vetter 86634fa4e6 Linux 6.10-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH
 E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H
 wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR
 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp
 zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM
 quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg
 O6fPUV8=
 =pgUx
 -----END PGP SIGNATURE-----

Merge v6.10-rc6 into drm-next

The exynos-next pull is based on a newer -rc than drm-next. hence
backmerge first to make sure the unrelated conflicts we accumulated
don't end up randomly in the exynos merge pull, but are separated out.

Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma
code, and cherry-pick conflict in xe.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-07-05 10:47:28 +02:00
Sunil Khatri ea67deb03c drm/amdgpu: fix out of bounds access in gfx11 during ip dump
During ip dump in gfx11 the index variable is reused but is
not reinitialized to 0 and this causes the index calculation
to be wrong and access out of bound access.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:16 -04:00
Tim Huang 94845ea057 drm/amdgpu: add firmware for PSP IP v14.0.4
This patch is to add firmware for PSP 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:10 -04:00
Tim Huang b709f949f0 drm/amdgpu: enable mode2 reset for SMU IP v14.0.4
Set the default reset method to mode2 for SMU 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:02 -04:00
Tim Huang 38a16bfe6f drm/amdgpu: add SMU IP v14.0.4 discovery support
This patch is to add SMU 14.0.4 support

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:57 -04:00
Tim Huang 9cd2ad14d8 drm/amdgpu: add PSP IP v14.0.4 discovery support
This patch is to add PSP 14.0.4 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:36 -04:00
Tim Huang c7c3f786b9 drm/amdgpu: add PSP IP v14.0.4 support
This patch is to add PSP 14.0.4 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:30 -04:00
Tim Huang 614a9f5ed5 drm/amdgpu: add firmware for VPE IP v6.1.3
This patch is to add firmware for VPE 6.1.3.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:24 -04:00
Tim Huang ca15cd559f drm/amdgpu: add VPE IP v6.1.3 discovery support
This patch is to add VPE 6.1.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:19 -04:00
Tim Huang f3e2a425c6 drm/amdgpu: add VPE IP v6.1.3 support
This patch is to add VPE 6.1.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:13 -04:00
Tim Huang 410bb279a8 drm/amdgpu: Add NBIO IP v7.11.3 support
Enable setting soc21 common clockgating for NBIO 7.11.3.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:06 -04:00
Tim Huang 5aea871694 drm/amdgpu: add NBIO IP v7.11.3 discovery support
This patch is to add NBIO 7.11.3 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:06:00 -04:00
Tim Huang 6857669a22 drm/amdgpu: add firmware for SDMA IP v6.1.2
This patch is to add firmware for SDMA 6.1.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:53 -04:00
Tim Huang dfeccf4d54 drm/amdgpu: add SDMA IP v6.1.2 discovery support
This patch is to add SDMA 6.1.2 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:41 -04:00
Tim Huang 4448b1ff4d drm/amdgpu: add firmware for GC IP v11.5.2
This patch is to add firmware for GC 11.5.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:35 -04:00
Tim Huang 23c1ea0241 drm/amdgpu: add GC IP v11.5.2 to GC 11.5.0 family
This patch is to add GC 11.5.2 to GC 11.5.0 family.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:23 -04:00
Tim Huang 43e4cc2299 drm/amdgpu: add GC IP v11.5.2 soc21 support
Add CG and PG flags for GFX IP v11.5.2 and
PG flags for VCN IP v4.0.5.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Li Ma <li.ma@amd.com>
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:15 -04:00
Tim Huang 98392782df drm/amdgpu: add tmz support for GC IP v11.5.2
Add tmz support for GC 11.5.2.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:09 -04:00
Tim Huang 02cf3ed627 drm/amdgpu: add GFXHUB IP v11.5.2 support
This patch is to add GFXHUB 11.5.2 support.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:05:03 -04:00
Tim Huang b6a343df46 drm/amdgpu: initialize GC IP v11.5.2
Initialize GC 11.5.2 and set gfx hw configuration.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:04:57 -04:00
Sunil Khatri 425c4a6f8b drm/amdgpu: fix out of bounds access in gfx10 during ip dump
During ip dump in gfx10 the index variable is reused but is
not reinitialized to 0 and this causes the index calculation
to be wrong and access out of bound access.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:04:49 -04:00
Marek Olšák f340f2bad1 drm/amdgpu: rewrite convert_tiling_flags_to_modifier_gfx12
There were multiple bugs, like checking SWIZZLE_MODE before checking
GFX12_SWIZZLE_MODE, which has undefined behavior.

The function had no effect before (it always returned -EINVAL).

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Hawking Zhang d3dbccacfd drm/amdgpu: Fix hbm stack id in boot error report
To align with firmware, hbm id field 0x1 refers to
hbm stack 0, 0x2 refers to hbm statck 1.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák 0d3157d04d drm/amdgpu: add amdgpu_framebuffer::gfx12_dcc
amdgpu_framebuffer doesn't have tiling_flags, so we need this.

amdgpu_display_get_fb_info never gets NULL parameters, so checking for NULL
was useless.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák 8dd1426e2c drm/amdgpu: handle gfx12 in amdgpu_display_verify_sizes
It verified GFX9-11 swizzle modes on GFX12, which has undefined behavior.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Hawking Zhang c2fad73174 drm/amdgpu: Correct register used to clear fault status
Driver should write to fault_cntl registers to do
one-shot address/status clear.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:47 -04:00
Marek Olšák fd536d2e12 drm/amdgpu: don't use amdgpu_lookup_format_info on gfx12
It only uses fields for GFX9-11 related to the separate DCC buffer,
which doesn't exist in GFX12.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák 30fb9cad6f drm/amdgpu/gfx12: remove GDS leftovers
GDS doesn't exist in gfx12. The incomplete packet allows userspace to hang
the hw from the kernel.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák e5f6bfe402 drm/amdgpu/gfx12: remove superfluous cache flags
If any INV flags are needed, they should be executed via ACQUIRE_MEM
before INDIRECT_BUFFER.

GLM flags are also removed because the hw ignores them.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák b16ec6300f drm/amdgpu/gfx11: remove superfluous cache flags
If any INV flags are needed, they should be executed via ACQUIRE_MEM
before INDIRECT_BUFFER.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:46 -04:00
Marek Olšák 11317d2963 drm/amdgpu: check for LINEAR_ALIGNED correctly in check_tiling_flags_gfx6
Fix incorrect check.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01 16:10:36 -04:00
Mario Limonciello 15eb8573ad drm/amd: Don't initialize ISP hardware without FW
Although designs may contain an ISP IP block, the camera might be a USB
camera. Because of this the ISP firmware is considered optional from
amdgpu.  However if the firmware doesn't get loaded the hardware should
not be initialized.

Adjust the return code for early init to ensure the IP block doesn't go
through the other init and fini sequences. Also decrease the message
about firmware load failure to debug so it's not as alarming to users.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Yang Wang 71fe449484 drm/amdgpu: refine isp firmware loading
refine isp firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi 75be61aa77 drm/amd/amdgpu: Enable MMHUB prefetch for ISP v4.1.0 and 4.1.1
Remove temporary WA to disable ISP prefetch as MMHUB SAW is initialized
to support ISP HW access GART memory using the TLSi path with prefetch
enabled.

Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi 7c2d3112b2 drm/amd/amdgpu: Fix 'snprintf' output truncation warning
snprintf can truncate the output fw filename if the isp ucode_prefix
exceeds 29 characters. Knowing ISP ucode_prefix is in the format
isp_x_x_x, limiting the size of ucode_prefix[] to 10 characters
to fix the warning.

Fixes the below warning:

   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c: In function 'isp_early_init':
   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:58: warning: 'snprintf'
   output may be truncated before the last format character
   [-Wformat-truncation=]
     192 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
         |                                                          ^
   In function 'isp_load_fw_by_psp',
       inlined from 'isp_early_init' at drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:218:8:
   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c:192:9: note: 'snprintf' output between 12 and 41 bytes into a destination of size 40
     192 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi 062666ffbc drm/amd/amdgpu: Disable MMHUB prefetch for ISP v4.1.1
Disable MMHUB prefetch for ISP v4.1.1 as a temporary WA until
the GART supports MMHUB TLSi and SAW for ISP HW to access
GART memory using the TLSi path.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi 05bafe95e5 drm/amd/amdgpu: Add ISP4.1.0 and ISP4.1.1 modules
Add independent IP centric modules for ISP4.1.0 and ISP4.1.1 hw blocks.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi 0253d718a0 drm/amd/amdgpu: Map ISP interrupts as generic IRQs
Map ISP IH interrupts to Linux generic IRQ for ISP driver to
handle the interrupts using MFD IORESOURCE_IRQ resource.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:40 -04:00
Alex Deucher 8930b90be6 drm/amdgpu: fix Kconfig for ISP v2
Add new config option and set proper dependencies for ISP.

v2: add missed guards, drop separate Kconfig

Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Pratap Nirujogi <pratap.nirujogi@amd.com>
2024-06-27 17:34:40 -04:00
Pratap Nirujogi d232584ae3 drm/amd/amdgpu: Enable ISP in amdgpu_discovery
Enable ISP for ISP V4.1.0 and V4.1.1 in amdgpu_discovery.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Pratap Nirujogi 8fcbfd53ea drm/amd/amdgpu: Add ISP driver support
Add the isp driver in amdgpu to support ISP device on the APUs that
supports ISP IP block. ISP hw block is used for camera front-end, pre
and post processing operations.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Pratap Nirujogi 772e4d56da drm/amd/amdgpu: Add ISP support to amdgpu_discovery
ISP hw block is supported in some of the AMD GPU versions, add support
to discover ISP IP in amdgpu_discovery.

v2: squash in documentation update (Alex)

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:39 -04:00
Sonny Jiang d4b8386c86 drm/amdgpu/jpeg5: Add support for DPG mode
Add DPG support for JPEG 5.0

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:33 -04:00
Frank Min 291af3f598 drm/amdgpu: tolerate allocating GTT bo with dcc flag
Do not return failure for allocating GTT bo with dcc flag on gfx12.
This will improve compatibility for UMD.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:34:17 -04:00
Jane Jian b17eecc08f drm/amdgpu: normalize registers as local xcc to read/write in gfx_v9_4_3
[WHY]
sriov has the higher bit violation when flushing tlb

[HOW]
normalize the registers to keep lower 16-bit(dword aligned) to aviod higher bit violation
RLCG will mask xcd out and always assume it's accessing its own xcd

v2
add check in wait mem that only do the normalization on regspace

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Yiqing Yao <YiQing.Yao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:33:27 -04:00
YiPeng Chai f852c9795c drm/amdgpu: add gpu reset check and exception handling
Add gpu reset check and exception handling for
page retirement.

v2:
  Clear poison consumption messages cached in fifo after
non mode-1 reset.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:32:12 -04:00
YiPeng Chai e278849cb2 drm/amdgpu: refine poison consumption interrupt handler
1. The poison fifo is only used for poison consumption
   requests.
2. Merge reset requests when poison fifo caches multiple
   poison consumption messages

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:32:06 -04:00
YiPeng Chai 5f08275cfd drm/amdgpu: refine poison creation interrupt handler
In order to apply to the case where a large number
of ras poison interrupts:
1. Change to use variable to record poison creation
   requests to avoid fifo full.
2. Prioritize handling poison creation requests
   instead of following the order of requests
   received by the driver.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:43 -04:00
Vignesh Chander cbda2758d8 drm/amdgpu: process RAS fatal error MB notification
For RAS error scenario, VF guest driver will check mailbox
and set fed flag to avoid unnecessary HW accesses.
additionally, poll for reset completion message first
to avoid accidentally spamming multiple reset requests to host.

v2: add another mailbox check for handling case where kfd detects
timeout first

v3: set host_flr bit and use wait_for_reset

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:37 -04:00
YiPeng Chai 78146c1dcd drm/amdgpu: add variable to record the deferred error number read by driver
Add variable to record the deferred error
number read by driver.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:31:20 -04:00
Vignesh Chander 29b6985de5 drm/amdgpu: Use dev_ prints for virtualization as it supports multi adapter
So we can get clearer per device logging.

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:30:39 -04:00
Danijel Slivka afbf7955ff drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts
Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.

How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW.

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:30:27 -04:00
Kenneth Feng bf826ba9b4 Revert "drm/amd/amdgpu: add module parameter for jpeg"
This reverts commit d3620eeae8.
Revert this due to a final solution:
commit ed3165d660 ("drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback")

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:40 -04:00
Lijo Lazar fe86c4d1a2 drm/amdgpu: Don't show false warning for reg list
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give
TEE_ERR_CANCEL response on second time load. Avoid printing warn
message for it.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Julia Zhang 79ea35c7d8 drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer.

Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com>
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Hawking Zhang bdbdc7cecd drm/amdgpu: Fix smatch static checker warning
adev->gfx.imu.funcs could be NULL

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Bob Zhou 9ff2e14cf0 drm/amdgpu: add missing error handling in function amdgpu_gmc_flush_gpu_tlb_pasid
Fix the unchecked return value warning reported by Coverity,
so add error handling.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:39 -04:00
Hawking Zhang 9da0f77367 drm/amdgpu: Fix register access violation
fault_status is read only register. fault_cntl
is not accessible from guest environment.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Frank Min 748bd8ebae drm/amdgpu: access ltr through pci cfg space
Access ltr through pci cfg space instead of mmio while programing
aspm on gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Yang Wang 6bab222b8b drm/amdgpu: refine gfx12 firmware loading
refine gfx12 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:10:35 -04:00
Frank Min 541fe90ee6 drm/amdgpu: update MTYPE mapping for gfx12
gfx12 only support MTYPE UC and NC, so update it accordingly.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:47 -04:00
Pierre-Eric Pelloux-Prayer c71c9aafd5 amdgpu: don't dereference a NULL resource in sysfs code
dma_resv_trylock being successful doesn't guarantee that bo->tbo.base.resv
is not NULL, so check its validity before using it.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Lijo Lazar e779af8e8b drm/amdgpu: Fix pci state save during mode-1 reset
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Yang Wang dab70d9f65 drm/amdgpu: refine gfx11 firmware loading
refine gfx11 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:46 -04:00
Sonny Jiang ed3165d660 drm/amdgpu/jpeg5: reprogram doorbell setting after power up for each playback
Doorbell needs to be configured after power up during each playback

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:41 -04:00
Alex Deucher 83edf00d89 drm/amdgpu/atomfirmware: fix parsing of vram_info
v3.x changed the how vram width was encoded.  The previous
implementation actually worked correctly for most boards.
Fix the implementation to work correctly everywhere.

This fixes the vram width reported in the kernel log on
some boards.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27 17:09:41 -04:00
Dave Airlie 365aa9f573 amd-drm-next-6.11-2024-06-22:
amdgpu:
 - HPD fixes
 - PSR fixes
 - DCC updates
 - DCN 4.0.1 fixes
 - FAMS fixes
 - Misc code cleanups
 - SR-IOV fixes
 - GPUVM TLB flush cleanups
 - Make VCN less verbose
 - ACPI backlight fixes
 - MES fixes
 - Firmware loading cleanups
 - Replay fixes
 - LTTPR fixes
 - Trap handler fixes
 - Cursor and overlay fixes
 - Primary plane zpos fixes
 - DML 2.1 fixes
 - RAS updates
 - USB4 fixes
 - MALL fixes
 - Reserved VMID fix
 - Silence UBSAN warnings
 
 amdkfd:
 - Misc code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZnbsMgAKCRC93/aFa7yZ
 2GTUAQDGl46/pxpFRmZ8n3Hy6OCnmvoFqI9Re1uAc2RbQNNOAgEAi72PwS2iquU/
 69ectl+oi8P/yNMxP4rO1KgOP3AMsg8=
 =bvKZ
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-22:

amdgpu:
- HPD fixes
- PSR fixes
- DCC updates
- DCN 4.0.1 fixes
- FAMS fixes
- Misc code cleanups
- SR-IOV fixes
- GPUVM TLB flush cleanups
- Make VCN less verbose
- ACPI backlight fixes
- MES fixes
- Firmware loading cleanups
- Replay fixes
- LTTPR fixes
- Trap handler fixes
- Cursor and overlay fixes
- Primary plane zpos fixes
- DML 2.1 fixes
- RAS updates
- USB4 fixes
- MALL fixes
- Reserved VMID fix
- Silence UBSAN warnings

amdkfd:
- Misc code cleanups

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240622152523.2267072-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-06-27 17:18:49 +10:00
Lijo Lazar 48880f9686 drm/amdgpu: Don't show false warning for reg list
If reg list is already loaded on PSP 13.0.2 SOCs, psp will give
TEE_ERR_CANCEL response on second time load. Avoid printing warn
message for it.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-25 14:22:56 -04:00
Julia Zhang bcfa48ff78 drm/amdgpu: avoid using null object of framebuffer
Instead of using state->fb->obj[0] directly, get object from framebuffer
by calling drm_gem_fb_get_obj() and return error code when object is
null to avoid using null object of framebuffer.

Reported-by: Fusheng Huang <fusheng.huang@ecarxgroup.com>
Signed-off-by: Julia Zhang <Julia.Zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-06-25 14:22:08 -04:00
Lijo Lazar 74fa02c4a5 drm/amdgpu: Fix pci state save during mode-1 reset
Cache the PCI state before bus master is disabled. The saved state is
later used for other cases like restoring config space after mode-2
reset.

Fixes: 5c03e5843e ("drm/amdgpu:add smu mode1/2 support for aldebaran")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-25 14:13:12 -04:00
Alex Deucher f6f49dda49 drm/amdgpu/atomfirmware: fix parsing of vram_info
v3.x changed the how vram width was encoded.  The previous
implementation actually worked correctly for most boards.
Fix the implementation to work correctly everywhere.

This fixes the vram width reported in the kernel log on
some boards.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-06-25 14:11:46 -04:00
Dave Airlie f680df51ca drm-misc-next for 6.11:
UAPI Changes:
   - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION
 
 Core Changes:
   - connector: Create a set of helpers to help with HDMI support
   - fbdev: Create memory manager optimized fbdev emulation
   - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer
 
 Driver Changes:
   - Remove driver owner assignments
   - Allow more drivers to compile with COMPILE_TEST
   - Conversions to drm_edid
   - ivpu: hardware scheduler support, profiling support, improvements
     to the platform support layer
   - mgag200: general reworks and improvements
   - nouveau: Add NVreg_RegistryDwords command line option
   - rockchip: Conversion to the hdmi helpers
   - sun4i: Conversion to the hdmi helpers
   - vc4: Conversion to the hdmi helpers
   - v3d: Perf counters improvements
   - zynqmp: IRQ and debugfs improvements
   - bridge:
     - Remove redundant checks on bridge->encoder
   - panels:
     - Switch panels from register table initialization to proper code
     - Now that the panel code tracks the panel state, remove every
       ad-hoc implementation in the panel drivers
     - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
       13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
       nv110wum-l60, IVO t109nw41
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZlhUKAAKCRAnX84Zoj2+
 dgHoAYDTpShgXFXnlnMtqZr+ZuShcjcwiqzwM4qNWdtyji9MONtJJU3ZQnGlnXbI
 ZU+oZP0Bf0PyT0/8bf+rmZBJ1UdAxt2IQaLkP1tTHOad4E+KlcL5n1opzMi160mB
 EZSm9f7aNw==
 =bZPt
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-05-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.11:

UAPI Changes:
  - Deprecate DRM date and return a 0 date in DRM_IOCTL_VERSION

Core Changes:
  - connector: Create a set of helpers to help with HDMI support
  - fbdev: Create memory manager optimized fbdev emulation
  - panic: Allow to select fonts, improve drm_fb_dma_get_scanout_buffer

Driver Changes:
  - Remove driver owner assignments
  - Allow more drivers to compile with COMPILE_TEST
  - Conversions to drm_edid
  - ivpu: hardware scheduler support, profiling support, improvements
    to the platform support layer
  - mgag200: general reworks and improvements
  - nouveau: Add NVreg_RegistryDwords command line option
  - rockchip: Conversion to the hdmi helpers
  - sun4i: Conversion to the hdmi helpers
  - vc4: Conversion to the hdmi helpers
  - v3d: Perf counters improvements
  - zynqmp: IRQ and debugfs improvements
  - bridge:
    - Remove redundant checks on bridge->encoder
  - panels:
    - Switch panels from register table initialization to proper code
    - Now that the panel code tracks the panel state, remove every
      ad-hoc implementation in the panel drivers
    - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
      13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE
      nv110wum-l60, IVO t109nw41

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530-hilarious-flat-magpie-5fa186@houat
2024-06-21 10:30:31 +10:00
Likun Gao ed5a4484f0 drm/amdgpu: init TA fw for psp v14
Add support to init TA firmware for psp v14.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 18:25:58 -04:00
Christian König e356d321d0 drm/amdgpu: cleanup MES11 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: eef016ba89 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 18:25:58 -04:00
Christian König 8bd82363e2 drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
This reverts commit b8c415e3bf ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39a ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-19 14:17:25 -04:00
Harish Kasiviswanathan 49c9ffabde drm/amdgpu: Indicate CU havest info to CP
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested

v2: Reset is_symmetric_cus for each loop

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 14:17:25 -04:00
Yunxiang Li 84801d4f1e drm/amdgpu: fix locking scope when flushing tlb
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-19 14:17:25 -04:00
Likun Gao 1ecef55893 drm/amdgpu: init TA fw for psp v14
Add support to init TA firmware for psp v14.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:52:43 -04:00
Yang Wang 017d0b67bf drm/amdgpu: refine gfx6 firmware loading
refine gfx6 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:52:36 -04:00
Yang Wang a4fcb5f733 Revert "drm/amdgpu: change aca bank error lock type to spinlock"
This reverts commit f6bce954f4.

Revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[  602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[  602.668939] Call Trace:
[  602.668940]  <TASK>
[  602.668941]  dump_stack_lvl+0x4c/0x70
[  602.668945]  dump_stack+0x14/0x20
[  602.668946]  __schedule_bug+0x5a/0x70
[  602.668950]  __schedule+0x940/0xb30
[  602.668952]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668955]  ? hrtimer_reprogram+0x77/0xb0
[  602.668957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668959]  ? hrtimer_start_range_ns+0x126/0x370
[  602.668961]  schedule+0x39/0xe0
[  602.668962]  schedule_hrtimeout_range_clock+0xb1/0x140
[  602.668964]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  602.668966]  schedule_hrtimeout_range+0x17/0x20
[  602.668967]  usleep_range_state+0x69/0x90
[  602.668970]  psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[  602.669066]  psp_ras_invoke+0x75/0x1a0 [amdgpu]
[  602.669156]  psp_ras_query_address+0x9c/0x120 [amdgpu]
[  602.669245]  umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[  602.669337]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669339]  ? stack_depot_save+0x12/0x20
[  602.669342]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669343]  ? set_track_prepare+0x52/0x70
[  602.669346]  ? kmemleak_alloc+0x4f/0x90
[  602.669348]  ? __kmalloc_node+0x34b/0x450
[  602.669352]  amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[  602.669438]  mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[  602.669554]  mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[  602.669655]  amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[  602.669743]  ? kmemleak_free+0x36/0x60
[  602.669745]  ? kvfree+0x32/0x40
[  602.669747]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669749]  ? kfree+0x15d/0x2a0
[  602.669752]  amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[  602.669839]  amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[  602.669924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669925]  ? __call_rcu_common.constprop.0+0xa6/0x2b0
[  602.669929]  amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[  602.670014]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670017]  amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[  602.670103]  amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[  602.670187]  ? srso_alias_return_thunk+0x5/0

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: YiPeng Chai <yipeng.chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:51:23 -04:00
Yang Wang 8c9ee18019 Revert "drm/amdgpu: change bank cache lock type to spinlock"
This reverts commit 258ed689bc

revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.

[  602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[  602.668939] Call Trace:
[  602.668940]  <TASK>
[  602.668941]  dump_stack_lvl+0x4c/0x70
[  602.668945]  dump_stack+0x14/0x20
[  602.668946]  __schedule_bug+0x5a/0x70
[  602.668950]  __schedule+0x940/0xb30
[  602.668952]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668955]  ? hrtimer_reprogram+0x77/0xb0
[  602.668957]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.668959]  ? hrtimer_start_range_ns+0x126/0x370
[  602.668961]  schedule+0x39/0xe0
[  602.668962]  schedule_hrtimeout_range_clock+0xb1/0x140
[  602.668964]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  602.668966]  schedule_hrtimeout_range+0x17/0x20
[  602.668967]  usleep_range_state+0x69/0x90
[  602.668970]  psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[  602.669066]  psp_ras_invoke+0x75/0x1a0 [amdgpu]
[  602.669156]  psp_ras_query_address+0x9c/0x120 [amdgpu]
[  602.669245]  umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[  602.669337]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669339]  ? stack_depot_save+0x12/0x20
[  602.669342]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669343]  ? set_track_prepare+0x52/0x70
[  602.669346]  ? kmemleak_alloc+0x4f/0x90
[  602.669348]  ? __kmalloc_node+0x34b/0x450
[  602.669352]  amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[  602.669438]  mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[  602.669554]  mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[  602.669655]  amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[  602.669743]  ? kmemleak_free+0x36/0x60
[  602.669745]  ? kvfree+0x32/0x40
[  602.669747]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669749]  ? kfree+0x15d/0x2a0
[  602.669752]  amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[  602.669839]  amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[  602.669924]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.669925]  ? __call_rcu_common.constprop.0+0xa6/0x2b0
[  602.669929]  amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[  602.670014]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670017]  amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[  602.670103]  amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[  602.670187]  ? srso_alias_return_thunk+0x5/0xfbef5
[  602.670189]  ? __schedule+0x37d/0xb30
[  602.670191]  process_one_work+0x176/0x350
[  602.670194]  worker_thread+0x2f7/0x420
[  602.670197]  ?

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:50:31 -04:00
Alex Deucher 19797687e6 drm/amdgpu: remove amdgpu_mes_fence_wait_polling()
No longer used so remove it.

Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:50:24 -04:00
Alex Deucher fffe347e14 drm/amdgpu: cleanup MES12 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: ade887c633 ("drm/amdgpu/mes12: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:59 -04:00
Yang Wang 3af2c80ae2 drm/amdgpu: refine gfx10 firmware loading
refine gfx10 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:53 -04:00
Yang Wang 23fc94795b drm/amdgpu: refine gfx9 firmware loading
refine gfx9 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:49:28 -04:00
Christian König de32462541 drm/amdgpu: cleanup MES11 command submission
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.

Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.

While at it cleanup the coding style.

Fixes: eef016ba89 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:48:25 -04:00
Christian König a6328c9c3d drm/amdgpu: fix using the reserved VMID with gang submit
We need to ensure that even when using a reserved VMID that the gang
members can still run in parallel.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:48:00 -04:00
Victor Lu b32563859d drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOV
SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked
for VF access.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:47:52 -04:00
Mukul Joshi 4d14a74054 Revert "drm/amdgpu: Add missing locking for MES API calls"
This reverts commit 3612702852.

This is causing a BUG message during suspend.

[   61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
[   61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14
[   61.603553] preempt_count: 1, expected: 0
[   61.603555] RCU nest depth: 0, expected: 0
[   61.603557] Preemption disabled at:
[   61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[   61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G        W          6.8.0+ #7
[   61.603795] Workqueue: events_unbound async_run_entry_fn
[   61.603801] Call Trace:
[   61.603803]  <TASK>
[   61.603806]  dump_stack_lvl+0x37/0x50
[   61.603811]  ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[   61.604007]  dump_stack+0x10/0x20
[   61.604010]  __might_resched+0x16f/0x1d0
[   61.604016]  __might_sleep+0x43/0x70
[   61.604020]  mutex_lock+0x1f/0x60
[   61.604024]  amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu]
[   61.604226]  gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu]
[   61.604422]  ? srso_alias_return_thunk+0x5/0xfbef5
[   61.604429]  amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu]
[   61.604621]  gfx_v11_0_hw_fini+0xda/0x100 [amdgpu]
[   61.604814]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[   61.605008]  amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu]
[   61.605175]  amdgpu_device_suspend+0xec/0x180 [amdgpu]

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-19 12:46:39 -04:00
Yang Wang 3a3be8bb97 drm/amdgpu: refine gfx8 firmware loading
refine gfx8 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:27 -04:00
Tao Zhou 09a3d8202d drm/amdgpu: set RAS fed status for more cases
Indicate fatal error for each RAS block and NBIO.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Tao Zhou 7e4371676e drm/amdgpu: create amdgpu_ras_in_recovery to simplify code
Reduce redundant code and user doesn't need to pay attention to RAS
details.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Tao Zhou 5f7697bbc1 drm/amdgpu: trigger mode1 reset for RAS RMA status
Check RMA status in bad page retirement flow.

v2: fix coding bugs in v1.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:18:26 -04:00
Yang Wang 9d26e0cfc2 drm/amdgpu: refine gfx7 firmware loading
refine gfx7 firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:15 -04:00
Christian König 030631e97b drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
This reverts commit b8c415e3bf ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39a ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").

Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.

The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.

When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.

The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.

v2: improve the commit message

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-14 16:17:13 -04:00
Yang Wang c37b8f7868 drm/amdgpu: refine imu firmware loading
refine imu firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang cd093c24ee drm/amdgpu: refine gmc firmware loading
refine gmc firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang 8d7ff60f36 drm/amdgpu: refine vpe firmware loading
refine vpe firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang b441e9ac9d drm/amdgpu: refine vcn firmware loading
refine vcn firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang 9817f06173 drm/amdgpu: move aca/mca init functions into ras_init() stage
adjust the function position to better match aca/mca fini code in ras_fini().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Bob Zhou be6a69b21a drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating()
To fix potential overflowed constant warning, modify the variables to u32
for getting the return value of RREG32_SOC15().

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Harish Kasiviswanathan 199d69d5f9 drm/amdgpu: Indicate CU havest info to CP
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested

v2: Reset is_symmetric_cus for each loop

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Lijo Lazar 3a86fdc422 drm/amdgpu: Skip coredump during resets for debug
Skip scheduling coredump when gpu reset is intentionally triggered
through debugfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang 3618fa26c8 drm/amdgpu: refine sdma firmware loading
refine sdma firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:12 -04:00
Yang Wang 8cae4b578e drm/amdgpu: refine psp firmware loading
refine psp firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Yang Wang bf349b036d drm/amdgpu: refine mes firmware loading
v1:
refine mes firmware loading

v2:
use dev_info instead of DRM_INFO

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Mukul Joshi 3612702852 drm/amdgpu: Add missing locking for MES API calls
Add missing locking at a few places when calling MES APIs to ensure
exclusive access to MES queue.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
Mario Limonciello 2fe87f54ab drm/amd/display: Set default brightness according to ACPI
Currently, amdgpu will always set up the brightness at 100% when it
loads.  However this is jarring when the BIOS has it previously
programmed to a much lower value.

The ACPI ATIF method includes two members for "ac_level" and "dc_level".
These represent the default values that should be used if the system is
brought up in AC and DC respectively.

Use these values to set up the default brightness when the backlight
device is registered.

v2: squash in ACPI fix

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:17:11 -04:00
David (Ming Qiang) Wu ee3942d9ab drm/amdgpu: drop some kernel messages in VCN code
Similar to commit 813e7d4cd0 where some kernel log
messages are dropped. With this commit, more log
messages in older version of VCN/JPEG code are dropped.

Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:16:51 -04:00
Yang Wang a777c9d70a drm/amdgpu: refine gpu_info firmware loading
refine gpu_info firmware loading

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yang Wang 1bfe5e7746 drm/amdgpu: enhance amdgpu_ucode_request() function flexibility
v1:
Adding formatting string feature to improve function flexibility.

v2:
modify macro name to ADMGPU_UCODE_MAX_NAME.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Bob Zhou 37f432481d drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15()
To fix potential overflowed constant warning reported by Coverity,
modify the variables to uint32_t.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li 18f2525d31 drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb
We need to take the reset domain lock before flush hdp. We can't put the
lock inside amdgpu_device_flush_hdp itself because it is used during
reset where we already take the write side lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li 9c33e5fd4f drm/amdgpu: fix locking scope when flushing tlb
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-06-14 16:15:59 -04:00
Yunxiang Li ba531117a8 drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable
Here since we are in reset and takes the reset_domain write side lock
already. We can't use the flush tlb helper which tries to take the read
side.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li c1f9d82b92 drm/amdgpu: use helper in amdgpu_gart_unbind
When amdgpu_gart_invalidate_tlb helper is introduced this part was left
out of the conversion. Avoid the code duplication here.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:59 -04:00
Yunxiang Li 4b0e76e4c1 drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover
At this point the gart is not set up, there's no point to invalidate tlb
here and it could even be harmful.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Yunxiang Li 5c0a1cdd17 drm/amdgpu: fix sriov host flr handler
We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.

In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Yunxiang Li b3948ad1ac drm/amdgpu: add skip_hw_access checks for sriov
Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Eric Huang bac640ddb5 drm/amdgpu: add reset source in various cases
To fullfill the reset event description.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Eric Huang 7bed1df814 drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc
amdgpu_job_ring may return NULL, which causes kernel NULL
pointer error, using another way to print ring name instead
of ring->name.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 16:15:58 -04:00
Jesse Zhang 27b500b77b drm/amdgpu: remove dead code in atom_get_src_int
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7.
In the case of ATOM_ARG_IMM, the code cannot reach the default case.
So there is no need for "break".

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:34:10 -04:00
Frank Min faa64f633c drm/amdgpu: add sdma 7.0 support for copy dcc buffer
1. Add dcc buffer flag for copy buffer
2. Add sdma 7.0 support copy dcc buffer

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:22:14 -04:00
Likun Gao 7c85e97083 drm/amdgpu: support for DCC feature
Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit
when needed.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:21:51 -04:00
Alex Deucher 6b83b94a94 drm/amdgpu: add additional VM bits
Add additional VM PTE bits.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14 15:20:56 -04:00
Maxime Ripard 14731a640e
Merge drm/drm-fixes into drm-misc-fixes
Roll -rc3 and current drm/fixes in.

This will also unstuck our for-next branch.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-06-14 09:55:46 +02:00
Dave Airlie 1ddaaa2440 amd-drm-next-6.11-2024-06-07:
amdgpu:
 - DCN 4.0.x support
 - DCN 3.5 updates
 - GC 12.0 support
 - DP MST fixes
 - Cursor fixes
 - MES11 updates
 - MMHUB 4.1 support
 - DML2 Updates
 - DCN 3.1.5 fixes
 - IPS fixes
 - Various code cleanups
 - GMC 12.0 support
 - SDMA 7.0 support
 - SMU 13 updates
 - SR-IOV fixes
 - VCN 5.x fixes
 - MES12 support
 - SMU 14.x updates
 - Devcoredump improvements
 - Fixes for HDP flush on platforms with >4k pages
 - GC 9.4.3 fixes
 - RAS ACA updates
 - Silence UBSAN flex array warnings
 - MMHUB 3.3 updates
 
 amdkfd:
 - Contiguous VRAM allocations
 - GC 12.0 support
 - SDMA 7.0 support
 - SR-IOV fixes
 
 radeon:
 - Backlight workaround for iMac
 - Silence UBSAN flex array warnings
 
 UAPI:
 - GFX12 modifier and DCC support
   Proposed Mesa changes:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510
 - KFD GFX ALU exceptions
   Proposed ROCdebugger changes:
   08c760622b
   944fe1c141
 - KFD Contiguous VRAM allocation flag
   Proposed ROCr/HIP changes:
   f7b4a26991
   26e8530d05
   1d48f2a1ab
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZmNlVQAKCRC93/aFa7yZ
 2MLSAP9lflyG//+wC9WWX5OjvqFWO6qkhVm1w55xy6TwN9NkqQEA76TqmcNZ6rk1
 4o9RaYpMJQU275FvK1NvwUbl4PPQYAs=
 =cFxt
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-07:

amdgpu:
- DCN 4.0.x support
- DCN 3.5 updates
- GC 12.0 support
- DP MST fixes
- Cursor fixes
- MES11 updates
- MMHUB 4.1 support
- DML2 Updates
- DCN 3.1.5 fixes
- IPS fixes
- Various code cleanups
- GMC 12.0 support
- SDMA 7.0 support
- SMU 13 updates
- SR-IOV fixes
- VCN 5.x fixes
- MES12 support
- SMU 14.x updates
- Devcoredump improvements
- Fixes for HDP flush on platforms with >4k pages
- GC 9.4.3 fixes
- RAS ACA updates
- Silence UBSAN flex array warnings
- MMHUB 3.3 updates

amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes

radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings

UAPI:
- GFX12 modifier and DCC support
  Proposed Mesa changes:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510
- KFD GFX ALU exceptions
  Proposed ROCdebugger changes:
  08c760622b
  944fe1c141
- KFD Contiguous VRAM allocation flag
  Proposed ROCr/HIP changes:
  f7b4a26991
  26e8530d05
  1d48f2a1ab

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com
2024-06-11 14:01:55 +10:00
Arunpravin Paneer Selvam 31849bf07e drm/amdgpu: Fix the BO release clear memory warning
This happens when the amdgpu_bo_release_notify running
before amdgpu_ttm_set_buffer_funcs_status set the buffer
funcs to enabled.

check the buffer funcs enablement before calling the fill
buffer memory.

v2:(Christian)
  - Apply it only for GEM buffers and since GEM buffers are only
    allocated/freed while the driver is loaded we never run into
    the issue to clear with buffer funcs disabled.

v3:(Mario)
  - drop the stable tag as this will presumably go into a
    -fixes PR for 6.10

Log snip:
*ERROR* Trying to clear memory with ring turned off.
RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu]

Fixes: a68c7eaa7a ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Richard Gong <richard.gong@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240610180401.9540-1-Arunpravin.PaneerSelvam@amd.com
2024-06-10 23:46:33 +05:30
Tao Zhou b95fa494d6 drm/amdgpu: add RAS is_rma flag
Set the flag to true if bad page number reaches threshold.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Lin.Cao c8ad1bbbc2 drm/amdgpu: fix failure mapping legacy queue when FLR
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set
as false when mes hw fini to avoid duplicate initialization. But hw fini
will not be called when function level reset, which will cause mes hw
init be skipped during FLR, which will leads to mapping legacy queue
fail. Set this flag as false when post reset will fix this issue.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Eric Huang dbe2c4c8ab drm/amdkfd: add reset cause in gpu pre-reset smi event
reset cause is requested by customer as additional
info for gpu reset smi event.

v2: integerate reset sources suggested by Lijo Lazar

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Frank Min 978f5428c9 drm/amdgpu: Set PTE_IS_PTE bit for gfx12
Set PTE_IS_PTE bit while PRT is enabled on gfx12.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:14 -04:00
Eric Huang 2656e1ce78 drm/amdgpu: add reset sources in gpu reset context
reset source or reset cause is very useful info
for reset context, it will be used by events API.

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:13 -04:00
Shane Xiao 45bd39fb3b drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_VG10
This patch changes the implementation of AMDGPU_PTE_MTYPE_VG10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:25:13 -04:00
Alex Deucher a9ebd10482 Revert "drm/amdgpu/gfx11: enable gfx pipe1 hardware support"
This reverts commit 6670142d25.

Pierre-Eric reported problems with this on his navi33.  Revert
for now until we understand what is going wrong.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Pierre-eric.Pelloux-prayer@amd.com
2024-06-05 11:03:45 -04:00
Shane Xiao 3928290102 drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_NV10
This patch changes the implementation of AMDGPU_PTE_MTYPE_NV10,
clear the bits before setting the new one.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:43 -04:00
Yifan Zhang 301dfbfc84 drm/amdgpu: disable lane0 L1TLB and enable lane1 L1TLB
This patch to disable lane0 L1TLB and enable lane1 L1TLB.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:37 -04:00
Yifan Zhang b7e2170b87 drm/amdgpu: init SAW registers for mmhub v3.3
This patch to configure mmhub3.3 SAW registers

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:30 -04:00
Sunil Khatri a1a049bd59 drm/amdgpu: fix comments and error message for ipdump
Fix comments and error messages to rightly represent
the information.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:09 -04:00
Sunil Khatri 33837d62a4 drm/amdgpu: rename ip_dump_cp_queues to compute queues
Rename the variable ip_dump_cp_queues to ip_dump_compute_queue
as it represent compute queues.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:02:03 -04:00
Sunil Khatri 34b8d94b6c drm/amdgpu: add cp queue registers for gfx9 ipdump
Add gfx9 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:57 -04:00
Sunil Khatri 173ef9182a drm/amdgpu: add print support for gfx9 ipdump
Add support of gfx9 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:50 -04:00
Sunil Khatri 514dc965b2 drm/amdgpu: add gfx9 register support in ipdump
Add general registers of gfx9 in ipdump for
devcoredump support.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 11:01:28 -04:00
Srinivasan Shanmugam 745f7170db drm/amdgpu: Fix type mismatch in amdgpu_gfx_kiq_init_ring
This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring
function triggered by the snprintf function expecting unsigned char
arguments due to the '%hhu' format specifier, but receiving int and u32
arguments.

The issue occurred because the arguments xcc_id, ring->me, ring->pipe,
and ring->queue were of type int and u32, not unsigned char. This led to
a type mismatch when these arguments were passed to snprintf.

To resolve this, the snprintf line was modified to cast these arguments
to unsigned char. This ensures that the arguments are of the correct
type for the '%hhu' format specifier and resolves the warning.

Fixes the below:
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format
>> specifies type 'unsigned char' but the argument has type 'int'
>> [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                    ^~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format
>> specifies type 'unsigned char' but the argument has type 'u32' (aka
>> 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                            ^~~~~~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                                      ^~~~~~~~~~
   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
                    xcc_id, ring->me, ring->pipe, ring->queue);
                                                  ^~~~~~~~~~~
   4 warnings generated.

Fixes: 0ea5544555 ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:43 -04:00
Jesse Zhang c09d2eff81 drm/amdgu: fix Unintentional integer overflow for mall size
Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with type unsigned int (32 bits, unsigned)
is evaluated using 32-bit arithmetic,and then used in a context that expects an expression of type u64 (64 bits, unsigned).

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:36 -04:00
Hawking Zhang a474161e84 drm/amdgpu: Update programming for boot error reporting
AMDGPU_RAS_GPU_ERR_BOOT_STATUS field is no longer valid.
The polling sequence is also simplifed according to
the latest firmware change.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:26 -04:00
Alex Deucher 7d3b9668e6 drm/amdgpu/soc24: use common nbio callback to set remap offset
This fixes HDP flushes on systems with non-4K pages.

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:58:10 -04:00
Hawking Zhang 473af28d3e drm/amdgpu: Estimate RAS reservation when report capacity v2
Add estimate of how much vram we need to reserve for RAS
when caculating the total available vram.

v2: apply the change to MP0 v13_0_2 and v13_0_14

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:53 -04:00
Tao Zhou 76bec2a031 drm/amdgpu: use u32 for buf size in __amdgpu_eeprom_xfer
And also make sure the value of msg[1].len should be in the range of u16.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:47 -04:00
David (Ming Qiang) Wu 813e7d4cd0 drm/amdgpu: drop some kernel messages in VCN code
We have messages when the VCN fails to initialize and
there is no need to report on success.
Also PSP loading FWs is the default for production.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:34 -04:00
Shane Xiao eba791dc17 drm/amdgpu: Update the impelmentation of AMDGPU_PTE_MTYPE_GFX12
This patch changes the implementation of AMDGPU_PTE_MTYPE_GFX12,
clear the bits before setting the new one.
This fixed the potential issue that GFX12 setting memory to NC.

v2: Clear mtype field before setting the new one (Alex)
v3: Fix typo (Felix)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: longlyao <Longlong.Yao@amd.com>
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05 10:57:24 -04:00
Dave Airlie bb61cf46b6 amd-drm-fixes-6.10-2024-05-30:
amdgpu:
 - RAS fix
 - Fix colorspace property for MST connectors
 - Fix for PCIe DPM
 - Silence UBSAN warning
 - GPUVM robustness fix
 - Partition fix
 - Drop deprecated I2C_CLASS_SPD
 
 amdkfd:
 - Revert unused changes for certain 11.0.3 devices
 - Simplify APU VRAM handling
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZljezAAKCRC93/aFa7yZ
 2PZpAP43w/Q7aNgEpzdViO6QckU2GtOaRskavnzew4yxd5Y4xAD/U4vEQ3V7Rc0Q
 XRv4c+rJAVXIlfcS4guLvHSTyqzAPQY=
 =WQzB
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.10-2024-05-30' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.10-2024-05-30:

amdgpu:
- RAS fix
- Fix colorspace property for MST connectors
- Fix for PCIe DPM
- Silence UBSAN warning
- GPUVM robustness fix
- Partition fix
- Drop deprecated I2C_CLASS_SPD

amdkfd:
- Revert unused changes for certain 11.0.3 devices
- Simplify APU VRAM handling

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240530202316.2246826-1-alexander.deucher@amd.com
2024-05-31 08:38:15 +10:00
Rajneesh Bhardwaj a9bc5a19e4 drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:48 -04:00
Alex Deucher 1f327dfc84 drm/amdkfd: simplify APU VRAM handling
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified.  Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:06:15 -04:00
Jesse Zhang a0cf36546c drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:03:20 -04:00
Alex Deucher ba46b3bda2 drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
Use current speed/width on devices which don't support
dynamic PCIe switching.

Fixes: 466a7d1153 ("drm/amd: Use the first non-dGPU PCI device for BW limits")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 17:01:49 -04:00
Alex Deucher 6670142d25 drm/amdgpu/gfx11: enable gfx pipe1 hardware support
Enable gfx pipe1 hardware support.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Rajneesh Bhardwaj ef5c0f897e drm/amdgpu: Make CPX mode auto default in NPS4
On GFXIP9.4.3, make CPX mode as the default compute mode if the node is
setup in NPS4 memory partition mode. This change is only applicable for
dGPU, for APU, continue to use TPX mode.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher 2e216b1e6b drm/amdgpu/gfx11: handle priority setup for gfx pipe1
Set up pipe1 as a high priority queue.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher eab57bf22f drm/amdgpu/gfx11: select HDP ref/mask according to gfx ring pipe
Use correct ref/mask for differnent gfx ring pipe. Ported from
ZhenGuo's patch for gfx10.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Victor Skvortsov e864180ee4 drm/amdgpu: Add lock around VF RLCG interface
flush_gpu_tlb may be called from another thread while
device_gpu_recover is running.

Both of these threads access registers through the VF
RLCG interface during VF Full Access. Add a lock around this interface
to prevent race conditions between these threads.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Alex Deucher 7978c4d414 drm/amdkfd: simplify APU VRAM handling
With commit 89773b8559
("drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs")
big and small APU "VRAM" handling in KFD was unified.  Since AMD_IS_APU
is set for both big and small APUs, we can simplify the checks in
the code.

v2: clean up a few more places (Lang)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:48:30 -04:00
Yang Wang 3c603b1fa8 drm/amdgpu: fix typo in amdgpu_ras_aca_sysfs_read() function
fix typo "info.ue_count" in amdgpu_ras_aca_sysfs_read() function.

Fixes: 865d339763 ("drm/amdgpu: add aca deferred error type support")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:48 -04:00
Jesse Zhang 511a623fb4 drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:41 -04:00
David (Ming Qiang) Wu ab47fa8358 drm/amd/amdgpu: add AMD_PG_SUPPORT_VCN_DPG flag
AMD_PG_SUPPORT_VCN_DPG is needed for secure parts
and should/can be enabled by now.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Sonny Jiang <sonjiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:35 -04:00
Alex Deucher f2a1fbdd1f drm/amdgpu: drop MES 10.1 support v3
It was an enablement vehicle for MES 11 and was never
productized.  Remove it.

v2: drop additional checks in the GFX10 code.
v3: drop mes_api_def.h

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:09:01 -04:00
Alex Deucher 87dfeb47a5 drm/amdgpu: Adjust logic in amdgpu_device_partner_bandwidth()
Use current speed/width on devices which don't support
dynamic PCIe switching.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3289
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-29 14:08:44 -04:00
Maxime Ripard 375c4d1583
Merge drm/drm-next into drm-misc-next
Let's start the new release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-27 11:08:31 +02:00
Linus Torvalds 56fb6f9285 drm fixes for 6.10-rc1
nouveau:
 - fix bo metadata uAPI for vm bind
 
 panthor:
 - Fixes for panthor's heap logical block.
 - Reset on unrecoverable fault
 - Fix VM references.
 - Reset fix.
 
 xlnx:
 - xlnx compile and doc fixes.
 
 amdgpu:
 - Handle vbios table integrated info v2.3
 
 amdkfd:
 - Handle duplicate BOs in reserve_bo_and_cond_vms
 - Handle memory limitations on small APUs
 
 dp/mst:
 - MST null deref fix.
 
 bridge:
 - Don't let next bridge create connector in adv7511 to make probe work.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZQ9rAACgkQDHTzWXnE
 hr7kew//e+OwUJcock4rqpMsXrVwgse4bJ1H7gnHLmN/AQKsXWcaHKixJu+WkkQI
 62gyhOEiSH+J4Je6zi5JbIt9AKgIKyZeJqg/QeSMqUFJzYkPKhrhj0aX4AefuNJB
 dxCI1+UnkNdQ/LM+Dwb4sq757FX0/yPqhzBiytWEJJVHP4Th5SMRFmSs8YsT1Z4F
 bn220QFvPv+KrVV2eKEf6plJqYhs92nT8cVkNw8x9LhnivLRhyh7we3Ew6R2zouV
 Izm1nYx1CtTqaiHo3+UOeqRLzhqpmukzmJyML9zk/qt+s+YMj9yI/ie02EnTPqg6
 NQCqqmoDy1BpUa37N486Q7MFTiS076HMaQpYiDmVz3PIWWPvKZDqcC5vonoh99ra
 b88X4J4w3e747xZP7VDsnAOFsaLgcRah4JNtv8H3rcurbz0FrcV/0g/W/kt+Nenc
 k2m2s2hv56yWXd6B7bd3tfUe5dOwxc+CgiNddM5X3WPenEvnH+DKBOoFojSRl2o7
 LpRL7WhY6fceRUqVrHzfIPLYgdfz/Ho9LYST2jGIdKDnwV0Hw0sNfU9C0KOzm0J6
 rjgGEeMHgVdZPzSeXSmaO82mBRvR7Ke6da7FIazbocledOapsB5qxjhLvkRyULAp
 mlGZPRPJdNVM8slNLxEqrT/ugorIo5owtQdEnJmDJXFdozcLTMY=
 =pexB
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Some fixes for the end of the merge window, mostly amdgpu and panthor,
  with one nouveau uAPI change that fixes a bad decision we made a few
  months back.

  nouveau:
   - fix bo metadata uAPI for vm bind

  panthor:
   - Fixes for panthor's heap logical block.
   - Reset on unrecoverable fault
   - Fix VM references.
   - Reset fix.

  xlnx:
   - xlnx compile and doc fixes.

  amdgpu:
   - Handle vbios table integrated info v2.3

  amdkfd:
   - Handle duplicate BOs in reserve_bo_and_cond_vms
   - Handle memory limitations on small APUs

  dp/mst:
   - MST null deref fix.

  bridge:
   - Don't let next bridge create connector in adv7511 to make probe
     work"

* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu/atomfirmware: add intergrated info v2.3 table
  drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
  drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
  drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
  drm/bridge: adv7511: Attach next bridge without creating connector
  drm/buddy: Fix the warn on's during force merge
  drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
  drm/panthor: Call panthor_sched_post_reset() even if the reset failed
  drm/panthor: Reset the FW VM to NULL on unplug
  drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
  drm/panthor: Force an immediate reset on unrecoverable faults
  drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
  drm/panthor: Fix an off-by-one in the heap context retrieval logic
  drm/panthor: Relax the constraints on the tiler chunk size
  drm/panthor: Make sure the tiler initial/max chunks are consistent
  drm/panthor: Fix tiler OOM handling to allow incremental rendering
  drm: xlnx: zynqmp_dpsub: Fix compilation error
  drm: xlnx: zynqmp_dpsub: Fix few function comments
2024-05-24 17:28:02 -07:00
Hawking Zhang ec58991054 drm/amdgpu: correct hbm field in boot status
hbm filed takes bit 13 and bit 14 in boot status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 18:13:23 -04:00
Sunil Khatri 498906d376 drm/amdgpu: add gfx queue support for gfx11 ipdump
Add support of all the CP GFX queues for gfx11 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:14:08 -04:00
Sunil Khatri 368c33ac8a drm/amdgpu: add cp queue registers for gfx11 ipdump
Add gfx11 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:14:00 -04:00
Sunil Khatri 015a04a59e drm/amdgpu: add print support for gfx11 ipdump
Add support of gfx11 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:48 -04:00
Sunil Khatri b5812822d9 drm/amdgpu: add gfx11 registers support in ipdump
Add general registers of gfx11 in ipdump for
devcoredump support.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:41 -04:00
Sunil Khatri 836bc350a5 drm/amdgpu: add more device info to the devcoredump
Adding more device information:
a. PCI info
b. VRAM and GTT info
c. GDC config

Also correct the print layout and section information for
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:28 -04:00
Sunil Khatri 29b1fc665c drm/amdgpu: add prints in IP State dump
add prints before and after ip state is dumped.
It avoids user to think of system being
stuck/hung as dump could take some time after a
gpu hang.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:20 -04:00
Sunil Khatri 8444453dce drm/amdgpu: add gfx queue support of gfx10 in ipdump
Add gfx queue register for all instances in devcoredump
for gfx10.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:13 -04:00
Sunil Khatri 0f83227bc8 drm/amdgpu: Add cp queues support fro gfx10 in ipdump
Add support to dump registers of all instances of
cp queue registers of gfx10 to devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:13:06 -04:00
Sunil Khatri 74feef5667 drm/amdgpu: rename the ip_dump to ip_dump_core
Rename the memory pointer from ip_dump to ip_dump_core
to make it specific to core registers and rest other
registers to be dumped in their respective memories.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:12:58 -04:00
Lijo Lazar 621a4e9efb drm/amdgpu: Add CRC16 selection in config
KFD uses crc16 for gpu_id generation.

Fixes: 3ed181b8ff ("drm/amdkfd: Ensure gpu_id is unique")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405211405.TidTWIBX-lkp@intel.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:52 -04:00
Victor Zhao e21e0b7824 drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw
the inst passed to amdgpu_virt_rlcg_reg_rw should be physical instance.
Fix the miss matched code.

Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:38 -04:00
Jane Jian f889f9c68b drm/amdgpu - optimize rlc spm cntl
v1
- driver MMIO read the register to check whether write is required
- if write is required, sriov full time to use rlcg, otherwise use KIQ

v2
- include gfx v11 sriov runtime case

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:31 -04:00
Hawking Zhang cf85764e2b drm/amdgpu: correct hbm field in boot status
hbm filed takes bit 13 and bit 14 in boot status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:12 -04:00
Frank Min 8d7b149675 drm/amdgpu: program device_cntl2 through pci cfg space
device_cntl2 is accessible from pci config space, so program it through pci cfg
space instead of mmio.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:11:05 -04:00
Li Ma 19f0edd897 drm/amdgpu/atomfirmware: add intergrated info v2.3 table
[Why]
The vram width value is 0.
Because the integratedsysteminfo table in VBIOS has updated to 2.3.

[How]
Driver needs a new intergrated info v2.3 table too.
Then the vram width value will be correct.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:10:48 -04:00
Srinivasan Shanmugam 0ea5544555 drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring
This commit fixes a format truncation issue arosed by the snprintf
function potentially writing more characters into the ring->name buffer
than it can hold, in the amdgpu_gfx_kiq_init_ring function

The issue occurred because the '%d' format specifier could write between
1 and 10 bytes into a region of size between 0 and 8, depending on the
values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf
function could output between 12 and 41 bytes into a destination of size
16, leading to potential truncation.

To resolve this, the snprintf line was modified to use the '%hhu' format
specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu'
specifier is used for unsigned char variables and ensures that these
values are printed as unsigned decimal integers.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=]
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |                                                             ^~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647]
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |                                                  ^~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16
  332 |         snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  333 |                  xcc_id, ring->me, ring->pipe, ring->queue);
      |                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 345a36c4f1 ("drm/amdgpu: prefer snprintf over sprintf")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:10:03 -04:00
Jesse Zhang 806e8c5579 drm/amdgpu: fix invadate operation for pg_flags
Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0
regardless of the values of its operands.

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:50 -04:00
Jack Xiao fa10408116 drm/amdgpu/mes12: mes hw_fini fix for mode1 reset
Port mes11 hw_fini to mes12, fix for mode1 reset.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:42 -04:00
Jesse Zhang 64da71ea76 drm/amdgpu: fix invadate operation for umsch
Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0
regardless of the values of its operands

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:35 -04:00
Jesse Zhang 030ffd4d43 drm/admgpu: fix dereferencing null pointer context
When user space sets an invalid ta type, the pointer context will be empty.
So it need to check the pointer context before using it

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:09:15 -04:00
Yang Wang 9262f411dc drm/amdgpu: skip to create ras xxx_err_count node when ACA is enabled
skip to create 'xxx_err_count' node when ACA is enabled.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-23 15:08:25 -04:00
Jani Nikula 42505ab120 drm/amdgpu: remove amdgpu_connector_edid() and stop using edid_blob_ptr
amdgpu_connector_edid() copies the EDID from edid_blob_ptr as a side
effect if amdgpu_connector->edid isn't initialized. However, everywhere
that the returned EDID is used, the EDID should have been set
beforehands.

Only the drm EDID code should look at the EDID property, anyway, so stop
using it.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pan
Cc: amd-gfx@lists.freedesktop.org
Reviewed-by: Robert Foss <rfoss@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1463862965d76e9458551598fd4d287a08d3d264.1715353572.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-05-23 14:37:24 +03:00
Steven Rostedt (Google) 2c92ca849f tracing/treewide: Remove second parameter of __assign_str()
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.

This means that with:

  __string(field, mystring)

Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.

There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:

  git grep -l __assign_str | while read a ; do
      sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
      mv /tmp/test-file $a;
  done

I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.

Note, the same updates will need to be done for:

  __assign_str_len()
  __assign_rel_str()
  __assign_rel_str_len()

I tested this with both an allmodconfig and an allyesconfig (build only for both).

[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/

Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>	# xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-22 20:14:47 -04:00
Li Ma e64e8f7c17 drm/amdgpu/atomfirmware: add intergrated info v2.3 table
[Why]
The vram width value is 0.
Because the integratedsysteminfo table in VBIOS has updated to 2.3.

[How]
Driver needs a new intergrated info v2.3 table too.
Then the vram width value will be correct.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-05-22 12:01:39 -04:00
Linus Torvalds f0bae243b2 pci-v6.10-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmZLzNIUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwr/Q//STe2XGKI8bAKqP2wbbkzm+ISnK4A
 Lqf3FEAIXunxDRspszfXKKV2p4vaIkmOFiwIdtp/kWvd0DQn5+ATXJ/iQtp8aFX/
 R+6BQ7EZc2G7fN5fbQuK54+CvmWEpkKEMbXYbd6ivQ14Cijdb3Nbu+w+DYFjS+6C
 k2a9lS1bTW7Xcy0fyiO1w6GQiWqtmOH8U3OlQtIrI0EVkDG9OG1LsLuc92/FgkOo
 REN+sU+hX1K5fHrvm2CtjYDn/9/B6bJ/It22H1dPgUL9nKvKC67fYzosMtUCOX1M
 6XSPjZIuXOmQGeZXHhpSlVwaidxoUjYO98I7nMquxKdCy6yct3geK7ULG/xeQCgD
 ML7MGQB4+sTiSWalXUQaziKqF1FIDEvU3HMGXFWnoBL5l56eRp8KS1EI9Eqk9pU3
 pk9fJaCkcFnkzPtMFzqPOm5q9zUZ6bGbfYb0hs72TUKplmVDhFo2T1YsW2AOyHZ7
 mjuDzUYZX0H7uM1tntA56IgZX+oNOrLvhBt5L5M/BQeCsZFBBUfIcAEaYoL9LwXO
 AYgIG3jdqzHHyAUzutJF+XHKinJLMHm0XVYbFmO6saPhFzrUJSNHqT7NzW1DGGTl
 OnO8e1WNMX1EcnKvnc6fXyGmM3SgVwy45FsbG/zRnhn4uBKqKtjrh6uX/myA22LK
 CSeqSUK9XmXxFNA=
 =xjoS
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Skip E820 checks for MCFG ECAM regions for new (2016+) machines,
     since there's no requirement to describe them in E820 and some
     platforms require ECAM to work (Bjorn Helgaas)

   - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien
     Le Moal)

   - Remove last user and pci_enable_device_io() (Heiner Kallweit)

   - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen)

   - Skip waiting for devices that have been disconnected while
     suspended (Ilpo Järvinen)

   - Clear Secondary Status errors after enumeration since Master Aborts
     and Unsupported Request errors are an expected part of enumeration
     (Vidya Sagar)

  MSI:

   - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas)

  Error handling:

   - Mask Genesys GL975x SD host controller Replay Timer Timeout
     correctable errors caused by a hardware defect; the errors cause
     interrupts that prevent system suspend (Kai-Heng Feng)

   - Fix EDR-related _DSM support, which previously evaluated revision 5
     but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan)

  ASPM:

   - Simplify link state definitions and mask calculation (Ilpo
     Järvinen)

  Power management:

   - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS
     apparently doesn't know how to put them back in D0 (Mario
     Limonciello)

  CXL:

   - Support resetting CXL devices; special handling required because
     CXL Ports mask Secondary Bus Reset by default (Dave Jiang)

  DOE:

   - Support DOE Discovery Version 2 (Alexey Kardashevskiy)

  Endpoint framework:

   - Set endpoint BAR to be 64-bit if the driver says that's all the
     device supports, in addition to doing so if the size is >2GB
     (Niklas Cassel)

   - Simplify endpoint BAR allocation and setting interfaces (Niklas
     Cassel)

  Cadence PCIe controller driver:

   - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof
     Kozlowski)

  Cadence PCIe endpoint driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

  Freescale Layerscape PCIe controller driver:

   - Convert DT binding to YAML (Frank Li)

  MediaTek MT7621 PCIe controller driver:

   - Add DT binding missing 'reg' property for child Root Ports
     (Krzysztof Kozlowski)

   - Fix theoretical string truncation in PHY name (Sergio Paracuellos)

  NVIDIA Tegra194 PCIe controller driver:

   - Return success for endpoint probe instead of falling through to the
     failure path (Vidya Sagar)

  Renesas R-Car PCIe controller driver:

   - Add DT binding missing IOMMU properties (Geert Uytterhoeven)

   - Add DT binding R-Car V4H compatible for host and endpoint mode
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

   - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski)

   - Set the Subsystem Vendor ID, which was previously zero because it
     was masked incorrectly (Rick Wertenbroek)

  Synopsys DesignWare PCIe controller driver:

   - Restructure DBI register access to accommodate devices where this
     requires Refclk to be active (Manivannan Sadhasivam)

   - Remove the deinit() callback, which was only need by the
     pcie-rcar-gen4, and do it directly in that driver (Manivannan
     Sadhasivam)

   - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean
     up things like eDMA (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel
     to dw_pcie_ep_init() (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to
     reflect the actual functionality (Manivannan Sadhasivam)

   - Call dw_pcie_ep_init_registers() directly from all the glue
     drivers, not just those that require active Refclk from the host
     (Manivannan Sadhasivam)

   - Remove the "core_init_notifier" flag, which was an obscure way for
     glue drivers to indicate that they depend on Refclk from the host
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli)

   - Add DT binding J722S SoC support (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Add DT binding missing num-viewport, phys and phy-name properties
     (Jan Kiszka)

  Miscellaneous:

   - Constify and annotate with __ro_after_init (Heiner Kallweit)

   - Convert DT bindings to YAML (Krzysztof Kozlowski)

   - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming
     Zhou)"

* tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Do not wait for disconnected devices when resuming
  x86/pci: Skip early E820 check for ECAM region
  PCI: Remove unused pci_enable_device_io()
  ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
  PCI: Update pci_find_capability() stub return types
  PCI: Remove PCI_IRQ_LEGACY
  scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
  scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios
  Revert "genirq/msi: Provide constants for PCI/IMS support"
  Revert "x86/apic/msi: Enable PCI/IMS"
  Revert "iommu/vt-d: Enable PCI/IMS"
  Revert "iommu/amd: Enable PCI/IMS"
  Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
  ...
2024-05-21 10:09:28 -07:00
Lang Yu eb853413d0 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.

We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:

"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"

Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.

v2: Report local_mem_size_private as 0. (Felix)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 17:17:53 -04:00
Lang Yu 2a705f3e49 drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.

[   57.910418] Call Trace:
[   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
[   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
[   57.793923]  ? idr_get_next_ul+0xbe/0x100
[   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
[   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
[   57.794141]  ? process_scheduled_works+0x29c/0x580
[   57.794147]  process_scheduled_works+0x303/0x580
[   57.794157]  ? __pfx_worker_thread+0x10/0x10
[   57.794160]  worker_thread+0x1a2/0x370
[   57.794165]  ? __pfx_worker_thread+0x10/0x10
[   57.794167]  kthread+0x11b/0x150
[   57.794172]  ? __pfx_kthread+0x10/0x10
[   57.794177]  ret_from_fork+0x3d/0x60
[   57.794181]  ? __pfx_kthread+0x10/0x10
[   57.794184]  ret_from_fork_asm+0x1b/0x30

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 17:17:53 -04:00
Tvrtko Ursulin 9c1a429217 drm/amdgpu: Fix amdgpu_vm_is_bo_always_valid kerneldoc
Align kerneldoc with the function argument name.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 26e20235ce ("drm/amdgpu: Add amdgpu_bo_is_vm_bo helper")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Dr. David Alan Gilbert 191ef65b4e drm/amdgpu: remove unused struct 'hqd_registers'
'hqd_registers' used to be used in a member of the 'bonaire_mqd'
struct. 'bonaire_mqd' was removed by
commit 486d807cd9 ("drm/amdgpu: remove duplicate definition of cik_mqd")
It's now unused.

Remove 'hqd_registers' as well.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Tao Zhou 2aadb520bf drm/amdgpu: update type of buf size to u32 for eeprom functions
Avoid overflow issue.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:26 -04:00
Victor Skvortsov 5434bc03f5 drm/amdgpu: Queue KFD reset workitem in VF FED
The guest recovery sequence is buggy in Fatal Error when both
FLR & KFD reset workitems are queued at the same time. In addition,
FLR guest recovery sequence is out of order when PF/VF communication
breaks due to a GPU fatal error

As a temporary work around, perform a KFD style reset (Initiate reset
request from the guest) inside the pf2vf thread on FED.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:25 -04:00
Victor Skvortsov 3a19a8af64 drm/amdgpu: Extend KIQ reg polling wait for VF
Runtime KIQ interface to read/write registers in VF may take longer
than expected for BM environment. Extend the timeout.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-20 16:20:25 -04:00
Linus Torvalds ff9a79307f Kbuild updates for v6.10
- Avoid 'constexpr', which is a keyword in C23
 
  - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
    'dt_binding_check'
 
  - Fix weak references to avoid GOT entries in position-independent
    code generation
 
  - Convert the last use of 'optional' property in arch/sh/Kconfig
 
  - Remove support for the 'optional' property in Kconfig
 
  - Remove support for Clang's ThinLTO caching, which does not work with
    the .incbin directive
 
  - Change the semantics of $(src) so it always points to the source
    directory, which fixes Makefile inconsistencies between upstream and
    downstream
 
  - Fix 'make tar-pkg' for RISC-V to produce a consistent package
 
  - Provide reasonable default coverage for objtool, sanitizers, and
    profilers
 
  - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
 
  - Remove the last use of tristate choice in drivers/rapidio/Kconfig
 
  - Various cleanups and fixes in Kconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
 2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
 msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
 GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
 inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
 lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
 JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
 Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
 y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
 orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
 aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
 ZCSnZ31h7woGfNho
 =D5B/
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Avoid 'constexpr', which is a keyword in C23

 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of
   'dt_binding_check'

 - Fix weak references to avoid GOT entries in position-independent code
   generation

 - Convert the last use of 'optional' property in arch/sh/Kconfig

 - Remove support for the 'optional' property in Kconfig

 - Remove support for Clang's ThinLTO caching, which does not work with
   the .incbin directive

 - Change the semantics of $(src) so it always points to the source
   directory, which fixes Makefile inconsistencies between upstream and
   downstream

 - Fix 'make tar-pkg' for RISC-V to produce a consistent package

 - Provide reasonable default coverage for objtool, sanitizers, and
   profilers

 - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.

 - Remove the last use of tristate choice in drivers/rapidio/Kconfig

 - Various cleanups and fixes in Kconfig

* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
  kconfig: use sym_get_choice_menu() in sym_check_prop()
  rapidio: remove choice for enumeration
  kconfig: lxdialog: remove initialization with A_NORMAL
  kconfig: m/nconf: merge two item_add_str() calls
  kconfig: m/nconf: remove dead code to display value of bool choice
  kconfig: m/nconf: remove dead code to display children of choice members
  kconfig: gconf: show checkbox for choice correctly
  kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
  Makefile: remove redundant tool coverage variables
  kbuild: provide reasonable defaults for tool coverage
  modules: Drop the .export_symbol section from the final modules
  kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
  kconfig: use sym_get_choice_menu() in conf_write_defconfig()
  kconfig: add sym_get_choice_menu() helper
  kconfig: turn defaults and additional prompt for choice members into error
  kconfig: turn missing prompt for choice members into error
  kconfig: turn conf_choice() into void function
  kconfig: use linked list in sym_set_changed()
  kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
  kconfig: gconf: remove debug code
  ...
2024-05-18 12:39:20 -07:00
Yang Wang 062a7ce676 drm/amdgpu: fix ACA no query result after gpu reset
fix ACA no query result after gpu reset.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Lijo Lazar 546e6309d1 drm/amd/pm: Remove legacy interface for xgmi plpd
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd_policy sysfs node is not used by any client.
Remove that as well.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Yang Wang 258ed689bc drm/amdgpu: change bank cache lock type to spinlock
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:39 -04:00
Yang Wang f6bce954f4 drm/amdgpu: change aca bank error lock type to spinlock
modify the lock type to 'spinlock' to avoid schedule issue
in interrupt context.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin 50bff04d02 drm/amdgpu: Describe all object placements in debugfs
Accurately show all placements when describing objects in debugfs, instead
of bunching them up under the 'CPU' placement.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin 8fb0efb101 drm/amdgpu: Reduce mem_type to domain double indirection
All apart from AMDGPU_GEM_DOMAIN_GTT memory domains map 1:1 to TTM
placements. And the former be either AMDGPU_PL_PREEMPT or TTM_PL_TT,
depending on AMDGPU_GEM_CREATE_PREEMPTIBLE.

Simplify a few places in the code which convert the TTM placement into
a domain by checking against the current placement directly.

In the conversion AMDGPU_PL_PREEMPT either does not have to be handled
because amdgpu_mem_type_to_domain() cannot return that value anyway.

v2:
 * Remove AMDGPU_PL_PREEMPT handling.

v3:
 * Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com> # v1
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> # v2
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tvrtko Ursulin 26e20235ce drm/amdgpu: Add amdgpu_bo_is_vm_bo helper
Help code readability by replacing a bunch of:

bo->tbo.base.resv == vm->root.bo->tbo.base.resv

With:

amdgpu_vm_is_bo_always_valid(vm, bo)

No functional changes.

v2:
 * Rename helper and move to amdgpu_vm. (Christian)

v3:
 * Use Christian's kerneldoc.

v4:
 * Fixed logic inversion in amdgpu_vm_bo_get_memory.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Srinivasan Shanmugam e060c7ba7e drm/amdgpu: Remove duplicate check for *is_queue_unmap in sdma_v7_0_ring_set_wptr
This commit removes a duplicate check for *is_queue_unmap in the
sdma_v7_0_ring_set_wptr function. The check at line 171 was considered
dead code because at this point in the code, we already know that
*is_queue_unmap is false due to the check at line 161.

By removing this unnecessary check, improves the readability of the
code

Fixes the below:
	drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c:171 sdma_v7_0_ring_set_wptr()
	warn: duplicate check '*is_queue_unmap' (previous on line 161)

drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
    140 static void sdma_v7_0_ring_set_wptr(struct amdgpu_ring *ring)
    141 {
    142         struct amdgpu_device *adev = ring->adev;
    143         uint32_t *wptr_saved;
    144         uint32_t *is_queue_unmap;
    145         uint64_t aggregated_db_index;
    146         uint32_t mqd_size = adev->mqds[AMDGPU_HW_IP_DMA].mqd_size;
    147
    148         DRM_DEBUG("Setting write pointer\n");
    149
    150         if (ring->is_mes_queue) {
    151                 wptr_saved = (uint32_t *)(ring->mqd_ptr + mqd_size);
    152                 is_queue_unmap = (uint32_t *)(ring->mqd_ptr + mqd_size +
                        ^^^^^^^^^^^^^^^^ Set here

    153                                               sizeof(uint32_t));
    154                 aggregated_db_index =
    155                         amdgpu_mes_get_aggregated_doorbell_index(adev,
    156                                                          ring->hw_prio);
    157
    158                 atomic64_set((atomic64_t *)ring->wptr_cpu_addr,
    159                              ring->wptr << 2);
    160                 *wptr_saved = ring->wptr << 2;
    161                 if (*is_queue_unmap) {
                            ^^^^^^^^^^^^^^^ Checked here

    162                         WDOORBELL64(aggregated_db_index, ring->wptr << 2);
    163                         DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n",
    164                                         ring->doorbell_index, ring->wptr << 2);
    165                         WDOORBELL64(ring->doorbell_index, ring->wptr << 2);
    166                 } else {
    167                         DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n",
    168                                         ring->doorbell_index, ring->wptr << 2);
    169                         WDOORBELL64(ring->doorbell_index, ring->wptr << 2);
    170
--> 171                         if (*is_queue_unmap)
                                    ^^^^^^^^^^^^^^^ This is dead code.  We know it's false.

    172                                 WDOORBELL64(aggregated_db_index,
    173                                             ring->wptr << 2);
    174                 }
    175         } else {
    176                 if (ring->use_doorbell) {
    177                         DRM_DEBUG("Using doorbell -- "
    178                                   "wptr_offs == 0x%08x "

Fixes: b412351e91 ("drm/amdgpu: Add sdma v7_0 ip block support (v7)")
Cc: Likun Gao <Likun.Gao@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Tim Van Patten 1446226d32 drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1
The following commit updated gmc->noretry from 0 to 1 for GC HW IP
9.3.0:

    commit 5f3854f1f4 ("drm/amdgpu: add more cases to noretry=1")

This causes the device to hang when a page fault occurs, until the
device is rebooted. Instead, revert back to gmc->noretry=0 so the device
is still responsive.

Fixes: 5f3854f1f4 ("drm/amdgpu: add more cases to noretry=1")
Signed-off-by: Tim Van Patten <timvp@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Ruijing Dong a1a9143c96 drm/amdgpu/vcn: update vcn5 enc/dec capabilities
Update the capabilities for supporting 8k encoding.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Jiapeng Chong 72e6ea95c4 drm/amdgpu: Remove duplicate amdgpu_umsch_mm.h header
./drivers/gpu/drm/amd/amdgpu/amdgpu.h: amdgpu_umsch_mm.h is included more than once.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9063
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:38 -04:00
Kenneth Feng d3620eeae8 drm/amd/amdgpu: add module parameter for jpeg
add module parameter for jpeg. this is a temporary workaround for jpeg unit test fail
on vcn 5.0 now. will be removed later.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Alex Deucher 736f911204 drm/amdgpu: fix documentation errors in gmc v12.0
Fix up parameter descriptions.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Sreekant Somasekharan 0cce5f285d drm/amdkfd: Check correct memory types for is_system variable
To catch GPU mapping of system memory, TTM_PL_TT and AMDGPU_PL_PREEMPT
must be checked.

Fixes: 628e1ace23 ("drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC")
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Alex Deucher b72fa761fc drm/amdgpu: fix documentation errors in sdma v7.0
Fix up parameter descriptions.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Yang Wang b712d7c201 drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()
create a new helper function to avoid compiler 'side-effect'
check about RAS_EVENT_LOG() macro.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Frank Min 9a55c77978 drm/amdgpu: fix getting vram info for gfx12
gfx12 query video mem channel/type/width from umc_info of atom list, so
fix it accordingly.

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Frank Min b80160a53a drm/amdgpu/mes: use mc address for wptr in add queue packet
use mc address for wptr in add queue packet

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Likun Gao c14d5b5095 drm/amdgpu: enable gfxoff for gc v12.0.0
Enable GFXOFF for GC v12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:37 -04:00
Likun Gao d8cd2d617a drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.0
Enable mmhub and athub cg on gc 12.0.0

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao 7be73af53b drm/amdgpu: switch default mes to uni mes
Switch the default mes to uni mes for gfx v12.
V2: remove uni_mes set for gfx v11.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Yang Wang b2aa3d4b30 drm/amdgpu: add debug flag to enable RAS ACA
Use debug_mask=0x10 (BIT.4) param to help enable RAS ACA.
(RAS ACA is disabled by default.)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao 2ac72cbc7e drm/amdgpu: enable some cg feature for gc 12.0.0
Enable 3D cgcg, 3D cgls, sram fgcg, perfcounter mgcg,
repeater fgcg for gc v12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Ma Jun 4c11d30c95 drm/amdgpu: Fix the null pointer dereference to ras_manager
Check ras_manager before using it

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Xiaogang Chen f326d7cc74 drm/kfd: Correct pinned buffer handling at kfd restore and validate process
This reverts commit 8a774fe912 ("drm/amdgpu: avoid restore process run into dead loop")
since buffer got pinned is not related whether it needs mapping
And skip buffer validation at kfd driver if the buffer has been pinned.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Lijo Lazar b194d21b9b drm/amdgpu: Use NPS ranges from discovery table
Add GMC API to fetch NPS range information from discovery table. Use NPS
range information in GMC 9.4.3 SOCs when available, otherwise fallback
to software method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Friedrich Vock 0cdb3f9740 drm/amdgpu: Check if NBIO funcs are NULL in amdgpu_device_baco_exit
The special case for VM passthrough doesn't check adev->nbio.funcs
before dereferencing it. If GPUs that don't have an NBIO block are
passed through, this leads to a NULL pointer dereference on startup.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Fixes: 1bece222ea ("drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Lijo Lazar ce798376ef drm/amdgpu: Fix memory range calculation
Consider the 16M reserved region also before range calculation for GMC
9.4.3 SOCs.

Fixes: a433f1f594 ("drm/amdgpu: Initialize memory ranges for GC 9.4.3")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:40:36 -04:00
Likun Gao 73fbc3e000 drm/amdgpu: enable gfx cgcg&cgls for gfx v12_0_0
Enable GFX CGCG and CGLS for gfx version 12.0.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:11:02 -04:00
Sonny Jiang 75125e6b4c drm/amdgpu/jpeg5: enable power gating
Enable PG on JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:10:56 -04:00
Likun Gao ef57158462 drm/amdgpu: support imu for gc 12_0_0
Support IMU for ASIC with GC 12.0.0

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:10:31 -04:00
Ma Jun 01b3297336 drm/amdgpu: Remove dead code in amdgpu_ras_add_mca_err_addr
Remove dead code in amdgpu_ras_add_mca_err_addr

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:09:55 -04:00
Ma Jun d1a6bfff94 drm/amdgpu: Fix null pointer dereference to bo
Check bo before using it

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:09:47 -04:00
shaoyunl 4488cd671c drm/amdgpu: enable unmapped doorbell handling basic mode on mes 12
This reverts commit fcc5df722d.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-17 17:05:38 -04:00
Linus Torvalds db5d28c0bf drm for 6.10-rc1
new drivers:
 - panthor: ARM Mali/Immortalis CSF-based GPU driver
 
 core:
 - add a CONFIG_DRM_WERROR option
 - make more headers self-contained
 - grab resv lock in pin/unpin
 - fix vmap resv locking
 - EDID/eDP panel matching
 - Kconfig cleanups
 - DT sound bindings
 - Add SIZE_HINTS property for cursor planes
 - Add struct drm_edid_product_id and helpers.
 - Use drm device based logging in more drm functions.
 - drop seq_file.h from a bunch of places
 - use drm_edid driver conversions
 
 dp:
 - DP Tunnel documentation
 - MST read sideband cap
 - Adaptive sync SDP prep work
 
 ttm:
 - improve placement for TTM BOs in idle/busy handling
 
 panic:
 - Fixes for drm-panic, and option to test it.
 - Add drm panic to simpledrm, mgag200, imx, ast
 
 bridge:
 - improve init ordering
 - adv7511: allow GPIO pin sharing
 - tc358775: add tc358675 support
 
 panel:
 - AUO B120XAN01.0
 - Samsung s6e3fa7
 - BOE NT116WHM-N44
 - CMN N116BCA-EA1,
 - CrystalClear CMT430B19N00
 - Startek KD050HDFIA020-C020A
 - powertip PH128800T006-ZHC01
 - Innolux G121X1-L03
 - LG sw43408
 - Khadas TS050 V2
 - EDO RM69380 OLED
 - CSOT MNB601LS1-1
 
 amdgpu:
 - HDCP/ODM/RAS fixes
 - Devcoredump improvements
 - Expose VCN activity via sysfs
 - SMY 13.0.x updates
 - Enable fast updates on DCN 3.1.4
 - Add dclk and vclk reporting on additional devices
 - Add ACA RAS infrastructure
 - Implement TLB flush fence
 - EEPROM handling fixes
 - SMUIO 14.0.2 support
 - SMU 14.0.1 Updates
 - SMU 14.0.2 support
 - Sync page table freeing with TLB flushes
 - DML2 refactor
 - DC debug improvements
 - DCN 3.5.x Updates
 - GPU reset fixes
 - HDP fix for second GFX pipe on GC 10.x
 - Enable secondary GFX pipe on GC 10.3
 - Refactor and clean up BACO/BOCO/BAMACO handling
 - Remove invalid TTM resource start check
 - UAF fix in VA IOCTL
 - GPUVM page fault redirection to secondary IH rings for IH 6.x
 - Initial support for mapping kernel queues via MES
 - Fix VRAM memory accounting
 
 amdkfd:
 - MQD handling cleanup
 - Preemption handling fixes for XCDs
 - TLB flush fix for GC 9.4.2
 - Properly clean up workqueue during module unload
 - Fix memory leak process create failure
 - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace
 - Fix eviction fence handling
 - Fix leak in GPU memory allocation failure case
 - DMABuf import handling fix
 - Enable SQ watchpoint for gfx10
 
 i915:
 - Adding new DG2 PCI ID
 - add context hints for GT frequency
 - enable only one CCS for compute workloads
 - new workarounds
 - Fix UAF on destroy against retire race and remove two earlier partial fixes
 - Limit the reserved VM space to only the platforms that need it
 - Fix gt reset with GuC submission is disable
 - Add and use gt_to_guc() wrapper
 
 i915/xe display:
 - Lunar Lake display enabling, including cdclk and other refactors
 - BIOS/VBT/opregion related refactor
 - Digital port related refactor/clean-up
 - Fix 2s boot time regression on DP panel replay init
 - Remove duplication on audio enable/disable on SDVO and g4x+ DP
 - Disable AuxCCS framebuffers if built for Xe
 - Make crtc disable more atomic
 - Increase DP idle pattern wait timeout to 2ms
 - Start using container_of_const() for some extra const safety
 - Fix Jasper Lake boot freeze
 - Enable MST mode for 128b/132b single-stream sideband
 - Enable Adaptive Sync SDP Support for DP
 - Fix MTL supported DP rates - removal of UHBR13.5
 - PLL refactoring
 - Limit eDP MSO pipe only for display version 20
 - More display refactor towards independence from i915 dev_priv
 - Convert i915/xe fbdev to DRM client
 - More initial work to make display code more independent from i915
 
 xe:
 - improved error capture
 - clean up some uAPI leftovers
 - devcoredump update
 - Add BMG mocs table
 - Handle GSCCS ER interrupt
 - Implement xe2- and GuC workarounds
 - struct xe_device cleanup
 - Hwmon updates
 - Add LRC parsing for more GPU instruction
 - Increase VM_BIND number of per-ioctl Ops
 - drm/xe: Add XE_BO_GGTT_INVALIDATE flag
 - Initial development for SR-IOV support
 - Add new PCI IDs to DG2 platform
 - Move userptr over to start using hmm_range_fault
 
 msm:
 - Switched to generating register header files during build process
   instead of shipping pre-generated headers
 - Merged DPU and MDP4 format databases.
 - DP:
 - Stop using compat string to distinguish DP and eDP cases
 - Added support for X Elite platform (X1E80100)
 - Reworked DP aux/audio support
 - Added SM6350 DP to the bindings
 - GPU:
 - a7xx perfcntr reg fixes
 - MAINTAINERS updates
 - a750 devcoredump support
 
 radeon:
 - Silence UBSAN warnings related to flexible arrays
 
 nouveau:
 - move some uAPI objects to uapi headers
 
 omapdrm:
 - console fix
 
 ast:
 - add i2c polling
 
 qaic:
 - add debugfs entries
 
 exynos:
 - fix platform_driver .owner
 - drop cleanup code
 
 mediatek:
 - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe()
 - Add GAMMA 12-bit LUT support for MT8188
 - Rename mtk_drm_* to mtk_*
 - Drop driver owner initialization
 - Correct calculation formula of PHY Timing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZEUU0ACgkQDHTzWXnE
 hr5qMBAAjUFF0w3YOQMsn0LEAm628kMRHpoVeSXmIfO9z9lTyad30EtiS4ggFgj7
 Q/oQ6hHCd5jdsvGSJDgtTTAsTQX+aCkXrgf/18ENbqR5mM3MdefUAPR/zawZ7HR4
 8+b2h6p7gHBw8wDjuIvQ5e9InHcnIkKWJc82qnJG5Urgxa05SDh3mu3cosPTJiBw
 a851vlWaYcxC0yAUwJlWaXDdN8yzdFaSQNboZBS/CMLXF/WE6Ht257uxJmaouc0Y
 Z0kBybok5x0TPQEXF9IV+kuSW3EYpYcwRi0BFFM9sJjkEBdH3rYRZwuYP1LR+7VZ
 HKsmIkie8YzCm2VwTquYzUvHgF+swZX4RRch9XJlGz7UvBLc0eBO/2n4X6fNd8Kl
 QGNNqEfsnUQrAHKvGsOUgoGjSCmEo8voGcMZ3JPIAdJ/GcnJwpMvNxtF6XB08hEu
 rDxuU6o7WkM4dJbtiaFEHNh0Fmjj6aXdBL23UD9pcqPT1fc9cT3xnUd5RJIRuRwV
 /tpb2WfkFAoxCkKFiunaC4rE8oG6ME6wr/trYjvoYuhCI5hCVaXRBGzJEtC30IP6
 lG2YZ8r0jHjktbgjZ0Cz/hY424H4sxSN9SJAnXXFDzcfjBJ/nOgo5nMD1jKajAD5
 SYfqWaD5Y+YygtyLJPMfZQI2XMOpCzteXD8uaNXXFJfpV7Apeyg=
 =ocVM
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
 "This is the main pull request for the drm subsystems for 6.10.

  In drivers the main thing is a new driver for ARM Mali firmware based
  GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and
  scattered changes to everything else.

  In the core a bunch of headers and Kconfig was refactored, along with
  the addition of a new panic handler which is meant to provide a user
  friendly message when a panic happens and graphical display is
  enabled.

  New drivers:
   - panthor: ARM Mali/Immortalis CSF-based GPU driver

  Core:
   - add a CONFIG_DRM_WERROR option
   - make more headers self-contained
   - grab resv lock in pin/unpin
   - fix vmap resv locking
   - EDID/eDP panel matching
   - Kconfig cleanups
   - DT sound bindings
   - Add SIZE_HINTS property for cursor planes
   - Add struct drm_edid_product_id and helpers.
   - Use drm device based logging in more drm functions.
   - drop seq_file.h from a bunch of places
   - use drm_edid driver conversions

  dp:
   - DP Tunnel documentation
   - MST read sideband cap
   - Adaptive sync SDP prep work

  ttm:
   - improve placement for TTM BOs in idle/busy handling

  panic:
   - Fixes for drm-panic, and option to test it.
   - Add drm panic to simpledrm, mgag200, imx, ast

  bridge:
   - improve init ordering
   - adv7511: allow GPIO pin sharing
   - tc358775: add tc358675 support

  panel:
   - AUO B120XAN01.0
   - Samsung s6e3fa7
   - BOE NT116WHM-N44
   - CMN N116BCA-EA1,
   - CrystalClear CMT430B19N00
   - Startek KD050HDFIA020-C020A
   - powertip PH128800T006-ZHC01
   - Innolux G121X1-L03
   - LG sw43408
   - Khadas TS050 V2
   - EDO RM69380 OLED
   - CSOT MNB601LS1-1

  amdgpu:
   - HDCP/ODM/RAS fixes
   - Devcoredump improvements
   - Expose VCN activity via sysfs
   - SMY 13.0.x updates
   - Enable fast updates on DCN 3.1.4
   - Add dclk and vclk reporting on additional devices
   - Add ACA RAS infrastructure
   - Implement TLB flush fence
   - EEPROM handling fixes
   - SMUIO 14.0.2 support
   - SMU 14.0.1 Updates
   - SMU 14.0.2 support
   - Sync page table freeing with TLB flushes
   - DML2 refactor
   - DC debug improvements
   - DCN 3.5.x Updates
   - GPU reset fixes
   - HDP fix for second GFX pipe on GC 10.x
   - Enable secondary GFX pipe on GC 10.3
   - Refactor and clean up BACO/BOCO/BAMACO handling
   - Remove invalid TTM resource start check
   - UAF fix in VA IOCTL
   - GPUVM page fault redirection to secondary IH rings for IH 6.x
   - Initial support for mapping kernel queues via MES
   - Fix VRAM memory accounting

  amdkfd:
   - MQD handling cleanup
   - Preemption handling fixes for XCDs
   - TLB flush fix for GC 9.4.2
   - Properly clean up workqueue during module unload
   - Fix memory leak process create failure
   - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace
   - Fix eviction fence handling
   - Fix leak in GPU memory allocation failure case
   - DMABuf import handling fix
   - Enable SQ watchpoint for gfx10

  i915:
   - Adding new DG2 PCI ID
   - add context hints for GT frequency
   - enable only one CCS for compute workloads
   - new workarounds
   - Fix UAF on destroy against retire race and remove two earlier partial fixes
   - Limit the reserved VM space to only the platforms that need it
   - Fix gt reset with GuC submission is disable
   - Add and use gt_to_guc() wrapper

  i915/xe display:
   - Lunar Lake display enabling, including cdclk and other refactors
   - BIOS/VBT/opregion related refactor
   - Digital port related refactor/clean-up
   - Fix 2s boot time regression on DP panel replay init
   - Remove duplication on audio enable/disable on SDVO and g4x+ DP
   - Disable AuxCCS framebuffers if built for Xe
   - Make crtc disable more atomic
   - Increase DP idle pattern wait timeout to 2ms
   - Start using container_of_const() for some extra const safety
   - Fix Jasper Lake boot freeze
   - Enable MST mode for 128b/132b single-stream sideband
   - Enable Adaptive Sync SDP Support for DP
   - Fix MTL supported DP rates - removal of UHBR13.5
   - PLL refactoring
   - Limit eDP MSO pipe only for display version 20
   - More display refactor towards independence from i915 dev_priv
   - Convert i915/xe fbdev to DRM client
   - More initial work to make display code more independent from i915

  xe:
   - improved error capture
   - clean up some uAPI leftovers
   - devcoredump update
   - Add BMG mocs table
   - Handle GSCCS ER interrupt
   - Implement xe2- and GuC workarounds
   - struct xe_device cleanup
   - Hwmon updates
   - Add LRC parsing for more GPU instruction
   - Increase VM_BIND number of per-ioctl Ops
   - drm/xe: Add XE_BO_GGTT_INVALIDATE flag
   - Initial development for SR-IOV support
   - Add new PCI IDs to DG2 platform
   - Move userptr over to start using hmm_range_fault

  msm:
   - Switched to generating register header files during build process
     instead of shipping pre-generated headers
   - Merged DPU and MDP4 format databases.
   - DP:
     - Stop using compat string to distinguish DP and eDP cases
     - Added support for X Elite platform (X1E80100)
     - Reworked DP aux/audio support
     - Added SM6350 DP to the bindings
   - GPU:
     - a7xx perfcntr reg fixes
     - MAINTAINERS updates
     - a750 devcoredump support

  radeon:
   - Silence UBSAN warnings related to flexible arrays

  nouveau:
   - move some uAPI objects to uapi headers

  omapdrm:
   - console fix

  ast:
   - add i2c polling

  qaic:
   - add debugfs entries

  exynos:
   - fix platform_driver .owner
   - drop cleanup code

  mediatek:
   - Use devm_platform_get_and_ioremap_resource() in mtk_hdmi_ddc_probe()
   - Add GAMMA 12-bit LUT support for MT8188
   - Rename mtk_drm_* to mtk_*
   - Drop driver owner initialization
   - Correct calculation formula of PHY Timing"

* tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel: (1477 commits)
  drm/xe/ads: Use flexible-array
  drm/xe: Use ordered WQ for G2H handler
  drm/msm/gen_header: allow skipping the validation
  drm/msm/a6xx: Cleanup indexed regs const'ness
  drm/msm: Add devcoredump support for a750
  drm/msm: Adjust a7xx GBIF debugbus dumping
  drm/msm: Update a6xx registers XML
  drm/msm: Fix imported a750 snapshot header for upstream
  drm/msm: Import a750 snapshot registers from kgsl
  MAINTAINERS: Add Konrad Dybcio as a reviewer for the Adreno driver
  MAINTAINERS: Add a separate entry for Qualcomm Adreno GPU drivers
  drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails
  drm/msm/adreno: fix CP cycles stat retrieval on a7xx
  drm/msm/a7xx: allow writing to CP_BV counter selection registers
  drm: zynqmp_dpsub: Always register bridge
  Revert "drm/bridge: ti-sn65dsi83: Fix enable error path"
  drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer()
  drm/fbdev-generic: Do not set physical framebuffer address
  drm/panthor: Fix the FW reset logic
  drm/panthor: Make sure we handle 'unknown group state' case properly
  ...
2024-05-15 09:43:42 -07:00
Frank Min b61467778e drm/amdgpu: fix mqd corruption for gfx12
1. restore mqd from backup while resuming
2. use copy_toio and copy_fromio while mqd in vram

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Frank Min ef168e6de9 drm/amdgpu: add initial value for gfx12 AGP aperture
add initial value for gfx12 AGP aperture

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Jesse Zhang e6ae021adb drm/amdgpu: fix the warning bad bit shift operation for aca_error_type type
Filter invalid aca error types before performing a shift operation.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Jesse Zhang d190b459b2 drm/amdgpu: the warning dereferencing obj for nbio_v7_4
if ras_manager obj null, don't print NBIO err data

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Jesse Zhang 68de5d31b1 drm/amdgpu: remove structurally dead code
This code cannot be reached: return "UNKNOWN";.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:02 -04:00
Jesse Zhang 269435aef4 drm/amdgu: remove unused code
The same code is executed when the condition err is true or false,
because the code in the if-then branch and after the if statement is identical

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:12:01 -04:00
Jesse Zhang e2bff63ba6 drm/amdgpu: remove structurally dead code for amd_gmc
This code cannot be reached: return sysfs_emit(buf, "UNK....)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:53 -04:00
Jesse Zhang f0574a56fb drm/amd: fix the warning unchecking return vaule for sdma_v7
check ring allocate success before emit preempt ib

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:53 -04:00
Jesse Zhang b55bf19eb9 drm/amdgpu: clear the warning unsigned compared against 0 for xcp_id
This greater-than-or-equal-to-zero comparison of an unsigned value is always true. fpriv->xcp_id >= 0U

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:53 -04:00
Jesse Zhang 1940708ccf drm/amdgpu: fix the waring dereferencing hive
Check the amdgpu_hive_info *hive that maybe is NULL.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:53 -04:00
Jesse Zhang b1f7810b05 drm/amdgpu: fix dereference after null check
check the pointer hive before use.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:52 -04:00
Jesse Zhang 1a00f2ac82 drm/amdgpu: Fix the warning division or modulo by zero
Checks the partition mode and returns an error for an invalid mode.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 16:11:47 -04:00
Saleemkhan Jamadar 98a2e3a0d1 drm/amdgpu/umsch: add support to capture fw debug log
Added support to capture unsch fw debug logs in debugfs.
To enable set amdgpu_umschfw_log =1 in boot args.

v1 - rename variable to umsch_mm_fwlog (Veera)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:45:41 -04:00
David (Ming Qiang) Wu 4b0497d25d drm/amd/amdgpu: update jpeg 5 capability
Based on the documentation the maximum resolustion should
be 16384x16384.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:45:34 -04:00
David (Ming Qiang) Wu a166ec28db drm/amdgpu/vcn: set VCN5 power gating state to GATE on suspend
On suspend, we need to set power gating state to GATE when
VCN5 is busy, otherwise we will get following error on resume:

[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v5_0_0> failed -110
amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -110
PM: failed to resume async: error -110

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:45:21 -04:00
David (Ming Qiang) Wu 10fe1a79cd drm/amdgpu/vcn: remove irq disabling in vcn 5 suspend
We do not directly enable/disable VCN IRQ in vcn 5.0.0.
And we do not handle the IRQ state as well. So the calls to
disable IRQ and set state are removed. This effectively gets
rid of the warining of
      "WARN_ON(!amdgpu_irq_enabled(adev, src, type))"
in amdgpu_irq_put().

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:45:12 -04:00
Jack Xiao 745e0a90be drm/amdgpu/mes: fix mes12 to map legacy queue
Adjust mes12 initialization sequence to fix mapping
legacy queue.

v2: use dev_err.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:44:20 -04:00
Philip Yang 9095e55440 drm/amdkfd: Remove arbitrary timeout for hmm_range_fault
On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.

Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.

Change EAGAIN to debug message as this is not error.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-13 15:44:02 -04:00
Michel Dänzer 8d2c930735 drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible
It incorrectly claimed a resource isn't CPU visible if it's located at
the very end of CPU visible VRAM.

Fixes: a6ff969fe9 ("drm/amdgpu: fix visible VRAM handling during faults")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-05-10 13:05:13 -04:00
Masahiro Yamada b1992c3772 kbuild: use $(src) instead of $(srctree)/$(src) for source directory
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:

    src := $(obj)

When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.

This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.

To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.

Going forward, the variables used in Makefiles will have the following
meanings:

  $(obj)     - directory in the object tree
  $(src)     - directory in the source tree  (changed by this commit)
  $(objtree) - the top of the kernel object tree
  $(srctree) - the top of the kernel source tree

Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-05-10 04:34:52 +09:00
Michel Dänzer c4dcb47d46 drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible
It incorrectly claimed a resource isn't CPU visible if it's located at
the very end of CPU visible VRAM.

Fixes: a6ff969fe9 ("drm/amdgpu: fix visible VRAM handling during faults")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3343
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-and-Tested-by: Jeremy Day <jsday@noreason.ca>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-05-08 18:45:29 -04:00
Srinivasan Shanmugam e35ba81bb3 drm/amdgpu: Fix buffer size to prevent truncation in gfx_v12_0_init_microcode
This commit fixes multiple potential truncations when writing the
strings _pfp.bin, _me.bin, _rlc.bin, and _mec.bin into the fw_name
buffer in the gfx_v12_0_init_microcode function in the gfx_v12_0.c file

The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf
function does not exceed the size of the fw_name buffer.

Thus fixing the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c: In function ‘gfx_v12_0_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  421 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix);
      |                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:421:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  421 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  428 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix);
      |                                                      ^~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:428:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40
  428 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  436 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                                                              ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:436:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  436 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  448 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix);
      |                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c:448:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  448 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Srinivasan Shanmugam ffd574459d drm/amdgpu: Fix truncation by resizing ucode_prefix in imu_v12_0_init_microcode
This commit fixes potential truncation when writing the string _imu.bin
into the fw_name buffer in the imu_v12_0_init_microcode function in the
imu_v12_0.c file

The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf
function does not exceed the size of the fw_name buffer.

Thus fixing the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c: In function ‘imu_v12_0_init_microcode’:
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:54: warning: ‘_imu.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
   51 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix);
      |                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/imu_v12_0.c:51:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
   51 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_imu.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Tim Huang 51dfc0a4d6 drm/amdgpu: fix mc_data out-of-bounds read warning
Clear warning that read mc_data[i-1] may out-of-bounds.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Tim Huang 8944acd0f9 drm/amdgpu: fix ucode out-of-bounds read warning
Clear warning that read ucode[] may out-of-bounds.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Ma Jun 0991e49d2b drm/amdgpu: Fix uninitialized variable warning in amdgpu_info_ioctl
Check the return value of amdgpu_xcp_get_inst_details, otherwise we
may use an uninitialized variable inst_mask

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Ma Jun d768394fa9 drm/amdgpu: Fix out-of-bounds read of df_v1_7_channel_number
Check the fb_channel_number range to avoid the array out-of-bounds
read error

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Alex Deucher cdca89bce4 drm/amdgpu/soc21: use common nbio callback to set remap offset
This fixes HDP flushes on systems with non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Alex Deucher 1dd8b24acc drm/amdgpu/nv: use common nbio callback to set remap offset
This fixes HDP flushes on systems with non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:07 -04:00
Alex Deucher c866201cdc drm/amdgpu/soc15: use common nbio callback to set remap offset
This fixes HDP flushes on systems with non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 30f45a8ea4 drm/amdgpu: add set_reg_remap callback for NBIF 6.3.1
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 3345f7ec0d drm/amdgpu: add set_reg_remap callback for NBIO 7.7
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher ffd3d6e780 drm/amdgpu: add set_reg_remap callback for NBIO 4.3
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 454847c9f4 drm/amdgpu: add set_reg_remap callback for NBIO 2.3
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher cacbbfbd24 drm/amdgpu: add set_reg_remap callback for NBIO 7.2
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 42ad8ac6bd drm/amdgpu: add set_reg_remap callback for NBIO 7.11
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher f9a2274b33 drm/amdgpu: add set_reg_remap callback for NBIO 7.9
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 9d0e2915c4 drm/amdgpu: add set_reg_remap callback for NBIO 7.4
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher b2648640b9 drm/amdgpu: add set_reg_remap callback for NBIO 7.0
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher cab62e4839 drm/amdgpu: add set_reg_remap callback for NBIO 6.1
This will be used to consolidate the register remap offset
configuration and fix  HDP flushes on systems non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
Alex Deucher 0617cdde84 drm/amdgpu: add nbio set_reg_remap helper
Will be used to consolidate reg remap settings and fix HDP
flushes on systems with non-4K pages.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:06 -04:00
YiPeng Chai 2b3b9d2150 drm/amdgpu: change log level
Change log level.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Likun Gao a735b4a4ad drm/amdgpu: fix spl component for psp v14
Fix the coding error when load spl component for psp v14.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Tim Huang 9e5da94259 drm/amdgpu: fix uninitialized variable warning for jpeg_v4
Clear warning that using uninitialized variable r.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Yang Wang 329cec8f18 drm/amdgpu: fix RAS unload driver issue in SRIOV
Fix null pointer issue when unload driver in SRIOV mode.

Adjust the function position to ensure that the amdgpu_mca/aca_xxx_init()
related functions can be initialized properly.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Yang Wang 85a24a3ea0 drm/amdgpu: ignoring unsupported ras blocks when MCA bank dispatches
This patch is used to solve the problem of incorrect parsing of error counts.
When the UE trigger gpu is reset, the driver will attempt to parse all possible ras blocks.
For ras blocks that are not supported by the current ASIC, the driver should ignore this error.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Tim Huang 8f184f8e7a drm/amdgpu: fix uninitialized variable warning for amdgpu_xgmi
Clear warning that using uninitialized variable current_node.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Tim Huang 3aa6b72045 drm/amdgpu: fix uninitialized variable warning for sdma_v7
Clear warning that using uninitialized variable index.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Ma Jun be1684930f drm/amdgpu: Fix out-of-bounds write warning
Check the ring type value to fix the out-of-bounds
write warning

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:05 -04:00
Ma Jun 7e39d7ec35 drm/amdgpu: Fix the uninitialized variable warning
Check the user input and phy_id value range to fix
"Using uninitialized value phy_id"

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:04 -04:00
Jack Xiao 56fd1f8868 drm/amdgpu/mes11: fix kiq ring ready flag
kiq ring test has overwitten ready flag,
need disable after gfx hw init.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:04 -04:00
Lang Yu 89773b8559 drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
Small APUs(i.e., consumer, embedded products) usually have a small
carveout device memory which can't satisfy most compute workloads
memory allocation requirements.

We can't even run a Basic MNIST Example with a default 512MB carveout.
https://github.com/pytorch/examples/tree/main/mnist. Error Log:

"torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate
84.00 MiB. GPU 0 has a total capacity of 512.00 MiB of which 0 bytes
is free. Of the allocated memory 103.83 MiB is allocated by PyTorch,
and 22.17 MiB is reserved by PyTorch but unallocated"

Though we can change BIOS settings to enlarge carveout size,
which is inflexible and may bring complaint. On the other hand,
the memory resource can't be effectively used between host and device.

The solution is MI300A approach, i.e., let VRAM allocations go to GTT.
Then device and host can flexibly and effectively share memory resource.

v2: Report local_mem_size_private as 0. (Felix)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:04 -04:00
Sunil Khatri 3b3c9e865e drm/amdgpu: add se registers to ip dump for gfx10
add the registers of SE block of gfx for ip dump
for gfx10 IP.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:04 -04:00
Sunil Khatri b4e394e843 drm/amdgpu: add CP headers registers to gfx10 dump
add registers in the ip dump for CP headers in gfx10

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 15:17:04 -04:00
Ville Syrjälä d26238c680 drm/amdgpu: Use drm_crtc_vblank_crtc()
Replace the open coded drm_crtc_vblank_crtc() with the real
thing.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240408190611.24914-2-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-08 21:55:30 +03:00
Sunil Khatri b0923d5d80 drm/amdgpu: remove ip dump reg_count variable
reg_count is not used and the register count is
directly derived from the array size and hence
removed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-03 09:06:47 -04:00
Asad Kamal 6cd2b87264 drm/amd/amdgpu: Check tbo resource pointer
Validate tbo resource pointer, skip if NULL

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:18 -04:00
Kenneth Feng 3474e02ed5 drm/amd/pm: support mode1 reset on smu_v14_0_3
support mode1 reset on smu_v14_0_3

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Alex Deucher ade887c633 drm/amdgpu/mes12: Use a separate fence per transaction
We can't use a shared fence location because each transaction
should be considered independently.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Alex Deucher 94b51a3d01 drm/amdgpu/mes12: increase mes submission timeout
MES internally has a timeout allowance of 2 seconds.
Increase driver timeout to 3 seconds to be safe.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Alex Deucher b1d852920b drm/amdgpu/mes12: print MES opcodes rather than numbers
Makes it easier to review the logs when there are MES
errors.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng 174fdc07c0 drm/amd/amdgpu: enable mmhub and athub cg on gc 12.0.1
enable mmhub and athub cg on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng dd8707295d drm/amd/amdgpu: enable gfxoff on gc 12.0.1
Enable gfxoff on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Likun Gao b9f5d0f978 drm/amdgpu: support cg state get for gfx v12
Support to get clockgating state for gfx v12.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng 598a3b753a drm/amd/amdgpu: enable sram fgcg on gc 12.0.1
enable sram fgcg on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng 6f6bb3909c drm/amd/amdgpu: enable perfcounter mgcg and repeater fgcg
enable perfcounter mgcg and repeater fgcg on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng 0b6662eb2a drm/amd/amdgpu: enable 3D cgcg and 3D cgls
enable 3D cgcg and 3D cgls on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng af472f68c7 drm/amd/amdgpu: enable mgcg on gfx 12.0.1
enable mgcg on gfx 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
Kenneth Feng 81b09cedb3 drm/amd/amdgpu: enable cgcg and cgls
enable cgcg and cgls on gc 12.0.1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:15 -04:00
David (Ming Qiang) Wu 856d1ed4b2 drm/amdgpu/vcn5: Add VCN5 capabilities
Add VCN5 encode and decode capabilities support

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Sonny Jiang 117f851393 drm/amdgpu/vcn5: enable DPG mode support
Enable DPG mode

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Sonny Jiang f19cfce87d drm/amdgpu/jpeg5: enable power gating
Enable PG on JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
David (Ming Qiang) Wu f8f8e95c5f amdgpu/vcn: enable AMD_PG_SUPPORT_VCN
turn on AMD_PG_SUPPORT_VCN flag for power saving

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
David Belanger da43e93d1b drm/amdgpu: Fix physical address mask
Mask should be 44-bit.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Likun Gao 0a75dc9831 drm/amdgpu/discovery: add mes v12_0 ip block
Add mes v12_0 ip block.

v2: squash in update (Alex)
v3: rebase on unified mes changes (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Likun Gao 5e676d7180 drm/amdgpu/discovery: add gfx v12_0 ip block
Add gfx v12_0 ip block.

v2: Squash in update (Alex)
v3: add exp flag (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Jack Xiao 03f4b8c3ca drm/amdgpu/mes12: disable logging output
Random page fault was oberserved, temporarily disable
mes log buffer output.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Jack Xiao 3dc434ad26 drm/amdgpu: add module parameter 'amdgpu_uni_mes'
Add module parameter 'amdgpu_uni_mes' to enable/disable unified
mes fw support.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Jack Xiao ad5c0a79df drm/amdgpu/mes12: add legacy setting hw resource interface
For unified mes fw, add the legacy interface to set hardware
resources.

v2: remove warning (Alex)

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
shaoyunl fcc5df722d drm/amdgpu: Disable unmapped doorbell handling basic mode on mes 12
The new mechanism for unmapped doorbell handling requires both driver side and
MES fw side change. The FW side changes are still not released.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Jack Xiao 663bbfaf68 drm/amdgpu/gfx: enable mes to map legacy queue support
Enable mes to map legacy queue support.

v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex)

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:14 -04:00
Jack Xiao 4c2439f908 drm/amdgpu/mes12: add mes mapping legacy queue support
Add mes12 map legacy queue packet submission.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Jack Xiao 6ce03bd3a4 drm/amdgpu/mes12: enable uni_mes fw on mes pipe0
Enable the unified mes firmware on mes pipe0.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Jack Xiao d2e2c9be78 drm/amdgpu/mes12: add uni_mes fw loading support
Add the unified mes firmware loading support.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Jack Xiao 15ddc4e693 drm/amdgpu/mes: add uni_mes fw loading support
Add the unified mes firmware loading support.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Sreekant Somasekharan 628e1ace23 drm/amdkfd: mark GFX12 system and peer GPU memory mappings as MTYPE_NC
Due to a HW bug, the system memory mappings and peer GPU mappings
on GFX12 need to be marked as MTYPE_NC.

Cc: Joe Greathouse <joseph.greathouse@amd.com>
Cc: David Belanger <david.belanger@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Jonathan Kim 984b265ff6 drm/amdkfd: fix support for trap on wave start and end for gfx12
Similar to GFX11, GFX12 supports trapping on wave start and end.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
Jonathan Kim fda3f378c4 drm/amdkfd: always enable ttmp setup for gfx12
Similar to GFX11, always enable the setup of trap temporaries on GFX12.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:13 -04:00
David Belanger 90e4fc8369 drm/amdkfd: Added gfx_v12_kfd2kgd interface for GFX12.
Initial implementation, based on GFX11.

v2: Removed functions not needed by cp scheduler.
v3: Fixed typos.
v4: squash in warning fix (Alex)

Signed-off-by: David Belanger <david.belanger@amd.com>
Acked-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
shaoyunl 2f983d3ca5 drm/amdgpu: Enable event log on MES 12
Enable event log through the HW specific FW API

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
shaoyunl 19e69a5d28 drm/amdgpu: Enable unmapped doorbell handling basic mode on mes 12
Enable basic mode handling for doorbell ring on unmapped CP queue.
In this mode, MES can start schedule the queue mapping based on HW
interrupt instead of timer.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Harish Kasiviswanthan <Harish.Kasiviswanthan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
Hawking Zhang a2211e475c drm/amdgpu: Switch to smuio func to get gpu clk counter
Switch to smuio callback to query gpu clock counter

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
Likun Gao e781af6663 drm/amdgpu: init gfxhub setting to align with mmhub
Align gfxhub settings with mmhub when program rlc ram.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:12 -04:00
Likun Gao b32edc2340 drm/amdgpu: skip dpm check to init imu fw
Skip dpm check to init imu firmware for imu v12.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao 044feb8e2a drm/amdgpu: fix active rb and cu number for gfx12
Correct the algorithm of active CU and RB to bypass
the disabled SA for gfx12.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao 56159fffaa drm/amdgpu: use new method to program rlc ram
Program rlc ram with golden setting data instead.
The old method (program_imu_rlc_ram_old) should be
retired in the future.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Kenneth Feng 043869be5a drm/amd/amdgpu: add cgcg&cgls interface for gfx 12.0
add cgcg&cgls interface for gfx 12.0

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Tom St Denis 60917ce8f8 drm/amd/amdgpu: update GFX12 wave data registers
Update the registers for gfx12.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao 53efeba35d drm/amdgpu: set different fw data addr for mec pipe
For MEC fw data, different pipe should programed into
different address.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Kenneth Feng 1e740df77f drm/amd/amdgpu: workaround for the imu fw loading
workaournd for the imu fw loading on gfx 12.0 without psp

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao f5b4c3236f drm/amd: Move fw init from sw_init to early_init for imu v12
Move microcode loading from sw_init to early_init to align with
the perious version of imu init sequence.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao 2502af906b drm/amdgpu: support S&R fw load for gfx v12
Support Save & Restore related fw load with backdoor RLC
autoload type on gfx v12.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Jack Xiao 36b2ce4775 drm/amdgpu/gfx12: recalculate available compute rings to use
Recalculate the number of compute rings to use based on
the gfx hardware configuration. As needed reserve half of
compute rings for mes, kgd can't use up all compute rings.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao 29d36a9cfd drm/amdgpu: skip imu related function if dpm=0
Only execute IMU related functions if dpm>0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Kenneth Feng 32d1637689 drm/amd/amdgpu: imu fw loading support
support imu related function for gfx v12.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:11 -04:00
Likun Gao af204b76a7 drm/amdgpu: set cp fw address set for gfx v12
Split PFF/ME/MEC firmware address setting function
from related load microcode funtion, as it's also
needed for rlc autolad.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao 52cb80c12e drm/amdgpu: Add gfx v12_0 ip block support (v6)
Initial support for GFX 12.

v1: Add gfx v12_0 ip block support. (Likun)
v2: Switch to gfx.kiq array.
    Move the vmhub from ring callback to ring. (Hawking)
v3: Update various callback function impl. (Hawking)
v4: Warning fixes (Alex)
v5: squash in imu fix, csb, rlc autoload implementations (Alex)
v6: Rebase (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Jack Xiao 4632bec9fa drm/amdgpu/mes12: update data cache boundary
Enlarge the data cache boundary.

v2: use the fix data cache boundary.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Jonathan Kim 46c4766610 drm/amdgpu: fix trap enablement for gfx12
Fix request to MES to set SQ_SHADER_TBA_HI.trap_en for GFX12.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
shaoyunl d817c470cb drm/amdgpu: Enable MES to handle doorbell ring on unmapped queue
On MES12, HW can monitor up to 2048 doorbells that not be
mapped currently and trigger the interrupt to MES when these unmapped
doorbell been ringed.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Jack Xiao 745f46b6a9 drm/amdgpu: enable mes v12 self test
1. fix available compute queue to use
2. enable mes v12 self test

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao 6628f7762b drm/amdgpu: set mes fw address for mes v12
Split the function of mes fimrware address setting
from mes firmware load for mes v12, as it's also
needed for rlc autoload.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Jack Xiao 785f0f9fe7 drm/amdgpu: Add mes v12_0 ip block support (v4)
v1: Add mes v12_0 ip block support. (Jack)
v2: Switch to gfx.kiq array. (Hawking)
v3: Switch to AMDGPU_GFXHUB(0). (Hawking)
v4: Rebase (Alex)

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao 69d4c44e51 drm/amdgpu: init mes ucode name for gfx v12
Keep gfx v12 mes fw name to gc_12_x_x_mes.bin
and gc_12_x_x_mes1.bin.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao 00c9035633 drm/amdgpu: add rlc TOC header file for soc24
Add RLC autoload TOC header file for soc24 ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao e3a911bb38 drm/amdgpu: add new TOC structure
Add new RLC_TABLE_OF_CONTENT structure definition.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao d8fd91f905 drm/amdgpu: add gfx12 clearstate header
Add gfx12 clearstate register arrays.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:10 -04:00
Likun Gao 5638b1cfa7 drm/amdgpu/discovery: Set GC family for GC 12.0 IP
Set GC family for GC 12.0 IPs.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:18:09 -04:00
Sonny Jiang e56b042118 drm/amdgpu: IB test encode test package change for VCN5
VCN5 session info package interface changed

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 16:17:55 -04:00
Hawking Zhang 5f571c61b9 drm/amdgpu: Add gfx v9_4_4 ip block
Add gfx v9_4_4 ip block support

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:49:16 -04:00
Hawking Zhang a6bcffa596 drm/amdgpu: Add smu v13_0_14 ip block
Add smu v13_0_14 ip block support

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:49:11 -04:00
Hawking Zhang 1dbd59f3f4 drm/amdgpu: Add psp v13_0_14 ip block
Add psp v13_0_14 ip block support.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:49:05 -04:00
Hawking Zhang 3d1bb1a2e0 drm/amdgpu: Add sdma v4_4_5 ip block
Add sdma v4_4_5 ip block support

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:48:57 -04:00
Aurabindo Pillai c45211adfa drm/amd: Override DCN410 IP version
Override DCN IP version to 4.0.1 from 4.1.0 temporarily until change is
made in DC codebase to use 4.1.0

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:48:46 -04:00
Lang Yu 2d6f49ee84 drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
Observed on gfx8 ASIC where KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM is used.
Two attachments use the same VM, root PD would be locked twice.

[   57.910418] Call Trace:
[   57.793726]  ? reserve_bo_and_cond_vms+0x111/0x1c0 [amdgpu]
[   57.793820]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x6c/0x1c0 [amdgpu]
[   57.793923]  ? idr_get_next_ul+0xbe/0x100
[   57.793933]  kfd_process_device_free_bos+0x7e/0xf0 [amdgpu]
[   57.794041]  kfd_process_wq_release+0x2ae/0x3c0 [amdgpu]
[   57.794141]  ? process_scheduled_works+0x29c/0x580
[   57.794147]  process_scheduled_works+0x303/0x580
[   57.794157]  ? __pfx_worker_thread+0x10/0x10
[   57.794160]  worker_thread+0x1a2/0x370
[   57.794165]  ? __pfx_worker_thread+0x10/0x10
[   57.794167]  kthread+0x11b/0x150
[   57.794172]  ? __pfx_kthread+0x10/0x10
[   57.794177]  ret_from_fork+0x3d/0x60
[   57.794181]  ? __pfx_kthread+0x10/0x10
[   57.794184]  ret_from_fork_asm+0x1b/0x30

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:44:10 -04:00
Yunxiang Li 4752cac300 drm/amdgpu: Move ras resume into SRIOV function
This is part of the reset, move it into the reset function.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:44:02 -04:00
Peyton Lee 3f19cffde9 drm/amdgpu/vpe: fix vpe dpm clk ratio setup failed
Some version of BIOS does not enable all clock levels,
resulting in high level clock frequency of 0.
The number of valid CLKs must be confirmed in advance.

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:43:48 -04:00
Yunxiang Li 6e4aa08fa9 drm/amdgpu: Fix amdgpu_device_reset_sriov retry logic
The retry loop for SRIOV reset have refcount and memory leak issue.
Depending on which function call fails it can potentially call
amdgpu_amdkfd_pre/post_reset different number of times and causes
kfd_locked count to be wrong. This will block all future attempts at
opening /dev/kfd. The retry loop also leakes resources by calling
amdgpu_virt_init_data_exchange multiple times without calling the
corresponding fini function.

Align with the bare-metal reset path which doesn't have these issues.
This means taking the amdgpu_amdkfd_pre/post_reset functions out of the
reset loop and calling amdgpu_device_pre_asic_reset each retry which
properly free the resources from previous try by calling
amdgpu_virt_fini_data_exchange.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:41:05 -04:00
Aurabindo Pillai a5b843269a drm/amd: Enable DCN410 init
Enable initializing Display Manager for DCN410 IP

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:40:57 -04:00
Yunxiang Li 25c01191c2 drm/amdgpu: Add reset_context flag for host FLR
There are other reset sources that pass NULL as the job pointer, such as
amdgpu_amdkfd_reset_work. Therefore, using the job pointer to check if
the FLR comes from the host does not work.

Add a flag in reset_context to explicitly mark host triggered reset, and
set this flag when we receive host reset notification.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:40:50 -04:00
Yunxiang Li f4322b9f8a drm/amdgpu: Fix two reset triggered in a row
Some times a hang GPU causes multiple reset sources to schedule resets.
The second source will be able to trigger an unnecessary reset if they
schedule after we call amdgpu_device_stop_pending_resets.

Move amdgpu_device_stop_pending_resets to after the reset is done. Since
at this point the GPU is supposedly in a good state, any reset scheduled
after this point would be a legitimate reset.

Remove unnecessary and incorrect checks for amdgpu_in_reset that was
kinda serving this purpose.

Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:40:44 -04:00
Zhigang Luo f5007c67fc drm/amdgpu: update vf to pf message retry from 2 to 5
increase retry times to wait host has enough time to complete reset.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:40:38 -04:00
Zhigang Luo 3bcc0ee147 drm/amdgpu: avoid reading vf2pf info size from FB
VF can't access FB when host is doing mode1 reset. Using sizeof to get
vf2pf info size, instead of reading it from vf2pf header stored in FB.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02 15:40:32 -04:00
Geert Uytterhoeven 05b8b6dd22
Revert "drm: Switch DRM_DISPLAY_HELPER to depends on"
This reverts commit e075e496f5, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.

The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in.  Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1ba76cc4d96a8afefff5d1bc42fb1e1329c5da68.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02 17:58:23 +02:00
Geert Uytterhoeven 7fe302ae19
Revert "drm: Switch DRM_DISPLAY_DP_HELPER to depends on"
This reverts commit 0323287de8, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.

The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in.  Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/89ac456805746b6d0c888f10c5120b11aacd3319.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02 17:58:21 +02:00
Geert Uytterhoeven 9573446953
Revert "drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on"
This reverts commit 3166e7e6d9, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.

The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in.  Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/a40e70a0abd3d841c23c107d452a43fdd70ef37a.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02 17:58:21 +02:00
Geert Uytterhoeven d7c128cb77
Revert "drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on"
This reverts commit f6d2dc03fa, as helper
code should always be selected by the driver that needs it, for the
convenience of the final user configuring a kernel.

The user who configures a kernel should not need to know which helpers
are needed for the driver he is interested in.  Making a driver depend
on helper code means that the user needs to know which helpers to enable
first, which is very user-unfriendly.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/bd288a5943dab8609f2d1f2bf413595a61df727a.1713780345.git.geert+renesas@glider.be
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-05-02 17:58:20 +02:00
Thomas Zimmermann aae4682e5d drm/fbdev-generic: Convert to fbdev-ttm
Only TTM-based drivers use fbdev-generic. Rename it to fbdev-ttm and
change the symbol infix from _generic_ to _ttm_. Link the source file
into TTM helpers, so that it is only build if TTM-based drivers have
been selected. Select DRM_TTM_HELPER for loongson.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-43-tzimmermann@suse.de
2024-05-02 11:33:32 +02:00
Shashank Sharma 705d0480e6 drm/amdgpu: fix doorbell regression
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Fixes: a6ff969fe9 ("drm/amdgpu: fix visible VRAM handling during faults")
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 21:59:16 -04:00
Christian König d3a9331a65 drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.

Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.

Also rework the statistic handling so that we don't update the eviction
counter before the move.

v2: add missing NULL check

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 94aeb41173 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-04-30 21:39:57 -04:00
Mukul Joshi f06446ef23 drm/amdgpu: Fix VRAM memory accounting
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 21:22:50 -04:00
Tim Huang 0fa4c25db8 drm/amdgpu: fix uninitialized scalar variable warning
Clear warning that field bp is uninitialized when
calling amdgpu_virt_ras_add_bps.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:04:15 -04:00
Likun Gao f45ed399d7 drm/amdgpu/discovery: add sdma v7_0 ip block
Add sdma v7_0 ip block.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:52 -04:00
Likun Gao 4badb9999b drm/amdgpu: provide more ucode name shown via id
Provide some lost ucode name shown via firmware ID.

v2: fix whitespace (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:47 -04:00
Likun Gao 807d90b5ef drm/amdgpu: support SDMA v3 struct fw front door load
Add support for new SDMA firmware struct (V3) with PSP
front door load type.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:42 -04:00
Jack Xiao 5251b56e38 drm/amdgpu/sdma7: set sdma hang watchdog
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers
to improve SDMA reliability.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:36 -04:00
Likun Gao b412351e91 drm/amdgpu: Add sdma v7_0 ip block support (v7)
v1: Add sdma v7_0 ip block support. (Likun)
v2: Move vmhub from ring_funcs to ring. (Hawking)
v3: Switch to AMDGPU_GFXHUB(0). (Hawking)
v4: Move microcode init into early_init. (Likun)
v5: Fix warnings (Alex)
v6: Squash in various fixes (Alex)
v7: Rebase (Alex)
v8: Rebase (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:32 -04:00
Likun Gao 9989a924aa drm/amdgpu: Add sdma fw v3 structure
Add sdma firmware struct version 3 to support
sdma v7_0 firmware.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:03:12 -04:00
Dan Carpenter 6769a23697 drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq()
The "instance" variable needs to be signed for the error handling to work.

Fixes: 8b2faf1a4f ("drm/amdgpu: add error handle to avoid out-of-bounds")
Reviewed-by: Bob Zhou <bob.zhou@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:01:53 -04:00
Likun Gao e8a31b4e81 drm/amdgpu: Add new members for sdma v7_0 fw
Add new members in sdma instance structure
for sdma v7_0 firmware.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:01:49 -04:00
Likun Gao 1b838189ed drm/amdgpu/discovery: Add gmc v12_0 ip block
Add gmc v12_0 ip block.

v2: Squash in updates (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:01:39 -04:00
Shashank Sharma 04790139c5 drm/amdgpu: fix doorbell regression
This patch adds a missed handling of PL domain doorbell while
handling VRAM faults.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Fixes: a6ff969fe9 ("drm/amdgpu: fix visible VRAM handling during faults")
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:01:04 -04:00
Hawking Zhang 980a0a9452 drm/amdgpu: support gfx v12 specific pte/pde fields
Add gfx v12 pte/pde support to gmc common helper.

v2: squash in fixes (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:01:01 -04:00
Hawking Zhang f3c3dd1207 drm/amdgpu: Set pte_is_pte flag in gmc v12 gart
pte_is_pte is new flag introduced in gmc v12 that
needs to be set by default for pte.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:51 -04:00
Hawking Zhang 075b44aa21 drm/amdgpu: Add gmc v12_0 ip block support (v7)
Add initial support for GMC v12.

v1: Add gmc v12_0 ip block support.
v2: Switch to gfx.kiq array.
v3: Switch to vmhubs_mask.
v4: Switch to AMDGPU_MMHUB0(0) and AMDGPU_GFXHUB(0)
v5: Rebase (Alex)
v6: Squash in fixes for AGP handling, gfxhub init order,
    vmhub index (Alex)
v7: Rebase (Alex)
v8: squash in ecc fix (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:39 -04:00
Hawking Zhang 2d1d875656 drm/amdgpu: Add gfx v12 pte/pde format change
Add gfx v12 pte/pde format change.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:35 -04:00
Likun Gao 47677629f6 drm/amdgpu: Add gfxhub v12_0 ip block support (v3)
Add initial gfxhub v12 support.

v1: Add gfxhub v12_0 ip block support (Likun)
v2: Switch to AMDGPU_GFXHUB(0) (Hawking)
v3: Squash in keep default error response mode (Hawking)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:30 -04:00
Jack Xiao 27694eace5 drm/amdgpu/mes11: increase waiting time for engine ready
mes schq engine require more waiting time for engine ready
before packet submission.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:23 -04:00
Jack Xiao f9d8c5c785 drm/amdgpu/gfx: enable mes to map legacy queue support
Enable mes to map legacy queue support.

v2: kiq_set_resources is required.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 10:00:01 -04:00
Philip Yang d53ce02352 drm/amdkfd: Evict BO itself for contiguous allocation
If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM space, and then move it from system memory back to VRAM.

v6: user context should use interruptible call (Felix)

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:59:51 -04:00
YiPeng Chai 3ca73073f4 drm/amdgpu: Remove redundant function call
Remove redundant function call.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:59:14 -04:00
YiPeng Chai 2c0410fbee rm/amdgpu: Remove unused code
Remove unused code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:59:08 -04:00
Tim Huang ebbc2ada5c drm/amdgpu: fix overflowed array index read warning
Clear overflowed array index read warning by cast operation.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:59:03 -04:00
Tim Huang 22a5daaec0 drm/amdgpu: fix potential resource leak warning
Clear resource leak warning that when the prepare fails,
the allocated amdgpu job object will never be released.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:53 -04:00
Yang Wang 5eccab32c1 drm/amdgpu: avoid dump mca bank log muti times during ras ISR
because the ue valid mca count will only be cleared after gpu reset,
so only dump mca log on the first time to get mca bank after receive RAS interrupt.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:47 -04:00
Yang Wang 76ad30f51a drm/amdgpu: add MCA smu cache support
v1:
because SMU CE valid mca bank will be cleared after reading,
this patch adds mca cache at the driver level to ensure that the mca bank is not lost.

v2:
refine amdgpu_mca_init/fini/reset() function name.

v3:
add mca_cache.lock support
only add CE bank to mca bank cache.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:41 -04:00
Yang Wang 8fb20d9551 drm/amdgpu: add amdgpu MCA bank dispatch function support
- Refine mca driver code.
- Centralize mca bank dispatch code logic.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:34 -04:00
Hawking Zhang 8e9f1575d1 drm/amdgpu: Add mmhub v4_1_0 ip block support (v4)
Add initial support for MMHUB 4.1.0.

v1: Add mmhub v4_1_0 ip block support.
v2: Switch to AMDGPU_MMHUB0(0).
v3: squash in fix for ip version check (Alex)
v4: squash in vm_contexts_disable fix (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:25 -04:00
Philip Yang 7005b169da drm/amdgpu: Evict BOs from same process for contiguous allocation
When TTM failed to alloc VRAM, TTM try evict BOs from VRAM to system
memory then retry the allocation, this skips the KFD BOs from the same
process because KFD require all BOs are resident for user queues.

If TTM with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM, allow
TTM evict KFD BOs from the same process, this will evict the user queues
first, and restore the queues later after contiguous VRAM allocation.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:15 -04:00
Philip Yang b2dba064c9 drm/amdgpu: Handle sg size limit for contiguous allocation
Define macro AMDGPU_MAX_SG_SEGMENT_SIZE 2GB, because struct scatterlist
length is unsigned int, and some users of it cast to a signed int, so
every segment of sg table is limited to size 2GB maximum.

For contiguous VRAM allocation, don't limit the max buddy block size in
order to get contiguous VRAM memory. To workaround the sg table segment
size limit, allocate multiple segments if contiguous size is bigger than
AMDGPU_MAX_SG_SEGMENT_SIZE.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:58:08 -04:00
Yang Wang 7e0357bef4 drm/amdgpu: remove unused MCA driver codes
- remove unused callback functions.
- make part of mca functions static and refine the function order.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:51:44 -04:00
Likun Gao d22c075676 drm/amdgpu/discovery: Add common soc24 ip block
Add common soc24 ip block.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:51:06 -04:00
Jack Xiao ff518e13eb drm/amdgpu/mes11: adjust mes initialization sequence
Adjust mes queue initialization before kgq/kcq initialization
to enable mes mapping legacy queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:50:47 -04:00
Jack Xiao 486eb6b5a8 drm/amdgpu/mes11: add mes mapping legacy queue support
Add mes11 map legacy queue packet submission.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:50:40 -04:00
Christian König ffda708148 drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.

Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.

Also rework the statistic handling so that we don't update the eviction
counter before the move.

v2: add missing NULL check

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 94aeb41173 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-04-30 09:49:51 -04:00
Hawking Zhang 98b912c50e drm/amdgpu: Add soc24 common ip block (v2)
Add initial soc24 support.

v1: Add soc24 common ip block.
v2: Switch to new select_se_sh/enter_safe_mode
    interface.
v3: squash in correct ext rev id, etc. (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:46:51 -04:00
Philip Yang 155ce502e9 drm/amdgpu: Support contiguous VRAM allocation
RDMA device with limited scatter-gather ability requires contiguous VRAM
buffer allocation for RDMA peer direct support.

Add a new KFD alloc memory flag and store as bo alloc flag
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS. When pin this bo to export for RDMA
peerdirect access, this will set TTM_PL_FLAG_CONTIFUOUS flag, and ask
VRAM buddy allocator to get contiguous VRAM.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:44:46 -04:00
Ma Jun c0d6bd3cd2 drm/amdgpu: Fix uninitialized variable warning in amdgpu_afmt_acr
Assign value to clock to fix the warning below:
"Using uninitialized value res. Field res.clock is uninitialized"

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30 09:44:34 -04:00
Dave Airlie 4a56c0ed5a amd-drm-next-6.10-2024-04-26:
amdgpu:
 - Misc code cleanups and refactors
 - Support setting reset method at runtime
 - Report OD status
 - SMU 14.0.1 fixes
 - SDMA 4.4.2 fixes
 - VPE fixes
 - MES fixes
 - Update BO eviction priorities
 - UMSCH fixes
 - Reset fixes
 - Freesync fixes
 - GFXIP 9.4.3 fixes
 - SDMA 5.2 fixes
 - MES UAF fix
 - RAS updates
 - Devcoredump updates for dumping IP state
 - DSC fixes
 - JPEG fix
 - Fix VRAM memory accounting
 - VCN 5.0 fixes
 - MES fixes
 - UMC 12.0 updates
 - Modify contiguous flags handling
 - Initial support for mapping kernel queues via MES
 
 amdkfd:
 - Fix rescheduling of restore worker
 - VRAM accounting for SVM migrations
 - mGPU fix
 - Enable SQ watchpoint for gfx10
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiwmAQAKCRC93/aFa7yZ
 2Hp0AP9SOxHrKDPErqCQbenkZXNoPFtLJWAhhKZ9zEb7YgjufwD/Zpk31mO0FpVu
 kUYSAQ4OFXinVVLtmVH3fE/vK+1ccwM=
 =oFW7
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.10-2024-04-26:

amdgpu:
- Misc code cleanups and refactors
- Support setting reset method at runtime
- Report OD status
- SMU 14.0.1 fixes
- SDMA 4.4.2 fixes
- VPE fixes
- MES fixes
- Update BO eviction priorities
- UMSCH fixes
- Reset fixes
- Freesync fixes
- GFXIP 9.4.3 fixes
- SDMA 5.2 fixes
- MES UAF fix
- RAS updates
- Devcoredump updates for dumping IP state
- DSC fixes
- JPEG fix
- Fix VRAM memory accounting
- VCN 5.0 fixes
- MES fixes
- UMC 12.0 updates
- Modify contiguous flags handling
- Initial support for mapping kernel queues via MES

amdkfd:
- Fix rescheduling of restore worker
- VRAM accounting for SVM migrations
- mGPU fix
- Enable SQ watchpoint for gfx10

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426221245.1613332-1-alexander.deucher@amd.com
2024-04-30 14:43:00 +10:00
Daniel Vetter b84bc94852 Linux 6.9-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmYutdweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGG5oH/3Ggwz5N+gdEK3np
 qxUfpgJTWo+fJ6xhRLGy84TnEZ9s9UnK0Su6UuVyOb0F/2Y8hesJ6iwB16yQFKNe
 Nore/VvuBZ+utshz5N20yNyPugNOP74GGbyOm+d+iJwIKnmSE8jSjWyMwNFHJCZM
 BfoBxZrpwU/YD/0PD1KkI44jhPX1H/EcEmtNiklLnuYvJydTWiRFeku+CSgcOiRz
 6fIFrcZMREZrAytMQSwteBAvI3vWblC0S39ZgJmtZt+oi+s1ksIUNG8Mm5uMAiyF
 LcGep2tWV06x9uB9XVvrk0qco/kOaUgYuQrHQwNu9LzsaUdYGKqBBMk41SoNXFkB
 hBXN26U=
 =N2E2
 -----END PGP SIGNATURE-----

Merge v6.9-rc6 into drm-next

Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas
has some fun with i915-gem conflicts.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-04-29 20:22:39 +02:00
Aurabindo Pillai 96557f785a drm/amd: GFX12 changes for converting tiling flags to modifiers
GFX12 swizzle mode and GCC formats changed and is much simpler. Use a
seperate function for the same. Changes:

* Swizzle mode is now 3 bits only
* DCC enablement doesn't come from tiling_flags, it is always set in modifiers
* DCC max compressed block size of 128B

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:58 -04:00
Jesse Zhang ea686fef54 drm/amdgpu: fix the warning about the expression (int)size - len
Converting size from size_t to int may overflow.
v2: keep reverse xmas tree order (Christian)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:45 -04:00
Jack Xiao 029c2b0389 drm/amdgpu/mes: add mes mapping legacy queue support
Add mes mapping legacy queue framework support.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:45 -04:00
Tim Huang 9a5f15d2a2 drm/amdgpu: fix uninitialized scalar variable warning
Clear warning that uses uninitialized value fw_size.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:45 -04:00
Sunil Khatri 2a8f7464d3 drm/amdgpu: skip ip dump if devcoredump flag is set
Do not dump the ip registers during driver reload
in passthrough environment.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:44 -04:00
Arunpravin Paneer Selvam e362b7c8f8 drm/amdgpu: Modify the contiguous flags behaviour
Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
  allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
  and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
  the allocation fails, we return -ENOSPC.

v2:
  - keep the mem_flags and bo->flags check as is(Christian)
  - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
    amdgpu_bo_pin_restricted function placement range iteration
    loop(Christian)
  - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
    (Christian)
  - Keep the kernel BO allocation as is(Christain)
  - If BO pin vram allocation failed, we need to return -ENOSPC as
    RDMA cannot work with scattered VRAM pages(Philip)

v3(Christian):
  - keep contiguous flag handling outside of pages_per_block
    calculation
  - remove the hacky implementation in contiguous flag error
    handling code

v4(Christian):
  - use any variable and return value for non-contiguous
    fallback

v5: rebase to amd-staging-drm-next branch

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:44 -04:00
Lancelot SIX bd31e5026d drm/amdkfd: Enable SQ watchpoint for gfx10
There are new control registers introduced in gfx10 used to configure
hardware watchpoints triggered by SMEM instructions:
SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}.

Those registers work in a similar way as the TCP_WATCH* registers
currently used for gfx9 and above.

This patch adds support to program the SQ_WATCH registers for gfx10.

The SQ_WATCH?_CNTL.MASK field has one bit more than
TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity
than TCP_WATCH watchpoints.  In this patch, we keep the capabilities
advertised to the debugger unchanged
(HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and
SQ watchpoints can do and both watchpoints are configured together.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:44 -04:00
Srinivasan Shanmugam acce6479e3 drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()
The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating
about potential truncation of output when using the snprintf function.
The issue was due to the size of the buffer 'ucode_prefix' being too
small to accommodate the maximum possible length of the string being
written into it.

The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin",
where %s is replaced by the value of 'chip_name'. The length of this
string without the %s is 16 characters. The warning message indicated
that 'chip_name' could be up to 29 characters long, resulting in a total
of 45 characters, which exceeds the buffer size of 30 characters.

To resolve this issue, the size of the 'ucode_prefix' buffer has been
reduced from 30 to 15. This ensures that the maximum possible length of
the string being written into the buffer will not exceed its size, thus
preventing potential buffer overflow and truncation issues.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
  379 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |                                                    ^~
......
  439 |         r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix);
      |                                                 ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
  379 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
  413 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |                                                    ^~
......
  443 |         r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                        ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
  413 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 8630112969 ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0")
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
YiPeng Chai 6f3b69139c drm/amdgpu: Fix ras mode2 reset failure in ras aca mode
Fix ras mode2 reset failure in ras aca mode.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Bob Zhou 506c245f3f drm/amdgpu: fix double free err_addr pointer warnings
In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Jesse Zhang 7bfd16d0ec drm/amdgpu: initialize the last_jump_jiffies in atom_exec_context
The parameter "last_jump_jiffies" should be initialized
before being used in the function atom_op_jump.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Jesse Zhang 2d10c3dbde drm/amdgpu: add check before free wb entry
Check if ring is not a mes queue before freeing the wb entry,
because we only allocate a wb entry when it's not a mes queue.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Bob Zhou cd48b97ce7 drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte
After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be
conducted to put wrong value.
So return and check the i2c transfer result.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Bob Zhou 8b2faf1a4f drm/amdgpu: add error handle to avoid out-of-bounds
if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
be stop to avoid out-of-bounds read, so directly return -EINVAL.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
Ma Jun 2e55bcf3d7 drm/amdgpu: Initialize timestamp for some legacy SOCs
Initialize the interrupt timestamp for some legacy SOCs
to fix the coverity issue "Uninitialized scalar variable"

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
YiPeng Chai 48fa90718b drm/amdgpu: Use new interface to reserve bad page
Use new interface to reserve bad page.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:43 -04:00
YiPeng Chai bcc0934885 drm/amdgpu: Fix address translation defect
retired_page is page frame and should be expanded
to the full address when querying status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:42 -04:00
YiPeng Chai e023874081 drm/amdgpu: support ACA logging ecc errors
support ACA logging ecc errors.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:42 -04:00
YiPeng Chai 370fbff4cc drm/amdgpu: add poison consumption handler
Add poison consumption handler.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:42 -04:00
YiPeng Chai bfa579b38b drm/amdgpu: prepare to handle pasid poison consumption
Prepare to handle pasid poison consumption.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:42 -04:00
YiPeng Chai 314c38cde6 drm/amdgpu: retire bad pages for umc v12_0
Retire bad pages for umc v12_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:42 -04:00
YiPeng Chai e74313be5a drm/amdgpu: add condition check for amdgpu_umc_fill_error_record
Add condition check for amdgpu_umc_fill_error_record.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai 2cf8e50ec3 drm/amdgpu: Add delay work to retire bad pages
Add delay work to retire bad pages.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai f27defca68 drm/amdgpu: umc v12_0 logs ecc errors
1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
   be retired later.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai b2aa6b108d drm/amdgpu: umc v12_0 converts error address
Umc v12_0 converts error address.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai 95b4063de4 drm/amdgpu: add interface to update umc v12_0 ecc status
Add interface to update umc v12_0 ecc status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai a734adfbcd drm/amdgpu: add poison creation handler
Add poison creation handler.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai f493dd64ee drm/amdgpu: prepare for logging ecc errors
Prepare for logging ecc errors.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
YiPeng Chai 98b5bc878d drm/amdgpu: add message fifo to handle RAS poison events
Add message fifo to handle RAS poison events.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
Jesse Zhang 88a9a467c5 drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001.
V2: To really improve the handling we would actually
   need to have a separate value of 0xffffffff.(Christian)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
Alex Deucher eef016ba89 drm/amdgpu/mes11: Use a separate fence per transaction
We can't use a shared fence location because each transaction
should be considered independently.

Reviewed-by: Shaoyun.liu <shaoyunl@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:41 -04:00
Alex Deucher 497d7cee24 drm/amdgpu: add a spinlock to wb allocation
As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.

Reviewed-by: Shaoyun.liu <shaoyunl@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:40 -04:00
Sonny Jiang 754c366e41 drm/amdgpu: update fw_share for VCN5
kmd_fw_shared changed in VCN5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:40 -04:00
Mukul Joshi 8e1d190595 drm/amdgpu: Fix VRAM memory accounting
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:40 -04:00
Sathishkumar S c551316e15 drm/amdgpu: update jpeg max decode resolution
jpeg ip version v2.1 and higher supports 16kx16k resolution decode

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:40 -04:00
Sunil Khatri af8644121e drm/amdgpu: add ip dump for each ip in devcoredump
Add ip dump for each ip of the asic in the
devcoredump for all the ips where a callback
is registered for register dump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:40 -04:00
Sunil Khatri e043a35dc2 drm/amdgpu: dump ip state before reset for each ip
Invoke the dump_ip_state function for each ip before
the asic resets and save the register values for
debugging via devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Sunil Khatri c8732c80de drm/amdgpu: add support for gfx v10 print
Add support to print ip information to be
used to print registers in devcoredump
buffer.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Sunil Khatri 40356542c3 drm/amdgpu: add protype for print ip state
Add the protoype for print ip state to be used
to print the registers in devcoredump during
a gpu reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Sunil Khatri c395dbb68b drm/amdgpu: add support of gfx10 register dump
Adding gfx10 gc registers to be used for register
dump via devcoredump during a gpu reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Sunil Khatri e21d253bd7 drm/amdgpu: add prototype for ip dump
Add the prototype to dump ip registers
for all ips of different asics and set
them to NULL for now. Based on the
requirement add a function pointer for
each of them.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
YiPeng Chai af730e0820 drm/amdgpu: Add interface to reserve bad page
Add interface to reserve bad page.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Ma Jun 60c448439f drm/amdgpu: Fix uninitialized variable warnings
return 0 to avoid returning an uninitialized variable r

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Jack Xiao f88da7fbf6 drm/amdgpu/mes: fix use-after-free issue
Delete fence fallback timer to fix the ramdom
use-after-free issue.

v2: move to amdgpu_mes.c

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Alex Deucher e0a9bbeea0 drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
This avoids a potential conflict with firmwares with the newer
HDP flush mechanism.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:39 -04:00
Rajneesh Bhardwaj a16b951586 drm/amdgpu: Update CGCG settings for GFXIP 9.4.3
Tune coarse grain clock gating idle threshold and rlc idle timeout to
achieve better kernel launch latency.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:38 -04:00
Frank Min ea9238a81b drm/amdgpu: replace tmz flag into buffer flag
Replace tmz flag into buffer flag to make it easier to understand
and extend

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:38 -04:00
Le Ma 92ed1e9cd5 drm/amdgpu: init microcode chip name from ip versions
To adapt to different gc versions in gfx_v9_4_3.c file.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:38 -04:00
Prike Liang 26de73bc0a drm/amdgpu: Fix the ring buffer size for queue VM flush
Here are the corrections needed for the queue ring buffer size
calculation for the following cases:
- Remove the KIQ VM flush ring usage.
- Add the invalidate TLBs packet for gfx10 and gfx11 queue.
- There's no VM flush and PFP sync, so remove the gfx9 real
  ring and compute ring buffer usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:38 -04:00
Lang Yu a522ec528c drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend
umsch test needs full GPU functionality(e.g., VM update, TLB flush,
possibly buffer moving under memory pressure) which may be not ready
under these states. Just skip it to avoid potential issues.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-26 17:22:38 -04:00
Damien Le Moal 6927b01680 drm/amdgpu: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
Use the macro PCI_IRQ_INTX instead of the deprecated PCI_IRQ_LEGACY macro.

Link: https://lore.kernel.org/r/20240325070944.3600338-11-dlemoal@kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-25 12:53:31 -05:00
Jack Xiao 9482552820 drm/amdgpu/mes: fix use-after-free issue
Delete fence fallback timer to fix the ramdom
use-after-free issue.

v2: move to amdgpu_mes.c

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 23:23:46 -04:00
Alex Deucher 9792b7cc18 drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
This avoids a potential conflict with firmwares with the newer
HDP flush mechanism.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-23 23:23:46 -04:00
Prike Liang fe93b0927b drm/amdgpu: Fix the ring buffer size for queue VM flush
Here are the corrections needed for the queue ring buffer size
calculation for the following cases:
- Remove the KIQ VM flush ring usage.
- Add the invalidate TLBs packet for gfx10 and gfx11 queue.
- There's no VM flush and PFP sync, so remove the gfx9 real
  ring and compute ring buffer usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 23:23:46 -04:00
Lang Yu 661d71ee5a drm/amdgpu/umsch: don't execute umsch test when GPU is in reset/suspend
umsch test needs full GPU functionality(e.g., VM update, TLB flush,
possibly buffer moving under memory pressure) which may be not ready
under these states. Just skip it to avoid potential issues.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-23 23:23:28 -04:00
Felix Kuehling b0b13d5321 drm/amdgpu: Update BO eviction priorities
Make SVM BOs more likely to get evicted than other BOs. These BOs
opportunistically use available VRAM, but can fall back relatively
seamlessly to system memory. It also avoids SVM migrations evicting
other, more important BOs as they will evict other SVM allocations
first.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 23:23:28 -04:00
Peyton Lee d59198d2d0 drm/amdgpu/vpe: fix vpe dpm setup failed
The vpe dpm settings should be done before firmware is loaded.
Otherwise, the frequency cannot be successfully raised.

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 23:23:02 -04:00
Lijo Lazar aebd3eb9d3 drm/amdgpu: Assign correct bits for SDMA HDP flush
HDP Flush request bit can be kept unique per AID, and doesn't need to be
unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-23 23:22:54 -04:00
Lang Yu 9c783a1121 drm/amdkfd: make sure VM is ready for updating operations
When page table BOs were evicted but not validated before
updating page tables, VM is still in evicting state,
amdgpu_vm_update_range returns -EBUSY and
restore_process_worker runs into a dead loop.

v2: Split the BO validation and page table update into two
separate loops in amdgpu_amdkfd_restore_process_bos. (Felix)
  1.Validate BOs
  2.Validate VM (and DMABuf attachments)
  3.Update page tables for the BOs validated above

Fixes: 50661eb1a2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-23 23:22:23 -04:00
Mukul Joshi 25e9227c6a drm/amdgpu: Fix leak when GPU memory allocation fails
Free the sync object if the memory allocation fails for any
reason.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-23 23:17:30 -04:00
Felix Kuehling e76691f45a drm/amdgpu: Update BO eviction priorities
Make SVM BOs more likely to get evicted than other BOs. These BOs
opportunistically use available VRAM, but can fall back relatively
seamlessly to system memory. It also avoids SVM migrations evicting
other, more important BOs as they will evict other SVM allocations
first.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:31 -04:00
Pierre-Eric Pelloux-Prayer 6e042cee74 drm/amdgpu/vcn: fix unitialized variable warnings
Avoid returning an uninitialized value if we never enter the loop.
This case should never be hit in practice, but returning 0 doesn't
hurt.

The same fix is applied to the 4 places using the same pattern.

v2: - fixed typos in commit message (Alex)
    - use "return 0;" before the done label instead of initializing
      r to 0

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:30 -04:00
Alex Deucher 3f0664110a drm/amdgpu/mes11: print MES opcodes rather than numbers
Makes it easier to review the logs when there are MES
errors.

v2: use dbg for emitted, add helpers for fetching strings
v3: fix missing commas (Harish)
v4: drop command prefixes (Felix)
v5: squash in bounds fix (Jun)

Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed by Shaoyun.liu <Shaoyun.liu@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:30 -04:00
Peyton Lee 2476c6bd95 drm/amdgpu/vpe: fix vpe dpm setup failed
The vpe dpm settings should be done before firmware is loaded.
Otherwise, the frequency cannot be successfully raised.

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:30 -04:00
Lijo Lazar 7b19f1f346 drm/amdgpu: Assign correct bits for SDMA HDP flush
HDP Flush request bit can be kept unique per AID, and doesn't need to be
unique SOC-wide. Assign only bits 10-13 for SDMA v4.4.2.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:30 -04:00
Stanley.Yang 939c475181 drm/amdgpu: Support setting reset_method at runtime
In order to support more test cases, support user
change reset_method at runtime.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-23 12:08:30 -04:00
Maxime Ripard c058e7a8f8
Merge drm/drm-next into drm-misc-next
Maíra needs a backmerge to apply v3d patches, and Danilo for some
nouveau patches.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-04-23 08:48:56 +02:00
Arunpravin Paneer Selvam a68c7eaa7a drm/amdgpu: Enable clear page functionality
Add clear page support in vram memory region.

v1(Christian):
  - Dont handle clear page as TTM flag since when moving the BO back
    in from GTT again we don't need that.
  - Make a specialized version of amdgpu_fill_buffer() which only
    clears the VRAM areas which are not already cleared
  - Drop the TTM_PL_FLAG_WIPE_ON_RELEASE check in
    amdgpu_object.c

v2:
  - Modify the function name amdgpu_ttm_* (Alex)
  - Drop the delayed parameter (Christian)
  - handle amdgpu_res_cleared(&cursor) just above the size
    calculation (Christian)
  - Use AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE for clearing the buffers
    in the free path to properly wait for fences etc.. (Christian)

v3(Christian):
  - Remove buffer clear code in VRAM manager instead change the
    AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE handling to set
    the DRM_BUDDY_CLEARED flag.
  - Remove ! from amdgpu_res_cleared(&cursor) check.

v4(Christian):
  - vres flag setting move to vram manager file
  - use dma_fence_get_stub in amdgpu_ttm_clear_buffer function
  - make fence a mandatory parameter and drop the if and the get/put dance

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419063538.11957-2-Arunpravin.PaneerSelvam@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-04-22 19:44:16 +02:00
Arunpravin Paneer Selvam 96950929eb drm/buddy: Implement tracking clear page feature
- Add tracking clear page feature.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
  successfully clears the blocks in the free path. On the otherhand,
  DRM buddy marks each block as cleared.

- Track the available cleared pages size

- If driver requests cleared memory we prefer cleared memory
  but fallback to uncleared if we can't find the cleared blocks.
  when driver requests uncleared memory we try to use uncleared but
  fallback to cleared memory if necessary.

- When a block gets freed we clear it and mark the freed block as cleared,
  when there are buddies which are cleared as well we can merge them.
  Otherwise, we prefer to keep the blocks as separated.

- Add a function to support defragmentation.

v1:
  - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as
    cleared. Else, reset the clear flag for each block in the list(Christian)
  - For merging the 2 cleared blocks compare as below,
    drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian)
  - Defragment the memory beginning from min_order
    till the required memory space is available.

v2: (Matthew)
  - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks
    operation within drm buddy.
  - Write a macro block_incompatible() to allocate the required blocks.
  - Update the xe driver for the drm_buddy_free_list change in arguments.
  - add a warning if the two blocks are incompatible on
    defragmentation
  - call full defragmentation in the fini() function
  - place a condition to test if min_order is equal to 0
  - replace the list with safe_reverse() variant as we might
    remove the block from the list.

v3:
  - fix Gitlab user reported lockup issue.
  - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew)
  - modify to pass the root order instead max_order in fini()
    function(Matthew)
  - change bool 1 to true(Matthew)
  - add check if min_block_size is power of 2(Matthew)
  - modify the min_block_size datatype to u64(Matthew)

v4:
  - rename the function drm_buddy_defrag with __force_merge.
  - Include __force_merge directly in drm buddy file and remove
    the defrag use in amdgpu driver.
  - Remove list_empty() check(Matthew)
  - Remove unnecessary space, headers and placement of new variables(Matthew)
  - Add a unit test case(Matthew)

v5:
  - remove force merge support to actual range allocation and not to bail
    out when contains && split(Matthew)
  - add range support to force merge function.

v6:
  - modify the alloc_range() function clear page non merged blocks
    allocation(Matthew)
  - correct the list_insert function name(Matthew).

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419063538.11957-1-Arunpravin.PaneerSelvam@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-04-22 19:44:16 +02:00
Dave Airlie 377b5b397d amd-drm-next-6.10-2024-04-19:
amdgpu:
 - DC resource allocation logic updates
 - DC IPS fixes
 - DC YUV fixes
 - DMCUB fixes
 - DML2 fixes
 - Devcoredump updates
 - USB-C DSC fix
 - Misc display code cleanups
 - PSR fixes
 - MES timeout fix
 - RAS updates
 - UAF fix in VA IOCTL
 - Fix visible VRAM handling during faults
 - Fix IP discovery handling during PCI rescans
 - Misc code cleanups
 - PSP 14 updates
 - More runtime PM code rework
 - SMU 14.0.2 support
 - GPUVM page fault redirection to secondary IH rings for IH 6.x
 - Suspend/resume fixes
 - SR-IOV fixes
 
 amdkfd:
 - Fix eviction fence handling
 - Fix leak in GPU memory allocation failure case
 - DMABuf import handling fix
 
 radeon:
 - Silence UBSAN warnings related to flexible arrays
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZiLyawAKCRC93/aFa7yZ
 2BzfAPoDVZjTunizh6SyCFmQamR3eelnxWeY1xaVzmKBHqLCOAEAo2EyThRGyPCH
 SjD+f+ZlflaXQZtZpiQrOr0rkLvh5Q4=
 =96hT
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.10-2024-04-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.10-2024-04-19:

amdgpu:
- DC resource allocation logic updates
- DC IPS fixes
- DC YUV fixes
- DMCUB fixes
- DML2 fixes
- Devcoredump updates
- USB-C DSC fix
- Misc display code cleanups
- PSR fixes
- MES timeout fix
- RAS updates
- UAF fix in VA IOCTL
- Fix visible VRAM handling during faults
- Fix IP discovery handling during PCI rescans
- Misc code cleanups
- PSP 14 updates
- More runtime PM code rework
- SMU 14.0.2 support
- GPUVM page fault redirection to secondary IH rings for IH 6.x
- Suspend/resume fixes
- SR-IOV fixes

amdkfd:
- Fix eviction fence handling
- Fix leak in GPU memory allocation failure case
- DMABuf import handling fix

radeon:
- Silence UBSAN warnings related to flexible arrays

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419224332.2938259-1-alexander.deucher@amd.com
2024-04-22 12:28:49 +10:00
Lang Yu 81bf14519a drm/amdkfd: make sure VM is ready for updating operations
When page table BOs were evicted but not validated before
updating page tables, VM is still in evicting state,
amdgpu_vm_update_range returns -EBUSY and
restore_process_worker runs into a dead loop.

v2: Split the BO validation and page table update into two
separate loops in amdgpu_amdkfd_restore_process_bos. (Felix)
  1.Validate BOs
  2.Validate VM (and DMABuf attachments)
  3.Update page tables for the BOs validated above

Fixes: 50661eb1a2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:54:49 -04:00
Mukul Joshi e53a1713de drm/amdgpu: Fix leak when GPU memory allocation fails
Free the sync object if the memory allocation fails for any
reason.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:54:49 -04:00
Zhigang Luo 6a009ca1bf drm/amdgpu: remove virt_init_data_exchange from poison consumption handler
Host will initiate an FLR for all poison consumption.
Guest should wait for FLR message to re-init data exchange.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:47:26 -04:00
Lijo Lazar 8954c3fbe7 drm/amdgpu: Change AID detection logic
On GFX 9.4.3 SOCs, only 2 SDMA instances need to be available to be
considered as a valid AID.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:47:19 -04:00
Sunil Khatri 93522c1948 drm/amdgpu: enable redirection of irq's for IH V6.1
Enable redirection of irq for pagefaults for specific
clients to avoid overflow without dropping interrupts.

So here we redirect the interrupts to another IH ring
i.e ring1 where only these interrupts are processed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:46:56 -04:00
Ahmad Rehman ea137071ad drm/amdgpu: Skip the coredump collection on reset during driver reload
In passthrough environment, the driver triggers the mode-1 reset on
reload. The reset causes the core dump collection which is delayed task
and prevents driver from unloading until it is completed. Since we do
not need to collect data on "reset on reload" case, we can skip core
dump collection.

v2: Use the same flag to avoid calling amdgpu_reset_reg_dumps as well.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:46:45 -04:00
Sunil Khatri ca0afa2f41 drm/amdgpu: enable redirection of irq's for IH V6.0
Enable redirection of irq for pagefaults for specific
clients to avoid overflow without dropping interrupts.

So here we redirect the interrupts to another IH ring
i.e ring1 where only these interrupts are processed.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:46:37 -04:00
Sunil Khatri 5adcd78fa2 drm:amdgpu: enable IH ring1 for IH v6.1
We need IH ring1 for handling the pagefault
interrupts which over flow in default
ring for specific usecases.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:45:37 -04:00
Sunil Khatri eefc85a277 drm:amdgpu: enable IH RB ring1 for IH v6.0
We need IH ring1 for handling the pagefault
interrupts which are overflowing the default
ring for specific usecases.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-18 23:45:22 -04:00
Christian König a6ff969fe9 drm/amdgpu: fix visible VRAM handling during faults
When we removed the hacky start code check we actually didn't took into
account that *all* VRAM pages needs to be CPU accessible.

Clean up the code and unify the handling into a single helper which
checks if the whole resource is CPU accessible.

The only place where a partial check would make sense is during
eviction, but that is neglitible.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: aed01a6804 ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-04-17 11:50:43 -04:00
xinhui pan 6fef2d4c00 drm/amdgpu: validate the parameters of bo mapping operations more clearly
Verify the parameters of
amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.

Fixes: dc54d3d174 ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2")
Cc: stable@vger.kernel.org
Reported-by: Vlad Stolyarov <hexed@google.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-17 11:50:43 -04:00
Christian König ca7c4507ba drm/amdgpu: remove invalid resource->start check v2
The majority of those where removed in the commit aed01a6804
("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")

But this one was missed because it's working on the resource and not the
BO. Since we also no longer use a fake start address for visible BOs
this will now trigger invalid mapping errors.

v2: also remove the unused variable

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: aed01a6804 ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
CC: stable@vger.kernel.org
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-17 11:02:57 -04:00
Dave Airlie 34633158b8 amd-drm-next-6.10-2024-04-13:
amdgpu:
 - HDCP fixes
 - ODM fixes
 - RAS fixes
 - Devcoredump improvements
 - Misc code cleanups
 - Expose VCN activity via sysfs
 - SMY 13.0.x updates
 - Enable fast updates on DCN 3.1.4
 - Add dclk and vclk reporting on additional devices
 - Add ACA RAS infrastructure
 - Implement TLB flush fence
 - EEPROM handling fixes
 - SMUIO 14.0.2 support
 - SMU 14.0.1 Updates
 - Sync page table freeing with TLB flushes
 - DML2 refactor
 - DC debug improvements
 - SR-IOV fixes
 - Suspend and Resume fixes
 - DCN 3.5.x Updates
 - Z8 fixes
 - UMSCH fixes
 - GPU reset fixes
 - HDP fix for second GFX pipe on GC 10.x
 - Enable secondary GFX pipe on GC 10.3
 - Refactor and clean up BACO/BOCO/BAMACO handling
 - VCN partitioning fix
 - DC DWB fixes
 - VSC SDP fixes
 - DCN 3.1.6 fix
 - GC 11.5 fixes
 - Remove invalid TTM resource start check
 - DCN 1.0 fixes
 
 amdkfd:
 - MQD handling cleanup
 - Preemption handling fixes for XCDs
 - TLB flush fix for GC 9.4.2
 - Properly clean up workqueue during module unload
 - Fix memory leak process create failure
 - Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace
 
 radeon:
 - Misc code cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZhr4EAAKCRC93/aFa7yZ
 2B8jAP9z1JpOnjSQvc2mhHAooXRYO4Mj5HCQ25ZE8N4c8ZZhjAEAqefmEx5/UyLh
 lv2pWILL4o597qhq9nA7hJ6tTICLPAU=
 =HUwY
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.10-2024-04-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.10-2024-04-13:

amdgpu:
- HDCP fixes
- ODM fixes
- RAS fixes
- Devcoredump improvements
- Misc code cleanups
- Expose VCN activity via sysfs
- SMY 13.0.x updates
- Enable fast updates on DCN 3.1.4
- Add dclk and vclk reporting on additional devices
- Add ACA RAS infrastructure
- Implement TLB flush fence
- EEPROM handling fixes
- SMUIO 14.0.2 support
- SMU 14.0.1 Updates
- Sync page table freeing with TLB flushes
- DML2 refactor
- DC debug improvements
- SR-IOV fixes
- Suspend and Resume fixes
- DCN 3.5.x Updates
- Z8 fixes
- UMSCH fixes
- GPU reset fixes
- HDP fix for second GFX pipe on GC 10.x
- Enable secondary GFX pipe on GC 10.3
- Refactor and clean up BACO/BOCO/BAMACO handling
- VCN partitioning fix
- DC DWB fixes
- VSC SDP fixes
- DCN 3.1.6 fix
- GC 11.5 fixes
- Remove invalid TTM resource start check
- DCN 1.0 fixes

amdkfd:
- MQD handling cleanup
- Preemption handling fixes for XCDs
- TLB flush fix for GC 9.4.2
- Properly clean up workqueue during module unload
- Fix memory leak process create failure
- Range check CP bad op exception targets to avoid reporting invalid exceptions to userspace

radeon:
- Misc code cleanups

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240413213708.3427038-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-04-17 15:48:59 +10:00
Kenneth Feng 0c1195ca0d drm/amd/swsmu: support smu block discovery for smu v14
Support for smu ip block add for SMU v14.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:16 -04:00
Hawking Zhang 577cbed318 drm/amdgpu: rename DBG_DRV to HAD_DRV for psp v14
Add a psp bl command enum for HAD_DRV.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:16 -04:00
Ma Jun 1347853271 drm/amdgpu: refactoring the runtime pm mode detection code
refactor the code of runtime pm mode detection to support
amdgpu_runtime_pm =2 and 1 two cases

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:16 -04:00
Hawking Zhang 6c6acc5f33 drm/amdgpu: Load ipkeymgr drv for psp v14
while DBG_DRV is renamed to HAD_DRV for psp v14,
part of its APIs/functionality is moved to a new
component named Ipkeymgr_Drv.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:16 -04:00
Thorsten Blum 12b8b4e685 drm/amdgpu: Add missing space to DRM_WARN() message
s/,please/, please/

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:15 -04:00
Ma Jun 959056982a drm/amdgpu: Fix discovery initialization failure during pci rescan
Waiting for system ready to fix the discovery initialization
failure issue. This failure usually occurs when dGPU is removed
and then rescanned via command line.
It's caused by following two errors:
[1] vram size is 0
[2] wrong binary signature

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:15 -04:00
Christian König 394ae0603a drm/amdgpu: fix visible VRAM handling during faults
When we removed the hacky start code check we actually didn't took into
account that *all* VRAM pages needs to be CPU accessible.

Clean up the code and unify the handling into a single helper which
checks if the whole resource is CPU accessible.

The only place where a partial check would make sense is during
eviction, but that is neglitible.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: aed01a6804 ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2024-04-16 22:39:15 -04:00
xinhui pan 98856136c4 drm/amdgpu: validate the parameters of bo mapping operations more clearly
Verify the parameters of
amdgpu_vm_bo_(map/replace_map/clearing_mappings) in one common place.

Fixes: dc54d3d174 ("drm/amdgpu: implement AMDGPU_VA_OP_CLEAR v2")
Cc: stable@vger.kernel.org
Reported-by: Vlad Stolyarov <hexed@google.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:15 -04:00
Yang Wang f23558627f drm/amdgpu: add new aca smu callback func parse_error_code()
add new aca smu callback parse_error_code{} to avoid specific asic check
in amdgpu_aca.c file

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:39:15 -04:00
Jonathan Kim f7c161a4c2 drm/amdgpu: increase mes submission timeout
MES internally has a timeout allowance of 2 seconds.
Increase driver timeout to 3 seconds to be safe.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 22:38:59 -04:00
Sunil Khatri 3c858cf65e drm/amdgpu: add missing vbios version from devcoredump
Add vbios version in the devcoredump along with formatting
the information with proper alignment.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 21:25:28 -04:00
Alex Deucher 8b9130bae0 drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset
Need to take the srbm_mutex and while we are here, use the
helper function soc21_grbm_select();

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-16 21:25:23 -04:00
Christian König c8962679af drm/amdgpu: remove invalid resource->start check v2
The majority of those where removed in the commit aed01a6804
("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")

But this one was missed because it's working on the resource and not the
BO. Since we also no longer use a fake start address for visible BOs
this will now trigger invalid mapping errors.

v2: also remove the unused variable

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: aed01a6804 ("drm/amdgpu: Remove TTM resource->start visible VRAM condition v2")
CC: stable@vger.kernel.org
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12 00:33:51 -04:00
Jack Xiao a0e002cdac drm/amdgpu/sdma6: set sdma hang watchdog
Set SDMAx_WATCHDOG_CNTL.QUEUE_HANG_COUNT registers
to improve SDMA reliability.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12 00:33:43 -04:00
Luqmaan Irshad 6b0d78032f drm/amd/amdgpu: Update PF2VF Header
Adding a new field for GPU Capacity to align the header with the host.

Signed-off-by: Luqmaan Irshad <Luqmaan.Irshad@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-12 00:33:11 -04:00
Yifan Zhang 6dba20d23e drm/amdgpu: differentiate external rev id for gfx 11.5.0
This patch to differentiate external rev id for gfx 11.5.0.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-10 00:00:32 -04:00
Tim Huang bbca7f414a drm/amdgpu: fix incorrect number of active RBs for gfx11
The RB bitmap should be global active RB bitmap &
active RB bitmap based on active SA.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 23:30:30 -04:00
ZhenGuo Yin e33997e18d drm/amdgpu: clear set_q_mode_offs when VM changed
[Why]
set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE
packet to init shadow memory will be skiped, hence there has a page fault.

[How]
VM flush is needed after GPU reset, clear set_q_mode_offs when
emitting VM flush.

Fixes: 8bc75586ea ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 23:27:58 -04:00
Lijo Lazar f7e232de51 drm/amdgpu: Fix VCN allocation in CPX partition
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:27:23 -04:00
Kenneth Feng 3818708e9c drm/amd/pm: fix the high voltage issue after unload
fix the high voltage issue after unload on smu 13.0.10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:26:32 -04:00
Tao Zhou f886b49fea drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
SDMA_CNTL is not set in some cases, driver configures it by itself.

v2: simplify code

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:16:51 -04:00
Yifan Zhang 533eefb9be drm/amdgpu: add smu 14.0.1 discovery support
This patch to add smu 14.0.1 support

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:15:05 -04:00
shaoyunl 5b0cd091d9 drm/amdgpu : Increase the mes log buffer size as per new MES FW version
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:14:40 -04:00
shaoyunl a3a4c0b123 drm/amdgpu : Add mes_log_enable to control mes log feature
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:14:18 -04:00
Lijo Lazar 8b2be55f4d drm/amdgpu: Reset dGPU if suspend got aborted
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 23:12:44 -04:00
Lang Yu 0f1bbcc2ba drm/amdgpu/umsch: reinitialize write pointer in hw init
Otherwise the old one will be used during GPU reset.
That's not expected.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 23:12:20 -04:00
Lijo Lazar 4b18a91faf drm/amdgpu: Refine IB schedule error logging
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 23:11:59 -04:00
Alex Deucher 65ff8092e4 drm/amdgpu: always force full reset for SOC21
There are cases where soft reset seems to succeed, but
does not, so always use mode1/2 for now.

Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 23:10:06 -04:00
Yifan Zhang 526b184e88 drm/amdgpu: differentiate external rev id for gfx 11.5.0
This patch to differentiate external rev id for gfx 11.5.0.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:20:01 -04:00
Tim Huang d6d6561f93 drm/amdgpu: fix incorrect number of active RBs for gfx11
The RB bitmap should be global active RB bitmap &
active RB bitmap based on active SA.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:14:55 -04:00
Zhigang Luo d1999b4017 amd/amdgpu: improve VF recover time
1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5.
2. set fatel error detected flag.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:14:30 -04:00
Lijo Lazar b41f742d6f drm/amdgpu: Set fatal errror detected flag earlier
In case of fatal errors, set FED status when interrupt is received. Set
the flag on other devices in the hive before RAS recovery work.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:13:36 -04:00
ZhenGuo Yin 05e4014168 drm/amdgpu: clear set_q_mode_offs when VM changed
[Why]
set_q_mode_offs don't get cleared after GPU reset, nexting SET_Q_MODE
packet to init shadow memory will be skiped, hence there has a page fault.

[How]
VM flush is needed after GPU reset, clear set_q_mode_offs when
emitting VM flush.

Fixes: 8bc75586ea ("drm/amdgpu: workaround to avoid SET_Q_MODE packets v2")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:09:21 -04:00
Tao Zhou 4b0cb230bd drm/amdgpu: retire UMC v12 mca_addr_to_pa
RAS TA will handle it, the function is useless.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:09:15 -04:00
chongli2 f6ac084236 drm/amd/amdgpu: support MES command SET_HW_RESOURCE1 in sriov
support MES command SET_HW_RESOURCE1 in sriov

Signed-off-by: chongli2 <chongli2@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:53 -04:00
Tao Zhou 9ecef5b2d0 drm/amdgpu: update check condition for XGMI ACA UE
Check more possible ext error codes.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:47 -04:00
Lijo Lazar 91bc860116 drm/amdgpu: Fix VCN allocation in CPX partition
VCN need not be shared in CPX mode always for all GFX 9.4.3 SOC SKUs. In
certain configs, VCN instance can be exclusively allocated to a
partition even under CPX mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:34 -04:00
Ma Jun fcc0735b00 drm/amdgpu: Add support for BAMACO mode checking
Optimize the code to add support for BAMACO mode checking

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:18 -04:00
Hawking Zhang 327eec5427 drm/amdgpu: Bypass asd if display hw is not available
ASD is not needed by headless GPU.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:08:05 -04:00
Ma Jun b2207dc698 drm/amdgpu/pm: Add support for MACO flag checking
Add support for MACO flag checking.
MACO mode only works if BACO is supported.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:07:59 -04:00
Kenneth Feng 7c1d9e10e6 drm/amd/pm: fix the high voltage issue after unload
fix the high voltage issue after unload on smu 13.0.10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:07:30 -04:00
Danijel Slivka 3e2dacca54 drm/amdgpu: use vm_update_mode=0 as default in sriov for gfx10.3 onwards
Apply this rule to all newer asics in sriov case.
For asic with VF MMIO access protection avoid using CPU for VM table updates.
CPU pagetable updates have issues with HDP flush as VF MMIO access protection
blocks write to BIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL register
during sriov runtime.
Moved the check to amdgpu_device_init() to ensure it is done after
amdgpu_device_ip_early_init() where the IP versions are discovered.

Signed-off-by: Danijel Slivka <danijel.slivka@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:02:37 -04:00
Arunpravin Paneer Selvam b7a1a0ef12 drm/amd/amdgpu: add pipe1 hardware support
Enable pipe1 support starting from SIENNA CICHLID asic

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 22:02:24 -04:00
ZhenGuo Yin 0453e5f220 drm/amdgpu: select HDP ref/mask according to gfx ring pipe
Use correct ref/mask for differnent gfx ring pipe.
This should fix the gfx hang issue after enabling gfx pipe1.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2117
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:59:44 -04:00
Tao Zhou 029faefb73 drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
SDMA_CNTL is not set in some cases, driver configures it by itself.

v2: simplify code

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:59:20 -04:00
Yifan Zhang f5a3507c4a drm/amdgpu: add smu 14.0.1 discovery support
This patch to add smu 14.0.1 support

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:51:03 -04:00
shaoyunl 8966c31674 drm/amdgpu : Increase the mes log buffer size as per new MES FW version
From MES version 0x54, the log entry increased and require the log buffer
size to be increased. The 16k is maximum size agreed

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:50:48 -04:00
shaoyunl e58acb7613 drm/amdgpu : Add mes_log_enable to control mes log feature
The MES log might slow down the performance for extra step of log the data,
disable it by default and introduce a parameter can enable it when necessary

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:50:27 -04:00
Yang Wang 81d96e8b5a drm/amdgpu: refine function signature of amdgpu_aca_get_error_data()
refine function signature of amdgpu_aca_get_error_data();

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:50:09 -04:00
Lijo Lazar df3c7dc5c5 drm/amdgpu: Reset dGPU if suspend got aborted
For SOC21 ASICs, there is an issue in re-enabling PM features if a
suspend got aborted. In such cases, reset the device during resume
phase. This is a workaround till a proper solution is finalized.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-04-09 21:49:42 -04:00
Sunil Khatri 6a0e1bafd7 drm/amdgpu: add IP's FW information to devcoredump
Add FW information of all the IP's in the devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:49:34 -04:00
Lang Yu 108ab31be9 drm/amdgpu/umsch: reinitialize write pointer in hw init
Otherwise the old one will be used during GPU reset.
That's not expected.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:49:19 -04:00
Lijo Lazar cd409dbc69 drm/amdgpu: Refine IB schedule error logging
Downgrade to debug information when IBs are skipped. Also, use dev_* to
identify the device.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-09 21:49:09 -04:00
Dave Airlie fee54d08bc Merge tag 'drm-misc-next-2024-03-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Two misc-next in one.

drm-misc-next for v6.10-rc1:

The deal of a lifetime! You get ALL of the previous
drm-misc-next-2024-03-21-1 tag!!

But WAIT, there's MORE!

Cross-subsystem Changes:
- Assorted DT binding updates.

Core Changes:
- Clarify how optional wait_hpd_asserted is.
- Shuffle Kconfig names around.

Driver Changes:
- Assorted build fixes for panthor, imagination,
- Add AUO B120XAN01.0 panels.
- Assorted small fixes to panthor, panfrost.

drm-misc-next for v6.10:
UAPI Changes:
- Move some nouveau magic constants to uapi.

Cross-subsystem Changes:
- Move drm-misc to gitlab and freedesktop hosting.
- Add entries for panfrost.

Core Changes:
- Improve placement for TTM bo's in idle/busy handling.
- Improve drm/bridge init ordering.
- Add CONFIG_DRM_WERROR, and use W=1 for drm.
- Assorted documentation updates.
- Make more (drm and driver) headers self-contained and add header
  guards.
- Grab reservation lock in pin/unpin callbacks.
- Fix reservation lock handling for vmap.
- Add edp and edid panel matching, use it to fix a nearly identical
  panel.

Driver Changes:
- Add drm/panthor driver and assorted fixes.
- Assorted small fixes to xlnx, panel-edp, tidss, ci, nouveau,
  panel and bridge drivers.
- Add Samsung s6e3fa7, BOE NT116WHM-N44, CMN N116BCA-EA1,
  CrystalClear CMT430B19N00, Startek KD050HDFIA020-C020A,
  powertip PH128800T006-ZHC01 panels.
- Fix console for omapdrm.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/bea310a6-6ff6-477e-9363-f9f053cfd12a@linux.intel.com
2024-04-05 13:16:17 +10:00
Thomas Zimmermann 0d21364c6e Merge drm/drm-next into drm-misc-next
Backmerging to get v6.9-rc2 changes into drm-misc-next.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-04-02 09:51:30 +02:00
Maxime Ripard f6d2dc03fa
drm: Switch DRM_DISPLAY_HDMI_HELPER to depends on
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.

Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDMI_HELPER to depend on it.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-12-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28 11:26:52 +01:00
Maxime Ripard 3166e7e6d9
drm: Switch DRM_DISPLAY_HDCP_HELPER to depends on
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.

Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HDCP_HELPER to depend on it.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-11-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28 11:26:51 +01:00
Maxime Ripard 0323287de8
drm: Switch DRM_DISPLAY_DP_HELPER to depends on
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.

Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_DP_HELPER to depend on it.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-10-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28 11:26:51 +01:00
Maxime Ripard e075e496f5
drm: Switch DRM_DISPLAY_HELPER to depends on
Most of our helpers have relied on being selected so far through
Kconfig, but that creates issues when we have multiple layers of helpers
with some depending on others.

Indeed, select doesn't select a dependency's dependencies, and thus
isn't super intuitive. Depends on however doesn't have that limitation,
so we can just switch all the drivers that were selecting
DRM_DISPLAY_HELPER to depend on it.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240327-kms-kconfig-helpers-v3-8-eafee11b84b3@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-03-28 11:26:49 +01:00
Johannes Weiner 8678b1060a drm/amdgpu: fix deadlock while reading mqd from debugfs
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):

[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082]        __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082]        ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082]        dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082]        kernel_init+0x15/0x1a0
[ 1318.027242][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082]        dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082]        kernel_init+0x15/0x1a0
[ 1318.031168][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082]        __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082]        lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082]        __might_fault+0x58/0x80
[ 1318.033814][ T1082]        amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082]        full_proxy_read+0x55/0x80
[ 1318.034487][ T1082]        vfs_read+0xa7/0x360
[ 1318.034788][ T1082]        ksys_read+0x70/0xf0
[ 1318.035085][ T1082]        do_syscall_64+0x94/0x180
[ 1318.035375][ T1082]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082]   &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082]  Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082]        CPU0                    CPU1
[ 1318.038101][ T1082]        ----                    ----
[ 1318.038350][ T1082]   lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082]                                lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082]                                lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082]   rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082]  *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082]  #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082]  <TASK>
[ 1318.042175][ T1082]  dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082]  check_noncircular+0x145/0x160
[ 1318.042743][ T1082]  __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082]  lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043580][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043856][ T1082]  __might_fault+0x58/0x80
[ 1318.044131][ T1082]  ? __might_fault+0x40/0x80
[ 1318.044408][ T1082]  amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082]  full_proxy_read+0x55/0x80
[ 1318.045042][ T1082]  vfs_read+0xa7/0x360
[ 1318.045333][ T1082]  ksys_read+0x70/0xf0
[ 1318.045623][ T1082]  do_syscall_64+0x94/0x180
[ 1318.045913][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082]  ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082]  </TASK>

amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.

Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.

Fixes: 445d85e3c1 ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 09:30:34 -04:00
Lang Yu 68a2afbcca drm/amdgpu: enable UMSCH 4.0.6
Share same codes with 4.0.5 and enable collaborate mode for VPE.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 09:30:05 -04:00
Lang Yu 6b154c00cd drm/amdgpu/umsch: update UMSCH 4.0 FW interface
Align with FW changes.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 09:29:42 -04:00
Mario Limonciello ca299b4512 drm/amd: Flush GFXOFF requests in prepare stage
If the system hasn't entered GFXOFF when suspend starts it can cause
hangs accessing GC and RLC during the suspend stage.

Cc: <stable@vger.kernel.org> # 6.1.y: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca3233 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b0e ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.6.y: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca3233 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b0e ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.1+
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 08:55:54 -04:00
Peyton Lee eed14eb48e drm/amdgpu/vpe: power on vpe when hw_init
To fix mode2 reset failure.
Should power on VPE when hw_init.

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 08:50:20 -04:00
Alex Deucher d7f1487643 drm/amdgpu: always force full reset for SOC21
There are cases where soft reset seems to succeed, but
does not, so always use mode1/2 for now.

Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 01:45:22 -04:00
Johannes Weiner c25d09bcb7 drm/amdgpu: fix deadlock while reading mqd from debugfs
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):

[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082]        __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082]        ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082]        dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082]        kernel_init+0x15/0x1a0
[ 1318.027242][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082]        dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082]        do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082]        kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082]        kernel_init+0x15/0x1a0
[ 1318.031168][ T1082]        ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082]        ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082]        __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082]        lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082]        __might_fault+0x58/0x80
[ 1318.033814][ T1082]        amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082]        full_proxy_read+0x55/0x80
[ 1318.034487][ T1082]        vfs_read+0xa7/0x360
[ 1318.034788][ T1082]        ksys_read+0x70/0xf0
[ 1318.035085][ T1082]        do_syscall_64+0x94/0x180
[ 1318.035375][ T1082]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082]   &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082]  Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082]        CPU0                    CPU1
[ 1318.038101][ T1082]        ----                    ----
[ 1318.038350][ T1082]   lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082]                                lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082]                                lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082]   rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082]  *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082]  #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082]  <TASK>
[ 1318.042175][ T1082]  dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082]  check_noncircular+0x145/0x160
[ 1318.042743][ T1082]  __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082]  lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043580][ T1082]  ? __might_fault+0x40/0x80
[ 1318.043856][ T1082]  __might_fault+0x58/0x80
[ 1318.044131][ T1082]  ? __might_fault+0x40/0x80
[ 1318.044408][ T1082]  amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082]  full_proxy_read+0x55/0x80
[ 1318.045042][ T1082]  vfs_read+0xa7/0x360
[ 1318.045333][ T1082]  ksys_read+0x70/0xf0
[ 1318.045623][ T1082]  do_syscall_64+0x94/0x180
[ 1318.045913][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082]  ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082]  ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082]  </TASK>

amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.

Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.

Fixes: 445d85e3c1 ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 01:44:24 -04:00
Lang Yu b9a8aee136 drm/amdgpu: enable UMSCH 4.0.6
Share same codes with 4.0.5 and enable collaborate mode for VPE.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 01:44:17 -04:00
Lang Yu f3e698978c drm/amdgpu/umsch: update UMSCH 4.0 FW interface
Align with FW changes.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-27 01:44:07 -04:00
Mario Limonciello 0355b24bde drm/amd: Flush GFXOFF requests in prepare stage
If the system hasn't entered GFXOFF when suspend starts it can cause
hangs accessing GC and RLC during the suspend stage.

Cc: <stable@vger.kernel.org> # 6.1.y: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.1.y: cb11ca3233 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.1.y: 2ceec37b0e ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.1.y: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.6.y: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Cc: <stable@vger.kernel.org> # 6.6.y: cb11ca3233 ("drm/amd: Add concept of running prepare_suspend() sequence for IP blocks")
Cc: <stable@vger.kernel.org> # 6.6.y: 2ceec37b0e ("drm/amd: Add missing kernel doc for prepare_suspend()")
Cc: <stable@vger.kernel.org> # 6.6.y: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Cc: <stable@vger.kernel.org> # 6.1+
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:57:04 -04:00
Srinivasan Shanmugam eb4f6eca26 drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode()
Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode
function. This would ensure that the total number of characters being
written into fw_name does not exceed its size of 40.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c: In function ‘gfx_v11_0_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:54: warning: ‘_pfp.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  523 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix);
      |                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:523:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  523 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:54: warning: ‘_me.bin’ directive output may be truncated writing 7 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  540 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix);
      |                                                      ^~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:540:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 40
  540 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:70: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  557 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:557:25: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  557 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:54: warning: ‘_mec.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
  569 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix);
      |                                                      ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:569:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
  569 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", ucode_prefix);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC [M]  drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_clockpowergating.o

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:56:57 -04:00
Tao Zhou 8e4617c25d drm/amdgpu: simplify convert_error_address interface for UMC v12
Replace separate parameters with struct ta_ras_query_address_input.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:56:18 -04:00
Srinivasan Shanmugam 539ff12ee5 drm/amdgpu: Fix truncation issues in gfx_v9_0.c
The size of fw_name is increased to ensure that it can accommodate
the maximum possible size of the string being written into it.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1255 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name);
      |                                                    ^~
......
 1393 |                 r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix);
      |                                                          ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
 1255 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1261 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name);
      |                                                    ^~
......
 1393 |                 r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix);
      |                                                          ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1261:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30
 1261 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1267 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name);
      |                                                    ^~
......
 1393 |                 r = gfx_v9_0_init_cp_gfx_microcode(adev, ucode_prefix);
      |                                                          ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1267:9: note: ‘snprintf’ output between 15 and 44 bytes into a destination of size 30
 1267 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1303 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name);
      |                                                            ^~
......
 1398 |         r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix);
      |                                               ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1303:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30
 1303 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc_am4.bin", chip_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1309 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name);
      |                                                            ^~
......
 1398 |         r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix);
      |                                               ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1309:17: note: ‘snprintf’ output between 23 and 52 bytes into a destination of size 30
 1309 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_kicker_rlc.bin", chip_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1311 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |                                                            ^~
......
 1398 |         r = gfx_v9_0_init_rlc_microcode(adev, ucode_prefix);
      |                                               ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1311:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
 1311 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1344 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name);
      |                                                            ^~
......
 1402 |         r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                      ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1344:17: note: ‘snprintf’ output between 20 and 49 bytes into a destination of size 30
 1344 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec.bin", chip_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:60: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1346 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |                                                            ^~
......
 1402 |         r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                      ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1346:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
 1346 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1356 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name);
      |                                                                    ^~
......
 1402 |         r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                      ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1356:25: note: ‘snprintf’ output between 21 and 50 bytes into a destination of size 30
 1356 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_sjt_mec2.bin", chip_name);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:68: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
 1358 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name);
      |                                                                    ^~
......
 1402 |         r = gfx_v9_0_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                      ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1358:25: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 30
 1358 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2.bin", chip_name);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:56:10 -04:00
Srinivasan Shanmugam 927a8a800e drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode
The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for
ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53
characters.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function ‘gfx_v10_0_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=]
 3982 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks);
      |                                                          ^~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3982:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40
 3982 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_pfp%s.bin", ucode_prefix, wks);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=]
 3988 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks);
      |                                                         ^~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3988:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40
 3988 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_me%s.bin", ucode_prefix, wks);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:57: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 1 and 30 [-Wformat-truncation=]
 3994 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks);
      |                                                         ^~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:3994:9: note: ‘snprintf’ output between 15 and 48 bytes into a destination of size 40
 3994 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ce%s.bin", ucode_prefix, wks);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:62: warning: ‘_rlc.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
 4001 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                                                              ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4001:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
 4001 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:58: warning: ‘%s’ directive output may be truncated writing up to 4 bytes into a region of size between 0 and 29 [-Wformat-truncation=]
 4017 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks);
      |                                                          ^~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4017:9: note: ‘snprintf’ output between 16 and 49 bytes into a destination of size 40
 4017 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec%s.bin", ucode_prefix, wks);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:54: warning: ‘_mec2’ directive output may be truncated writing 5 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
 4024 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks);
      |                                                      ^~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4024:9: note: ‘snprintf’ output between 17 and 50 bytes into a destination of size 40
 4024 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec2%s.bin", ucode_prefix, wks);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:55:54 -04:00
Srinivasan Shanmugam 20fd14460f drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode
The snprintf function is used to write a formatted string into fw_name.
The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced
by the string in ucode_prefix and the second %s is replaced by either
"_2" or "1" depending on the condition pipe == AMDGPU_MES_SCHED_PIPE.

The length of the string "amdgpu/%s_mes%s.bin" is 16 characters plus the
length of ucode_prefix and the length of the string "_2" or "1". The
size of ucode_prefix is 30, so the maximum length of ucode_prefix is 29
characters (since one character is needed for the null terminator).
Therefore, the maximum possible length of the string written into
fw_name is 16 + 29 + 2 = 47 characters.

The size of fw_name is 40, so if the length of the string written into
fw_name is more than 39 characters (since one character is needed for
the null terminator), it will be truncated by the snprintf function, and
thus warnings will be seen.

By increasing the size of fw_name to 50, we ensure that fw_name is
large enough to hold the maximum possible length of the string, so the
snprintf function will not truncate the output.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function ‘amdgpu_mes_init_microcode’:
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:66: warning: ‘%s’ directive output may be truncated writing up to 1 bytes into a region of size between 0 and 29 [-Wformat-truncation=]
 1482 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                                                                  ^~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1482:17: note: ‘snprintf’ output between 16 and 46 bytes into a destination of size 40
 1482 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1483 |                          ucode_prefix,
      |                          ~~~~~~~~~~~~~
 1484 |                          pipe == AMDGPU_MES_SCHED_PIPE ? "" : "1");
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 1 byte into a region of size between 0 and 29 [-Wformat-truncation=]
 1477 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                                                                  ^~
 1478 |                          ucode_prefix,
 1479 |                          pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1");
      |                                                                 ~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 17 and 46 bytes into a destination of size 40
 1477 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1478 |                          ucode_prefix,
      |                          ~~~~~~~~~~~~~
 1479 |                          pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1");
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:66: warning: ‘%s’ directive output may be truncated writing 2 bytes into a region of size between 0 and 29 [-Wformat-truncation=]
 1477 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                                                                  ^~
 1478 |                          ucode_prefix,
 1479 |                          pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1");
      |                                                          ~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1477:17: note: ‘snprintf’ output between 18 and 47 bytes into a destination of size 40
 1477 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes%s.bin",
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1478 |                          ucode_prefix,
      |                          ~~~~~~~~~~~~~
 1479 |                          pipe == AMDGPU_MES_SCHED_PIPE ? "_2" : "1");
      |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:62: warning: ‘_mes.bin’ directive output may be truncated writing 8 bytes into a region of size between 4 and 33 [-Wformat-truncation=]
 1489 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin",
      |                                                              ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1489:17: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 40
 1489 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mes.bin",
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1490 |                          ucode_prefix);
      |                          ~~~~~~~~~~~~~

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:55:44 -04:00
Srinivasan Shanmugam 7c2bc34ab9 drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()
Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init
function. This would ensure that the total number of characters being
written into fw_name does not exceed its size of 40.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c: In function ‘amdgpu_vcn_early_init’:
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  102 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
      |                                                                  ^
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40
  102 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:66: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
  102 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
      |                                                                  ^
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:102:17: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 40
  102 |                 snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin", ucode_prefix);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:73: warning: ‘.bin’ directive output may be truncated writing 4 bytes into a region of size between 2 and 31 [-Wformat-truncation=]
  105 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i);
      |                                                                         ^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c:105:25: note: ‘snprintf’ output between 14 and 43 bytes into a destination of size 40
  105 |                         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_%d.bin", ucode_prefix, i);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:55:01 -04:00
Tao Zhou 8b3495eafb drm/amdgpu: add socket id parameter for psp query address cmd
And set the socket id.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:54:54 -04:00
Shashank Sharma f88a7dd06a drm/amdgpu: Add a NULL check for freeing root PT
This patch adds a NULL check to fix this crash reported during the
freeing of root PT entry:

 BUG: unable to handle page fault for address: ffffc9002d637aa0
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu]
 PKRU: 55555554
 Call Trace:
 <TASK>
  amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu]
  amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu]
  ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0 [amdgpu]
  amdgpu_driver_postclose_kms+0x191/0x2d0 [amdgpu]
  drm_file_free.part.0+0x1e5/0x260 [drm]

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Christian König <Christian.Koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:51:55 -04:00
Sunil Khatri 9022f01b97 drm/amdgpu: refactor code to split devcoredump code
Refractor devcoredump code into new files since its
functionality is expanded further and better to slit
and devcoredump to have its own file.

v2: Fix the build failure caught by arm compiler
of implicit function declaration with #ifdef

v3: squash in fix for implicit declaration error

Cc: Ivan Lipski <ivan.lipski@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:51:48 -04:00
Peyton Lee 9ddafd1d14 drm/amdgpu/vpe: power on vpe when hw_init
To fix mode2 reset failure.
Should power on VPE when hw_init.

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:50:20 -04:00
Yang Wang 31fd330b97 drm/amdgpu: add ras event id support for ACA
add ras event id support for ACA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:48:18 -04:00
Yang Wang bd15bf742f drm/amdgpu: avoid update aca bank multi times during ras isr
Because the UE Valid MCA count will only be cleared after reset,
in order to avoid repeated counting of the error count,
the aca bank is only updated once during ras isr.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:48:11 -04:00
Yang Wang f7bcfb7a56 drm/amdgpu: retrieve umc odecc error count for aca umc v12.0
retrieve umc odecc error count for aca umc v12.0

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:48:03 -04:00
Shashank Sharma b6c4f90b38 drm/amdgpu: sync page table freeing with tlb flush
The idea behind this patch is to delay the freeing of PT entry objects
until the TLB flush is done.

This patch:
- Adds a tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
  objects that need to be freed after tlb_flush.
- Adds PT entries in this list in amdgpu_vm_ptes_update after finding
  the PT entry.
- Changes functionality of amdgpu_vm_pt_free_dfs from (df_search + free)
  to simply freeing of the BOs, also renames it to
  amdgpu_vm_pt_free_list to reflect this same.
- Exports function amdgpu_vm_pt_free_list to be called directly.
- Calls amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range.

V2: rebase
V4: Addressed review comments from Christian
    - add only locked PTEs entries in TLB flush waitlist.
    - do not create a separate function for list flush.
    - do not create a new lock for TLB flush.
    - there is no need to wait on tlb_flush_fence exclusively.

V5: Addressed review comments from Christian
    - change the amdgpu_vm_pt_free_dfs's functionality to simple freeing
      of the objects and rename it.
    - add all the PTE objects in params->tlb_flush_waitlist
    - let amdgpu_vm_pt_free_root handle the freeing of BOs independently
    - call amdgpu_vm_pt_free_list directly

V6: Rebase
V7: Rebase
V8: Added a NULL check to fix this backtrace issue:
[  415.351447] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  415.359245] #PF: supervisor write access in kernel mode
[  415.365081] #PF: error_code(0x0002) - not-present page
[  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
[  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: G           OE     5.18.2-mi300-build-140423-ubuntu-22.04+ #24
[  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS RMO1001AS 02/21/2024
[  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
[  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 4c 89 ff <48
> 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
[  415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287
[  415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 0000000000000000
[  415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: ffff888161f80000
[  415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: ffffc9000401fa00
[  415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: ffffc9000401fb20
[  415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: ffff888161f80000
[  415.476312] FS:  00007f132ff89840(0000) GS:ffff889f87c00000(0000) knlGS:0000000000000000
[  415.485350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 0000000000770ef0
[  415.499738] PKRU: 55555554
[  415.502750] Call Trace:
[  415.505482]  <TASK>
[  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
[  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
[  415.519814]  amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 [amdgpu]
[  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
[  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]

V9: Addressed review comments from Christian
    - No NULL check reqd for root PT freeing
    - Free PT list regardless of needs_flush
    - Move adding BOs in list in a separate function

V10: Added Christian's RB
V11: squash in list fix

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Christian König <Christian.Koenig@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-22 15:47:17 -04:00
Linus Torvalds 7ee0490121 drm fixes for 6.9-rc1
core:
 - fix rounding in drm_fixp2int_round()
 
 bridge:
 - fix documentation for DRM_BRIDGE_OP_EDID
 
 sun4i:
 - fix 64-bit division on 32-bit architectures
 
 tests:
 - fix dependency on DRM_KMS_HELPER
 
 probe-helper:
 - never return negative values from .get_modes() plus driver fixes
 
 xe:
 - invalidate userptr vma on page pin fault
 - fail early on sysfs file creation error
 - skip VMA pinning on xe_exec if no batches
 
 nouveau:
 - clear bo resource bus after eviction
 - documentation fixes
 - don't check devinit disable on GSP
 
 amdgpu:
 - Freesync fixes
 - UAF IOCTL fixes
 - Fix mmhub client ID mapping
 - IH 7.0 fix
 - DML2 fixes
 - VCN 4.0.6 fix
 - GART bind fix
 - GPU reset fix
 - SR-IOV fix
 - OD table handling fixes
 - Fix TA handling on boards without display hardware
 - DML1 fix
 - ABM fix
 - eDP panel fix
 - DPPCLK fix
 - HDCP fix
 - Revert incorrect error case handling in ioremap
 - VPE fix
 - HDMI fixes
 - SDMA 4.4.2 fix
 - Other misc fixes
 
 amdkfd:
 - Fix duplicate BO handling in process restore
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmX833kACgkQDHTzWXnE
 hr4ZDA/8CC/jW8drOInD52pqPxtFrhLPZ/pD+Vz4BWkcLnzRiM2d5gM3z3/JE7xi
 BGcxmhPgwT/M9oZENRxtSf2uUV/RZ0OMj+Mpwnew1YpANwWBe8pKeiwbHA4A4qzP
 Z46VLK0+BXEjh0btC4RY/Ji6yEuNCNAh0FBWTaLYoakN8M1JAJCUYFrXeWp8gYVm
 4yZETXO64iGNYy4wz9tD5fohC3xo1t9WRcskBV97uNrntDQlagoEdAnh1VF2K3yC
 SAwF3O8J60xh7osNx5YE4ENXynvh7UaAc75kliSsWoZKoTb1TFlyu8abE7NQRRXc
 9fjzwB6tQ6BJRFpmQGF6RHAhuoddqm9nPaOYOfQ/wXbVV1ajYvXmK+eGhHfR6/VO
 YYzhXksd9LGX34RWAcs9lJbV+EjG4buYnSThkvCYcPs2Ys+JppwlPYSd9sC8vqgP
 6D7slAoa8rh99WWz4mZ7ZuOveiOUS3Yzie5Vms2Dlwl/kHW5E1WpeEw+fXAqq08M
 m83whU2cod/oUvJtcuYFLoNAlhYYngVOI9XGgdM+eL/dpdKjTpyH9JHHEGj/Nejr
 W7Kts9CLLBShNKR8Wo2fyTu1n9dwY/eFVA1P48Mt03345G/fMNtPxy+M1Rt6LHQ2
 fmeBSU1P6mqoFeji4xCRXdJ4oDveNnHlyW9J9QJGXG44mN89PCc=
 =EW4i
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu,
  it's xe and nouveau then a few scattered core fixes.

  core:
   - fix rounding in drm_fixp2int_round()

  bridge:
   - fix documentation for DRM_BRIDGE_OP_EDID

  sun4i:
   - fix 64-bit division on 32-bit architectures

  tests:
   - fix dependency on DRM_KMS_HELPER

  probe-helper:
   - never return negative values from .get_modes() plus driver fixes

  xe:
   - invalidate userptr vma on page pin fault
   - fail early on sysfs file creation error
   - skip VMA pinning on xe_exec if no batches

  nouveau:
   - clear bo resource bus after eviction
   - documentation fixes
   - don't check devinit disable on GSP

  amdgpu:
   - Freesync fixes
   - UAF IOCTL fixes
   - Fix mmhub client ID mapping
   - IH 7.0 fix
   - DML2 fixes
   - VCN 4.0.6 fix
   - GART bind fix
   - GPU reset fix
   - SR-IOV fix
   - OD table handling fixes
   - Fix TA handling on boards without display hardware
   - DML1 fix
   - ABM fix
   - eDP panel fix
   - DPPCLK fix
   - HDCP fix
   - Revert incorrect error case handling in ioremap
   - VPE fix
   - HDMI fixes
   - SDMA 4.4.2 fix
   - Other misc fixes

  amdkfd:
   - Fix duplicate BO handling in process restore"

* tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
  drm/amdgpu/pm: Don't use OD table on Arcturus
  drm/amdgpu: drop setting buffer funcs in sdma442
  drm/amd/display: Fix noise issue on HDMI AV mute
  drm/amd/display: Revert Remove pixle rate limit for subvp
  Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
  Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
  drm/amd/display: Add a dc_state NULL check in dc_state_release
  drm/amd/display: Return the correct HDCP error code
  drm/amd/display: Implement wait_for_odm_update_pending_complete
  drm/amd/display: Lock all enabled otg pipes even with no planes
  drm/amd/display: Amend coasting vtotal for replay low hz
  drm/amd/display: Fix idle check for shared firmware state
  drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane
  drm/amd/display: Init DPPCLK from SMU on dcn32
  drm/amd/display: Add monitor patch for specific eDP
  drm/amd/display: Allow dirty rects to be sent to dmub when abm is active
  drm/amd/display: Override min required DCFCLK in dml1_validate
  drm/amdgpu: Bypass display ta if display hw is not available
  drm/amdgpu: correct the KGQ fallback message
  drm/amdgpu/pm: Check the validity of overdiver power limit
  ...
2024-03-21 19:04:31 -07:00
Hawking Zhang a61e2ce9d4 drm/amdgpu: Enable smuio v14_0_2 callbacks
Enable smuio v14_0_2_callbacks

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:16 -04:00
Hawking Zhang d80e44a34e drm/amdgpu: Add smuio callback to get gpu clk counter
Add smuio callback to get gpu clk counter

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:16 -04:00
Hawking Zhang 2d93151de8 drm/amdgpu: Add smuio v14_0_2 ip block support
Add smuio v14_0_2 ip block support

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:16 -04:00
Yang Wang b93d759f54 drm/amdgpu: add umc v12.0.0 deferred error support
add umc v12.0.0 deferred error support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Yang Wang 865d339763 drm/amdgpu: add aca deferred error type support
add aca deferred error type support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Tao Zhou 2fc46e0b2f drm/amdgpu: make reset method configurable for RAS poison
Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.

v2: remove the mmhub poison support for kfd int v10.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Yang Wang e3d4de8d8b drm/amdgpu: retire unused aca_bank_report data structure
retire unused aca_bank_report data structure.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Candice Li f26c4e3fc9 drm/amdgpu: Update setting EEPROM table version
Use helper function instead of umc callback to set
EEPROM table version.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Yang Wang 69bf42fbb2 drm/amdgpu: refine aca error cache for umc v12.0
refine aca error cache for umc v12.0

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Yang Wang 87428b4054 drm/amdgpu: refine aca error cache for sdma v4.4.2
refine aca error cache for sdma v4.4.2

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Yang Wang 62d2aaa7d4 drm/amdgpu: refine aca error cache for xgmi v6.4.0
refine aca error cache for xgmi v6.4.0

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:15 -04:00
Tao Zhou d8070c4241 drm/amdgpu: support utcl2 RAS poison query for mmhub
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Tao Zhou 176c3e8956 drm/amdgpu: add utcl2 RAS poison query for mmhub
Add it for mmhub v1.8.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Yang Wang 5275114a70 drm/amdgpu: refine aca error cache for mmhub v1.8
refine aca error cache for mmhub v1.8

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Christian Koenig d8a3f0a034 drm/amdgpu: implement TLB flush fence
The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
    - rebase
    - set dma_fence_error only in case of error
    - add tlb_flush fence only when PT/PD BO is locked (Felix)
    - use vm->pasid when f is NULL (Mukul)

V4: - add a wait for (f->dependency) in tlb_fence_work (Christian)
    - move the misplaced fence_create call to the end (Philip)

V5: - free the f->dependency properly

V6: (Shashank)
    - light code movement, moved all the clean-up in previous patch
    - introduce params.needs_flush and its usage in this patch
    - rebase without TLB HW sequence patch

V7:
   - Keep the vm->last_update_fence and tlb_cb code until
     we can fix the HW sequencing (Christian)
   - Move all the tlb_fence related code in a separate function so that
     its easier to read and review

V9: Addressed review comments from Christian
    - start PT update only when we have callback memory allocated

V10:
    - handle device unlock in OOM case (Christian, Mukul)
    - added Christian's R-B

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Yang Wang e6136150cd drm/amdgpu: refine aca error cache for gfx v9.4.3
refine aca error cache for gfx 9.4.3

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Yang Wang 949899cbac drm/amdgpu: add new api to save error count into aca cache
add new api to save error count into aca cache.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Yang Wang abc3b5d21d drm/amdgpu: add new aca_smu_type support
Add new types to distinguish between ACA error type and smu mca type.

e.g.:
the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank
channel, so add new type 'aca_smu_type' to distinguish aca error type
and smu mca type.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:14 -04:00
Sunil Khatri 6fe4dab331 drm/amdgpu: remove the adev check for NULL
adev is a global data structure and isn't expected
to be NULL and hence removing the redundant adev
check from the devcoredump code.

Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
Likun Gao 3cfaadbe0f drm/amdgpu: add support for atom fw version v3_5
Support for atom_firmware_info_v3_5.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
Hawking Zhang 765bea0d73 drm/amdgpu: Apply retry to IP discovery v2 and v4
To ensure GPU driver touch the local framebuffer until
it is initialized by integrated firmware.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
Yang Wang 9dc57c2adf drm/amdgpu: add ras event id support
add amdgpu ras event id support to better distinguish different
error information sources in dmesg logs.

the following log will be identify by event id:
{event_id} interrupt to inform RAS event
{event_id} ACA logs
{event_id} errors statistic since from current injection/error query
{event_id} errors statistic since from gpu load

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
Zhigang Luo ab66c83284 drm/amdgpu: trigger flr_work if reading pf2vf data failed
if reading pf2vf data failed 30 times continuously, it means something is
wrong. Need to trigger flr_work to recover the issue.

also use dev_err to print the error message to get which device has
issue and add warning message if waiting IDH_FLR_NOTIFICATION_CMPL
timeout.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:13 -04:00
Sunil Khatri d72e2bdac4 drm/amdgpu: add the hw_ip version of all IP's
Add all the IP's version information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:11 -04:00
Victor Skvortsov 6bb89d1340 drm/amdgpu: Skip virt_exchange_init on SDMA poison consumption
Host will initiate an FLR in SDMA poison consumption scenario.
Guest should wait for FLR message to re-init data exchange.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:38 -04:00
Lijo Lazar dfe9c3cde2 drm/amdgpu: Do a basic health check before reset
Check if the device is present in the bus before trying to recover. It
could be that device itself is lost from the bus in some hang
situations.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:38 -04:00
Shashank Sharma 97d5aa6030 drm/amdgpu: cleanup unused variable
This patch removes an unused input variable in the MES
doorbell function.

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <Christian.Koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:37 -04:00
Tao Zhou 0c501d3c11 drm/amdgpu: skip GFX FED error in page fault handling
Let kfd interrupt handler process it.

v2: return 0 instead of 1 for fed error.
drop the usage of strcmp in interrupt handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Tao Zhou 71a8d61ebc drm/amdgpu: retire gfx ras query_utcl2_poison_status
Replace it with related interface in gfxhub functions.

v2: replace node id with xcc id.
    get node id for query_utcl2_poison_status

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Sunil Khatri 3eb899c40a drm/amdgpu: add ring buffer information in devcoredump
Add relevant ringbuffer information such as
rptr, wptr,rb mask, ring name, ring size and also
the rings content for each ring on a gpu reset.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Sunil Khatri 583681d4a4 drm/amdgpu: add vm fault information to devcoredump
Add page fault information to the devcoredump.

Output of devcoredump:
**** AMDGPU Device Coredump ****
version: 1
kernel: 6.7.0-amd-staging-drm-next
module: amdgpu
time: 29.725011811
process_name: soft_recovery_p PID: 1720

Ring timed out details
IP Type: 0 Ring Name: gfx_0.0.0

[gfxhub] Page fault observed
Faulty page starting at address: 0x0000000000000000
Protection fault status register: 0x301031

VRAM is lost due to GPU reset!

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Tao Zhou fb0f5f5414 drm/amdgpu: add utcl2 poison query for gfxhub
Implement it for gfxhub 1.0 and 1.2.

v2: input logical xcc id for poison query interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Sunil Khatri dc406d92a0 drm/amdgpu: add recent pagefault info in vm_manager
Currently page fault information is stored per
vm and which could be freed or stale during
reset. Add it pagefault information in the
vm_manager which is a global space for vm's
and remains valid across.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:37:36 -04:00
Le Ma ad550dbe8a drm/amdgpu: drop setting buffer funcs in sdma442
To fix the entity rq NULL issue. This setting has been moved
to upper level.

Fixes: b70438004a ("drm/amdgpu: move buffer funcs setting up a level")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:36:29 -04:00
Lang Yu 1b7eec6bf3 Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
Ready now. Remove this workaround.
This reverts commit d40f6213b5.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Tested-by: Alan Liu <haoping.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:59 -04:00
Ma Jun 03c6284df1 Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
This patch causes the following iounmap erorr and calltrace
iounmap: bad address 00000000d0b3631f

The original patch was unjustified because amdgpu_device_fini_sw() will
always cleanup the rmmio mapping.

This reverts commit eb4f139888.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:59 -04:00
Hawking Zhang 9b3fec307f drm/amdgpu: Bypass display ta if display hw is not available
Do not load/invoke display TA if display hardware
is not available.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:58 -04:00
Prike Liang 43bda3e782 drm/amdgpu: correct the KGQ fallback message
Fix the KGQ fallback function name, as this will
help differentiate the failure in the KCQ enablement.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:58 -04:00
ZhenGuo Yin 56b30ac84c drm/amdgpu: Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV
[Why]
RLCG interface returns "out-of-range" error under SRIOV VF when accessing
PF-only registers.

[How]
Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Ahmad Rehman f679fd6057 drm/amdgpu: Init zone device and drm client after mode-1 reset on reload
In passthrough environment, when amdgpu is reloaded after unload, mode-1
is triggered after initializing the necessary IPs, That init does not
include KFD, and KFD init waits until the reset is completed. KFD init
is called in the reset handler, but in this case, the zone device and
drm client is not initialized, causing app to create kernel panic.

v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create.
As the previous version has the potential of creating DRM client twice.

v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA
before SDMA init. Adding the condition to in drm client creation, on top of v1,
to guard against drm client creation call multiple times.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Philip Yang 6c6064cbe5 drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
Otherwise after the GTT bo is released, the GTT and gart space is freed
but amdgpu_ttm_backend_unbind will not clear the gart page table entry
and leave valid mapping entry pointing to the stale system page. Then
if GPU access the gart address mistakely, it will read undefined value
instead page fault, harder to debug and reproduce the real issue.

Cc: stable@vger.kernel.org
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Saleemkhan Jamadar 6a7cbbc267 drm/amdgpu/vcn: enable vcn1 fw load for VCN 4_0_6
v1 - update the fw header for each vcn instance (Veera)

VCN1 has different FW binary in VCN v4_0_6.
Add changes to load the VCN1 fw binary

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Friedrich Vock c6ba60af01 drm/amdgpu: Reset IH OVERFLOW_EN bit for IH 7.0
IH 7.0 support landed shortly after the original patch for resetting the
bit on all other generations, but without that patch applied.

Fixes: 12443fc53e ("drm/amdgpu: Add ih v7_0 ip block support")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Lang Yu 6540ff6482 drm/amdgpu: fix mmhub client id out-of-bounds access
Properly handle cid 0x140.

Fixes: aba2be4147 ("drm/amdgpu: add mmhub 3.3.0 support")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:57 -04:00
Vitaly Prosyak 22207fd5c8 drm/amdgpu: fix use-after-free bug
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>.
For example the following code:

static void Syzkaller1(int fd)
{
	struct drm_amdgpu_gem_userptr arg;
	int ret;

	arg.addr = 0xffffffffffff0000;
	arg.size = 0x80000000; /*2 Gb*/
	arg.flags = 0x7;
	ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg);
}

Due to the address and size are not valid there is a failure in
amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert->
check_shl_overflow, but we even the amdgpu_hmm_register failure we still call
amdgpu_hmm_unregister into  amdgpu_gem_object_free which causes access to a bad address.
The following stack is below when the issue is reproduced when Kazan is enabled:

[  +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[  +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340
[  +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80
[  +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246
[  +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b
[  +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260
[  +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25
[  +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00
[  +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260
[  +0.000011] FS:  00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000
[  +0.000012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0
[  +0.000010] Call Trace:
[  +0.000006]  <TASK>
[  +0.000007]  ? show_regs+0x6a/0x80
[  +0.000018]  ? __warn+0xa5/0x1b0
[  +0.000019]  ? mmu_interval_notifier_remove+0x327/0x340
[  +0.000018]  ? report_bug+0x24a/0x290
[  +0.000022]  ? handle_bug+0x46/0x90
[  +0.000015]  ? exc_invalid_op+0x19/0x50
[  +0.000016]  ? asm_exc_invalid_op+0x1b/0x20
[  +0.000017]  ? kasan_save_stack+0x26/0x50
[  +0.000017]  ? mmu_interval_notifier_remove+0x23b/0x340
[  +0.000019]  ? mmu_interval_notifier_remove+0x327/0x340
[  +0.000019]  ? mmu_interval_notifier_remove+0x23b/0x340
[  +0.000020]  ? __pfx_mmu_interval_notifier_remove+0x10/0x10
[  +0.000017]  ? kasan_save_alloc_info+0x1e/0x30
[  +0.000018]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? __kasan_kmalloc+0xb1/0xc0
[  +0.000018]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? __kasan_check_read+0x11/0x20
[  +0.000020]  amdgpu_hmm_unregister+0x34/0x50 [amdgpu]
[  +0.004695]  amdgpu_gem_object_free+0x66/0xa0 [amdgpu]
[  +0.004534]  ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu]
[  +0.004291]  ? do_syscall_64+0x5f/0xe0
[  +0.000023]  ? srso_return_thunk+0x5/0x5f
[  +0.000017]  drm_gem_object_free+0x3b/0x50 [drm]
[  +0.000489]  amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu]
[  +0.004295]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[  +0.004270]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? __this_cpu_preempt_check+0x13/0x20
[  +0.000015]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[  +0.000020]  ? srso_return_thunk+0x5/0x5f
[  +0.000014]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  +0.000022]  ? drm_ioctl_kernel+0x17b/0x1f0 [drm]
[  +0.000496]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[  +0.004272]  ? drm_ioctl_kernel+0x190/0x1f0 [drm]
[  +0.000492]  drm_ioctl_kernel+0x140/0x1f0 [drm]
[  +0.000497]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[  +0.004297]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.000489]  ? srso_return_thunk+0x5/0x5f
[  +0.000011]  ? __kasan_check_write+0x14/0x20
[  +0.000016]  drm_ioctl+0x3da/0x730 [drm]
[  +0.000475]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
[  +0.004293]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.000506]  ? __pfx_rpm_resume+0x10/0x10
[  +0.000016]  ? srso_return_thunk+0x5/0x5f
[  +0.000011]  ? __kasan_check_write+0x14/0x20
[  +0.000010]  ? srso_return_thunk+0x5/0x5f
[  +0.000011]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.000015]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.000014]  ? srso_return_thunk+0x5/0x5f
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000011]  ? srso_return_thunk+0x5/0x5f
[  +0.000011]  ? preempt_count_sub+0x18/0xc0
[  +0.000013]  ? srso_return_thunk+0x5/0x5f
[  +0.000010]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  +0.000019]  amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu]
[  +0.004272]  __x64_sys_ioctl+0xcd/0x110
[  +0.000020]  do_syscall_64+0x5f/0xe0
[  +0.000021]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  +0.000015] RIP: 0033:0x7ff9ed31a94f
[  +0.000012] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  +0.000013] RSP: 002b:00007fff25f66790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000016] RAX: ffffffffffffffda RBX: 000055b3f7e133e0 RCX: 00007ff9ed31a94f
[  +0.000012] RDX: 000055b3f7e133e0 RSI: 00000000c1186451 RDI: 0000000000000003
[  +0.000010] RBP: 00000000c1186451 R08: 0000000000000000 R09: 0000000000000000
[  +0.000009] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fff25f66ca8
[  +0.000009] R13: 0000000000000003 R14: 000055b3f7021ba8 R15: 00007ff9ed7af040
[  +0.000024]  </TASK>
[  +0.000007] ---[ end trace 0000000000000000 ]---

v2: Consolidate any error handling into amdgpu_hmm_register
    which applied to kfd_bo also. (Christian)
v3: Improve syntax and comment (Christian)

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr>
Cc: Dokyung Song <dokyungs@yonsei.ac.kr>
Cc: <jisoo.jang@yonsei.ac.kr>
Cc: <yw9865@yonsei.ac.kr>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:56 -04:00
Mukul Joshi 71b9d19220 drm/amdgpu: Handle duplicate BOs during process restore
In certain situations, some apps can import a BO multiple times
(through IPC for example). To restore such processes successfully,
we need to tell drm to ignore duplicate BOs.
While at it, also add additional logging to prevent silent failures
when process restore fails.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:12:56 -04:00
Linus Torvalds e5eb28f6d1 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
 
 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".
 
 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits.  The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".
 
 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".
 
 - Ryusuke Konishi continues nilfs2 maintenance work in the series
 
 	"nilfs2: eliminate kmap and kmap_atomic calls"
 	"nilfs2: fix kernel bug at submit_bh_wbc()"
 
 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".
 
 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".
 
 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".
 
 Plus the usual shower of singleton patches in various parts of the tree.
 Please see the individual changelogs for details.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfMnvgAKCRDdBJ7gKXxA
 jjKMAP4/Upq07D4wjkMVPb+QrkipbbLpdcgJ++q3z6rba4zhPQD+M3SFriIJk/Xh
 tKVmvihFxfAhdDthseXcIf1nBjMALwY=
 =8rVc
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
   heap optimizations".

 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".

 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits. The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".

 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".

 - Ryusuke Konishi continues nilfs2 maintenance work in the series

	"nilfs2: eliminate kmap and kmap_atomic calls"
	"nilfs2: fix kernel bug at submit_bh_wbc()"

 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".

 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".

 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".

Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.

* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  nilfs2: prevent kernel bug at submit_bh_wbc()
  nilfs2: fix failure to detect DAT corruption in btree and direct mappings
  ocfs2: enable ocfs2_listxattr for special files
  ocfs2: remove SLAB_MEM_SPREAD flag usage
  assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
  buildid: use kmap_local_page()
  watchdog/core: remove sysctl handlers from public header
  nilfs2: use div64_ul() instead of do_div()
  mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
  kexec: copy only happens before uchunk goes to zero
  get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
  get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
  get_signal: don't abuse ksig->info.si_signo and ksig->sig
  const_structs.checkpatch: add device_type
  Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
  dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
  list: leverage list_is_head() for list_entry_is_head()
  nilfs2: MAINTAINERS: drop unreachable project mirror site
  smp: make __smp_processor_id() 0-argument macro
  fat: fix uninitialized field in nostale filehandles
  ...
2024-03-14 18:03:09 -07:00
Dave Airlie 119b225f01 amd-drm-next-6.9-2024-03-08-1:
amdgpu:
 - DCN 3.5.1 support
 - Fixes for IOMMUv2 removal
 - UAF fix
 - Misc small fixes and cleanups
 - SR-IOV fixes
 - MCBP cleanup
 - devcoredump update
 - NBIF 6.3.1 support
 - VPE 6.1.1 support
 
 amdkfd:
 - Misc fixes and cleanups
 - GFX10.1 trap fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZetE3AAKCRC93/aFa7yZ
 2K4iAQC9mqfalMDJQegU7lUHUFzklfyWMMwrxt6Ull9HjaIcBAEAvKzRdXGWl64j
 R7qt7aF8/U/90RuplIyukCBbrh7FEQI=
 =bDXu
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-03-08-1:

amdgpu:
- DCN 3.5.1 support
- Fixes for IOMMUv2 removal
- UAF fix
- Misc small fixes and cleanups
- SR-IOV fixes
- MCBP cleanup
- devcoredump update
- NBIF 6.3.1 support
- VPE 6.1.1 support

amdkfd:
- Misc fixes and cleanups
- GFX10.1 trap fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240308170741.3691166-1-alexander.deucher@amd.com
2024-03-11 13:32:12 +10:00
Thomas Zimmermann d6eb77731c Merge drm/drm-next into drm-misc-next
Backmerging to get the latest fixes from drm-next; specifically the
build fix from the patchset at [1]. Also fixes the build by removing
an unused variable from rzg2l_du_vsp_atomic_flush().

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/series/130720/ # 1
2024-03-08 10:53:46 +01:00
Dave Airlie af165fb00a amd-drm-next-6.9-2024-03-01:
amdgpu:
 - GC 11.5.1 updates
 - Misc display cleanups
 - NBIO 7.9 updates
 - Backlight fixes
 - DMUB fixes
 - MPO fixes
 - atomfirmware table updates
 - SR-IOV fixes
 - VCN 4.x updates
 - use RMW accessors for pci config registers
 - PSR fixes
 - Suspend/resume fixes
 - RAS fixes
 - ABM fixes
 - Misc code cleanups
 - SI DPM fix
 - Revert freesync video
 
 amdkfd:
 - Misc cleanups
 - Error handling fixes
 
 radeon:
 - use RMW accessors for pci config registers
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZeI9fgAKCRC93/aFa7yZ
 2DmCAQCVUsteinfHeJiyd7+pWnZdJVKzy5fRayvfOV8InHapawD/RAMlJhxy08IO
 B8tw7HY+YqRqW12T3klwRiBTk3Iw9Ac=
 =vBW4
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-03-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-03-01:

amdgpu:
- GC 11.5.1 updates
- Misc display cleanups
- NBIO 7.9 updates
- Backlight fixes
- DMUB fixes
- MPO fixes
- atomfirmware table updates
- SR-IOV fixes
- VCN 4.x updates
- use RMW accessors for pci config registers
- PSR fixes
- Suspend/resume fixes
- RAS fixes
- ABM fixes
- Misc code cleanups
- SI DPM fix
- Revert freesync video

amdkfd:
- Misc cleanups
- Error handling fixes

radeon:
- use RMW accessors for pci config registers

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-03-08 11:21:13 +10:00
lima1002 7c5fde53b1 drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.1
Set the default reset method to mode2 for SMU IP v14.0.1

Signed-off-by: lima1002 <li.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:33:25 -05:00
Alex Deucher 155d46835c drm/amdgpu: add VPE 6.1.1 discovery support
Enable VPE 6.1.1.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:33:16 -05:00
Lang Yu f9070b0f2f drm/amdgpu/vpe: add VPE 6.1.1 support
Add initial support for VPE 6.1.1.

v2: squash in updates (Alex)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:33:10 -05:00
Lang Yu d40f6213b5 drm/amdgpu/vpe: don't emit cond exec command under collaborate mode
Not ready now.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:33:04 -05:00
Lang Yu 26f5f34e6e drm/amdgpu/vpe: add collaborate mode support for VPE
Under clollaborate mode, multiple VPE instances share a ring buferr
and work together to finish a job.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:33:01 -05:00
Lang Yu 72f4ae0a64 drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODE
To support multi VPE collaborate mode.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:32:58 -05:00
Lang Yu 709ef39f95 drm/amdgpu/vpe: add multi instance VPE support
Add support for multi instance VPE processing.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:32:48 -05:00
Likun Gao 79698b145f drm/amdgpu/discovery: add nbif v6_3_1 ip block
Add nbif v6_3_1 ip block.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:32:45 -05:00
Hawking Zhang 894c6d3522 drm/amdgpu: Add nbif v6_3_1 ip block support
Add nbif v6_3_1 ip block support.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-07 15:32:42 -05:00
Sunil Khatri 5e592956cc drm/amdgpu: add ring timeout information in devcoredump
Add ring timeout related information in the amdgpu
devcoredump file for debugging purposes.

During the gpu recovery process the registered call
is triggered and add the debug information in data
file created by devcoredump framework under the
directory /sys/class/devcoredump/devcdx/

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-06 15:24:50 -05:00
Yifan Zhang 2bdebcb1e4 drm/amdgpu: add dcn3.5.1 support
This patch to add dcn3.5.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-06 15:24:50 -05:00
Pierre-Eric Pelloux-Prayer bf909454fe drm/amdgpu: disable ring_muxer if mcbp is off
Using the ring_muxer without preemption adds overhead for no
reason since mcbp cannot be triggered.

Moving back to a single queue in this case also helps when
high priority app are used: in this case the gpu_scheduler
priority handling will work as expected - much better than
ring_muxer with its 2 independant schedulers competing for
the same hardware queue.

This change requires moving amdgpu_device_set_mcbp above
amdgpu_device_ip_early_init because we use adev->gfx.mcbp.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-06 15:24:49 -05:00
Jesse Zhang bb8863cc9d drm/amdgpu: remove unused code
Remove the unused function - amdgpu_vm_pt_is_root_clean
and remove the impossible condition

v1: entries == 0 is not possible any more,
    so this condition could probably be removed (Felix)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by:Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-06 15:24:24 -05:00
Christian König 8bc75586ea drm/amdgpu: workaround to avoid SET_Q_MODE packets v2
It turned out that executing the SET_Q_MODE packet on every submission
creates to much overhead.

Implement a workaround which allows skipping the SET_Q_MODE packet if
subsequent submissions all use the same parameters.

v2: add a NULL check for ring_obj

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:59:08 -05:00
Christian König c68cbbfd54 drm/amdgpu: cleanup conditional execution
First of all calculating the number of dw to patch into a
conditional execution is not something HW generation specific.
This is just standard ring buffer calculations. While at it also
reduce the BUG_ON() into WARN_ON().

Then instead of a random bit pattern use 0 as default value for
the number of dw skipped, this way it's not mandatory any more
to patch the conditional execution.

And last make the address to check a parameter of the
conditional execution instead of getting this from the ring.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:59:08 -05:00
Ma Jun 86e14a7386 drm/amdgpu: Use rpm_mode flag instead of checking it again for rpm
Because the rpm_mode flag is already set when the driver
is initialized, we use it directly for runtime suspend/resume
instead of checking it again

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:59:08 -05:00
Shashank Sharma b8f67b9ddf drm/amdgpu: change vm->task_info handling
This patch changes the handling and lifecycle of vm->task_info object.
The major changes are:
- vm->task_info is a dynamically allocated ptr now, and its uasge is
  reference counted.
- introducing two new helper funcs for task_info lifecycle management
    - amdgpu_vm_get_task_info: reference counts up task_info before
      returning this info
    - amdgpu_vm_put_task_info: reference counts down task_info
- last put to task_info() frees task_info from the vm.

This patch also does logistical changes required for existing usage
of vm->task_info.

V2: Do not block all the prints when task_info not found (Felix)

V3: Fixed review comments from Felix
   - Fix wrong indentation
   - No debug message for -ENOMEM
   - Add NULL check for task_info
   - Do not duplicate the debug messages (ti vs no ti)
   - Get first reference of task_info in vm_init(), put last
     in vm_fini()

V4: Fixed review comments from Felix
   - fix double reference increment in create_task_info
   - change amdgpu_vm_get_task_info_pasid
   - additional changes in amdgpu_gem.c while porting

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:59:08 -05:00
Jesse Zhang feb13f52c8 Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven
fix the issue:
"amdgpu: Failed to create process VM object".

[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table.
But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

v1:
  - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04 15:58:45 -05:00
Christian König 216c1282dd drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations.

This fixes the issue that when VRAM was evacuated for suspend it's never
filled up again unless the application is restarted.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240229134003.3688-2-christian.koenig@amd.com
2024-03-01 17:12:26 +01:00
Bjorn Helgaas b07395d5d5 drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment
After 4020c22802 ("drm/amdgpu: don't runtime suspend if there are
displays attached (v3)"), "ret" is unconditionally set later before being
used, so there's point in initializing it and the associated comment is no
longer meaningful.

Remove the comment and the unnecessary initialization.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-29 20:35:39 -05:00
Alex Deucher 959143dab1 Revert "drm/amd: Remove freesync video mode amdgpu parameter"
This reverts commit e94e787e37.

This conflicts with how compositors want to handle VRR.  Now
that compositors actually handle VRR, we probably don't need
freesync video.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2985
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-29 20:35:31 -05:00
Tao Zhou 2c684b9342 drm/amdgpu: add deferred error check for UMC v12 address query
Both RAS UE and deferred errors need page retirement.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-29 20:35:14 -05:00
Eric Huang 1761d9a688 amd/amdkfd: remove unused parameter
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev),
and adev is also not used in the function
amdgpu_amdkfd_map_gtt_bo_to_gart().

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-28 17:10:53 -05:00
Srinivasan Shanmugam eb4f139888 drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()
This ensures that the memory mapped by ioremap for adev->rmmio, is
properly handled in amdgpu_device_init(). If the function exits early
due to an error, the memory is unmapped. If the function completes
successfully, the memory remains mapped.

Reported by smatch:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4337 amdgpu_device_init() warn: 'adev->rmmio' from ioremap() not released on lines: 4035,4045,4051,4058,4068,4337

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-27 11:06:58 -05:00
Srinivasan Shanmugam 7cf1ad2fe1 drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()
Missing break statement in the ATOM_ARG_IMM case of a switch statement,
adds the missing break statement, ensuring that the program's control
flow is as intended.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/atom.c:323 atom_get_src_int() warn: ignoring unreachable code.

Fixes: d38ceaf99e ("drm/amdgpu: add core driver (v4)")
Cc: Jammy Zhou <Jammy.Zhou@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-27 11:06:29 -05:00
Mario Limonciello 0887054d14 drm/amd: Drop abm_level property
This vendor specific property has never been used by userspace
software and conflicts with the panel_power_savings sysfs file.
That is a compositor and user could fight over the same data.

Fixes: 63d0b87213 ("drm/amd/display: add panel_power_savings sysfs entry to eDP connectors")
Suggested-by: Harry Wentland <Harry.Wentland@amd.com>
Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
Cc: "Sun peng Li (Leo)" <Sunpeng.Li@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-27 10:46:59 -05:00
Tim Huang 93d64097f7 drm/amdgpu: reserve more memory for MES runtime DRAM
This patch fixes a MES firmware boot failure issue
when backdoor loading the MES firmware.

MES firmware runtime DRAM size is changed to 512k,
the driver needs to reserve this amount of memory in
FB, otherwise adjacent memory will be overwritten by
the MES firmware startup code.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:15:32 -05:00
Prike Liang 63fcd306c0 drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
Currently, GPU resets can now be performed successfully on the Raven
series. While GPU reset is required for the S3 suspend abort case.
So now can enable gpu reset for S3 abort cases on the Raven series.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:15:25 -05:00
Victor Lu 56f7d2ac6d drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOV
VF should not program this register.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:14:46 -05:00
Stanley.Yang 7ec11c2f65 drm/amdgpu: Fix ineffective ras_mask settings
Check amdgpu_ras_mask to fix ineffective ras_mask setting
due to special asic without sram ecc enable but with poison
supported.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:14:37 -05:00
Lijo Lazar e1f6746f33 drm/amdkfd: Skip packet submission on fatal error
If fatal error is detected, packet submission won't go through. Return
error in such cases. Also, avoid waiting for fence when fatal error is
detected.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:14:31 -05:00
Lijo Lazar 1b6ef74b2b drm/amdgpu: Add fatal error detected flag
For a RAS error that needs a full reset to recover, set the fatal error
status. Clear the status once the device is reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:14:24 -05:00
Ma Jun f435b5156b drm/amdgpu: Fix the runtime resume failure issue
Don't set power state flag when system enter runtime suspend,
or it may cause runtime resume failure issue.

Fixes: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-26 11:11:46 -05:00
Daniel Vetter f112b68f27 Linux 6.8-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmXb0T4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5YQH/3eCV90sNGch0Y94
 8rtTdqFrVx7QPNl0pz+Mo6OUIKUUHvTuwime16ckLxG+3x2Y3I0MjP1edd1NB99C
 Kje//JTpaZBPpTZ/jY4u8B1Shov2Drdx/J4NFnE/9rG6yXzKQBtvON/xAxXDCVHT
 mLhst2LR0FeCSMk9jAX6CoqUPEgwlylNyAetKxaDQgoHl4GTZC7FDO17WxyjpIxe
 1rVHsrV9Eq8kD4uxrzpTYWgZrwTObPmlZjvefa1JfzSwRNABIBJj/C1nra1Zc1oi
 b7xVaXS1cMOxrtuuG00fmHsPnWivu0tuND7H3/yLd1mRCZAPSsVbVvrI/KNtoeV4
 1euINlY=
 =7IFt
 -----END PGP SIGNATURE-----

Merge v6.8-rc6 into drm-next

Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches,
there's a few same-area-changed conflicts (xe and amdgpu mostly) that
are getting a bit too annoying.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-02-26 11:41:07 +01:00
Daniel Vetter 71ab34f72f drm-misc-next for v6.9:
UAPI Changes:
 
 - changes to fdinfo stats
 
 Cross-subsystem Changes:
 
 agp:
 - remove unused type field from struct agp_bridge_data
 
 Core Changes:
 
 ci:
 - update test names
 - cleanups
 
 gem:
 - add stats for shared buffers plus updates to amdgpu, i915, xe
 
 Documentation:
 - fixes
 
 syncobj:
 - fixes to waiting and sleeping
 
 Driver Changes:
 
 bridge:
 - adv7511: fix crash on irq during probe
 - dw_hdmi: set bridge type
 
 host1x:
 - cleanups
 
 ivpu:
 - updates to firmware API
 - refactor BO allocation
 
 meson:
 - fix error handling in probe
 
 panel:
 - revert "drm/panel-edp: Add auo_b116xa3_mode"
 - add Himax HX83112A plus DT bindings
 - ltk500hd1829: add support for ltk101b4029w and admatec 9904370
 - simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs
 
 renesas:
 - add RZ/G2L DU support plus DT bindings
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmXXUhsACgkQaA3BHVML
 eiNVFQf+IoOXCACGkWEVmVaen50pjEfLq0OjSGHdbTJqhc9wU7Q/kPC+jEpZLyqo
 OUMdXlA55BeLX52O+bvLordDPNETUsYH1QX2BYKDwcNIrvj8ISXcvdbnDcbVmttD
 ZUaaBgZ0g2M6sZQvTVU88/1RtaG64+zuk9VA1dPlh6WnBtXBUeXNtD6YQjH6xY+a
 MjZpB5VafwJTmQxy7qJ4yTLX291Ao8J2YZK8cCSyEr3FQKkAx9sJyp3hPurVIjLM
 f1y1rtoHhxUV/OVg4M559fp6F6tUkFauv4qu5VUvmPPihJTaU0eSQxir0za4VJ4e
 Jr2GOkju0oRRpKfjd0aKvaoWhl+MNg==
 =aaTQ
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2024-02-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v6.9:

UAPI Changes:

- changes to fdinfo stats

Cross-subsystem Changes:

agp:
- remove unused type field from struct agp_bridge_data

Core Changes:

ci:
- update test names
- cleanups

gem:
- add stats for shared buffers plus updates to amdgpu, i915, xe

Documentation:
- fixes

syncobj:
- fixes to waiting and sleeping

Driver Changes:

bridge:
- adv7511: fix crash on irq during probe
- dw_hdmi: set bridge type

host1x:
- cleanups

ivpu:
- updates to firmware API
- refactor BO allocation

meson:
- fix error handling in probe

panel:
- revert "drm/panel-edp: Add auo_b116xa3_mode"
- add Himax HX83112A plus DT bindings
- ltk500hd1829: add support for ltk101b4029w and admatec 9904370
- simple: add BOE BP082WX1-100 8.2" panel plus DT bindungs

renesas:
- add RZ/G2L DU support plus DT bindings

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222135841.GA6677@localhost.localdomain
2024-02-26 09:51:49 +01:00
Nathan Chancellor 2947a4567f treewide: update LLVM Bugzilla links
LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues.  While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.

Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number.  Thankfully, LLVM maintains this mapping
through two shortlinks:

  https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
  https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>

Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information.  Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.

Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:51 -08:00
Ma Jun bbfaf2aea7 drm/amdgpu: Fix the runtime resume failure issue
Don't set power state flag when system enter runtime suspend,
or it may cause runtime resume failure issue.

Fixes: 3a9626c816 ("drm/amd: Stop evicting resources on APUs in suspend")
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-02-22 12:28:27 -05:00
Yifan Zhang a24029cc40 drm/amdgpu: add vcn 4.0.6 discovery support
This patch is to add vcn 4.0.6 support

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 12:05:23 -05:00
Ilpo Järvinen bb87e511b2 drm/amdgpu: Use RMW accessors for changing LNKCTL2
Convert open coded RMW accesses for LNKCTL2 to use
pcie_capability_clear_and_set_word() which makes its easier to
understand what the code tries to do.

LNKCTL2 is not really owned by any driver because it is a collection of
control bits that PCI core might need to touch. RMW accessors already
have support for proper locking for a selected set of registers
(LNKCTL2 is not yet among them but likely will be in the future) to
avoid losing concurrent updates.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 12:05:20 -05:00
Kunwu Chan a5fc4e5014 drm/amdgpu: Simplify the allocation of sync slab caches
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 12:05:16 -05:00
Veerabadhran Gopalakrishnan 84eaa2c2c6 drm/amdgpu/soc21: Enabling PG and CG flags for VCN 4.0.6
Enabled the VCN Power Gating and Clock Gating flags for VCN 4.0.6.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 12:05:12 -05:00
Kunwu Chan e4e4618bc1 drm/amdgpu: Simplify the allocation of mux_chunk slab caches
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:28:24 -05:00
Kunwu Chan 3d14cb0263 drm/amdgpu: Simplify the allocation of fence slab caches
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create
to simplify the creation of SLAB caches.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:28:19 -05:00
Veerabadhran Gopalakrishnan 437591d237 drm/amdgpu/soc21: Added Video Capabilities for VCN 406
Updated Query Video codecs for VCN 406

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:28:16 -05:00
Veerabadhran Gopalakrishnan 2b53b3668e drm/amdgpu/vcn: Enable VCN 4.0.6 Support
Modified driver to use the appropriate FW files and instance.

v2: squash in fixes (Alex)

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:28:11 -05:00
Saleemkhan Jamadar 07cb7fd0fd drm/amdgpu/jpeg: add support for jpeg multi instance
Enable support for multi instance on JPEG 4.0.6.

v2: squash in fixes (Alex)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:28:04 -05:00
Victor Lu 8f4de8f72e drm/amdgpu: Use correct SRIOV macro for gmc_v9_0_vm_fault_interrupt_state
Under SRIOV, programming to VM_CONTEXT*_CNTL regs failed because the
current macro does not pass through the correct xcc instance.
Use the *REG32_XCC macro in this case.

The behaviour without SRIOV is the same without this patch.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:42 -05:00
Victor Lu bea07b215d drm/amdgpu: Do not program IH_CHICKEN in vega20_ih.c under SRIOV
IH_CHICKEN is blocked for VF writes; this access should be skipped.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:31 -05:00
Victor Lu 8093383ae7 drm/amdgpu: Improve error checking in amdgpu_virt_rlcg_reg_rw (v2)
The current error detection only looks for a timeout.
This should be changed to also check scratch_reg1 for any errors
returned from RLCG.

v2: remove new error value

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:23 -05:00
Yifan Zhang d6a76c0a5a drm/amdgpu: enable MES discovery for GC 11.5.1
This patch to enable MES for GC 11.5.1

Reviewed-by: shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:18 -05:00
Yifan Zhang e2442d3e32 drm/amdgpu: add GC 11.5.1 discovery support
This patch to add GC 11.5.1 support

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:13 -05:00
Tim Huang 455918cf28 drm/amdgpu: enable CGPG for GFX ip v11.5.1
Enable CGPG support for GFX ip v11.5.1

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:07 -05:00
Yifan Zhang 7c15ac1183 drm/amdgpu: initialize gfx11.5.1
Initialize gfx 11.5.0 and set gfx hw configuration.

v2: squash in CG, PG, GFXOFF fixes (Alex)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:03 -05:00
Yifan Zhang 846f7385bf drm/amdgpu: add mes firmware support for GC 11.5.1
This patch to add MES PIPE0 and PIPE1 firmware support for gc_11_5_1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:27:01 -05:00
Yifan Zhang fa744c0dd2 drm/amdgpu: add imu firmware support for GC 11.5.1
This patch is to add imu firmware support for GC 11.5.1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:58 -05:00
Yifan Zhang dad4f543ac drm/amdgpu: add firmware for GC 11.5.1
This patch is to add firmware for GC 11.5.1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:55 -05:00
Yifan Zhang 93c5cc8312 drm/amdgpu: add GC 11.5.1 to GC 11.5.0 family
This patch to add GC 11.5.1 to GC 11.5.0 family.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:52 -05:00
Yifan Zhang f1c40b6ea4 drm/amdgpu: enable soc21 discovery support for GC 11.5.1
This patch to enable soc21 support for GC 11.5.1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:49 -05:00
Yifan Zhang e971995657 drm/amdgpu: add initial GC 11.5.1 soc21 support
Disable clock gating and power gating for now.

v2: squash in revision fix (Alex)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:44 -05:00
Yifan Zhang 278318d371 drm/amdgpu: enable gmc11 discovery support for GC 11.5.1
This patch to enable gmc11 for GC 11.5.1

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:26:40 -05:00
Asad Kamal 5fe4a8d3c6 drm/amdgpu: Remove pcie bw sys entry
Remove pcie bw sys entry for asics not supporting
such function

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:23:45 -05:00
Asad Kamal c607e76e64 Revert "drm/amdgpu: Add pcie usage callback to nbio"
pcie usage is now handled by fw

This reverts commit 8d759dc664.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:21:27 -05:00
Ma Jun 4acd31e6c2 drm/amdgpu: Drop redundant parameter in amdgpu_gfx_kiq_init_ring
Drop redundant parameters in function amdgpu_gfx_kiq_init_ring
to simplify the code

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:17:45 -05:00
Asad Kamal 86a08f1af2 Revert "drm/amdgpu: Add pci usage to nbio v7.9"
Remove implementation to get pcie usage for nbio v7.9
as pcie usage is handled by fw

This reverts commit 59070fd9cc.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:15:26 -05:00
Yifan Zhang 2612c8313f drm/amdgpu: add tmz support for GC IP v11.5.1
Add tmz support for GC 11.5.1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:14:30 -05:00
Hawking Zhang 5c07015619 drm/amdgpu: Do not toggle bif ras irq from guest
Only do this from host side.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:14:24 -05:00
Yifan Zhang 46e5de77b3 drm/amdgpu: add GFXHUB 11.5.1 support
This patch to add GFXHUB 11.5.1  support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-22 10:14:17 -05:00
Dave Airlie 40d47c5fb4 amd-drm-next-6.9-2024-02-19:
amdgpu:
 - ATHUB 4.1 support
 - EEPROM support updates
 - RAS updates
 - LSDMA 7.0 support
 - JPEG DPG support
 - IH 7.0 support
 - HDP 7.0 support
 - VCN 5.0 support
 - Misc display fixes
 - Retimer fixes
 - DCN 3.5 fixes
 - VCN 4.x fixes
 - PSR fixes
 - PSP 14.0 support
 - VA_RESERVED cleanup
 - SMU 13.0.6 updates
 - NBIO 7.11 updates
 - SDMA 6.1 updates
 - MMHUB 3.3 updates
 - Suspend/resume fixes
 - DMUB updates
 
 amdkfd:
 - Trap handler enhancements
 - Fix cache size reporting
 - Relocate the trap handler
 
 radeon:
 - fix typo in print statement
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZdPLUgAKCRC93/aFa7yZ
 2AakAQDmhlQkJAxIxJw4/5mEQY5zaMJ033lcZGzBQbj8uL42pQD/aQ/gdN/bOfPZ
 gsdidzgL5MThBOFfw72pBkEoE+kQXgc=
 =oP2J
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-02-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-02-19:

amdgpu:
- ATHUB 4.1 support
- EEPROM support updates
- RAS updates
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- Misc display fixes
- Retimer fixes
- DCN 3.5 fixes
- VCN 4.x fixes
- PSR fixes
- PSP 14.0 support
- VA_RESERVED cleanup
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- Suspend/resume fixes
- DMUB updates

amdkfd:
- Trap handler enhancements
- Fix cache size reporting
- Relocate the trap handler

radeon:
- fix typo in print statement

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com
2024-02-22 13:21:19 +10:00
Yifan Zhang 31e0a586f3 drm/amdgpu: add MMHUB 3.3.1 support
This patch to add MMHUB 3.3.1 support.

v2: squash in fault info fix (Alex)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-19 14:50:46 -05:00
Mario Limonciello 2bb2ad58f6 drm/amd: Change `jpeg_v4_0_5_start_dpg_mode()` to void
jpeg_v4_0_5_start_dpg_mode() always returns 0 and the return value
doesn't get used in the caller jpeg_v4_0_5_start(). Modify the
function to be void.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1583635 ("Code maintainability issues")
Fixes: 0a119d53f7 ("drm/amdgpu/jpeg: add support for jpeg DPG mode")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:44:30 -05:00
Srinivasan Shanmugam 6f18d7ad9d drm/amdgpu: Fix missing parameter descriptions in ih_v7_0.c
Rectifies kdoc warnings related to the 'ih' parameter in the
'ih_v7_0_get_wptr', 'ih_v7_0_irq_rearm', and 'ih_v7_0_set_rptr'
functions within the 'ih_v7_0.c' file.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:392: warning: Function parameter or member 'ih' not described in 'ih_v7_0_get_wptr'
drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:432: warning: Function parameter or member 'ih' not described in 'ih_v7_0_irq_rearm'
drivers/gpu/drm/amd/amdgpu/ih_v7_0.c:458: warning: Function parameter or member 'ih' not described in 'ih_v7_0_set_rptr'

Fixes: 12443fc53e ("drm/amdgpu: Add ih v7_0 ip block support")
Cc: Likun Gao <Likun.Gao@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:26 -05:00
Yifan Zhang a02cfac90f drm/amdgpu: add SDMA 6.1.1 discovery support
This patch to add SDMA 6.1.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:23 -05:00
Yifan Zhang c40797d320 drm/amdgpu: add sdma 6.1.1 firmware
This patch to add sdma 6.1.1 firmware declaration.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:21 -05:00
Yifan Zhang aec765a4dc drm/amdgpu: add psp 14.0.1 discovery support
This patch to add psp 14.0.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:18 -05:00
Yifan Zhang 24b5a5df94 drm/amdgpu: add PSP 14.0.1 support
This patch to add PSP 14.0.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:15 -05:00
Yifan Zhang c5ce1f1a21 drm/amdgpu: add smuio 14.0.1 support
This patch to add smuio 14.0.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:11 -05:00
Yifan Zhang bd377b1281 drm/amdgpu: add nbio 7.11.1 discovery support
This patch to add nbio 7.11.1 support.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:08 -05:00
Yifan Zhang dc84f52eb2 drm/amdgpu/nbio: Add NBIO 7.11.1 Support
Fix up doorbell setup and clockgating.

v2: squash in fixes (Alex)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:42:03 -05:00
Felix Kuehling 34a1de0f79 drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole
The TBA and TMA, along with an unused IB allocation, reside at low
addresses in the VM address space. A stray VM fault which hits these
pages must be serviced by making their page table entries invalid.
The scheduler depends upon these pages being resident and fails,
preventing a debugger from inspecting the failure state.

By relocating these pages above 47 bits in the VM address space they
can only be reached when bits [63:48] are set to 1. This makes it much
less likely for a misbehaving program to generate accesses to them.
The current placement at VA (PAGE_SIZE*2) is readily hit by a NULL
access with a small offset.

v2:
- Move it to the reserved space to avoid concflicts with Mesa
- Add macros to make reserved space management easier

v3:
- Move VM  max PFN calculation into AMDGPU_VA_RESERVED macros

Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-16 15:41:50 -05:00
Alex Deucher ba1a58d5b9 drm/amdgpu: add shared fdinfo stats
Add shared stats.  Useful for seeing shared memory.

v2: take dma-buf into account as well
v3: use the new gem helper

Link: https://lore.kernel.org/all/20231207180225.439482-1-alexander.deucher@amd.com/
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian König <christian.keonig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2024-02-16 12:52:50 +01:00
Thong 2f542421a4 drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
Update the maximum resolution reported for HEVC encoding on VCN 4
devices to reflect its 8K encoding capability.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159
Signed-off-by: Thong <thong.thai@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-02-15 14:18:43 -05:00
Mario Limonciello 9163616853 Revert "drm/amd: flush any delayed gfxoff on suspend entry"
commit ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee726395 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.

Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.

This reverts commit 0dee726395.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-15 14:18:43 -05:00
Mario Limonciello 3a9626c816 drm/amd: Stop evicting resources on APUs in suspend
commit 5095d54181 ("drm/amd: Evict resources during PM ops prepare()
callback") intentionally moved the eviction of resources to earlier in
the suspend process, but this introduced a subtle change that it occurs
before adev->in_s0ix or adev->in_s3 are set. This meant that APUs
actually started to evict resources at suspend time as well.

Explicitly set s0ix or s3 in the prepare() stage, and unset them if the
prepare() stage failed.

v2: squash in warning fix from Stephen Rothwell

Reported-by: Jürg Billeter <j@bitron.ch>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038
Fixes: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-15 14:18:43 -05:00
Hamza Mahfooz d16df040c8 drm/amdgpu: make damage clips support configurable
We have observed that there are quite a number of PSR-SU panels on the
market that are unable to keep up with what user space throws at them,
resulting in hangs and random black screens. So, make damage clips
support configurable and disable it by default for PSR-SU displays.

Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-15 14:18:42 -05:00
Likun Gao efc11f34e2 drm/amdgpu: support psp ip block discovery for psp v14
Support PSP ip block discovery for psp v14.
Add psp ip block for psp v14_0_2 and v14_0_3.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:18:27 -05:00
Likun Gao 8152825498 drm/amdgpu: add psp_timeout to limit PSP related operation
Add a new parameter psp_timeout to limit psp related operation
to unify the timeout limition for psp.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:18:24 -05:00
Likun Gao e71658299d drm/amdgpu/psp: set boot_time_tmr flag
Set boot_time_tmr flag for the ASIC which MP0 ip version
newer than 14.0.2
For runtime TMR:
     Init tmr and load tmr should did.
For boottime TMR:
     If do not support autoload, skip init TMR.
     If support autoload, excute init TMR but skip load tmr.

v2: rebase (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:18:20 -05:00
Likun Gao 2fb4460fb8 drm/amdgpu/psp: handle TMR type via flag
Add flag boot_time_tmr to indicate boot time TMR or runtime TMR
instead of function.

v2: rework logic (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:18:16 -05:00
Likun Gao 8d339b0df2 drm/amdgpu/psp: set autoload support by default
Set psp->autoload_supported to true by default,
as only a few version of ASIC not support autoload,
and the furture version of PSP should support this.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:18:06 -05:00
Likun Gao a78791c2b2 drm/amdgpu: support psp ip block for psp v14
Support PSP ip block for psp v14.
Add psp ip block for psp v14_0_2 and v14_0_3.

v2: sqaush in 14.0.3 firmware fix (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:16:12 -05:00
Likun Gao f19cb91615 drm/amdgpu: use spirom update wait_for helper for psp v14
Spirom update typically requires extremely long
duration for command execution, and special helper
function to wait for it's completion.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:16:07 -05:00
Felix Kuehling efe0f34c2b drm/amdgpu: Reduce VA_RESERVED_BOTTOM to 64KB
The reservation is there to catch NULL pointer dereferences from the
GPU. Reduce the size to 64KB to make sure that shared virtual address
programming models can map all CPU-accessible virtual addresses for GPU
access. This is also the default for CPU virtual address mappings as
seen in /proc/sys/vm/mmap_min_addr.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:15:50 -05:00
Hawking Zhang 876fa5f8a0 drm/amdgpu: Add psp v14_0 ip block support
Add psp v14_0 ip block support.

v2: rebase (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:15:46 -05:00
Thong fc2d4230e5 drm/amdgpu/soc21: update VCN 4 max HEVC encoding resolution
Update the maximum resolution reported for HEVC encoding on VCN 4
devices to reflect its 8K encoding capability.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3159
Signed-off-by: Thong <thong.thai@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-14 17:13:46 -05:00
Mario Limonciello ce311df91d Revert "drm/amd: flush any delayed gfxoff on suspend entry"
commit ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee726395 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.

Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.

This reverts commit 0dee726395.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ab4750332d ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-13 08:59:50 -05:00
Mario Limonciello 226db36032 drm/amd: Stop evicting resources on APUs in suspend
commit 5095d54181 ("drm/amd: Evict resources during PM ops prepare()
callback") intentionally moved the eviction of resources to earlier in
the suspend process, but this introduced a subtle change that it occurs
before adev->in_s0ix or adev->in_s3 are set. This meant that APUs
actually started to evict resources at suspend time as well.

Explicitly set s0ix or s3 in the prepare() stage, and unset them if the
prepare() stage failed.

v2: squash in warning fix from Stephen Rothwell

Reported-by: Jürg Billeter <j@bitron.ch>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3132#note_2271038
Fixes: 5095d54181 ("drm/amd: Evict resources during PM ops prepare() callback")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-13 08:59:49 -05:00
Dave Airlie b344e64fbd amd-drm-next-6.9-2024-02-09:
amdgpu:
 - Validate DMABuf imports in compute VMs
 - Add RAS ACA framework
 - PSP 13 fixes
 - Misc code cleanups
 - Replay fixes
 - Atom interpretor PS, WS bounds checking
 - DML2 fixes
 - Audio fixes
 - DCN 3.5 Z state fixes
 - Remove deprecated ida_simple usage
 - UBSAN fixes
 - RAS fixes
 - Enable seq64 infrastructure
 - DC color block enablement
 - Documentation updates
 - DC documentation updates
 - DMCUB updates
 - S3 fixes
 - VCN 4.0.5 fixes
 - DP MST fixes
 - SR-IOV fixes
 
 amdkfd:
 - Validate DMABuf imports in compute VMs
 - SVM fixes
 - Trap handler updates
 
 radeon:
 - Atom interpretor PS, WS bounds checking
 - Misc code cleanups
 
 UAPI:
 - Bump KFD version so UMDs know that the fixes that enable the management of
   VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present
 - Add INFO query for input power.  This matches the existing INFO query for average
   power.  Used in gaming HUDs, etc.
   Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZcaM8gAKCRC93/aFa7yZ
 2L64AP9S8Wh5T2dEm3Nr8zBR008KdFQyOGVoO4qwlmyJMgin3wEA57gHiUrvs3o7
 HRR+PU4JMo4OxQZNpVQtYYHc1BL6nQU=
 =3AqF
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.9-2024-02-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.9-2024-02-09:

amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- S3 fixes
- VCN 4.0.5 fixes
- DP MST fixes
- SR-IOV fixes

amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates

radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups

UAPI:
- Bump KFD version so UMDs know that the fixes that enable the management of
  VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present
- Add INFO query for input power.  This matches the existing INFO query for average
  power.  Used in gaming HUDs, etc.
  Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com
2024-02-13 11:32:23 +10:00
Alex Deucher ddc23e6e23 drm/amdgpu/psp: update define to better align with its meaning
MEM_TRAINING_ENCROACHED_SIZE is for BIST training data.  It's
not memory type specific.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:14:12 -05:00
Hamza Mahfooz 040fdcde28 drm/amdgpu: respect the abmlevel module parameter value if it is set
Currently, if the abmlevel module parameter is set, it is possible for
user space to override the ABM level at some point after boot. However,
that is undesirable because it means that we aren't respecting the
user's wishes with regard to the level that they want to use. So,
prevent user space from changing the ABM level if the module parameter
is set to a non-auto value.

Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:14:05 -05:00
Sonny Jiang cff9960317 drm/amdgpu: Add jpeg_v5_0_0 ip block support
Enable support for jpeg_v5_0_0 ip  block.

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:13:44 -05:00
Sonny Jiang 75a178926c drm/amdgpu/jpeg5: Enable doorbell
Add doorbell for JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:13:39 -05:00
Sonny Jiang 785e53a83b drm/amdgpu/jpeg5: add power gating support
Add PG support for JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:13:35 -05:00
Sonny Jiang dfad65c657 drm/amdgpu: Add JPEG5 support
Add support for JPEG5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:12:00 -05:00
Sonny Jiang 470675f6bf amdgpu/drm: Add vcn_v5_0_0_ip_block support
Enable support for vcn_v5_0_0_ip_block

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:11:53 -05:00
Hamza Mahfooz fc184dbe9f drm/amdgpu: make damage clips support configurable
We have observed that there are quite a number of PSR-SU panels on the
market that are unable to keep up with what user space throws at them,
resulting in hangs and random black screens. So, make damage clips
support configurable and disable it by default for PSR-SU displays.

Cc: stable@vger.kernel.org
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:10:24 -05:00
Sonny Jiang b6d1a06320 drm/amdgpu: add VCN_5_0_0 IP block support
Add VCN_5_0_0 IP init, ring functions, DPG support.

v2: squash in warning fixes (Alex)
v3: squash in block and ring init, boot, doorbell enablement,
    DPG support (Alex)

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:10:18 -05:00
Sonny Jiang 816dae1d69 drm/amdgpu: add VCN_5_0_0 firmware support
Add VCN5_0_0 firmware support

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:10:14 -05:00
Likun Gao ca46c25909 drm/amdgpu/discovery: Add hdp v7_0 ip block
Add hdp v7_0 ip block

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:10:00 -05:00
Likun Gao f3bcdf2d90 drm/amdgpu: Add hdp v7_0 ip block support
Add hdp v7_0 ip block support.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:09:57 -05:00
Likun Gao 56018e8363 drm/amdgpu/discovery: Add ih v7_0 ip block
Add ih v7_0 ip block.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:09:45 -05:00
Likun Gao 12443fc53e drm/amdgpu: Add ih v7_0 ip block support
Add ih v7_0 ip block support.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:09:42 -05:00
Saleemkhan Jamadar 0a119d53f7 drm/amdgpu/jpeg: add support for jpeg DPG mode
Jpeg DPG support for GC IP v11_5_0

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:09:32 -05:00
Saleemkhan Jamadar 617efef4af drm/amdgpu: add ucode id for jpeg DPG support
add ucode id and cmd buffer for jpeg psp sram programming
and Jpeg DPG support.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:09:11 -05:00
Likun Gao 39df603d2c drm/amdgpu/discovery: Add lsdma v7_0 ip block
Add lsdma v7_0 ip block.

v2: squash in updates (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:08:46 -05:00
Likun Gao aa2fb23605 drm/amdgpu: Add lsdma v7_0 ip block support
Add lsdma v7_0 ip block support.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:08:41 -05:00
Yang Wang f579c06bdc drm/amdgpu: send smu rma reason event in ras eeprom driver
send smu rma reason event to smu in ras eeprom driver.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:08:27 -05:00
Hawking Zhang 53edf77179 drm/amdgpu: Add athub v4_1_0 ip block support
Add athub v4_1_0 ip block support.

v2: fix clang warning (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:08:12 -05:00
Likun Gao b35c3feafe drm/amdgpu: support rlc auotload type set
Support to set fw_load_type=3 to use backdoor
rlc autoload.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:07:45 -05:00
Likun Gao 45b801c24c drm/amdgpu: skip ucode bo reserve for RLC AUTOLOAD
Skip ucode BO reservation for backdoor RLC autoload.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-12 16:07:42 -05:00
Lijo Lazar 534c8a5b9d drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
HDP flush remapping is not done for VFs. Keep the original offsets in VF
environment.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 18:30:11 -05:00
Lijo Lazar 55173942a6 drm/amdgpu: Avoid fetching VRAM vendor info
The present way to fetch VRAM vendor information turns out to be not
reliable on GFX 9.4.3 dGPUs as well. Avoid using the data.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-02-07 18:28:31 -05:00
Stanley.Yang 2dcf82a8e8 drm/amdgpu: Fix shared buff copy to user
ta if invoke node buffer
|-------- ta type ----------|
|--------  ta id  ----------|
|-------- cmd  id ----------|
|------ shared buf len -----|
|------ shared buffer ------|

ta if invoke node buffer is as above, copy shared buffer data to correct location

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 18:22:04 -05:00
Li Ma 897925dcc5 drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend
A supplement to commit: 615dd56ac5
There is an irq warning of jpeg during resume in s2idle process. No irq enabled in jpeg 4.0.5 resume.

Fixes: 615dd56ac5 ("drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend")
Signed-off-by: Li Ma <li.ma@amd.com>
Acked-By: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 18:19:22 -05:00
Prike Liang 6ef82ac664 drm/amdgpu: reset gpu for s3 suspend abort case
In the s3 suspend abort case some type of gfx9 power
rail not turn off from FCH side and this will put the
GPU in an unknown power status, so let's reset the gpu
to a known good power state before reinitialize gpu
device.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 18:19:08 -05:00
Prike Liang 93bafa32a6 drm/amdgpu: skip to program GFXDEC registers for suspend abort
In the suspend abort cases, the gfx power rail doesn't turn off so
some GFXDEC registers/CSB can't reset to default value and at this
moment reinitialize GFXDEC/CSB will result in an unexpected error.
So let skip those program sequence for the suspend abort case.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 18:19:04 -05:00
Lijo Lazar d559744403 drm/amdgpu: Fix HDP flush for VFs on nbio v7.9
HDP flush remapping is not done for VFs. Keep the original offsets in VF
environment.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:24 -05:00
Lijo Lazar 3d1554d999 drm/amdgpu: Avoid fetching VRAM vendor info
The present way to fetch VRAM vendor information turns out to be not
reliable on GFX 9.4.3 dGPUs as well. Avoid using the data.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:23 -05:00
Stanley.Yang 28b34ad207 drm/amdgpu: Fix shared buff copy to user
ta if invoke node buffer
|-------- ta type ----------|
|--------  ta id  ----------|
|-------- cmd  id ----------|
|------ shared buf len -----|
|------ shared buffer ------|

ta if invoke node buffer is as above, copy shared buffer data to correct location

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:22 -05:00
Srinivasan Shanmugam cdb637d339 drm/amdgpu: Fix potential out-of-bounds access in 'amdgpu_discovery_reg_base_init()'
The issue arises when the array 'adev->vcn.vcn_config' is accessed
before checking if the index 'adev->vcn.num_vcn_inst' is within the
bounds of the array.

The fix involves moving the bounds check before the array access. This
ensures that 'adev->vcn.num_vcn_inst' is within the bounds of the array
before it is used as an index.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1289 amdgpu_discovery_reg_base_init() error: testing array offset 'adev->vcn.num_vcn_inst' after use.

Fixes: a0ccc717c4 ("drm/amdgpu/discovery: validate VCN and SDMA instances")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:22 -05:00
Li Ma 0a8ff0cbee drm/amdgpu: remove asymmetrical irq disabling in jpeg 4.0.5 suspend
A supplement to commit: 615dd56ac5
There is an irq warning of jpeg during resume in s2idle process. No irq enabled in jpeg 4.0.5 resume.

Fixes: 615dd56ac5 ("drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend")
Signed-off-by: Li Ma <li.ma@amd.com>
Acked-By: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:22 -05:00
Prike Liang 3f719cf22f drm/amdgpu: reset gpu for s3 suspend abort case
In the s3 suspend abort case some type of gfx9 power
rail not turn off from FCH side and this will put the
GPU in an unknown power status, so let's reset the gpu
to a known good power state before reinitialize gpu
device.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:22 -05:00
Prike Liang 0326de4c44 drm/amdgpu: skip to program GFXDEC registers for suspend abort
In the suspend abort cases, the gfx power rail doesn't turn off so
some GFXDEC registers/CSB can't reset to default value and at this
moment reinitialize GFXDEC/CSB will result in an unexpected error.
So let skip those program sequence for the suspend abort case.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 12:26:22 -05:00
Qiang Ma aeaf3e6cf8 drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization
Problem:
The computer in the bios initialization process, unplug the HDMI display,
wait until the system up, plug in the HDMI display, did not enter the
hotplug interrupt function, the display is not bright.

Fix:
After the above problem occurs, and the hpd ack interrupt bit is 1,
the interrupt should be cleared during hpd_init initialization so that
when the driver is ready, it can respond to the hpd interrupt normally.

Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 10:01:10 -05:00
Alex Deucher 39a82d304b drm/amdgpu: fix typo in parameter description
Missing space.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 10:01:05 -05:00
shaoyunl 3fad156572 drm/amdgpu: Only create mes event log debugfs when mes is enabled
Skip the debugfs file creation for mes event log if the GPU
doesn't use MES. This to prevent potential kernel oops when
user try to read the event log in debugfs on a GPU without MES

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-02-07 10:00:56 -05:00
Thomas Zimmermann 0e85f1ae4a Merge drm/drm-next into drm-misc-next
Backmerging to update drm-misc-next to the state of v6.8-rc3. Also
fixes a build problem with xe.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-02-07 13:02:20 +01:00
Friedrich Vock 7330256268 drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Allows us to detect subsequent IH ring buffer overflows as well.

Cc: Joshua Ashton <joshua@froggi.es>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:39:47 -05:00
Yifan Zhang 4f56acdee4 drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:39:40 -05:00
Yifan Zhang de4a733868 drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
No need to set GC golden settings in driver from gfx 11.5.0 onwards.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:38:02 -05:00
Lang Yu 9c29282ecb drm/amdkfd: reserve the BO before validating it
Fix a warning.

v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.

v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)

[   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.708989] Call Trace:
[   41.708992]  <TASK>
[   41.708996]  ? show_regs+0x6c/0x80
[   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709008]  ? __warn+0x93/0x190
[   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709024]  ? report_bug+0x1f9/0x210
[   41.709035]  ? handle_bug+0x46/0x80
[   41.709041]  ? exc_invalid_op+0x1d/0x80
[   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
[   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
[   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
[   41.709959]  __x64_sys_ioctl+0x9c/0xd0
[   41.709967]  do_syscall_64+0x3f/0x90
[   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 101b810430 ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:37:53 -05:00
Srinivasan Shanmugam 16da399091 drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'

Fixes: fac4ebd79f ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:36:44 -05:00
David McFarland 8ef85a0ce2 drm/amd: Don't init MEC2 firmware when it fails to load
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.

Cc: stable@vger.kernel.org
Fixes: 9931b67690 ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <corngood@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:34:14 -05:00
Ma Jun bb34bc2cd3 drm/amdgpu: Fix the warning info in mode1 reset
Fix the warning info below during mode1 reset.
[  +0.000004] Call Trace:
[  +0.000004]  <TASK>
[  +0.000006]  ? show_regs+0x6e/0x80
[  +0.000011]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000005]  ? __warn+0x91/0x150
[  +0.000009]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000006]  ? report_bug+0x19d/0x1b0
[  +0.000013]  ? handle_bug+0x46/0x80
[  +0.000012]  ? exc_invalid_op+0x1d/0x80
[  +0.000011]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.000014]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000007]  ? __flush_work.isra.0+0x208/0x390
[  +0.000007]  ? _prb_read_valid+0x216/0x290
[  +0.000008]  __cancel_work_timer+0x11d/0x1a0
[  +0.000007]  ? try_to_grab_pending+0xe8/0x190
[  +0.000012]  cancel_work_sync+0x14/0x20
[  +0.000008]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.000032]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 17:34:05 -05:00
Le Ma db2aad036e drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:

  "[drm] Fence fallback timer expired on ring sdma0.0".

An early vm pt clearing job is sent to SDMA ahead of interrupt enabled.
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.

v2: wrap the drm client creation

Fixes: 1819200166 ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 15:33:52 -05:00
Friedrich Vock 9217b91c64 drm/amdgpu: Reset IH OVERFLOW_CLEAR bit
Allows us to detect subsequent IH ring buffer overflows as well.

Cc: Joshua Ashton <joshua@froggi.es>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:28 -05:00
Yifan Zhang 615dd56ac5 drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend
There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:20 -05:00
Tao Zhou 01087a1974 drm/amdgpu: use PSP address query command
Get UMC physical address from PSP in RAS error address coversion.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Tao Zhou a1eac5bd91 drm/amdgpu: add PSP RAS address query command
Convert mca address to physical address or vice versa via RAS TA.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Yifan Zhang e4d65510e8 drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0
No need to set GC golden settings in driver from gfx 11.5.0 onwards.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Lang Yu 0c93bd4957 drm/amdkfd: reserve the BO before validating it
Fix a warning.

v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.

v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)

[   41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.708989] Call Trace:
[   41.708992]  <TASK>
[   41.708996]  ? show_regs+0x6c/0x80
[   41.709000]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709008]  ? __warn+0x93/0x190
[   41.709014]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709024]  ? report_bug+0x1f9/0x210
[   41.709035]  ? handle_bug+0x46/0x80
[   41.709041]  ? exc_invalid_op+0x1d/0x80
[   41.709048]  ? asm_exc_invalid_op+0x1f/0x30
[   41.709057]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709185]  ? ttm_bo_validate+0x146/0x1b0 [ttm]
[   41.709197]  ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[   41.709337]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709346]  kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[   41.709467]  amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[   41.709586]  kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[   41.709710]  kfd_ioctl+0x1ec/0x650 [amdgpu]
[   41.709822]  ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[   41.709945]  ? srso_alias_return_thunk+0x5/0x7f
[   41.709949]  ? tomoyo_file_ioctl+0x20/0x30
[   41.709959]  __x64_sys_ioctl+0x9c/0xd0
[   41.709967]  do_syscall_64+0x3f/0x90
[   41.709973]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 101b810430 ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
Srinivasan Shanmugam fa8a91b0e5 drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'
Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'

Fixes: fac4ebd79f ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:19 -05:00
YiPeng Chai adb4d6a40d drm/amdgpu: Need to resume ras during gpu reset for gfx v9_4_3 sriov
Need to resume ras during gpu reset for
gfx v9_4_3 sriov

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:18 -05:00
Tao Zhou edfdde9013 drm/amdgpu: disable RAS feature when fini
Send RAS disable feature command in fini.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:05:18 -05:00
Hawking Zhang 1731ba9b64 drm/amdgpu: Update boot time errors polling sequence
Update boot time errors polling sequence to align with
the latest firmware change.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 14:04:55 -05:00
David McFarland c3ec8c4f9a drm/amd: Don't init MEC2 firmware when it fails to load
The same calls are made directly above, but conditional on the firmware
loading and validating successfully.

Cc: stable@vger.kernel.org
Fixes: 9931b67690 ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <corngood@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 13:53:06 -05:00
Ma Jun 9749c86843 drm/amdgpu: Fix the warning info in mode1 reset
Fix the warning info below during mode1 reset.
[  +0.000004] Call Trace:
[  +0.000004]  <TASK>
[  +0.000006]  ? show_regs+0x6e/0x80
[  +0.000011]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000005]  ? __warn+0x91/0x150
[  +0.000009]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000006]  ? report_bug+0x19d/0x1b0
[  +0.000013]  ? handle_bug+0x46/0x80
[  +0.000012]  ? exc_invalid_op+0x1d/0x80
[  +0.000011]  ? asm_exc_invalid_op+0x1f/0x30
[  +0.000014]  ? __flush_work.isra.0+0x2e8/0x390
[  +0.000007]  ? __flush_work.isra.0+0x208/0x390
[  +0.000007]  ? _prb_read_valid+0x216/0x290
[  +0.000008]  __cancel_work_timer+0x11d/0x1a0
[  +0.000007]  ? try_to_grab_pending+0xe8/0x190
[  +0.000012]  cancel_work_sync+0x14/0x20
[  +0.000008]  amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[  +0.000032]  amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
 - Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-31 09:40:42 -05:00
Jani Nikula 345a36c4f1 drm/amdgpu: prefer snprintf over sprintf
This will trade the W=1 warning -Wformat-overflow to
-Wformat-truncation. This lets us enable -Wformat-overflow subsystem
wide.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
2024-01-31 11:04:25 +02:00
Yang Wang 788686e2d9 drm/amdgpu: use helper macro HW_ERR instead of Hardware error string
use helper macro HW_ERR to instead of Hardware error string.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29 15:47:02 -05:00
Le Ma c0125b848a drm/amdgpu: move the drm client creation behind drm device registration
This patch is to eliminate interrupt warning below:

  "[drm] Fence fallback timer expired on ring sdma0.0".

An early vm pt clearing job is sent to SDMA ahead of interrupt enabled.
And re-locating the drm client creation following after drm_dev_register
looks like a more proper flow.

v2: wrap the drm client creation

Fixes: 1819200166 ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29 15:35:13 -05:00
Mario Limonciello c92c108403 Revert "drm/amd/pm: fix the high voltage and temperature issue"
This reverts commit 5f38ac54e6.
This causes issues with rebooting and the 7800XT.

Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: stable@vger.kernel.org
Fixes: 5f38ac54e6 ("drm/amd/pm: fix the high voltage and temperature issue")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-29 08:56:13 -05:00
Maxime Ripard 4db102dcb0
Merge drm/drm-next into drm-misc-next
Kickstart 6.9 development cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-01-29 14:20:23 +01:00
Alex Deucher 3380fcad2c drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-25 15:48:57 -05:00
Alex Deucher 03ff6d7238 drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0.  This can lead to hangs in some mixed
graphics and compute workloads.  Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-25 15:48:51 -05:00
Tom St Denis e7a8594cc2 drm/amd/amdgpu: Assign GART pages to AMD device mapping
This allows kernel mapped pages like the PDB and PTB to be
read via the iomem debugfs when there is no vram in the system.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.7.x
2024-01-25 15:47:36 -05:00
Lijo Lazar 89a7c0bd74 drm/amdgpu: Show vram vendor only if available
Ony if vram vendor info is available, show in sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.7.x
2024-01-25 15:44:11 -05:00
Lijo Lazar 90751bdeee drm/amdgpu: Avoid fetching vram vendor information
For GFX 9.4.3 APUs, the current method of fetching vram vendor
information is not reliable. Avoid fetching the information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.7.x
2024-01-25 15:41:57 -05:00
Victor Skvortsov 362936d613 amdgpu/drm: Use vram manager for virtualization page retirement
In runtime, use vram manager for virtualization page retirement.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:58:03 -05:00
Victor Skvortsov 2474414c60 drm/amdgpu: Add RAS_POISON_READY host response message
In a non-FLR page avoidance scenario, the host driver will
provide the bad pages in the pf2vf exchange region.

Adding a new host response message to indicate when the
pf2vf exchange region has been updated.

Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:58:03 -05:00
YiPeng Chai ed1e1e42fd drm/amdgpu: Support passing poison consumption ras block to SRIOV
Support passing poison consumption ras blocks
to SRIOV.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:58:03 -05:00
Yang Wang c0c48f0d61 drm/amdgpu: adjust aca init/fini sequence to match gpu reset
- move aca init/fini function into ras init/fini to adapt gpu reset
  sequence.
- add new function amdgpu_aca_reset()

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:58:02 -05:00
Yang Wang 6eb726a082 drm/amdgpu: add aca sysfs remove support
add aca sysfs remove support.

Fixes: 37973b69ea ("drm/amdgpu: add aca sysfs support")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:58:02 -05:00
Mukul Joshi c84a7e21db drm/amdgpu: Fix module unload hang with RAS enabled
The driver unload hangs because the page retirement
kthread cannot be stopped as it is sleeping and waiting
on page retirement event to occur. Add kthread_should_stop()
to the event condition to wake up the kthread when kthread
stop is called during driver unload.

Fixes: 3fdcd0a31d ("drm/amdgpu: Prepare for asynchronous processing of umc page retirement")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:57:52 -05:00
Alex Deucher fc8f5a29d4 drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads. Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-25 14:49:03 -05:00
Alex Deucher ca01082353 drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0.  This can lead to hangs in some mixed
graphics and compute workloads.  Updated firmware is also
required for AQL.

Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-25 14:48:58 -05:00
Tom St Denis 8352ca1090 drm/amd/amdgpu: Assign GART pages to AMD device mapping
This allows kernel mapped pages like the PDB and PTB to be
read via the iomem debugfs when there is no vram in the system.

Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:48:52 -05:00
Srinivasan Shanmugam 8d1717fb64 drm/amdgpu: Fix return type in 'aca_bank_hwip_is_matched()'
Change the return type of "if (!bank || type == ACA_HWIP_TYPE_UNKNOW)"
to be bool instead of int.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c:185 aca_bank_hwip_is_matched() warn: signedness bug returning '(-22)'

Fixes: f5e4cc8461 ("drm/amdgpu: implement RAS ACA driver framework")
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-25 14:47:03 -05:00
Somalapuram Amaranath a78a8da51b drm/ttm: replace busy placement with flags v6
Instead of a list of separate busy placement add flags which indicate
that a placement should only be used when there is room or if we need to
evict.

v2: add missing TTM_PL_FLAG_IDLE for i915
v3: fix auto build test ERROR on drm-tip/drm-tip
v4: fix some typos pointed out by checkpatch
v5: cleanup some rebase problems with VMWGFX
v6: implement some missing VMWGFX functionality pointed out by Zack,
    rename the flags as suggested by Michel, rebase on drm-tip and
    adjust XE as well

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240112125158.2748-4-christian.koenig@amd.com
2024-01-25 09:59:44 +01:00
Mario Limonciello 7055c5856a Revert "drm/amd/pm: fix the high voltage and temperature issue"
This reverts commit 5f38ac54e6.
This causes issues with rebooting and the 7800XT.

Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: stable@vger.kernel.org
Fixes: 5f38ac54e6 ("drm/amd/pm: fix the high voltage and temperature issue")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3062
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:28 -05:00
Yang Wang 2866a4549c drm/amdgpu: skip call ras_late_init if ras block is not supported
skip call ras_late_init callback if ras block is not supported.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:28 -05:00
Lijo Lazar 9c3f6e2c4e drm/amdgpu: Show vram vendor only if available
Ony if vram vendor info is available, show in sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:28 -05:00
Lijo Lazar e0eb08dcec drm/amdgpu: Avoid fetching vram vendor information
For GFX 9.4.3 APUs, the current method of fetching vram vendor
information is not reliable. Avoid fetching the information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:28 -05:00
Tao Zhou 1757bb7dab drm/amdgpu: update check condition of query for ras page retire
Support page retirement handling in debug mode.

v2: revert smu_v13_0_6_get_ecc_info directly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:26 -05:00
Srinivasan Shanmugam be91a828d0 drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting

Cc: Le Ma <Le.Ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:26 -05:00
YiPeng Chai 0795b5d234 drm/amdgpu:Support retiring multiple MCA error address pages
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
YiPeng Chai afb617f38f drm/amdgpu: add interface to check mca umc status
Add interface to check mca umc status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
YiPeng Chai 6c23f3d12a drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning
Use asynchronous polling to handle umc_v12_0 poisoning.

v2:
  1. Change function name.
  2. Change the debugging information content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
Stanley.Yang ee9c3031d0 drm/amdgpu: Fix ras features value calltrace
The high three bits of ras features mask indicate socket
id, it should skip to check high three bits of ras features
mask before disable all ras features.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
YiPeng Chai 3fdcd0a31d drm/amdgpu: Prepare for asynchronous processing of umc page retirement
Preparing for asynchronous processing of umc page retirement.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
YiPeng Chai 22f6e3e112 drm/amdgpu: Add log info for umc_v12_0
Add log info for umc_v12_0.

v2:
 Delete redundant logs.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22 17:13:25 -05:00
Arunpravin Paneer Selvam 00a11f977b drm/amdgpu: Enable seq64 manager and fix bugs
- Enable the seq64 mapping sequence.
- Fix wflinfo va conflict and other bugs.

v1:
  - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE
    otherwise the areas will conflict with user space allocations (Alex)

  - It needs to be mapped read only in the user VM (Alex)

v2:
  - Instead of just one define for TOP/BOTTOM
    reserved space separate them into two (Christian)

  - Fix the CPU and VA calculations and while at it
    also cleanup error handling and kerneldoc (Christian)

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2024-01-22 17:13:18 -05:00
Linus Torvalds e08b575815 drm fixes for 6.8-rc1
amdgpu:
 - DSC fixes
 - DC resource pool fixes
 - OTG fix
 - DML2 fixes
 - Aux fix
 - GFX10 RLC firmware handling fix
 - Revert a broken workaround for SMU 13.0.2
 - DC writeback fix
 - Enable gfxoff when ROCm apps are active on gfx11 with the proper FW version
 
 amdkfd:
 - Fix dma-buf exports using GEM handles
 
 nouveau:
 - fix a unneeded WARN_ON triggering
 
 xe:
 - Fix for definition of wakeref_t
 - Fix for an error code aliasing
 - Fix for VM_UNBIND_ALL in the case there are no bound VMAs
 - Fixes for a number of __iomem address space mismatches reported by sparse
 - Fixes for the assignment of exec_queue priority
 - A Fix for skip_guc_pc not taking effect
 - Workaround for a build problem on GCC 11
 - A couple of fixes for error paths
 - Fix a Flat CCS compression metadata copy issue
 - Fix a misplace array bounds checking
 - Don't have display support depend on EXPERT (as discussed on IRC)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmWqGdoACgkQDHTzWXnE
 hr5tiA//ZIos/mK+70JprAhJkXN/Lo5IDBOsldDQ1BkakkVLU1taHIsrER6iDT8g
 WmDzuC4ZIkHZyJ1V8zcIZ4wjE+sIOUeje0fSuMRgwPD+rrdn3WjUiAuvofuxQ5fD
 LFW+O/9hzTl3xBoxidPdupf33WGRMAKhNuYvhwfH14LaqNDVVdAHU7MPTORmFsyY
 CbCOLze5dwAmlB4rk+LDsO0gFtXjB7/Ewg2vHUzlEmaYmPpRxu9MHCpadQGq2Sal
 nwxwI5lb8DqR8jbel3pA0/kz06EdSKKr20YTdw+RVrp/tPqDq4njkZfJMPK+2VYf
 VpSPnGJOvvdtehCrKBBOBK4WRZWgbTUSuxrjgtIy+fp9aWt84NOEG2xDk1W+qWHS
 sqp5PeFb3meK1bsCdBBVSjj1fKApKmzJWqiCr0Z8dzV8QsZWSpPzhyfp3puxKESV
 dORGhMWdtdTMSPRysDUiTMGxn+fxaFTTx9Jsd0LbLizf/sob+ol9EjbNihaa5aKM
 FdHpUgO08X3OAppmcGmGQdIXG2K+TmKMsvguRAK98OVYXhVUpB/BrCGnd9eGyjYY
 mSQWrwKGm+Y/dGnr+ylHKxcyRQMmhc33gHmPAItNdMC8OemiHWZKKzoKBOzT2aME
 pZc6ZRJesG46bfrNqID6ASAIw6SEA+Zj3rk3QsWNPAbHVJcZboY=
 =LInS
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2024-01-19' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
 "This is mostly amdgpu and xe fixes, with an amdkfd and nouveau fix
  thrown in.

  The amdgpu ones are just the usual couple of weeks of fixes. The xe
  ones are bunch of cleanups for the new xe driver, the fix you put in
  on the merge commit and the kconfig fix that was hiding the problem
  from me.

  amdgpu:
   - DSC fixes
   - DC resource pool fixes
   - OTG fix
   - DML2 fixes
   - Aux fix
   - GFX10 RLC firmware handling fix
   - Revert a broken workaround for SMU 13.0.2
   - DC writeback fix
   - Enable gfxoff when ROCm apps are active on gfx11 with the proper FW
     version

  amdkfd:
   - Fix dma-buf exports using GEM handles

  nouveau:
   - fix a unneeded WARN_ON triggering

  xe:
   - Fix for definition of wakeref_t
   - Fix for an error code aliasing
   - Fix for VM_UNBIND_ALL in the case there are no bound VMAs
   - Fixes for a number of __iomem address space mismatches reported by
     sparse
   - Fixes for the assignment of exec_queue priority
   - A Fix for skip_guc_pc not taking effect
   - Workaround for a build problem on GCC 11
   - A couple of fixes for error paths
   - Fix a Flat CCS compression metadata copy issue
   - Fix a misplace array bounds checking
   - Don't have display support depend on EXPERT (as discussed on IRC)"

* tag 'drm-next-2024-01-19' of git://anongit.freedesktop.org/drm/drm: (71 commits)
  nouveau/vmm: don't set addr on the fail path to avoid warning
  drm/amdgpu: Enable GFXOFF for Compute on GFX11
  drm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for writeback requests.
  drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"
  drm/amdkfd: init drm_client with funcs hook
  drm/amd/display: Fix a switch statement in populate_dml_output_cfg_from_stream_state()
  drm/amdgpu: Fix the null pointer when load rlc firmware
  drm/amd/display: Align the returned error code with legacy DP
  drm/amd/display: Fix DML2 watermark calculation
  drm/amd/display: Clear OPTC mem select on disable
  drm/amd/display: Port DENTIST hang and TDR fixes to OTG disable W/A
  drm/amd/display: Add logging resource checks
  drm/amd/display: Init link enc resources in dc_state only if res_pool presents
  drm/amd/display: Fix late derefrence 'dsc' check in 'link_set_dsc_pps_packet()'
  drm/amd/display: Avoid enum conversion warning
  drm/amd/pm: Fix smuv13.0.6 current clock reporting
  drm/amd/pm: Add error log for smu v13.0.6 reset
  drm/amdkfd: Fix 'node' NULL check in 'svm_range_get_range_boundaries()'
  drm/amdgpu: drop exp hw support check for GC 9.4.3
  drm/amdgpu: move debug options init prior to amdgpu device init
  ...
2024-01-19 11:50:00 -08:00
Linus Torvalds ed8d84530a This cycle, I2C removes the currently unused CLASS_DDC support
(controllers set the flag, but there is no client to use it). Also,
 CLASS_SPD support gets simplified to prepare removal in the future.
 Class based instantiation is not recommended these days anyhow.
 Furthermore, I2C core now creates a debugfs directory per I2C adapter.
 Current bus driver users were converted to use it. Then, there are also
 quite some driver updates. Standing out are patches for the wmt-driver
 which is refactored to support more variants. This is the rebased pull
 request where a large series for the designware driver was dropped.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmWph0UPHHdzYUBrZXJu
 ZWwub3JnAAoJEBQN5MwUoCm2kbIQAJotSmX0mM+nNPReYCMMiloxoxUwgpiErNwY
 WDrYQSezthAJ1LDsGOEeLcE4f4I+UcUHBO1BoERtOZg3cGtE0Ii5N845sp100S9O
 ktyaKS5utoErymThWFFrnZX60/8yKXUMzZmNzy96560gPcxbFyyyVhKfBSPzK9T+
 O8CGu7GRNqgWHlvH3yqGeCbreWYrYVSrluEpBu6807cp3zDxrU+autOnsewm5+md
 ka3DdqrbxJSblYK8fJKESAUgkRmZgYKbgl0iiCuqX+ib6I4OA3Z68ny7dl0fY3Ws
 vwt7d88SaBKDdJmUZyb/sm4aJsW69GN+ECZolxrn4TIw45k4tes2s6Ma5+TV3E9h
 Fd1RuqduFEqQ7cj31UPe2x8rgj5Fo5nbjCWxdZv+/3zF8+cHwi8iwkp2PScsPCsa
 fmCdehUE5DrgobsRNANe6XJzxY5wp2VNpGEWKeaQz2Z0/d9T1YFS7a8aewvhXoPC
 isZboi6GQh2XoE8UgGJa29VUuaIkUW513DwCGw8mz1yKN+kHGcsRXXjkjaZoQn3U
 MMvh/zkI2Hpy/m2R8PWeIq5XhLJvmlZ19JJzUHJIjXh9Fn9EVtXhlUleh6mzMfeM
 n8NOg7Eukep2sBgmaufkUKz2Jtogs59YDSXZEvqJjIkPM2Wi0hA18Qj+pilES1ff
 3ckk3mxY
 =8D3Q
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
 "This removes the currently unused CLASS_DDC support (controllers set
  the flag, but there is no client to use it).

  Also, CLASS_SPD support gets simplified to prepare removal in the
  future. Class based instantiation is not recommended these days
  anyhow.

  Furthermore, I2C core now creates a debugfs directory per I2C adapter.
  Current bus driver users were converted to use it.

  Finally, quite some driver updates. Standing out are patches for the
  wmt-driver which is refactored to support more variants.

  This is the rebased pull request where a large series for the
  designware driver was dropped"

* tag 'i2c-for-6.8-rc1-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
  MAINTAINERS: use proper email for my I2C work
  i2c: stm32f7: add support for stm32mp25 soc
  i2c: stm32f7: perform I2C_ISR read once at beginning of event isr
  dt-bindings: i2c: document st,stm32mp25-i2c compatible
  i2c: stm32f7: simplify status messages in case of errors
  i2c: stm32f7: perform most of irq job in threaded handler
  i2c: stm32f7: use dev_err_probe upon calls of devm_request_irq
  i2c: i801: Add lis3lv02d for Dell XPS 15 7590
  i2c: i801: Add lis3lv02d for Dell Precision 3540
  i2c: wmt: Reduce redundant: REG_CR setting
  i2c: wmt: Reduce redundant: function parameter
  i2c: wmt: Reduce redundant: clock mode setting
  i2c: wmt: Reduce redundant: wait event complete
  i2c: wmt: Reduce redundant: bus busy check
  i2c: mux: reg: Remove class-based device auto-detection support
  i2c: make i2c_bus_type const
  dt-bindings: at24: add ROHM BR24G04
  eeprom: at24: use of_match_ptr()
  i2c: cpm: Remove linux,i2c-index conversion from be32
  i2c: imx: Make SDA actually optional for bus recovering
  ...
2024-01-18 17:29:01 -08:00
Ori Messinger aa0901a900 drm/amdgpu: Enable GFXOFF for Compute on GFX11
On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware
issue, which has since been fixed after version 64.
This patch only re-enables GFXOFF for GFX version 11 if the GPU's
MES KIQ firmware version is newer than version 64.

V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64.
V3: Add parentheses to avoid GCC warning for parentheses:
"suggest parentheses around comparison in operand of ‘&’"
V4: Remove "V3" from commit title
V5: Change commit description and insert 'Acked-by'

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-18 16:45:19 -05:00
Christian König fb1c93c2e9 drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e77970.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-18 16:43:42 -05:00
Flora Cui 3c4e4eb5d8 drm/amdkfd: init drm_client with funcs hook
otherwise drm_client_dev_unregister() would try to
kfree(&adev->kfd.client).

Fixes: 1819200166 ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 16:42:54 -05:00
Ma Jun bc03c02cc1 drm/amdgpu: Fix the null pointer when load rlc firmware
If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.

Fixes: 3da9b71563 ("drm/amd: Use `amdgpu_ucode_*` helpers for GFX10")
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-18 16:34:45 -05:00
Tao Zhou a9e4f61df1 drm/amdgpu: update error condition check for umc_v12_0_query_error_address
Deferred error is also taken into account.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:47:24 -05:00
Stanley.Yang 601429cca9 drm/amdgpu: Skip do PCI error slot reset during RAS recovery
Why:
    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this
    caused system hang.
    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume
    [  557.373718] [drm] PCIE GART of 512M enabled.
    [  557.373722] [drm] PTB located at 0x0000031FED700000
    [  557.373788] [drm] VRAM is lost due to GPU reset!
    [  557.373789] [drm] PSP is resuming...
    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset
    [  557.547067] [drm] PCI error: detected callback, state(1)!!
    [  557.547069] [drm] No support for XGMI hive yet...
    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter
    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations
    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered
    [  557.610492] [drm] PCI error: slot reset callback!!
    ...
    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!
    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI
    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu
    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023
    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]
    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00
    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202
    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0
    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010
    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08
    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000
    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000
    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000
    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0
    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
    [  560.843444] PKRU: 55555554
    [  560.846480] Call Trace:
    [  560.849225]  <TASK>
    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea
    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.867778]  ? show_regs.part.0+0x23/0x29
    [  560.872293]  ? __die_body.cold+0x8/0xd
    [  560.876502]  ? die_addr+0x3e/0x60
    [  560.880238]  ? exc_general_protection+0x1c5/0x410
    [  560.885532]  ? asm_exc_general_protection+0x27/0x30
    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]
    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]
    [  560.904520]  process_one_work+0x228/0x3d0
How:
    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected
    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:47:15 -05:00
Stanley.Yang 2c7a1560e8 drm/amdgpu: Show deferred error count for UMC
Show deferred error count for UMC syfs node

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:47:07 -05:00
Ori Messinger 776b0953ab drm/amdgpu: Enable GFXOFF for Compute on GFX11
On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware
issue, which has since been fixed after version 64.
This patch only re-enables GFXOFF for GFX version 11 if the GPU's
MES KIQ firmware version is newer than version 64.

V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64.
V3: Add parentheses to avoid GCC warning for parentheses:
"suggest parentheses around comparison in operand of ‘&’"
V4: Remove "V3" from commit title
V5: Change commit description and insert 'Acked-by'

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:46:56 -05:00
Yang Wang 7ed97155b2 drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[]
fix array index out of bounds issue for ras_block_string[] array.

Fixes: 30df05fb74 ("drm/amdgpu: Align ras block enum with firmware")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:46:07 -05:00
YuanShang b5387349ca drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest
Submit command of wreg in GFX and COMPUTE ring to update
RLC_SPM_MC_CNT in guest machine during runtime.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:45:58 -05:00
Felix Kuehling 5394fb2a5b drm/amdgpu: Remove unnecessary NULL check
A static checker pointed out, that bo_va->base.bo was already derefenced
earlier in the same scope. Therefore this check is unnecessary here.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 50661eb1a2 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:44:49 -05:00
Christian König 087a3e13ec drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"
Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the
HW in an active state and is an unbalanced use of the IP callbacks.

Using the IP callbacks like this can lead to memory leaks, double
free and imbalanced reference counters.

Leaving the HW in an active state can lead to DMA accesses to memory now
freed by the driver.

Both is a complete no-go for driver unload so completely revert the
workaround for now.

This reverts commit f5c7e77970.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:42:20 -05:00
Christophe JAILLET 8a1f7fddab drm/amdgpu: Remove usage of the deprecated ida_simple_xx() API
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().

Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_range() is inclusive. So a -1 has been added when needed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:42:13 -05:00
Flora Cui 733965a90f drm/amdkfd: init drm_client with funcs hook
otherwise drm_client_dev_unregister() would try to
kfree(&adev->kfd.client).

Fixes: 1819200166 ("drm/amdkfd: Export DMABufs from KFD using GEM handles")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:41:08 -05:00
chenxuebing 7937b6f63f drm/amdgpu: Clean up errors in umc_v6_0.c
Fix the following errors reported by checkpatch:

ERROR: space required after that ',' (ctx:VxV)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:38:00 -05:00
chenxuebing ac4d654f3d drm/amdgpu: Clean up errors in clearstate_si.h
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:37:52 -05:00
Heiner Kallweit e965a70727 drm: remove I2C_CLASS_DDC support
After removal of the legacy EEPROM driver and I2C_CLASS_DDC support in
olpc_dcon there's no i2c client driver left supporting I2C_CLASS_DDC.
Class-based device auto-detection is a legacy mechanism and shouldn't
be used in new code. So we can remove this class completely now.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2024-01-18 21:10:41 +01:00
chenxuebing 762343f79e drm/amdgpu: Clean up errors in amdgpu.h
Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:40 -05:00
chenxuebing f5e1f90b67 drm/amdgpu: Clean up errors in amdgpu_gmc.c
Fix the following errors reported by checkpatch:

ERROR: need consistent spacing around '-' (ctx:WxV)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:40 -05:00
chenxuebing 0b0fb6da9b drm/amdgpu: Clean up errors in jpeg_v2_5.c
Fix the following errors reported by checkpatch:

ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:40 -05:00
chenxuebing 7230ebeb0a drm/amdgpu: Clean up errors in gfx_v9_4.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:40 -05:00
chenxuebing b679566bf0 drm/amdgpu: Clean up errors in amdgpu_drv.c
Fix the following errors reported by checkpatch:

ERROR: do not initialise globals to 0

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:40 -05:00
chenxuebing 995d629f74 drm/amd: Clean up errors in amdgpu_vkms.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Ma Jun 849e133c97 drm/amdgpu: Fix the null pointer when load rlc firmware
If the RLC firmware is invalid because of wrong header size,
the pointer to the rlc firmware is released in function
amdgpu_ucode_request. There will be a null pointer error
in subsequent use. So skip validation to fix it.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Hawking Zhang 4e2965bd3b drm/amdgpu: Centralize ras cap query to amdgpu_ras_check_supported
Move ras capablity check to amdgpu_ras_check_supported.
Driver will query ras capablity through psp interace, or
vbios interface, or specific ip callbacks.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
Hawking Zhang 0e14eb0cef drm/amdgpu: Query ras capablity from psp v2
Instead of traditional atomfirmware interfaces for RAS
capability, host driver can query ras capability from
psp starting from psp v13_0_6.

v2: drop redundant local variable from get_ras_capability.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:39 -05:00
chenxuebing 73888bad4d drm/amdgpu: Clean up errors in amdgpu_rlc.c
Fix the following errors reported by checkpatch:

ERROR: space prohibited before that '++' (ctx:WxB)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing 1ed8ccf268 drm/amd: Clean up errors in sdma_v2_4.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: trailing statements should be on next line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing 32a0a398fc drm/amd/amdgpu: Clean up errors in amdgpu_umr.h
Fix the following errors reported by checkpatch:

spaces required around that '=' (ctx:VxV)

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing 2bb012138d drm/amdgpu: Clean up errors in amdgpu_atomfirmware.h
Fix the following errors reported by checkpatch:

ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing 05ec623147 drm/amdgpu: Clean up errors in clearstate_gfx9.h
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
chenxuebing 2763da27f9 drm/amdgpu: Clean up errors in navi10_ih.c
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: chenxuebing <chenxb_99091@126.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:38 -05:00
Alexander Richards 4630d5031c drm/amdgpu: check PS, WS index
Theoretically, it would be possible for a buggy or malicious VBIOS to
overwrite past the bounds of the passed parameters (or its own
workspace); add bounds checking to prevent this from happening.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3093
Signed-off-by: Alexander Richards <electrodeyt@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Hawking Zhang 30df05fb74 drm/amdgpu: Align ras block enum with firmware
Driver and firmware share the same ras block enum.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang 1714a1ffaf drm/amdgpu: replace MCA macro with ACA for XGMI
use new ACA macro to instead of MCA

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Candice Li 46e2231ce0 drm/amdgpu: Log deferred error separately
Separate deferred error from UE and CE and log it
individually.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Candice Li 9c97bf88f4 drm/amdgpu: Do bad page retirement for deferred errors
Needs to do bad page retirement for deferred errors.

v2: Drop unused dev_info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang bbcbfd4363 drm/amdgpu: add xgmi v6.4.0 ACA support
add xgmi v6.4.0 ACA driver support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:37 -05:00
Yang Wang f45e6f2b5c drm/amdgpu: add mmhub v1.8 ACA support
v1:
add mmhub v1.8 ACA driver support

v2:
use macro to define smn address value.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang 373e970a4a drm/amdgpu: add sdma v4.4.2 ACA support
v1:
add sdma v4.4.2 ACA driver support

v2:
use macro to define smn address value.
v3: squash in fix for unbalanced irqs

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang 0f3cd24e96 drm/amdgpu: add gfx v9.4.3 ACA support
v1:
add gfx v9.4.3 ACA driver support

v2:
use macro to define smn address value.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Ma Jun e372baeb3d drm/amdgpu: Check extended configuration space register when system uses large bar
Some customer platforms do not enable mmconfig for various reasons,
such as bios bug, and therefore cannot access the GPU extend configuration
space through mmio.

When the system enters the d3cold state and resumes, the amdgpu driver
fails to resume because the extend configuration space registers of
GPU can't be restored. At this point, Usually we only see some failure
dmesg log printed by amdgpu driver, it is difficult to find the root
cause.

Therefor print a warnning message if the system can't access the
extended configuration space register when using large bar.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang f38765de83 drm/amdgpu: add umc v12.0 ACA support
add umc v12.0 ACA driver support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang 37973b69ea drm/amdgpu: add aca sysfs support
add aca sysfs node support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang 04c4fcd263 drm/amdgpu: add amdgpu ras aca query interface
v1:
add ACA error query interface

v2:
Add a new helper function to determine whether to use ACA or MCA.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Alex Deucher 26405ff430 drm/amdgpu: move kiq_reg_write_reg_wait() out of amdgpu_virt.c
It's used for more than just SR-IOV now, so move it to
amdgpu_gmc.c and rename it to better match the functionality and
update the comments in the code paths to better document
when each path is used and why.  No functional change.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun.Liu@amd.com
Cc: Christian.Koenig@amd.com
2024-01-15 18:35:36 -05:00
Alex Deucher d3f452f3a0 drm/amdgpu: add new INFO IOCTL query for input power
Some chips provide both average and input power.  Previously
we just exposed average power, add a new query for input
power.

Example userspace:
https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power

Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Hawking Zhang d4b9cfe2c7 drm/amdgpu: Query boot status if boot failed
Check and report firmware boot status if it doesn't
reach steady status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:36 -05:00
Yang Wang 33dcda51e9 drm/amdgpu: add ACA bank dump debugfs support
add ACA bank dump debugfs support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Yang Wang 0599849c32 drm/amdgpu: add ACA kernel hardware error log support
add ACA kernel hardware error log support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang c8cb7e09db drm/amdgpu: Query boot status if discovery failed
Check and report boot status if discovery failed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang cce4febb27 drm/amdgpu: Add ras helper to query boot errors v2
Add ras helper function to query boot time gpu
errors.
v2: use aqua_vanjaram smn addressing pattern

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Yang Wang f5e4cc8461 drm/amdgpu: implement RAS ACA driver framework
v1:
implement new RAS ACA driver code framework.

v2:
- rename aca_bank_set to aca_banks.
- rename aca_source_xxx to aca_handle_xxx.

v3:
Optimize some function implementation details. (from Hawking's suggestion)

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang ad390542ec drm/amdgpu: Init pcie_index/data address as fallback (v2)
To allow using this helper for indirect access when
nbio funcs is not available. For instance, in ip
discovery phase.

v2: define macro for pcie_index/data/index_hi fallback.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang ea0f6dfeec drm/amdgpu: drop psp v13 query_boot_status implementation
Will replace it with new implementation to cover
boot fails in ip discovery phase.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Hawking Zhang ac3ff8a906 drm/amdgpu: Replace DRM_* with dev_* in amdgpu_psp.c
So kernel message has the device pcie bdf information,
which helps issue debugging especially in multiple GPU
system.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Felix Kuehling 50661eb1a2 drm/amdgpu: Auto-validate DMABuf imports in compute VMs
DMABuf imports in compute VMs are not wrapped in a kgd_mem object on the
process_info->kfd_bo_list. There is no explicit KFD API call to validate
them or add eviction fences to them.

This patch automatically validates and fences dymanic DMABuf imports when
they are added to a compute VM. Revalidation after evictions is handled
in the VM code.

v2:
* Renamed amdgpu_vm_validate_evicted_bos to amdgpu_vm_validate
* Eliminated evicted_user state, use evicted state for VM BOs and user BOs
* Fixed and simplified amdgpu_vm_fence_imports, depends on reserved BOs
* Moved dma_resv_reserve_fences for amdgpu_vm_fence_imports into
  amdgpu_vm_validate, outside the vm->status_lock
* Added dummy version of amdgpu_amdkfd_bo_validate_and_fence for builds
  without KFD

v4: Eliminate amdgpu_vm_fence_imports. It's not needed because the
reservation with its fences is shared with the export, as long as all
imports are from KFD, with the exports already reserved, validated and
fenced by the KFD restore worker.

v5: Reintroduced separate evicted_user state to simplify the state machine
and CS error handling when amdgpu_vm_validate is called without a ticket.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:35:35 -05:00
Alex Deucher c3d5e297dc drm/amdgpu: drop exp hw support check for GC 9.4.3
No longer needed.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.7.x
2024-01-15 18:32:56 -05:00
Le Ma 51258acdc4 drm/amdgpu: move debug options init prior to amdgpu device init
To bring debug options into effect in early initialization phase

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:32:54 -05:00
Le Ma d20e1aec88 drm/amdgpu: add debug flag to place fw bo on vram for frontdoor loading
Use debug_mask=0x8 param to help isolating data path issues
on new systems in early phase.

v2: rename the flag for explicitness (lijo)

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:32:49 -05:00
Le Ma 6c5683bd9e Revert "drm/amdgpu: add param to specify fw bo location for front-door loading"
This reverts commit c572abffe9.

Will use debug module param instead of independent module param.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:32:23 -05:00
Yifan Zhang 2b9a073b73 drm/amdgpu: update regGL2C_CTRL4 value in golden setting
This patch to update regGL2C_CTRL4 in golden setting.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.7.x
2024-01-15 18:32:15 -05:00
Srinivasan Shanmugam 8a44fdd3cf drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' -
'adev->pm.fw' may not be released before return.

Using the function release_firmware() to release adev->pm.fw.

Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:1571 amdgpu_device_need_post() warn: 'adev->pm.fw' from request_firmware() not released on lines: 1554.

Cc: Monk Liu <Monk.Liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:32:06 -05:00
Srinivasan Shanmugam 8e8272f0dc drm/amdgpu: Fix unsigned comparison with less than zero in vpe_u1_8_from_fraction()
The variables 'numerator' and 'denominator', are unsigned 16-bit integer
types, that can never be less than 0.

Thus fixing the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:62 vpe_u1_8_from_fraction() warn: unsigned 'numerator' is never less than zero.
drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c:63 vpe_u1_8_from_fraction() warn: unsigned 'denominator' is never less than zero.

Cc: Peyton Lee <peytolee@amd.com>
Cc: Lang Yu <lang.yu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Peyton Lee <peyton.lee@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:32:03 -05:00
Srinivasan Shanmugam fac4ebd79f drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'
The amdgpu_gmc_vram_checking() function in emulation checks whether
all of the memory range of shared system memory could be accessed by
GPU, from this aspect, -EIO is returned for error scenarios.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:919 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1103 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1223 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2344 gmc_v9_0_hw_init() warn: missing error code? 'r'

Cc: Xiaojian Du <Xiaojian.Du@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:31:59 -05:00
Victor Lu 30d8dffab7 drm/amdgpu: Do not program VM_L2_CNTL under SRIOV
VM_L2_CNTL* should not be programmed on driver unload under SRIOV.
These regs are skipped during SRIOV driver init.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:31:56 -05:00
Yifan Zhang c9edcc1864 drm/amdgpu: update ATHUB_MISC_CNTL offset for athub v3.3
This patch to update ATHUB_MISC_CNTL offset for athub v3.3

v2: correct a typo (Tim)
v3: correct patch title (Lang)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:31:45 -05:00
Alex Deucher d02069850f drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTL
For backwards compatibility with userspace.

Fixes: 47f1724db4 ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15 18:31:24 -05:00
James Zhu 50e60184bf drm/amdgpu: make a correction on comment
Use a generic comment for AMDGPU_VM_RESERVED_VRAM size.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09 15:44:13 -05:00
Felix Kuehling c147ddc68e drm/amdkfd: Fix sparse __rcu annotation warnings
Properly mark kfd_process->ef as __rcu and consistently use the right
accessor functions.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312052245.yFpBSgNH-lkp@intel.com/
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09 15:44:13 -05:00
Hawking Zhang 73cb81dc54 drm/amdgpu: Packed socket_id to ras feature mask
Initialize RAS feature mask bit[31:29] with socket_id.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09 15:44:13 -05:00
Candice Li fb1e917199 drm/amdgpu: Support poison error injection via ras_ctrl debugfs
Support poison error injection.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09 15:44:13 -05:00
Likun Gao f4a94dbb6d drm/amdgpu: correct the cu count for gfx v11
Correct the algorithm of active CU to skip disabled
sa for gfx v11.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2024-01-09 15:43:54 -05:00
Candice Li 90bd01471d drm/amdgpu: Drop unnecessary sentences about CE and deferred error.
Remove "no user action is needed" for correctable and deferred error
to avoid confusion.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-09 15:43:54 -05:00
Dave Airlie e54478fbda amd-drm-next-6.8-2024-01-05:
amdgpu:
 - VRR fixes
 - PSR-SU fixes
 - SubVP fixes
 - DCN 3.5 fixes
 - Documentation updates
 - DMCUB fixes
 - DML2 fixes
 - UMC 12.0 updates
 - GPUVM fix
 - Misc code cleanups and whitespace cleanups
 - DP MST fix
 - Let KFD sync with GPUVM fences
 - GFX11 reset fix
 - SMU 13.0.6 fixes
 - VSC fix for DP/eDP
 - Navi12 display fix
 - RN/CZN system aperture fix
 - DCN 2.1 bandwidth validation fix
 - DCN INIT cleanup
 
 amdkfd:
 - SVM fixes
 - Revert TBA/TMA location change
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZZh7ZQAKCRC93/aFa7yZ
 2EPJAQDkoOlSjRLZoqwOPvBCo3WIzVO+4N6pOgohbTrjhHvDFAD+ONEgIH/wydk1
 IOdtyizh9o7spo2qN2Oi06MDimclDg8=
 =bFa3
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.8-2024-01-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.8-2024-01-05:

amdgpu:
- VRR fixes
- PSR-SU fixes
- SubVP fixes
- DCN 3.5 fixes
- Documentation updates
- DMCUB fixes
- DML2 fixes
- UMC 12.0 updates
- GPUVM fix
- Misc code cleanups and whitespace cleanups
- DP MST fix
- Let KFD sync with GPUVM fences
- GFX11 reset fix
- SMU 13.0.6 fixes
- VSC fix for DP/eDP
- Navi12 display fix
- RN/CZN system aperture fix
- DCN 2.1 bandwidth validation fix
- DCN INIT cleanup

amdkfd:
- SVM fixes
- Revert TBA/TMA location change

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240105220522.4976-1-alexander.deucher@amd.com
2024-01-09 09:07:50 +10:00
Alex Deucher 16783d8ef0 drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well
These chips needs the same fix.  This was previously not seen
on then since the AGP aperture expanded the system aperture,
but this showed up again when AGP was disabled.

Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:44 -05:00
Srinivasan Shanmugam bf2ad4fb8a drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
Return value of container_of(...) can't be null, so null check is not
required for 'fence'. Hence drop its NULL check.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c:93 to_amdgpu_amdkfd_fence() warn: can 'fence' even be NULL?

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam 13a1851f92 drm/amdgpu: Fix '*fw' from request_firmware() not released in 'amdgpu_ucode_request()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:1404 amdgpu_ucode_request() warn: '*fw' from request_firmware() not released on lines: 1404.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam 4f32504a2f drm/amdgpu: Fix variable 'mca_funcs' dereferenced before NULL check in 'amdgpu_mca_smu_get_mca_entry()'
Fixes the below:

drivers/gpu/drm/amd/amdgpu/amdgpu_mca.c:377 amdgpu_mca_smu_get_mca_entry() warn: variable dereferenced before check 'mca_funcs' (see line 368)

357 int amdgpu_mca_smu_get_mca_entry(struct amdgpu_device *adev,
				     enum amdgpu_mca_error_type type,
358                                  int idx, struct mca_bank_entry *entry)
359 {
360         const struct amdgpu_mca_smu_funcs *mca_funcs =
						adev->mca.mca_funcs;
361         int count;
362
363         switch (type) {
364         case AMDGPU_MCA_ERROR_TYPE_UE:
365                 count = mca_funcs->max_ue_count;

mca_funcs is dereferenced here.

366                 break;
367         case AMDGPU_MCA_ERROR_TYPE_CE:
368                 count = mca_funcs->max_ce_count;

mca_funcs is dereferenced here.

369                 break;
370         default:
371                 return -EINVAL;
372         }
373
374         if (idx >= count)
375                 return -EINVAL;
376
377         if (mca_funcs && mca_funcs->mca_get_mca_entry)
	        ^^^^^^^^^

Checked too late!

Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:43 -05:00
Le Ma c572abffe9 drm/amdgpu: add param to specify fw bo location for front-door loading
This param can help isolating data path issues on new systems in early phase.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:43 -05:00
Srinivasan Shanmugam 8e317a811f drm/amdgpu: Remove unreachable code in 'atom_skip_src_int()'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/atom.c:398 atom_skip_src_int() warn: ignoring unreachable code.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:10:33 -05:00
Alex Deucher fb915c87ed drm/amdgpu: skip gpu_info fw loading on navi12
It's no longer required.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2318
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:05:08 -05:00
Hawking Zhang 6697dbf0af Revert "drm/amdgpu: enable mca debug mode on APU by default"
Not needed any more with firmware fixes

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-05 16:04:36 -05:00
Srinivasan Shanmugam b8d55a90fd drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper()
Return invalid error code -EINVAL for invalid block id.

Fixes the below:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1183 amdgpu_ras_query_error_status_helper() error: we previously assumed 'info' could be null (see line 1176)

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam 5ce9a6ad8e drm/amdgpu: Drop redundant unsigned >=0 comparision 'amdgpu_gfx_rlc_init_microcode()'
unsigned int "version_minor" is always >= 0

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c:534 amdgpu_gfx_rlc_init_microcode() warn: always true condition '(version_minor >= 0) => (0-u16max >= 0)'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam b57e3ca1fb drm/amdgpu: Use kvcalloc instead of kvmalloc_array in amdgpu_cs_parser_bos()
kvmalloc_array + __GFP_ZERO is the same with kvcalloc.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:873 amdgpu_cs_parser_bos() warn: Please consider using kvcalloc instead of kvmalloc_array

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam 091411be7a drm/amdgpu: Use kzalloc instead of kmalloc+__GFP_ZERO in amdgpu_ras.c
Fixes the below smatch warnings:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2543 amdgpu_ras_recovery_init() warn: Please consider using kzalloc instead of kmalloc
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2830 amdgpu_ras_init() warn: Please consider using kzalloc instead of kmalloc

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:06 -05:00
Srinivasan Shanmugam abaf0666a6 drm/amdgpu: Cleanup indenting in amdgpu_connector_dvi_detect()
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c:1106 amdgpu_connector_dvi_detect() warn: inconsistent indenting

Fixes: 8a1de314d1 ("drm/amdgpu: Refactor 'amdgpu_connector_dvi_detect' in amdgpu_connectors.c")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:05 -05:00
Jack Xiao 4b5c5f5ad3 drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET v2
It's required to take the gfx mutex before access to CP_VMID_RESET,
for there is a race condition with CP firmware to write the register.

v2: add extra code to ensure the mutex releasing is successful.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 10:46:52 -05:00
Felix Kuehling ec9ba4821f drm/amdgpu: Let KFD sync with VM fences
Change the rules for amdgpu_sync_resv to let KFD synchronize with VM
fences on page table reservations. This fixes intermittent memory
corruption after evictions when using amdgpu_vm_handle_moved to update
page tables for VM mappings managed through render nodes.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 10:46:13 -05:00
Stanley.Yang a32c6f7f57 drm/amdgpu: Fix ecc irq enable/disable unpaired
The ecc_irq is disabled while GPU mode2 reset suspending process,
but not be enabled during GPU mode2 reset resume process.

Changed from V1:
	only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip
	delete amdgpu_ras_late_resume function

Changed from V2:
	check umc ras supported before put ecc_irq

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 10:30:49 -05:00
Mangesh Gadre 5eb8094a9b drm/amdgpu: Add register read/write debugfs support for AID's
SMN address is larger than 32 bits for registers on different AID's
Updating existing interface to support access to such registers.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 10:30:19 -05:00
Dave Airlie 22a2decedf Merge tag 'drm-msm-next-2023-12-15' of https://gitlab.freedesktop.org/drm/msm into drm-next
Updates for v6.8:

Core:
- Add support for SDM670, SM8650
- Handle the CFG interconnect to fix the obscure hangs / timeouts
  on register write
- Kconfig fix for QMP dependency
- DT schema fixes

DPU:
- Add support for SDM670, SM8650
- Enable SmartDMA on SM8350 and SM8450
- Correct UBWC settings for SC8280XP
- Fix catalog settings for SC8180X
- Actually make use of the version to switch between QSEED3/3LITE/4
  scalers
- Use devres-managed and drm-managed allocations where appropriate
- misc other fixes
- Enabled YUV writeback on SC7280, SM8250
- Enabled writeback on SM8350, SM8450
- CRC fix when encoder is selected as the input source
- other misc fixes

MDP4:
- Use devres-managed and drm-managed allocations where appropriate
- flush vblank event on CRTC disable

MDP5:
- Use devres-managed and drm-managed allocations where appropriate

DP:
- Add support for SM8650
- Enable PM runtime support
- Merge msm-specific debugfs dir with the generic one
- Described DisplayPort on SM8150 in DeviceTree bindings
- Moved dp_display_get_next_bridge() to probe()

DSI:
- Add support for SM8650
- Enable PM runtime support

GPU/GEM:
- demote userspace triggerable warnings to debug
- add GEM object metadata UAPI
- move GPU devcoredumps to GPU device
- fix hangcheck to skip retired submits
- expose UBWC config to userspace
- fix a680 chip-id
- drm_exec conversion
- drm/ci: remove rebase-merge directory (to unblock CI)

[airlied: fix drm_exec/amd interaction]
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9auYqmo-7NSd9FsbNBCDf7aBevd=4xkcF3A5G_OGvMQ@mail.gmail.com
2023-12-20 07:54:03 +10:00
ZhenGuo Yin 87825c860e drm/amdgpu: re-create idle bo's PTE during VM state machine reset
Idle bo's PTE needs to be re-created when resetting VM state machine.
Set idle bo's vm_bo as moved to mark it as invalid.

Fixes: 55bf196f60 ("drm/amdgpu: reset VM when an error is detected")
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-19 14:59:03 -05:00
YiPeng Chai 99cab331a4 drm/amdgpu: Add umc page retirement for umc v12_0
Add umc page retirement for umc v12_0.

V2:
  1. Changed umc page retirement check condition
     to call umc_v12_0_is_uncorrectable_error.
  2. Use memset to clear the contents of the umc
     error address structure.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-19 14:59:03 -05:00
YiPeng Chai a8c77a121c drm/amdgpu: Add poison mode check error condition for umc v12_0
Add poison mode check error condition for umc v12_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-19 14:59:03 -05:00
YiPeng Chai 9f91e983ee drm/amdgpu: MCA supports recording umc address information
MCA supports recording umc address information.

V2:
  Move err_addr variable from struct ras_err_node to
struct ras_err_info.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-19 14:59:03 -05:00
James Zhu 0c8c0e7a9e drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages
Only schedule when hmm_range_fault returns error.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-15 12:16:42 -05:00
James Zhu 78b4dfd359 drm/amdgpu: increase hmm range get pages timeout
When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-15 12:16:34 -05:00
Alex Deucher afe58346d5 drm/amdgpu/debugfs: fix error code when smc register accessors are NULL
Should be -EOPNOTSUPP.

Fixes: 5104fdf50d ("drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-14 15:27:42 -05:00
Peyton Lee 5f82a0c90c drm/amdgpu/vpe: enable vpe dpm
enable vpe dpm

Signed-off-by: Peyton Lee <peytolee@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-14 15:26:21 -05:00
Melissa Wen b8b92c1bd7 drm/amd/display: add plane CTM driver-specific property
Plane CTM for pre-blending color space conversion. Only enable
driver-specific plane CTM property on drivers that support both pre- and
post-blending gamut remap matrix, i.e., DCN3+ family. Otherwise it
conflits with DRM CRTC CTM property.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-14 15:26:15 -05:00
Wang, Beyond 94aeb41173 drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap
Issue: during evict or validate happened on amdgpu_bo, the 'from' and
'to' is always same in ftrace event of amdgpu_bo_move

where calling the 'trace_amdgpu_bo_move', the comment says move_notify
is called before move happens, but actually it is called after move
happens, here the new_mem is same as bo->resource

Fix: move trace_amdgpu_bo_move from move_notify to amdgpu_bo_move

Signed-off-by: Wang, Beyond <Wang.Beyond@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-14 15:25:48 -05:00
Christian König 65d2765d62 drm/amdgpu: warn when there are still mappings when a BO is destroyed v2
This can only happen when there is a reference counting bug.

v2: fix typo

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:33:02 -05:00
Christian König 683b8c7e7a drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
When freeing PD/PT with shadows it can happen that the shadow
destruction races with detaching the PD/PT from the VM causing a NULL
pointer dereference in the invalidation code.

Fix this by detaching the the PD/PT from the VM first and then
freeing the shadow instead.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867
Cc: <stable@vger.kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:08:01 -05:00
Jani Nikula d9501844d5 drm/amd: include drm/drm_edid.h only where needed
Including drm_edid.h from amdgpu_mode.h causes the rebuild of literally
hundreds of files when drm_edid.h is modified, while there are only a
handful of files that actually need to include drm_edid.h.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:08:01 -05:00
Melissa Wen 0f5afa190b drm/amd/display: add CRTC gamma TF driver-specific property
Add AMD pre-defined transfer function property to default DRM CRTC gamma
to convert to wire encoding with or without a user gamma LUT. There is
no post-blending regamma ROM for pre-defined TF. When setting Gamma TF
(!= Identity) and LUT at the same time, the color module will combine
the pre-defined TF and the custom LUT values into the LUT that's
actually programmed.

v2:
- enable CRTC prop in the end of driver-specific prop sequence
- define inverse EOTFs as supported regamma TFs
- reword driver-specific function doc to remove shaper/3D LUT

v3:
- spell out TF+LUT behavior in the commit and comments (Harry)

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Co-developed-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:08:01 -05:00
Joshua Ashton 0ef47454dc drm/amd/display: add plane blend LUT and TF driver-specific properties
Blend 1D LUT or a pre-defined transfer function (TF) can be set to
linearize content before blending, so that it's positioned just before
blending planes in the AMD color mgmt pipeline, and after 3D LUT
(non-linear space). Shaper and Blend LUTs are 1D LUTs that sandwich 3D
LUT. Drivers should advertize blend properties according to HW caps.

There is no blend ROM for pre-defined TF. When setting blend TF (!=
Identity) and LUT at the same time, the color module will combine the
pre-defined TF and the custom LUT values into the LUT that's actually
programmed.

v3:
- spell out TF+LUT behavior in the commit and comments (Harry)

v5:
- get blend blob correctly

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:08:00 -05:00
Melissa Wen f545d82479 drm/amd/display: add plane shaper LUT and TF driver-specific properties
On AMD HW, 3D LUT always assumes a preceding shaper 1D LUT used for
delinearizing and/or normalizing the color space before applying a 3D
LUT. Add pre-defined transfer function to enable delinearizing content
with or without shaper LUT, where AMD color module calculates the
resulted shaper curve. We apply an inverse EOTF to go from linear
values to encoded values. If we are already in a non-linear space and/or
don't need to normalize values, we can bypass shaper LUT with a linear
transfer function that is also the default TF value.

There is no shaper ROM. When setting shaper TF (!= Identity) and LUT at
the same time, the color module will combine the pre-defined TF and the
custom LUT values into the LUT that's actually programmed.

v2:
- squash commits for shaper LUT and shaper TF
- define inverse EOTF as supported shaper TFs

v3:
- spell out TF+LUT behavior in the commit and comments (Harry)
- replace BT709 EOTF by inv OETF

v5:
- get shaper blob correctly (Joshua)

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:08:00 -05:00
Jonathan Kim bd33bb1409 drm/amdkfd: fix mes set shader debugger process management
MES provides the driver a call to explicitly flush stale process memory
within the MES to avoid a race condition that results in a fatal
memory violation.

When SET_SHADER_DEBUGGER is called, the driver passes a memory address
that represents a process context address MES uses to keep track of
future per-process calls.

Normally, MES will purge its process context list when the last queue
has been removed.  The driver, however, can call SET_SHADER_DEBUGGER
regardless of whether a queue has been added or not.

If SET_SHADER_DEBUGGER has been called with no queues as the last call
prior to process termination, the passed process context address will
still reside within MES.

On a new process call to SET_SHADER_DEBUGGER, the driver may end up
passing an identical process context address value (based on per-process
gpu memory address) to MES but is now pointing to a new allocated buffer
object during KFD process creation.  Since the MES is unaware of this,
access of the passed address points to the stale object within MES and
triggers a fatal memory violation.

The solution is for KFD to explicitly flush the process context address
from MES on process termination.

Note that the flush call and the MES debugger calls use the same MES
interface but are separated as KFD calls to avoid conflicting with each
other.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: Alice Wong <shiwei.wong@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 16:07:43 -05:00
Mario Limonciello c01b9be7b2 drm/amd: Fix a probing order problem on SDMA 2.4
commit 751e293f2c ("drm/amd: Move microcode init from sw_init to
early_init for SDMA v2.4") made a fateful mistake in
`adev->sdma.num_instances` wasn't declared when sdma_v2_4_init_microcode()
was run. This caused probing to fail.

Move the declaration to right before sdma_v2_4_init_microcode().

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3043
Fixes: 751e293f2c ("drm/amd: Move microcode init from sw_init to early_init for SDMA v2.4")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:28:55 -05:00
Hawking Zhang 058eb51912 drm/amdgpu: Switch to aca bank for xgmi pcs err cnt
Instead of software managed counters.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:28:47 -05:00
Melissa Wen 671994e3bf drm/amd/display: add plane 3D LUT driver-specific properties
Add 3D LUT property for plane color transformations using a 3D lookup
table. 3D LUT allows for highly accurate and complex color
transformations and is suitable to adjust the balance between color
channels. It's also more complex to manage and require more
computational resources.

Since a 3D LUT has a limited number of entries in each dimension we want
to use them in an optimal fashion. This means using the 3D LUT in a
colorspace that is optimized for human vision, such as sRGB, PQ, or
another non-linear space. Therefore, userpace may need one 1D LUT
(shaper) before it to delinearize content and another 1D LUT after 3D
LUT (blend) to linearize content again for blending. The next patches
add these 1D LUTs to the plane color mgmt pipeline.

v3:
- improve commit message about 3D LUT
- describe the 3D LUT entries and size (Harry)

v4:
- advertise 3D LUT max size as the size of a single-dimension

v5:
- get lut3d blob correctly (Joshua)
- fix doc about 3d-lut dimension size (Sebastian)

Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:28:38 -05:00
Friedrich Vock 91963397c4 drm/amdgpu: Enable tunneling on high-priority compute queues
This improves latency if the GPU is already busy with other work.
This is useful for VR compositors that submit highly latency-sensitive
compositing work on high-priority compute queues while the GPU is busy
rendering the next frame.

Userspace merge request:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462

v2: bump driver version (Alex)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:23:59 -05:00
Alex Deucher 94b1e028e1 drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
Add begin/end_use ring callbacks to disallow GFXOFF when
SDMA work is submitted and allow it again afterward.

This should avoid corner cases where GFXOFF is erroneously
entered when SDMA is still active.  For now just allow/disallow
GFXOFF in the begin and end helpers until we root cause the
issue.  This should not impact power as SDMA usage is pretty
minimal and GFXOSS should not be active when SDMA is active
anyway, this just makes it explicit.

v2: move everything into sdma5.2 code.  No reason for this
to be generic at this point.
v3: Add comments in new code

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
Tested-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.15+
2023-12-13 15:23:51 -05:00
Evan Quan b8b39de646 drm/amd/pm: setup the framework to support Wifi RFI mitigation feature
With WBRF feature supported, as a driver responding to the frequencies,
amdgpu driver is able to do shadow pstate switching to mitigate possible
interference(between its (G-)DDR memory clocks and local radio module
frequency bands used by Wifi 6/6e/7).

--
v1->v2:
  - update the prompt for feature support(Lijo)
v8->v9:
  - update parameter document for smu_wbrf_event_handler(Simon)
v9->v10:
v10->v11:
 - correct the logics for wbrf range sorting(Lijo)
v13:
 - Fix the format issue (IIpo Jarvinen)

Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:23:50 -05:00
Felix Kuehling 0188006d7c drm/amdkfd: Import DMABufs for interop through DRM
Use drm_gem_prime_fd_to_handle to import DMABufs for interop. This
ensures that a GEM handle is created on import and that obj->dma_buf
will be set and remain set as long as the object is imported into KFD.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Xiaogang.Chen <Xiaogang.Chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:55 -05:00
Felix Kuehling 1819200166 drm/amdkfd: Export DMABufs from KFD using GEM handles
Create GEM handles for exporting DMABufs using GEM-Prime APIs. The GEM
handles are created in a drm_client_dev context to avoid exposing them
in user mode contexts through a DMABuf import.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:55 -05:00
Vignesh Chander 4e95669ecb drm/amdgpu: xgmi_fill_topology_info
1. Use the mirrored topology info to fill links for VF.
The new solution is required to simplify and optimize host driver logic.
Only use the new solution for VFs that support full duplex and
extended_peer_link_info otherwise the info would be incomplete.

2. avoid calling extended_link_info on VF as its not supported

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:55 -05:00
Joshua Ashton ec7b2a5546 drm/amd/display: add plane HDR multiplier driver-specific property
Multiplier to 'gain' the plane. When PQ is decoded using the fixed func
transfer function to the internal FP16 fb, 1.0 -> 80 nits (on AMD at
least) When sRGB is decoded, 1.0 -> 1.0.  Therefore, 1.0 multiplier = 80
nits for SDR content. So if you want, 203 nits for SDR content, pass in
(203.0 / 80.0).

v4:
- comment about the PQ TF need for L-to-NL (from Harry's review)

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Co-developed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:55 -05:00
Joshua Ashton d5a348d96e drm/amd/display: add plane degamma TF driver-specific property
Allow userspace to tell the kernel driver the input space and,
therefore, uses correct predefined transfer function (TF) to go from
encoded values to linear values.

v2:
- rename TF enum prefix from DRM_ to AMDGPU_ (Harry)
- remove HLG TF

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Co-developed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:54 -05:00
Saleemkhan Jamadar b70aed8f5d drm/amdgpu/jpeg: configure doorbell for each playback
Doorbell is configured during start of each playback.

v1 - add comment for the doorbell programming change

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:53 -05:00
Lijo Lazar ed342a2e78 drm/amdgpu: Use the right method to get IP version
Replace direct usage of adev->ip_versions with amdgpu_ip_version.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:53 -05:00
Melissa Wen 9342a9ae54 drm/amd/display: add driver-specific property for plane degamma LUT
Hook up driver-specific atomic operations for managing AMD color
properties. Create AMD driver-specific color management properties
and attach them according to HW capabilities defined by `struct
dc_color_caps`.

First add plane degamma LUT properties that means user-blob and its
size. We will add more plane color properties in the next patches. In
addition, we define AMD_PRIVATE_COLOR to guard these driver-specific
plane properties.

Plane degamma can be used to linearize input space for arithmetical
operations that are more accurate when applied in linear color.

v2:
- update degamma LUT prop description
- move private color operations from amdgpu_display to amdgpu_dm_color

v5:
- get degamma blob correctly (Joshua)

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Co-developed-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:53 -05:00
Alex Deucher 51ea405c47 drm/amdgpu: fix buffer funcs setting order on suspend harder
Part of commit c035819862 ("drm/amdgpu: fix buffer funcs setting order on suspend")
got dropped accidently.  Add it back.

Fixes: c035819862 ("drm/amdgpu: fix buffer funcs setting order on suspend")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13 15:09:52 -05:00
Dave Airlie a0a28956b4 amd-drm-next-6.8-2023-12-08:
amdgpu:
 - SR-IOV fixes
 - DCN 3.5 updates
 - Backlight fixes
 - MST fixes
 - DMCUB fixes
 - DPIA fixes
 - Display powergating updates
 - Enable writeback connectors
 - Misc code cleanups
 - Add more register state debugging for aquavanjaram
 - Suspend fix
 - Clockgating fixes
 - SMU 14 updates
 - PSR fixes
 - MES logging updates
 - Misc fixes
 
 amdkfd:
 - SVM fix
 
 radeon:
 - Fix potential memory leaks in error paths
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZXOBHwAKCRC93/aFa7yZ
 2E9hAP9Dqv/NKWf9aMZ01msdOysScu7FmAsnFrXjR0Mx0zX4WQD/er2/UrSR146+
 VDaxFbcjINoq0Q8b7CWLG4gy5Q9fDAo=
 =JU7h
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.8-2023-12-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.8-2023-12-08:

amdgpu:
- SR-IOV fixes
- DCN 3.5 updates
- Backlight fixes
- MST fixes
- DMCUB fixes
- DPIA fixes
- Display powergating updates
- Enable writeback connectors
- Misc code cleanups
- Add more register state debugging for aquavanjaram
- Suspend fix
- Clockgating fixes
- SMU 14 updates
- PSR fixes
- MES logging updates
- Misc fixes

amdkfd:
- SVM fix

radeon:
- Fix potential memory leaks in error paths

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231208205613.4861-1-alexander.deucher@amd.com
2023-12-13 15:55:55 +10:00
Dave Airlie c1ee197d64 Linux 6.7-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmV2PMQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgQIAKSdjtpkv+IeDwph
 umLHxDlCTT8nl+TAhEYuxSdMiHt1i7+zyF0snnHhGthBRnBZBf+PDdkQOmqtjLZj
 e591VGY7tdNCAVUwmRRaI82JFz9Pl1k8DXL3f+CE1+MfbpsirAecoIAL0rsvg+4f
 ICKSAOBZonL7GT7IGrsbxgqGKCLC+2PcgNT4pADBdjtbC1nmVSLK21dtWr6i4xsn
 CT9MxAnej1+EOpreyHhNZUVqfQy0/04x4OY0u/EOXU3PgpfGTrAzkyfwzcd/eit0
 x59TZ0owDVXfvvdXUUAC0zG1th2ek5LKNtL+C4rKOoAbq7GZoQfjN5M/Ug+yykwE
 aqm6CJ8=
 =tZ+R
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.7-rc5' into drm-next

Linux 6.7-rc5

Alex requested this for some amdkfd work relying on the symbols exports.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-12-12 11:32:33 +10:00
Rob Clark 05d249352f drm/exec: Pass in initial # of objects
In cases where the # is known ahead of time, it is silly to do the table
resize dance.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Patchwork: https://patchwork.freedesktop.org/patch/568338/
2023-12-10 10:38:47 -08:00
Dave Airlie a60501d7c2 drm-misc-next for 6.8:
UAPI Changes:
   - Remove Userspace Mode-Setting ioctls
   - v3d: New uapi to handle jobs involving the CPU
 
 Cross-subsystem Changes:
 
 Core Changes:
   - atomic: Add support for FB-less planes which got reverted a bit
     later for lack of IGT tests and userspace code, Dump private objects
     state in drm_state_dump.
   - dma-buf: Add fence deadline support
   - encoder: Create per-encoder debugfs directory, move the bridge chain
     file to that directory
 
 Driver Changes:
   - Include drm_auth.h in driver that use it but don't include it, Drop
     drm_plane_helper.h from drivers that include it but don't use it
   - imagination: Plenty of small fixes
   - panfrost: Improve interrupt handling at poweroff
   - qaic: Convert to persistent DRM devices
   - tidss: Support for the AM62A7, a few probe improvements, some cleanups
   - v3d: Support for jobs involving the CPU
 
   - bridge:
     - Create transparent aux-bridge for DP/USB-C
     - lt8912b: Add suspend/resume support and power regulator support
 
   - panel:
     - himax-hx8394: Drop prepare, unprepare and shutdown logic, Support
       panel rotation
     - New panels: BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G,
       Evervision VGG644804, SDC ATNA45AF01
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZXGXPAAKCRDj7w1vZxhR
 xblIAP4tz7X0mqa7KFr6L0sIfVILHANho4L81IUFi+kpTn64WgEAnqJ3IqJtdLbi
 czXzgJPae3Ifstm7CW7+72fLQCR6Cg8=
 =pU2c
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2023-12-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.8:

UAPI Changes:
  - Remove Userspace Mode-Setting ioctls
  - v3d: New uapi to handle jobs involving the CPU

Cross-subsystem Changes:

Core Changes:
  - atomic: Add support for FB-less planes which got reverted a bit
    later for lack of IGT tests and userspace code, Dump private objects
    state in drm_state_dump.
  - dma-buf: Add fence deadline support
  - encoder: Create per-encoder debugfs directory, move the bridge chain
    file to that directory

Driver Changes:
  - Include drm_auth.h in driver that use it but don't include it, Drop
    drm_plane_helper.h from drivers that include it but don't use it
  - imagination: Plenty of small fixes
  - panfrost: Improve interrupt handling at poweroff
  - qaic: Convert to persistent DRM devices
  - tidss: Support for the AM62A7, a few probe improvements, some cleanups
  - v3d: Support for jobs involving the CPU

  - bridge:
    - Create transparent aux-bridge for DP/USB-C
    - lt8912b: Add suspend/resume support and power regulator support

  - panel:
    - himax-hx8394: Drop prepare, unprepare and shutdown logic, Support
      panel rotation
    - New panels: BOE BP101WX1-100, Powkiddy X55, Ampire AM8001280G,
      Evervision VGG644804, SDC ATNA45AF01

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/yu5heqaufyeo4nlowzieu4s5unwqrqyx4jixbfjmzdon677rpk@t53vceua2dao
2023-12-08 16:27:00 +10:00
shaoyunl 47c4533543 drm/amdgpu: Enable event log on MES 11
Enable event log through the HW specific FW API

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-07 17:43:28 -05:00
shaoyunl b2662d4cc4 drm/amdgpu: SW part of MES event log enablement
This is the generic SW part, prepare the event log buffer and dump it through debugfs

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-07 17:43:13 -05:00
Alex Deucher dab96d8b61 drm/amdgpu: fix buffer funcs setting order on suspend
We need to disable this after the last eviction
call, but before we disable the SDMA IP.

Fixes: b70438004a ("drm/amdgpu: move buffer funcs setting up a level")
Link: https://lore.kernel.org/r/87edgv4x3i.fsf@vps.thesusis.net
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Tested-by: Phillip Susi <phill@thesusis.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Phillip Susi <phill@thesusis.net>
Cc: Luben Tuikov <ltuikov89@gmail.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar 27b024a88a drm/amdgpu: Avoid querying DRM MGCG status
MP0 v13.0.6 SOCs don't support DRM MGCG.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar 555e39f027 drm/amdgpu: Update HDP 4.4.2 clock gating flags
HDP 4.4.2 clockgating is enabled by default, update the flags
accordingly.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar 81577503ef drm/amdgpu: Add NULL checks for function pointers
Check if function is implemented before making the call.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar 6fce23a4d8 drm/amdgpu: Restrict extended wait to PSP v13.0.6
Only PSPv13.0.6 SOCs take a longer time to reach steady state. Other
PSPv13 based SOCs don't need extended wait. Also, reduce PSPv13.0.6 wait
time.

Cc: stable@vger.kernel.org
Fixes: fc59889071 ("drm/amdgpu: update retry times for psp vmbx wait")
Fixes: d8c1925ba8 ("drm/amdgpu: update retry times for psp BL wait")
Link: https://lore.kernel.org/amd-gfx/34dd4c66-f7bf-44aa-af8f-c82889dd652c@amd.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Yang Wang dbf3850d12 drm/amdgpu: optimize the printing order of error data
sort error data list to optimize the printing order.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Hawking Zhang 0e8af20517 drm/amdgpu: Update fw version for boot time error query
Boot time error query is not available until fw a10109

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Yang Wang 37c57631c1 drm/amd/pm: support new mca smu error code decoding
support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Jiadong Zhu d6a5758866 drm/amdgpu: disable MCBP by default
Disable MCBP(mid command buffer preemption) by default as old Mesa
hangs with it. We shall not enable the feature that breaks old usermode
driver.

Fixes: 50a7c8765c ("drm/amdgpu: enable mcbp by default on gfx9")
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-12-06 15:39:59 -05:00
Bokun Zhang e17768691d drm/amd/amdgpu: SRIOV full reset issue with VCN
- After a full reset, VF's FB will be cleaned. This
  includes the VCN's fw_shared memory.

  However, there is no suspend-resume routine for
  SRIOV VF. Therefore, the data in the fw_shared
  memory will be lost forever and it causes engine
  hang later on.

  We must repopulate the data in fw_shared during
  SRIOV hw_init

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Alex Deucher c035819862 drm/amdgpu: fix buffer funcs setting order on suspend
We need to disable this after the last eviction
call, but before we disable the SDMA IP.

Fixes: b70438004a ("drm/amdgpu: move buffer funcs setting up a level")
Link: https://lore.kernel.org/r/87edgv4x3i.fsf@vps.thesusis.net
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Tested-by: Phillip Susi <phill@thesusis.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Phillip Susi <phill@thesusis.net>
Cc: Luben Tuikov <ltuikov89@gmail.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar b12fb29539 drm/amdgpu: Avoid querying DRM MGCG status
MP0 v13.0.6 SOCs don't support DRM MGCG.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar 828afefd4b drm/amdgpu: Update HDP 4.4.2 clock gating flags
HDP 4.4.2 clockgating is enabled by default, update the flags
accordingly.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar 6146081d58 drm/amdgpu: Add NULL checks for function pointers
Check if function is implemented before making the call.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar 4657b3e456 drm/amdgpu: Restrict extended wait to PSP v13.0.6
Only PSPv13.0.6 SOCs take a longer time to reach steady state. Other
PSPv13 based SOCs don't need extended wait. Also, reduce PSPv13.0.6 wait
time.

Cc: stable@vger.kernel.org
Fixes: fc59889071 ("drm/amdgpu: update retry times for psp vmbx wait")
Fixes: d8c1925ba8 ("drm/amdgpu: update retry times for psp BL wait")
Link: https://lore.kernel.org/amd-gfx/34dd4c66-f7bf-44aa-af8f-c82889dd652c@amd.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar 650f0487d6 drm/amdgpu: Read aquavanjaram USR register state
Add support to read state of USR links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:37 -05:00
Lijo Lazar 13ac7c0e30 drm/amdgpu: Read aquavanjaram WAFL register state
Add support to read state of WAFL links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:36 -05:00
Yang Wang 04a71f1104 drm/amdgpu: optimize the printing order of error data
sort error data list to optimize the printing order.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:36 -05:00
Hawking Zhang 71a9d7a2a1 drm/amdgpu: Update fw version for boot time error query
Boot time error query is not available until fw a10109

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:36 -05:00
Yang Wang 0d65efcbe3 drm/amd/pm: support new mca smu error code decoding
support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:36 -05:00
Alex Hung f872e2f5f0 drm/amd/display: Add writeback enable field (wb_enabled)
[WHAT]
Add a new field to keep track whether a crtc is previously
writeback-enabled.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:35 -05:00
Alex Hung c81e13b929 drm/amd/display: Hande writeback request from userspace
[WHAT]
Handle writeback requests and fill in the required information for DWB
programming and setup.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:35 -05:00
Bokun Zhang d5e78f1c26 drm/amd/amdgpu: Move vcn4 fw_shared init to a single function
- Move VCN4's fw_shared initialization to a separated function.
  This way, the function can be reused at different locations.

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:22:20 -05:00
Jiadong Zhu fd2ef5fa35 drm/amdgpu: disable MCBP by default
Disable MCBP(mid command buffer preemption) by default as old Mesa
hangs with it. We shall not enable the feature that breaks old usermode
driver.

Fixes: 50a7c8765c ("drm/amdgpu: enable mcbp by default on gfx9")
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-12-06 15:21:24 -05:00
Dave Airlie 5edfd7d94b amd-drm-next-6.8-2023-12-01:
amdgpu:
 - Add new 64 bit sequence number infrastructure.
   This will ultimately be used for user queue synchronization.
 - GPUVM updates
 - Misc code cleanups
 - RAS updates
 - DCN 3.5 updates
 - Rework PCIe link speed handling
 - Document GPU reset types
 - DMUB fixes
 - eDP fixes
 - NBIO 7.9 updates
 - NBIO 7.11 updates
 - SubVP updates
 - DCN 3.1.4 fixes
 - ABM fixes
 - AGP aperture fix
 - DCN 3.1.5 fix
 - Fix some potential error path memory leaks
 - Enable PCIe PMEs
 - Add XGMI, PCIe state dumping for aqua vanjaram
 - GFX11 golden register updates
 - Misc display fixes
 
 amdkfd:
 - Migrate TLB flushing logic to amdgpu
 - Trap handler fixes
 - Fix restore workers handling on suspend and reset
 - Fix possible memory leak in pqm_uninit()
 
 radeon:
 - Fix some possible overflows in command buffer checking
 - Check for errors in ring_lock
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZWohuwAKCRC93/aFa7yZ
 2A6XAP9a6/3noDYxabnLmr3B+13wtweSkvEwD2bjOk7KinyS6AEAr8H3HZ5U0HMP
 zj35QhcNcYaTsjrip7e53XunpMBQSwQ=
 =DvId
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.8-2023-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.8-2023-12-01:

amdgpu:
- Add new 64 bit sequence number infrastructure.
  This will ultimately be used for user queue synchronization.
- GPUVM updates
- Misc code cleanups
- RAS updates
- DCN 3.5 updates
- Rework PCIe link speed handling
- Document GPU reset types
- DMUB fixes
- eDP fixes
- NBIO 7.9 updates
- NBIO 7.11 updates
- SubVP updates
- DCN 3.1.4 fixes
- ABM fixes
- AGP aperture fix
- DCN 3.1.5 fix
- Fix some potential error path memory leaks
- Enable PCIe PMEs
- Add XGMI, PCIe state dumping for aqua vanjaram
- GFX11 golden register updates
- Misc display fixes

amdkfd:
- Migrate TLB flushing logic to amdgpu
- Trap handler fixes
- Fix restore workers handling on suspend and reset
- Fix possible memory leak in pqm_uninit()

radeon:
- Fix some possible overflows in command buffer checking
- Check for errors in ring_lock

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-12-05 12:11:41 +10:00
Yang Wang f875f61b1f drm/amdgpu: enable mca debug mode on APU by default
enable MCA debug mode on APU device by default.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30 18:26:31 -05:00
Likun Gao 9596ffe1cc drm/amdgpu: distinguish rlc fw for different SKU
For some SKU, rlc firmware should use different one
compared with the normal rlc firmware.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30 18:26:31 -05:00
Yang Wang 00f9d49bce drm/amdgpu: Fix missing mca debugfs node
Use amdgpu_ip_version() helper function to check ip version.

The ip version contains other information,
use the helper function to avoid reading wrong value.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30 18:26:22 -05:00
ZhenGuo Yin 04fcc3fec5 drm/amdgpu: Skip access gfx11 golden registers under SRIOV
[Why]
Golden registers are PF-only registers on gfx11.
RLCG interface will return "out-of-range" under SRIOV VF.

[How]
Skip access gfx11 golden registers under SRIOV.

Reviewed-by: Horace Chen <horace.chen@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30 18:21:30 -05:00
Lijo Lazar ed6e4f0a27 drm/amdgpu: Use another offset for GC 9.4.3 remap
The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs.
Use 0x1A000 offset instead as MMIO register remap region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 18:13:35 -05:00
Candice Li e0409021e3 drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
Check smu v13_0_0 SKU type to select EEPROM I2C address.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-11-29 18:11:39 -05:00
Lu Yao 2161e09cd0 drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 18:09:53 -05:00
Mario Limonciello 6967741d26 drm/amd: Enable PCIe PME from D3
When dGPU is put into BOCO it may be in D3cold but still able send
PME on display hotplug event. For this to work it must be enabled
as wake source from D3.

When runpm is enabled use pci_wake_from_d3() to mark wakeup as
enabled by default.

Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 18:09:34 -05:00
Alex Deucher e222b36e96 drm/amdgpu: fix AGP addressing when GART is not at 0
This worked by luck if the GART aperture ended up at 0.  When
we ended up moving GART on some chips, the GART aperture ended
up offsetting the AGP address since the resource->start is
a GART offset, not an MC address.  Fix this by moving the AGP
address setup into amdgpu_bo_gpu_offset_no_check().

v2: check mem_type before checking agp
v3: check if the ttm bo has a ttm_tt allocated yet

Fixes: 67318cb843 ("drm/amdgpu/gmc11: set gart placement GC11")
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reported-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: christian.koenig@amd.com
Cc: mario.limonciello@amd.com
2023-11-29 18:09:00 -05:00
Prike Liang c6df7f3137 drm/amdgpu: correct the amdgpu runtime dereference usage count
Fix the amdgpu runpm dereference usage count.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-29 18:04:55 -05:00
Tim Huang 6b0b7789a7 drm/amdgpu: fix memory overflow in the IB test
Fix a memory overflow issue in the gfx IB test
for some ASICs. At least 20 bytes are needed for
the IB test packet.

v2: correct code indentation errors. (Christian)

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-29 18:03:11 -05:00
Li Ma 5c908a3586 drm/amdgpu: add init_registers for nbio v7.11
enable init_registers callback func for nbio v7.11.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 18:02:36 -05:00
Alex Sierra 4b27a33c3b drm/amdgpu: Force order between a read and write to the same address
Setting register to force ordering to prevent read/write or write/read
hazards for un-cached modes.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-11-29 17:57:33 -05:00
Hawking Zhang 884e9b0827 drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interrupt
In nbio v7_9, host driver should not issu gpu reset

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 17:57:02 -05:00
Perry Yuan 8c4e9105b2 drm/amdgpu: optimize RLC powerdown notification on Vangogh
The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state  is correctly set to SMU.

[  101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[  110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[  110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[  110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[  110.884394] PM: suspend of devices aborted after 21213.620 msecs
[  110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[  110.884405] PM: Some devices failed to suspend, or early wake event detected

Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 17:55:47 -05:00
Jonathan Kim b9eab9e0aa drm/amdgpu: update xgmi num links info post gc9.4.2
GC IP 9.4.2 and up support TA reporting of the number
of xGMI links between peers.

Tested-by: Vignesh Chander <vignesh.chander@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 17:53:16 -05:00
Lijo Lazar 36fd9969fa drm/amdgpu: Use another offset for GC 9.4.3 remap
The legacy region at 0x7F000 maps to valid registers in GC 9.4.3 SOCs.
Use 0x1A000 offset instead as MMIO register remap region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Lijo Lazar 92e508eaf3 drm/amdgpu: Read aquavanjaram XGMI register state
Add support to read state of XGMI links in aquavanjaram SOC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Lijo Lazar 081a6eda2b drm/amdgpu: Read aquavanjaram PCIE register state
Add support to read aqua vanjaram PCIE register state

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Lijo Lazar af39e6f4d8 drm/amdgpu: Add reg_state sysfs attribute
Add reg_state attribute to fetch the register snapshot of different
IPs like XGMI, WAFL,PCIE and USR. To get a snapshot for a particular IP
	1) Open the sysfs file
	2) Seek to the offset as defined in amdgpu_sysfs_reg_offset
	3) Read

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:35 -05:00
Alex Deucher 9a5095e785 drm/amdgpu: add amdgpu_reg_state.h
This header defines the reg state structures exposed via
sysfs for umr debugging.

v2: add content type

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
2023-11-29 16:49:24 -05:00
Candice Li ca0ad76089 drm/amdgpu: Update EEPROM I2C address for smu v13_0_0
Check smu v13_0_0 SKU type to select EEPROM I2C address.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:23 -05:00
Felix Kuehling 9a1c1339ab drm/amdkfd: Run restore_workers on freezable WQs
Make restore workers freezable so we don't have to explicitly flush them
in suspend and GPU reset code paths, and we don't accidentally try to
restore BOs while the GPU is suspended. Not having to flush restore_work
also helps avoid lock/fence dependencies in the GPU reset case where we're
not allowed to wait for fences.

A side effect of this is, that we can now have multiple concurrent threads
trying to signal the same eviction fence. Rework eviction fence signaling
and replacement to account for that.

The GPU reset path can no longer rely on restore_process_worker to resume
queues because evict/restore workers can run independently of it. Instead
call a new restore_process_helper directly.

This is an RFC and request for testing.

v2:
- Reworked eviction fence signaling
- Introduced restore_process_helper

v3:
- Handle unsignaled eviction fences in restore_process_bos

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:23 -05:00
Mario Limonciello bd1f6a31e7 drm/amd: Enable PCIe PME from D3
When dGPU is put into BOCO it may be in D3cold but still able send
PME on display hotplug event. For this to work it must be enabled
as wake source from D3.

When runpm is enabled use pci_wake_from_d3() to mark wakeup as
enabled by default.

Cc: stable@vger.kernel.org # 6.1+
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:23 -05:00
Lu Yao 7b194fdccb drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer
For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init
'didt_rreg' and 'didt_wreg' to 'NULL'. But in func
'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT'
lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use
these two definitions won't be added for 'AMDGPU_FAMILY_SI'.

So, add null pointer judgment before calling.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:23 -05:00
Alex Deucher ca0b006939 drm/amdgpu: fix AGP addressing when GART is not at 0
This worked by luck if the GART aperture ended up at 0.  When
we ended up moving GART on some chips, the GART aperture ended
up offsetting the AGP address since the resource->start is
a GART offset, not an MC address.  Fix this by moving the AGP
address setup into amdgpu_bo_gpu_offset_no_check().

v2: check mem_type before checking agp
v3: check if the ttm bo has a ttm_tt allocated yet

Fixes: 67318cb843 ("drm/amdgpu/gmc11: set gart placement GC11")
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reported-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: christian.koenig@amd.com
Cc: mario.limonciello@amd.com
2023-11-29 16:49:22 -05:00
Lijo Lazar 201761b5eb drm/amdgpu: Move mca debug mode decision to ras
Refactor code such that ras block decides the default mca debug mode,
and not swsmu block.

By default mca debug mode is set to false.

v2: squash in uninitialized value fix (Alex)

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:01 -05:00
Prike Liang 0e6a12884c drm/amdgpu: correct the amdgpu runtime dereference usage count
Fix the amdgpu runpm dereference usage count.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:00 -05:00
Tim Huang 88f4b10a79 drm/amdgpu: fix memory overflow in the IB test
Fix a memory overflow issue in the gfx IB test
for some ASICs. At least 20 bytes are needed for
the IB test packet.

v2: correct code indentation errors. (Christian)

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:00 -05:00
Li Ma ee95135bfe drm/amdgpu: add init_registers for nbio v7.11
enable init_registers callback func for nbio v7.11.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:49:00 -05:00
Alex Sierra 20b07b0cb3 drm/amdgpu: Force order between a read and write to the same address
Setting register to force ordering to prevent read/write or write/read
hazards for un-cached modes.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:48:59 -05:00
Hawking Zhang 4b8251e019 drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interrupt
In nbio v7_9, host driver should not issu gpu reset

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:48:59 -05:00
Perry Yuan 2e9b152325 drm/amdgpu: optimize RLC powerdown notification on Vangogh
The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state  is correctly set to SMU.

[  101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[  110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[  110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[  110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[  110.884394] PM: suspend of devices aborted after 21213.620 msecs
[  110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[  110.884405] PM: Some devices failed to suspend, or early wake event detected

Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:48:58 -05:00
Hawking Zhang 702e2fb579 drm/amdgpu: Retire query/reset_ras_err_status from gfx_v9_4_3
Not needed anymore.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:48:58 -05:00
Jonathan Kim 35c425f5cc drm/amdgpu: update xgmi num links info post gc9.4.2
GC IP 9.4.2 and up support TA reporting of the number
of xGMI links between peers.

Tested-by: Vignesh Chander <vignesh.chander@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:48:45 -05:00
André Almeida 613ecd6563 drm/amd: Document device reset methods
Document what each amdgpu driver reset method does.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-29 16:23:31 -05:00
Thomas Zimmermann 26b9a880d2 Merge drm/drm-next into drm-misc-next
Backmerging to get commit 8d6ef26501 ("drm/ast: Disconnect BMC if
physical connector is connected") into drm-misc-next.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-11-28 15:32:24 +01:00
Luben Tuikov 38f922a563 drm/sched: Reverse run-queue priority enumeration
Reverse run-queue priority enumeration such that the higest priority is now 0,
and for each consecutive integer the prioirty diminishes.

Run-queues correspond to priorities. To an external observer a scheduler
created with a single run-queue, and another created with
DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule
sched->sched_rq[0] with the same "priority", as that index run-queue exists in
both schedulers, i.e. a scheduler with one run-queue or many. This patch makes
it so.

In other words, the "priority" of sched->sched_rq[n], n >= 0, is the same for
any scheduler created with any allowable number of run-queues (priorities), 0
to DRM_SCHED_PRIORITY_COUNT.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com
2023-11-24 23:03:53 -05:00
Luben Tuikov fe375c7480 drm/sched: Rename priority MIN to LOW
Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW.

This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities
in ascending order,
  DRM_SCHED_PRIORITY_LOW,
  DRM_SCHED_PRIORITY_NORMAL,
  DRM_SCHED_PRIORITY_HIGH,
  DRM_SCHED_PRIORITY_KERNEL.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com
2023-11-24 23:03:53 -05:00
Daniel Vetter c79b972eb8 drm-misc-next for 6.8:
UAPI Changes:
   - drm: Introduce CLOSE_FB ioctl
   - drm/dp-mst: Documentation for the PATH property
   - fdinfo: Do not align to a MB if the size is larger than 1MiB
   - virtio-gpu: add explicit virtgpu context debug name
 
 Cross-subsystem Changes:
   - dma-buf: Add dma_fence_timestamp helper
 
 Core Changes:
   - client: Do not acquire module reference
   - edid: split out drm_eld, add SAD helpers
   - format-helper: Cache format conversion buffers
   - sched: Move from a kthread to a workqueue, rename some internal
     functions to make it clearer, implement dynamic job-flow control
   - gpuvm: Provide more features to handle GEM objects
   - tests: Remove slow kunit tests
 
 Driver Changes:
   - ivpu: Update FW API, new debugfs file, a new NOP job submission test
     mode, improve suspend/resume, PM improvements, MMU PT optimizations,
     firmware profiling frequency support, support for uncached buffers,
     switch to gem shmem helpers, replace kthread with threaded
     interrupts
   - panfrost: PM improvements
   - qaic: Allow to run with a single MSI, support host/device time
     synchronization, misc improvements
   - simplefb: Support memory-regions, support power-domains
   - ssd130x: Unitialized variable fixes
   - omapdrm: dma-fence lockdep annotation fix
   - tidss: dma-fence lockdep annotation fix
   - v3d: Support BCM2712 (RaspberryPi5), Support fdinfo and gputop
   - panel:
     - edp: Support AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49
       V8.0, plus a whole bunch of panels used on Mediatek chromebooks.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZVc02QAKCRDj7w1vZxhR
 xRXbAPsHQkuUz2n1XWx/8uExPCn+PmS5Lg6IcbtvqpfuFrGt1QEA+DaZf2yfekFj
 FgY66UExyddH/2LDHzpKa0WJk6l6vQw=
 =1HSK
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2023-11-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.8:

UAPI Changes:
  - drm: Introduce CLOSE_FB ioctl
  - drm/dp-mst: Documentation for the PATH property
  - fdinfo: Do not align to a MB if the size is larger than 1MiB
  - virtio-gpu: add explicit virtgpu context debug name

Cross-subsystem Changes:
  - dma-buf: Add dma_fence_timestamp helper

Core Changes:
  - client: Do not acquire module reference
  - edid: split out drm_eld, add SAD helpers
  - format-helper: Cache format conversion buffers
  - sched: Move from a kthread to a workqueue, rename some internal
    functions to make it clearer, implement dynamic job-flow control
  - gpuvm: Provide more features to handle GEM objects
  - tests: Remove slow kunit tests

Driver Changes:
  - ivpu: Update FW API, new debugfs file, a new NOP job submission test
    mode, improve suspend/resume, PM improvements, MMU PT optimizations,
    firmware profiling frequency support, support for uncached buffers,
    switch to gem shmem helpers, replace kthread with threaded
    interrupts
  - panfrost: PM improvements
  - qaic: Allow to run with a single MSI, support host/device time
    synchronization, misc improvements
  - simplefb: Support memory-regions, support power-domains
  - ssd130x: Unitialized variable fixes
  - omapdrm: dma-fence lockdep annotation fix
  - tidss: dma-fence lockdep annotation fix
  - v3d: Support BCM2712 (RaspberryPi5), Support fdinfo and gputop
  - panel:
    - edp: Support AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49
      V8.0, plus a whole bunch of panels used on Mediatek chromebooks.

Note that the one missing s-o-b for 0da611a870 ("dma-buf: add
dma_fence_timestamp helper") has been supplied here, and rebasing the
entire tree with upsetting committers didn't seem worth the trouble:
https://lore.kernel.org/dri-devel/ce94020e-a7d4-4799-b87d-fbea7b14a268@gmail.com/

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/y4awn5vcfy2lr2hpauo7rc4nfpnc6kksr7btmnwaz7zk63pwoi@gwwef5iqpzva
2023-11-20 09:50:09 +01:00
Srinivasan Shanmugam 699d392903 drm/amdgpu: Add function parameter 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb'
Fixes the below:

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:30:51 -05:00
Prike Liang 425285d39a drm/amdgpu: add amdgpu runpm usage trace for separate funcs
Add trace for amdgpu runpm separate funcs usage and this will
help debugging on the case of runpm usage missed to dereference.
In the normal case the runpm usage count referred by one kind
of functionality pairwise and usage should be changed from 1 to 0,
otherwise there will be an issue in the amdgpu runpm usage
dereference.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:30:51 -05:00
Shiwu Zhang 75fb313c55 drm/amdgpu: expose the connected port num info through sysfs
By catting the xgmi_port_num sysfs node, it prints out the info in the
format of <src node id>:<src port num> -> <dst node id>:<dst port num>
for one xgmi link.

For example, in case of 4 sockets fully and evenly connected setup, it
would be like as below for the first node in the hive.
01:02 -> 02:03
01:03 -> 02:02
01:07 -> 03:04
01:04 -> 03:07
01:06 -> 04:05
01:05 -> 04:06
Based on the fact that there is two xgmi links between each socket pair,
"01:02 -> 02:03" means that the current socket in question use the port 2
to connect with port 3 of the second node in the hive and so on.

v2: print out the src/dst node id for each xgmi link (lijo)
v3: replace the current_node++ with +1 to align with dst node (le)
    and use the dev_err instead of pr_err (lijo)
v4: fix checkpatch warning (alex)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:30:51 -05:00
Mario Limonciello d9b3a066df drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks
The PCIe speed capabilities advertised by a USB4 or TBT3 link are
limited to PCIe gen 1 per the USB4 spec. In reality the speed will
change dynamically based on fabric conditions and other traffic.

DPM is disabled when dGPUs are connected directly to Intel hosts
since the PCIe root port isn't able to handle dynamic speed
switching.

As this limitation is specifically for PCIe root ports in the SoC,
don't apply it when connected to an eGPU enclosure connected to an
Intel host.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2885
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:30:51 -05:00
Mario Limonciello 466a7d1153 drm/amd: Use the first non-dGPU PCI device for BW limits
When bandwidth limits are looked up using pcie_bandwidth_available()
virtual links such as USB4 are analyzed which might not represent the
real speed. Furthermore devices may change speeds autonomously which
may introduce conditional variation to the results reported in the
status registers.

Instead look at the capabilities of first PCI device outside of
dGPU to decide upper limits that the dGPU will work at.

For eGPU this effectively means that it will use the speed of the link
partner.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925#note_2145860
Link: https://www.usb.org/document-library/usb4r-specification-v20
      USB4 V2 with Errata and ECN through June 2023
      Section 11.2.1
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:30:50 -05:00
Srinivasan Shanmugam 8a1de314d1 drm/amdgpu: Refactor 'amdgpu_connector_dvi_detect' in amdgpu_connectors.c
Fixes the below:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
WARNING: Too many leading tabs - consider code refactoring
+                                               if (list_connector->connector_type != DRM_MODE_CONNECTOR_VGA) {
WARNING: Too many leading tabs - consider code refactoring
+                                                       if (!amdgpu_display_hpd_sense(adev, amdgpu_connector->hpd.hpd)) {

Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:54 -05:00
Sam James b5a52d2afe amdgpu: Adjust kmalloc_array calls for new -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
on various files in drivers/gpu/drm/amd/amdgpu like:
```
amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size]
```

This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to
the size argument so GCC sees we're allocating enough.

Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested
'sizeof(**dump)' better communicates the intent.

Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:54 -05:00
Felix Kuehling 94e2dae0a8 drm/amdkfd: Move TLB flushing logic into amdgpu
This will make it possible for amdgpu GEM ioctls to flush TLBs on compute
VMs.

This removes VMID-based TLB flushing and always uses PASID-based
flushing. This still works because it scans the VMID-PASID mapping
registers to find the right VMID. It's only slightly less efficient. This
is not a production use case.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:53 -05:00
Felix Kuehling e6ed364efa drm/amdgpu: update mappings not managed by KFD
When restoring after an eviction, use amdgpu_vm_handle_moved to update
BO VA mappings in KFD VMs that are not managed through the KFD API. This
should allow using the render node API to create more flexible memory
mappings in KFD VMs.

v2: rebase on drm_exec changes (Alex)

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:53 -05:00
Arunpravin Paneer Selvam c8031019dc drm/amdgpu: Implement a new 64bit sequence memory driver
Developed a new driver which allocates a 64bit memory on
each request in sequence order. At the moment, user queue
fence memory is the main consumer of this seq64 driver.

v2: Worked on review comments from Christian for the following
    modifications

    - Move driver name from "semaphore" to "seq64"
    - Remove unnecessary PT/PD mapping
    - Move enable_mes check into init/fini functions.

v3: Worked on review comments from Christian

    - drop enable_mes check
    - use DECLARE_BITMAP for bit array
    - added kerneldoc for seq64

v4: Worked on review comments from Christian
    - Rename amdgpu_seq64_get name with amdgpu_seq64_alloc

v5: Worked on review comments from Christian
    - Fix seq64 lockdep warning
    - move fpriv->seq64_va check into amdgpu_seq64_unmap()
    - make the function amdgpu_seq64_unmap() return as void.
    - reserve the buffers as not interruptible.

v6: port to drm_exec (Alex)
v7: disable for now (Arun)

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 09:29:53 -05:00
Alex Deucher e8c2d3e25b drm/amdgpu/gmc9: disable AGP aperture
We've had misc reports of random IOMMU page faults when
this is used.  It's just a rarely used optimization anyway, so
let's just disable it.  It can still be toggled via the
module parameter for testing.

v2: leave it configurable via module parameter

Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:41 -05:00
Alex Deucher 61fc93695b drm/amdgpu/gmc10: disable AGP aperture
We've had misc reports of random IOMMU page faults when
this is used.  It's just a rarely used optimization anyway, so
let's just disable it.  It can still be toggled via the
module parameter for testing.

v2: leave it configurable via module parameter

Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:34 -05:00
Alex Deucher 0db062eac3 drm/amdgpu/gmc11: disable AGP aperture
We've had misc reports of random IOMMU page faults when
this is used.  It's just a rarely used optimization anyway, so
let's just disable it.  It can still be toggled via the
module parameter for testing.

v2: leave it configurable via module parameter

Fixes: 67318cb843 ("drm/amdgpu/gmc11: set gart placement GC11")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:28 -05:00
Alex Deucher 6ba5b61383 drm/amdgpu: add a module parameter to control the AGP aperture
Add a module parameter to control the AGP aperture.  The AGP
aperture is an aperture in the GPU's internal address space
which provides direct non-paged access to the platform address
space.  This access is non-snooped so only uncached memory
can be accessed.

Add a knob so that we can toggle this for debugging.

Fixes: 67318cb843 ("drm/amdgpu/gmc11: set gart placement GC11")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:20 -05:00
Alex Deucher 564ca1b53e drm/amdgpu/gmc11: fix logic typo in AGP check
Should be && rather than ||.

Fixes: b2e1cbe628 ("drm/amdgpu/gmc11: disable AGP on GC 11.5")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:58:11 -05:00
Shiwu Zhang 9ddea8c977 drm/amdgpu: add and populate the port num into xgmi topology info
The port num info is firstly introduced with 20.00.01.13 xgmi ta and
make them as part of topology info.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:53:49 -05:00
Yang Wang 07ee43faeb drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.c
fix ras err_data null pointer issue in amdgpu_ras.c

Fixes: 8cc0f5669e ("drm/amdgpu: Support multiple error query modes")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:51:58 -05:00
YuanShang 50d51374b4 drm/amdgpu: correct chunk_ptr to a pointer to chunk.
The variable "chunk_ptr" should be a pointer pointing
to a struct drm_amdgpu_cs_chunk instead of to a pointer
of that.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:48:41 -05:00
Srinivasan Shanmugam 8a0173cd90 drm/amdgpu: Address member 'ring' not described in 'amdgpu_ vce, uvd_entity_init()'
Fixes the following:

drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:237: warning: Function parameter or member 'ring' not described in 'amdgpu_vce_entity_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:405: warning: Function parameter or member 'ring' not described in 'amdgpu_uvd_entity_init'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:47:14 -05:00
Le Ma bdb72185d3 drm/amdgpu: finalizing mem_partitions at the end of GMC v9 sw_fini
The valid num_mem_partitions is required during ttm pool fini,
thus move the cleanup at the end of the function.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:47:05 -05:00
Victor Lu 0288603040 drm/amdgpu: Do not program VF copy regs in mmhub v1.8 under SRIOV (v2)
MC_VM_AGP_* registers should not be programmed by guest driver.

v2: move early return outside of loop

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17 00:46:27 -05:00
Maxime Ripard 3bf3e21c15
Merge drm/drm-next into drm-misc-next
Let's kickstart the v6.8 release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2023-11-15 10:56:44 +01:00
Linus Torvalds c0d12d7692 drm fixes for 6.7-rc1
- big pile of amd fixes, but mostly hw support newly added in 6.7
 - i915 fixes, mostly minor things
 - qxl memory leak fix
 - vc4 uaf fix in mock helpers
 - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEb4nG6jLu8Y5XI+PfTA9ye/CYqnEFAmVOlEQACgkQTA9ye/CY
 qnGawQ//d7M2CZpd9LvkRnaV2+fG9yGONOwZsG5fVtXrfT4RDQmITC9KMEh2TxJb
 s1W6HA+UijMMx6RtQN6cTYNHeYDaX2b55/g3lMnreXydii0COJwkWe52iFn0Dpcm
 RpsT5cYLEiRtiTvEzKbrkxS+rrQMu9jwxcA23b+lMkmybVgqQe1m9hYtRxZCFqr7
 6BMKOgrCRoY1mYZrNaccXBHvvgOtcpWPOsuNNgjW3MDKB663BmpABT1iDg3xxPdX
 BI8SAl6PX6ju2Jwi3WPlscmI199fhcUXDuqb8LXhJsuqynhwU940aUlxcbI7Hz6Y
 LaFwSK4OiaIsIC8yisa7cZ1z2mqnMIiGXasP6mfYVmYqpGYMW+AZcOmzui/MLiGd
 duOcvK/bLxN7moSqcKgz+mmrLfZzJkPLV8pGEVk0IbTn239AnR76bqrEQJ+Iqukx
 d4yLGE4OJchD4zzw5RlVmQIhwA8M/5croJBIo6yNyQB5xgN/Krd4QUc6KjjgzNo8
 e402NjaW2/PqWIiPtsL5tK4XdkVtvMvVUq+bJSch6Wfn3j9u06/wgFl1FBRyX7zS
 3QYvMtF/QM5/QTpbdl82hSSsJ78iO62tPYSOhycIxgB/BHoc/fap+IOFtKKnT3RZ
 xr7UiwAQA043gAMvS+TkZAc6bFW8U8Dzxu5XxEPE1L+WU6XCSbs=
 =l45e
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Daniel Vetter:
 "Dave's VPN to the big machine died, so it's on me to do fixes pr this
  and next week while everyone else is at plumbers.

   - big pile of amd fixes, but mostly for hw support newly added in 6.7

   - i915 fixes, mostly minor things

   - qxl memory leak fix

   - vc4 uaf fix in mock helpers

   - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE"

* tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits)
  drm/amdgpu: fix error handling in amdgpu_vm_init
  drm/amdgpu: Fix possible null pointer dereference
  drm/amdgpu: move UVD and VCE sched entity init after sched init
  drm/amdgpu: move kfd_resume before the ip late init
  drm/amd: Explicitly check for GFXOFF to be enabled for s0ix
  drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)
  drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
  drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
  drm/amdgpu: fix software pci_unplug on some chips
  drm/amd/display: remove duplicated argument
  drm/amdgpu: correct mca debugfs dump reg list
  drm/amdgpu: correct acclerator check architecutre dump
  drm/amdgpu: add pcs xgmi v6.4.0 ras support
  drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
  drm/amdgpu: disable smu v13.0.6 mca debug mode by default
  drm/amdgpu: Support multiple error query modes
  drm/amdgpu: refine smu v13.0.6 mca dump driver
  drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)
  drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV
  drm: amd: Resolve Sphinx unexpected indentation warning
  ...
2023-11-10 14:59:30 -08:00
Daniel Vetter 03df0fc007 amd-drm-next-6.7-2023-11-10:
amdgpu:
 - SR-IOV fixes
 - DMCUB fixes
 - DCN3.5 fixes
 - DP2 fixes
 - SubVP fixes
 - SMU14 fixes
 - SDMA4.x fixes
 - Suspend/resume fixes
 - AGP regression fix
 - UAF fixes for some error cases
 - SMU 13.0.6 fixes
 - Documentation fixes
 - RAS fixes
 - Hotplug fixes
 - Scheduling entity ordering fix
 - GPUVM fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZU58aAAKCRC93/aFa7yZ
 2PvJAQDF1IHj90BAqH3EzOx7p2jkGVeK1p+em2sS051kOvpgiAD/fvZovVUBmt/V
 tD0NOtkL8bqmIavP3vDV0Yvf9tW48Qs=
 =Z4Je
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-11-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-11-10:

amdgpu:
- SR-IOV fixes
- DMCUB fixes
- DCN3.5 fixes
- DP2 fixes
- SubVP fixes
- SMU14 fixes
- SDMA4.x fixes
- Suspend/resume fixes
- AGP regression fix
- UAF fixes for some error cases
- SMU 13.0.6 fixes
- Documentation fixes
- RAS fixes
- Hotplug fixes
- Scheduling entity ordering fix
- GPUVM fixes

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110190703.4741-1-alexander.deucher@amd.com
2023-11-10 20:51:38 +01:00
Christian König 8473bfdcb5 drm/amdgpu: fix error handling in amdgpu_vm_init
When clearing the root PD fails we need to properly release it again.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-10 11:33:28 -05:00
Felix Kuehling 256503071c drm/amdgpu: Fix possible null pointer dereference
mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 1802537820 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-10 11:33:28 -05:00
Alex Deucher 037b98a231 drm/amdgpu: move UVD and VCE sched entity init after sched init
We need kernel scheduling entities to deal with handle clean up
if apps are not cleaned up properly.  With commit 56e449603f
("drm/sched: Convert the GPU scheduler to variable number of run-queues")
the scheduler entities have to be created after scheduler init, so
change the ordering to fix this.

v2: Leave logic in UVD and VCE code

Fixes: 56e449603f ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: ltuikov89@gmail.com
2023-11-10 11:33:08 -05:00
Tim Huang 8ed79c409e drm/amdgpu: move kfd_resume before the ip late init
The kfd_resume needs to touch GC registers to enable the interrupts,
it needs to be done before GFXOFF is enabled to ensure that the GFX is
not off and GC registers can be touched. So move kfd_resume before the
amdgpu_device_ip_late_init which enables the CGPG/GFXOFF.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10 11:08:33 -05:00
Mario Limonciello e4c44b1a19 drm/amd: Explicitly check for GFXOFF to be enabled for s0ix
If a user has disabled GFXOFF this may cause problems for the suspend
sequence.  Ensure that it is enabled in amdgpu_acpi_is_s0ix_active().

The system won't reach the deepest state but it also won't hang.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10 11:08:20 -05:00
Daniel Vetter aec3e2e23b Merge tag 'drm-misc-fixes-2023-11-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-fixes for v6.7-rc1:

qxl:
- qxl memory leak fix.
syncobj:
- Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE
vc4:
- Fix UAF in mock helpers

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[sima: Stitch together both changelogs from Maarten. Also because of
branch history this contains a few more bugfixes which are already in
v6.6, but I didn't feel like this justifies some backmerge since there
wasn't any real conflict.]
Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com
2023-11-10 16:57:49 +01:00
Danilo Krummrich a78422e9df drm/sched: implement dynamic job-flow control
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
2023-11-10 02:54:29 +01:00
Victor Lu 1972642843 drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)
W/RREG32_RLC is hardedcoded to use instance 0. W/RREG32_SOC15_RLC
should be used instead when inst != 0.

v2: rebase

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:03:16 -05:00
Victor Lu 85150626ea drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0.

Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC
and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter.

Using amdgpu_sriov_runtime to determine whether to access via kiq or
RLC is sufficient for now.

v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call

v4: avoid using amdgpu_sriov_w/rreg

v3: use W/RREG32_XCC to handle non-kiq case

v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters
    of amdgpu_device_wreg/rreg

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:03:07 -05:00
Yang Wang 76d2da18af drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
add pcs xgmi ras error query support for smu v13.0.6.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:59 -05:00
Vitaly Prosyak 4638e0c29a drm/amdgpu: fix software pci_unplug on some chips
When software 'pci unplug' using IGT is executed we got a sysfs directory
entry is NULL for differant ras blocks like hdp, umc, etc.
Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group'
check that 'sd' is  not NULL.

[  +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90
[  +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[  +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246
[  +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000
[  +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000
[  +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000
[  +0.000001] FS:  00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0
[  +0.000001] Call Trace:
[  +0.000001]  <TASK>
[  +0.000001]  ? show_regs+0x72/0x90
[  +0.000002]  ? sysfs_remove_group+0x83/0x90
[  +0.000002]  ? __warn+0x8d/0x160
[  +0.000001]  ? sysfs_remove_group+0x83/0x90
[  +0.000001]  ? report_bug+0x1bb/0x1d0
[  +0.000003]  ? handle_bug+0x46/0x90
[  +0.000001]  ? exc_invalid_op+0x19/0x80
[  +0.000002]  ? asm_exc_invalid_op+0x1b/0x20
[  +0.000003]  ? sysfs_remove_group+0x83/0x90
[  +0.000001]  dpm_sysfs_remove+0x61/0x70
[  +0.000002]  device_del+0xa3/0x3d0
[  +0.000002]  ? ktime_get_mono_fast_ns+0x46/0xb0
[  +0.000002]  device_unregister+0x18/0x70
[  +0.000001]  i2c_del_adapter+0x26d/0x330
[  +0.000002]  arcturus_i2c_control_fini+0x25/0x50 [amdgpu]
[  +0.000236]  smu_sw_fini+0x38/0x260 [amdgpu]
[  +0.000241]  amdgpu_device_fini_sw+0x116/0x670 [amdgpu]
[  +0.000186]  ? mutex_lock+0x13/0x50
[  +0.000003]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
[  +0.000192]  drm_minor_release+0x4f/0x80 [drm]
[  +0.000025]  drm_release+0xfe/0x150 [drm]
[  +0.000027]  __fput+0x9f/0x290
[  +0.000002]  ____fput+0xe/0x20
[  +0.000002]  task_work_run+0x61/0xa0
[  +0.000002]  exit_to_user_mode_prepare+0x150/0x170
[  +0.000002]  syscall_exit_to_user_mode+0x2a/0x50

Cc: Hawking Zhang <hawking.zhang@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:49 -05:00
Yang Wang 8140b07b0a drm/amdgpu: correct mca debugfs dump reg list
avoid driver to touch invalid mca reg.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:32 -05:00
Hawking Zhang d406aec8dc drm/amdgpu: correct acclerator check architecutre dump
So driver doesn't touch invalid aca entries.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:26 -05:00
Yang Wang 27d80f7d68 drm/amdgpu: add pcs xgmi v6.4.0 ras support
add pcs xgmi v6.4.0 ras support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:20 -05:00
David Yat Sin 4abf0b0bdf drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
Change local memory type to MTYPE_UC on revision id 0

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:02:14 -05:00
Hawking Zhang 8cc0f5669e drm/amdgpu: Support multiple error query modes
Direct error query mode and firmware error query mode
are supported for now.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:58 -05:00
Yang Wang 07c1db7036 drm/amdgpu: refine smu v13.0.6 mca dump driver
refine smu mca driver to support query ras error from pmfw path.
- correct gfx smu bank hwid (from mp5 to smu bank)
- retire unused callback function in amdgpu_mca_smu_funcs{}
- add new mca_bank_set{} structure to collect mca bank
- move enum mca_reg_idx into amdgpu_mca.h header
- add mca status register field decode macro

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:51 -05:00
Victor Lu 0b1695710a drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)
The following regs can only be programmed by the PF:
HDP_MISC_CNTL
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI

v2: update commit message

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:42 -05:00
Victor Lu a78b481469 drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV
PCTL0_MMHUB_DEEPSLEEP_IB is blocked for VF access

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:35 -05:00
Yang Wang bf13da6ae1 drm/amdgpu: correct smu v13.0.6 umc ras error check
correct smu v13.0.0 umc ras error check

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:20 -05:00
Victor Lu bc3c566071 drm/amdgpu: Add xcc param to SRIOV kiq write and WREG32_SOC15_IP_NO_KIQ (v4)
WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait
are not using the correct rlcg interface or mec engine, respectively.

Add xcc instance parameter to them.

v4: Use GET_INST and squash commit with:
"drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait"

v3: xcc not needed for MMMHUB

v2: rebase

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:10 -05:00
Victor Lu f64c3fce46 drm/amdgpu: Add flag to enable indirect RLCG access for gfx v9.4.3
The "rlcg_reg_access_supported" flag is missing. Add it back in.

Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:01:01 -05:00
Yang Wang 4eaa007c73 drm/amdgpu: correct amdgpu ip block rev info
correct following amdgpu ip block version information:
- gfx_v9_4_3
- sdma_v4_4_2

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:00:48 -05:00
Tao Zhou 61d7052216 drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_mode
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to
print warning for it.

v2: add ret2 to save the status of psp_ras_trigger_error.

Suggested-by: lijo.lazar@amd.com
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09 17:00:32 -05:00
Christian König 17daf01ab4 drm/amdgpu: lower CS errors to debug severity
Otherwise userspace can spam the logs by using incorrect input values.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-09 17:00:17 -05:00
Christian König 12f76050d8 drm/amdgpu: fix error handling in amdgpu_bo_list_get()
We should not leak the pointer where we couldn't grab the reference
on to the caller because it can be that the error handling still
tries to put the reference then.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-09 16:59:57 -05:00
Alex Deucher bff3315ba8 drm/amdgpu: fix AGP init order
The default AGP settings were overwriting the IP selected
ones since the default was getting set after the IP ones
were selected.

Fixes: de59b69932 ("drm/amdgpu/gmc: set a default disable value for AGP")
Link: https://lists.freedesktop.org/archives/amd-gfx/2023-November/100966.html
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
2023-11-09 16:59:46 -05:00
Linus Torvalds 25b6377007 drm next and fixes for 6.7-rc1
renesas:
 - atomic conversion
 - DT support
 
 ssd13xx:
 - dt binding fix for ssd132x
 - Initialize ssd130x crtc_state to NULL.
 
 amdgpu:
 - Fix RAS support check
 - RAS fixes
 - MES fixes
 - SMU13 fixes
 - Contiguous memory allocation fix
 - BACO fixes
 - GPU reset fixes
 - Min power limit fixes
 - GFX11 fixes
 - USB4/TB hotplug fixes
 - ARM regression fix
 - GFX9.4.3 fixes
 - KASAN/KCSAN stack size check fixes
 - SR-IOV fixes
 - SMU14 fixes
 - PSP13 fixes
 - Display blend fixes
 - Flexible array size fixes
 
 amdkfd:
 - GPUVM fix
 
 radeon:
 - Flexible array size fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVJmdsACgkQDHTzWXnE
 hr5hphAAoFdk4ma7TauUNyP3JoUwht+Ohm9NcUHq/9P9kOCwPIIehxMZnwPoTGyw
 VWwXpqpeVsW6zyMgfxWq/+P1S1C5LvZ1HLccbP3xv327fUZ1QnLapHwFxT1SYNpi
 Tw5qhQN/cwNOX0Pc9uBavYmJzf54OhvPxt2CHHPDShsHBOBc0Gd88gKJr7GWUF5M
 Ri6i20Tfsgq8AopWQj9628TT+y/aN3rVIfYYBiNejxejlFtt1HODkKFX3DuBNPOI
 bZOtQm11cbxmX7/2RI92mI20axUb4UMNIQFDYEl3bVlyPyEhhwKPQMSDxhQQodPg
 8zMY9Fbl4Z4VaOFEDbpiRv/0/HeWoLefmpQ5LZbz35RhKLTkwsWXHUELPUEj3uTr
 7+EnLwuvQdtbT9W2J8btO7v1dHOy86ArlnmqNjB2cEnvGaR3DNM4jxPVLn60SyAc
 N7CFWNU4EoJf02XhwAludYa5pQHEaTFL8ss6TSoRWXuHHg1vNdu2SorOvkcGkg28
 q/t28gZDZaOpXfMmf3ec+PHhO6nxXLXJRbiksVP/rpXQQ42cEI72hr6//UYLRuzg
 BbYPZ8uXmDjsSZIqYceZwc3Vr6oEmD6EAzHM9+zwS5h0IZ12jKTmFiDEAhG2DwoG
 8PCaj5UXkxVK+6iHndz8Qwg4+Fu1j5nodKM+vBNem/iSnxC/OVw=
 =TsFN
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm

Pull more drm updates from Dave Airlie:
 "Geert pointed out I missed the renesas reworks in my main pull, so
  this pull contains the renesas next work for atomic conversion and DT
  support.

  It also contains a bunch of amdgpu and some small ssd13xx fixes.

  renesas:
   - atomic conversion
   - DT support

  ssd13xx:
   - dt binding fix for ssd132x
   - Initialize ssd130x crtc_state to NULL.

  amdgpu:
   - Fix RAS support check
   - RAS fixes
   - MES fixes
   - SMU13 fixes
   - Contiguous memory allocation fix
   - BACO fixes
   - GPU reset fixes
   - Min power limit fixes
   - GFX11 fixes
   - USB4/TB hotplug fixes
   - ARM regression fix
   - GFX9.4.3 fixes
   - KASAN/KCSAN stack size check fixes
   - SR-IOV fixes
   - SMU14 fixes
   - PSP13 fixes
   - Display blend fixes
   - Flexible array size fixes

  amdkfd:
   - GPUVM fix

  radeon:
   - Flexible array size fixes"

* tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
  drm/amd/display: Enable fast update on blendTF change
  drm/amd/display: Fix blend LUT programming
  drm/amd/display: Program plane color setting correctly
  drm/amdgpu: Query and report boot status
  drm/amdgpu: Add psp v13 function to query boot status
  drm/amd/swsmu: remove fw version check in sw_init.
  drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
  drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks
  drm/amdgpu: Optimize the asic type fix code
  drm/amdgpu: fix GRBM read timeout when do mes_self_test
  drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
  drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs
  drm/amdgpu: Attach eviction fence on alloc
  drm/amdkfd: Improve amdgpu_vm_handle_moved
  drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2
  drm/amd/display: Avoid NULL dereference of timing generator
  drm/amdkfd: Update cache info for GFX 9.4.3
  drm/amdkfd: Populate cache info for GFX 9.4.3
  drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
  drm/amdgpu/smu13: drop compute workload workaround
  ...
2023-11-07 17:10:02 -08:00
Tao Zhou 20238a2cc9 drm/amdgpu: add RAS reset/query operations for XGMI v6_4
Reset/query RAS error status and count.

v2: use XGMI IP version instead of WAFL version.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07 12:03:31 -05:00
Tao Zhou 61fe5536d0 drm/amdgpu: handle extra UE register entries for gfx v9_4_3
The UE registe list is larger than CE list.

Reported-by: yipeng.chai@amd.com
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07 12:03:31 -05:00
Lijo Lazar 0553eb9f33 drm/amdgpu: Fix sdma 4.4.2 doorbell rptr/wptr init
Doorbell rptr/wptr can be set through multiple ways including direct
register initialization. Disable doorbell during hw_fini once the ring
is disabled so that during next module reload direct initialization
takes effect. Also, move the direct initialization after minor update is
set to 1 since rptr/wptr are reinitialized back to 0 which could be
lower than the previous doorbell value (ex: cases like module reload).

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07 12:03:30 -05:00
Jiadong Zhu 9c561ca2d3 drm/amdgpu/soc21: add mode2 asic reset for SMU IP v14.0.0
Set the default reset method to mode2 for SMU IP v14.0.0

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07 12:03:30 -05:00
Surbhi Kakarya 9256e8d47a drm/amd: Disable XNACK on SRIOV environment
The purpose of this patch is to disable XNACK or set XNACK OFF mode
on SRIOV platform which doesn't support it.

This will prevent user-space application to fail or result into
unexpected behaviour whenever the application need to run test-case
in XNACK ON mode.

Signed-off-by: Surbhi Kakarya <surbhi.kakarya@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07 11:15:37 -05:00
Hawking Zhang 23618280cc drm/amdgpu: Query and report boot status
Query boot status and report boot errors. A follow
up change is needed to stop GPU initialization if boot
fails.

v2: only invoke the call for dGPU (Le/Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Hawking Zhang df57e019d5 drm/amdgpu: Add psp v13 function to query boot status
Add psp v13 function to query boot status.

v2: limit the use case to dGPU only (Lijo)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Ma Jun dbab63561b drm/amdgpu: Optimize the asic type fix code
Use a new struct array to define the asic information which
asic type needs to be fixed.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Tim Huang 36e7ff5c13 drm/amdgpu: fix GRBM read timeout when do mes_self_test
Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers
can be touched when initialize the compute and gfx mqd in mes_self_test.
Otherwise, we expect no response from CP and an GRBM eventual timeout.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 12:18:32 -04:00
Tao Zhou 18eae367cb drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
Handle xgmi hive case.

Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Felix Kuehling 0e2e7c5b3d drm/amdgpu: Attach eviction fence on alloc
Instead of attaching the eviction fence when a KFD BO is first mapped,
attach it when it is allocated or imported. This in preparation to allow
KFD BOs to be mapped using the render node API.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Felix Kuehling 5a104cb97c drm/amdkfd: Improve amdgpu_vm_handle_moved
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by
the caller. This will be useful for handling extra BO VA mappings in
KFD VMs that are managed through the render node API.

v2: rebase against drm_exec changes (Alex)

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:32 -04:00
Alex Deucher ba0fb4b48c drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
Issues were reported with commit 1cfb4d6121
("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere
Altra Developer Platform (AVA developer platform).

Various ARM systems seem to have problems related
to PCIe and MMIO access.  In this case, I'm not sure
if this is specific to the ADLINK platform or ARM
in general.  Seems to be some coherency issue with
VRAM.  For now, just don't put MQDs in VRAM on ARM.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html
Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: alexey.klimov@linaro.org
2023-11-03 11:59:51 -04:00
Alex Deucher 3938eb956e drm/amdgpu: add a retry for IP discovery init
AMD dGPUs have integrated FW that runs as soon as the
device gets power and initializes the board (determines
the amount of memory, provides configuration details to
the driver, etc.).  For direct PCIe attached cards this
happens as soon as power is applied and normally completes
well before the OS has even started loading.  However, with
hotpluggable ports like USB4, the driver needs to wait for
this to complete before initializing the device.

This normally takes 60-100ms, but could take longer on
some older boards periodically due to memory training.

Retry for up to a second.  In the non-hotplug case, there
should be no change in behavior and this should complete
on the first try.

v2: adjust test criteria
v3: adjust checks for the masks, only enable on removable devices
v4: skip bif_fb_en check

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:51 -04:00
Perry Yuan 886b92f635 drm/amdgpu: ungate power gating when system suspend
[Why] During suspend, if GFX DPM is enabled and GFXOFF feature is
enabled the system may get hung. So, it is suggested to disable
GFXOFF feature during suspend and enable it after resume.

[How] Update the code to disable GFXOFF feature during suspend and enable
it after resume.

[  311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[  311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[  311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62

Acked-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Kun Liu <kun.liu2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:51 -04:00
Alex Deucher 7b1c6263ea drm/amdgpu: don't use pci_is_thunderbolt_attached()
It's only valid on Intel systems with the Intel VSEC.
Use dev_is_removable() instead.  This should do the right
thing regardless of the platform.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:44 -04:00
Alex Deucher 432e664e7c drm/amdgpu: don't use ATRM for external devices
The ATRM ACPI method is for fetching the dGPU vbios rom
image on laptops and all-in-one systems.  It should not be
used for external add in cards.  If the dGPU is thunderbolt
connected, don't try ATRM.

v2: pci_is_thunderbolt_attached only works for Intel.  Use
    pdev->external_facing instead.
v3: dev_is_removable() seems to be what we want

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-11-03 11:59:13 -04:00
Alex Deucher b3c942bb6c drm/amdgpu/gfx10,11: use memcpy_to/fromio for MQDs
Since they were moved to VRAM, we need to use the IO
variants of memcpy.

Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:19 -04:00
Tao Zhou f7aeee7346 drm/amdgpu: use mode-2 reset for RAS poison consumption
Switch from mode-1 reset to mode-2 for poison consumption.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:13 -04:00
Lin.Cao b77cc85bdb drm/amdgpu doorbell range should be set when gpu recovery
GFX doorbell range should be set after flr otherwise the gfx doorbell
range will be overlap with MEC.

v2: remove "amdgpu_sriov_vf" and "amdgpu_in_reset" check, and add grbm
select for the case of 2 gfx rings.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 11:38:04 -04:00
Linus Torvalds 27beb3ca34 pci-v6.7-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmVBaU8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwEdxAAo++s98+ZaaTdUuoV0Zpft1fuY6Yr
 mR80jUDxjHDbcI1G4iNVUSWG6pGIdlURnrBp5kU74FV9R2Ps3Fl49XQUHowE0HfH
 D/qmihiJQdnMsQKwzw3XGoTSINrDcF6nLafl9brBItVkgjNxfxSEbnweJMBf+Boc
 rpRXHzxbVHVjwwhBLODF2Wt/8sQ24w9c+wcQkpo7im8ZZReoigNMKgEa4J7tLlqA
 vTyPR/K6QeU8IBUk2ObCY3GeYrVuqi82eRK3Uwzu7IkQwA9orE416Okvq3Z026/h
 TUAivtrcygHaFRdGNvzspYLbc2hd2sEXF+KKKb6GNAjxuDWUhVQW4ObY4FgFkZ65
 Gqz/05D6c1dqTS3vTxp3nZYpvPEbNnO1RaGRL4h0/mbU+QSPSlHXWd9Lfg6noVVd
 3O+CcstQK8RzMiiWLeyctRPV5XIf7nGVQTJW5aCLajlHeJWcvygNpNG4N57j/hXQ
 gyEHrz3idXXHXkBKmyWZfre6YpLkxZtKyONZDHWI/AVhU0TgRdJWmqpRfC1kVVUe
 IUWBRcPUF4/r3jEu6t10N/aDWQN1uQzIsJNnCrKzAddPDTTYQJk8VVzKPo8SVxPD
 X+OjEMgBB/fXUfkJ7IMwgYnWaFJhxthrs6/3j1UqRvGYRoulE4NdWwJDky9UYIHd
 qV3dzuAxC/cpv08=
 =G//C
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Use acpi_evaluate_dsm_typed() instead of open-coding _DSM
     evaluation to learn device characteristics (Andy Shevchenko)

   - Tidy multi-function header checks using new PCI_HEADER_TYPE_MASK
     definition (Ilpo Järvinen)

   - Simplify config access error checking in various drivers (Ilpo
     Järvinen)

   - Use pcie_capability_clear_word() (not
     pcie_capability_clear_and_set_word()) when only clearing (Ilpo
     Järvinen)

   - Add pci_get_base_class() to simplify finding devices using base
     class only (ignoring subclass and programming interface) (Sui
     Jingfeng)

   - Add pci_is_vga(), which includes ancient PCI_CLASS_NOT_DEFINED_VGA
     devices from before the Class Code was added to PCI (Sui Jingfeng)

   - Use pci_is_vga() for vgaarb, sysfs "boot_vga", virtio, qxl to
     include ancient VGA devices (Sui Jingfeng)

  Resource management:

   - Make pci_assign_unassigned_resources() non-init because sparc uses
     it after init (Randy Dunlap)

  Driver binding:

   - Retain .remove() and .probe() callbacks (previously __init) because
     sysfs may cause them to be called later (Uwe Kleine-König)

   - Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device, so
     it can be claimed by dwc3 instead (Vicki Pfau)

  PCI device hotplug:

   - Add Ampere Altra Attention Indicator extension driver for acpiphp
     (D Scott Phillips)

  Power management:

   - Quirk VideoPropulsion Torrent QN16e with longer delay after reset
     (Lukas Wunner)

   - Prevent users from overriding drivers that say we shouldn't use
     D3cold (Lukas Wunner)

   - Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4
     because wakeup interrupts from those states don't work if amd-pmc
     has put the platform in a hardware sleep state (Mario Limonciello)

  IOMMU:

   - Disable ATS for Intel IPU E2000 devices with invalidation message
     endianness erratum (Bartosz Pawlowski)

  Error handling:

   - Factor out interrupt enable/disable into helpers (Kai-Heng Feng)

  Peer-to-peer DMA:

   - Fix flexible-array usage in struct pci_p2pdma_pagemap in case we
     ever use pagemaps with multiple entries (Gustavo A. R. Silva)

  ASPM:

   - Revert a change that broke when drivers disabled L1 and users later
     enabled an L1.x substate via sysfs, and fix a similar issue when
     users disabled L1 via sysfs (Heiner Kallweit)

  Endpoint framework:

   - Fix double free in __pci_epc_create() (Dan Carpenter)

   - Use IS_ERR_OR_NULL() to simplify endpoint core (Ruan Jinjie)

  Cadence PCIe controller driver:

   - Drop unused "is_rc" member (Li Chen)

  Freescale Layerscape PCIe controller driver:

   - Enable 64-bit addressing in endpoint mode (Guanhua Gao)

  Intel VMD host bridge driver:

   - Fix multi-function header check (Ilpo Järvinen)

  Microsoft Hyper-V host bridge driver:

   - Annotate struct hv_dr_state with __counted_by (Kees Cook)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop setting of LNKCAP_MLW (max link width) since dw_pcie_setup()
     already does this via dw_pcie_link_set_max_link_width() (Yoshihiro
     Shimoda)

  Qualcomm PCIe controller driver:

   - Use PCIE_SPEED2MBS_ENC() to simplify encoding of link speed
     (Manivannan Sadhasivam)

   - Add a .write_dbi2() callback so DBI2 register writes, e.g., for
     setting the BAR size, work correctly (Manivannan Sadhasivam)

   - Enable ASPM for platforms that use 1.9.0 ops, because the PCI core
     doesn't enable ASPM states that haven't been enabled by the
     firmware (Manivannan Sadhasivam)

  Renesas R-Car Gen4 PCIe controller driver:

   - Add DesignWare core support (set max link width, EDMA_UNROLL flag,
     .pre_init(), .deinit(), etc) for use by R-Car Gen4 driver
     (Yoshihiro Shimoda)

   - Add driver and DT schema for DesignWare-based Renesas R-Car Gen4
     controller in both host and endpoint mode (Yoshihiro Shimoda)

  Xilinx NWL PCIe controller driver:

   - Update ECAM size to support 256 buses (Thippeswamy Havalige)

   - Stop setting bridge primary/secondary/subordinate bus numbers,
     since PCI core does this (Thippeswamy Havalige)

  Xilinx XDMA controller driver:

   - Add driver and DT schema for Zynq UltraScale+ MPSoCs devices with
     Xilinx XDMA Soft IP (Thippeswamy Havalige)

  Miscellaneous:

   - Use FIELD_GET()/FIELD_PREP() to simplify and reduce use of _SHIFT
     macros (Ilpo Järvinen, Bjorn Helgaas)

   - Remove logic_outb(), _outw(), outl() duplicate declarations (John
     Sanpe)

   - Replace unnecessary UTF-8 in Kconfig help text because menuconfig
     doesn't render it correctly (Liu Song)"

* tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (102 commits)
  PCI: qcom-ep: Add dedicated callback for writing to DBI2 registers
  PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word()
  PCI: endpoint: Fix double free in __pci_epc_create()
  PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver
  dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx XDMA PCIe Root Port Bridge
  PCI: xilinx-cpm: Move IRQ definitions to a common header
  PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses
  PCI: xilinx-nwl: Rename the NWL_ECAM_VALUE_DEFAULT macro
  dt-bindings: PCI: xilinx-nwl: Modify ECAM size in the DT example
  PCI: xilinx-nwl: Remove redundant code that sets Type 1 header fields
  PCI: hotplug: Add Ampere Altra Attention Indicator extension driver
  PCI/AER: Factor out interrupt toggling into helpers
  PCI: acpiphp: Allow built-in drivers for Attention Indicators
  PCI/portdrv: Use FIELD_GET()
  PCI/VC: Use FIELD_GET()
  PCI/PTM: Use FIELD_GET()
  PCI/PME: Use FIELD_GET()
  PCI/ATS: Use FIELD_GET()
  PCI/ATS: Show PASID Capability register width in bitmasks
  PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common()
  ...
2023-11-02 14:05:18 -10:00
Matthew Brost a6149f0393 drm/sched: Convert drm scheduler to use a work queue rather than kthread
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-11-01 17:29:21 -04:00
Matthew Brost 35963cf2cd drm/sched: Add drm_sched_wqueue_* helpers
Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-11-01 17:29:20 -04:00
Linus Torvalds 7d461b291e drm for 6.7-rc1
kernel:
 - add initial vmemdup-user-array
 
 core:
 - fix platform remove() to return void
 - drm_file owner updated to reflect owner
 - move size calcs to drm buddy allocator
 - let GPUVM build as a module
 - allow variable number of run-queues in scheduler
 
 edid:
 - handle bad h/v sync_end in EDIDs
 
 panfrost:
 - add Boris as maintainer
 
 fbdev:
 - use fb_ops helpers more
 - only allow logo use from fbcon
 - rename fb_pgproto to pgprot_framebuffer
 - add HPD state to drm_connector_oob_hotplug_event
 - convert to fbdev i/o mem helpers
 
 i915:
 - Enable meteorlake by default
 - Early Xe2 LPD/Lunarlake display enablement
 - Rework subplatforms into IP version checks
 - GuC based TLB invalidation for Meteorlake
 - Display rework for future Xe driver integration
 - LNL FBC features
 - LNL display feature capability reads
 - update recommended fw versions for DG2+
 - drop fastboot module parameter
 - added deviceid for Arrowlake-S
 - drop preproduction workarounds
 - don't disable preemption for resets
 - cleanup inlines in headers
 - PXP firmware loading fix
 - Fix sg list lengths
 - DSC PPS state readout/verification
 - Add more RPL P/U PCI IDs
 - Add new DG2-G12 stepping
 - DP enhanced framing support to state checker
 - Improve shared link bandwidth management
 - stop using GEM macros in display code
 - refactor related code into display code
 - locally enable W=1 warnings
 - remove PSR watchdog timers on LNL
 
 amdgpu:
 - RAS/FRU EEPROM updatse
 - IP discovery updatses
 - GC 11.5 support
 - DCN 3.5 support
 - VPE 6.1 support
 - NBIO 7.11 support
 - DML2 support
 - lots of IP updates
 - use flexible arrays for bo list handling
 - W=1 fixes
 - Enable seamless boot in more cases
 - Enable context type property for HDMI
 - Rework GPUVM TLB flushing
 - VCN IB start/size alignment fixes
 
 amdkfd:
 - GC 10/11 fixes
 - GC 11.5 support
 - use partial migration in GPU faults
 
 radeon:
 - W=1 Fixes
 - fix some possible buffer overflow/NULL derefs
 nouveau:
 - update uapi for NO_PREFETCH
 - scheduler/fence fixes
 - rework suspend/resume for GSP-RM
 - rework display in preparation for GSP-RM
 
 habanalabs:
 - uapi: expose tsc clock
 - uapi: block access to eventfd through control device
 - uapi: force dma-buf export to PAGE_SIZE alignments
 - complete move to accel subsystem
 - move firmware interface include files
 - perform hard reset on PCIe AXI drain event
 - optimise user interrupt handling
 
 msm:
 - DP: use existing helpers for DPCD
 - DPU: interrupts reworked
 - gpu: a7xx (a730/a740) support
 - decouple msm_drv from kms for headless devices
 
 mediatek:
 - MT8188 dsi/dp/edp support
 - DDP GAMMA - 12 bit LUT support
 - connector dynamic selection capability
 
 rockchip:
 - rv1126 mipi-dsi/vop support
 - add planar formats
 
 ast:
 - rename constants
 
 panels:
 - Mitsubishi AA084XE01
 - JDI LPM102A188A
 - LTK050H3148W-CTA6
 
 ivpu:
 - power management fixes
 
 qaic:
 - add detach slice bo api
 
 komeda:
 - add NV12 writeback
 
 tegra:
 - support NVSYNC/NHSYNC
 - host1x suspend fixes
 
 ili9882t:
 - separate into own driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmVAgzYACgkQDHTzWXnE
 hr7ZEQ//UXne3tyGOsU3X8r+lstLFDMa90a3hvTg6hX+Q0MjHd/clwkKFkLpkipL
 n7gIZlaHl11dRs0FzrIZA5EVAAgjMLKmIl10NBDFec6ZFA3VERcggx8y61uifI15
 VviMR1VbLHYZaCdyrQOK0A4wcktWnKXyoXp7cwy9crdc2GOBMUZkdIqtvD7jHxQx
 UMIFnzi1CyKUX/Fjt/JceYcNk9y2ZGkzakYO3sHcUdv4DPu9qX4kNzpjF691AZBP
 UeKWvCswTRVg2M0kuo/RYIBzqaTmOlk6dHLWBognIeZPyuyhCcaGC2d64c6tShwQ
 dtHdi+IgyQ8s2qb350ymKTQUP7xA/DfZBwH7LvrZALBxeQGYQN1CnsgDMOS2wcUc
 XrRFiS7PxEOtMMBctcPBnnoV5ttnsLLlPpzM9puh9sUFMn6CgLzcAMqXdqxzMajH
 +dz2aD1N0vMqq4varozOg9SC2QamgUiPN/TQfrulhCTCfQaXczy5x1OYiIz65+Sl
 mKoe2WASuP9Ve8do4N/wEwH5SZY2ItipBdUTRxttY9NTanmV0X5DjZBXH5b9XGci
 Zl5Ar613f9zwm5T5BVA5k6s3ZbGY6QcP5pDNTCPaSgitfFXIdReBZ2CaYzK3MPg/
 Wit/TXrud9yT6VPpI1igboMyasf5QubV1MY1K83kOCWr9u8R2CM=
 =l79u
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "Highlights:
   - AMD adds some more upcoming HW platforms
   - Intel made Meteorlake stable and started adding Lunarlake
   - nouveau has a bunch of display rework in prepartion for the NVIDIA
     GSP firmware support
   - msm adds a7xx support
   - habanalabs has finished migration to accel subsystem

  Detail summary:

  kernel:
   - add initial vmemdup-user-array

  core:
   - fix platform remove() to return void
   - drm_file owner updated to reflect owner
   - move size calcs to drm buddy allocator
   - let GPUVM build as a module
   - allow variable number of run-queues in scheduler

  edid:
   - handle bad h/v sync_end in EDIDs

  panfrost:
   - add Boris as maintainer

  fbdev:
   - use fb_ops helpers more
   - only allow logo use from fbcon
   - rename fb_pgproto to pgprot_framebuffer
   - add HPD state to drm_connector_oob_hotplug_event
   - convert to fbdev i/o mem helpers

  i915:
   - Enable meteorlake by default
   - Early Xe2 LPD/Lunarlake display enablement
   - Rework subplatforms into IP version checks
   - GuC based TLB invalidation for Meteorlake
   - Display rework for future Xe driver integration
   - LNL FBC features
   - LNL display feature capability reads
   - update recommended fw versions for DG2+
   - drop fastboot module parameter
   - added deviceid for Arrowlake-S
   - drop preproduction workarounds
   - don't disable preemption for resets
   - cleanup inlines in headers
   - PXP firmware loading fix
   - Fix sg list lengths
   - DSC PPS state readout/verification
   - Add more RPL P/U PCI IDs
   - Add new DG2-G12 stepping
   - DP enhanced framing support to state checker
   - Improve shared link bandwidth management
   - stop using GEM macros in display code
   - refactor related code into display code
   - locally enable W=1 warnings
   - remove PSR watchdog timers on LNL

  amdgpu:
   - RAS/FRU EEPROM updatse
   - IP discovery updatses
   - GC 11.5 support
   - DCN 3.5 support
   - VPE 6.1 support
   - NBIO 7.11 support
   - DML2 support
   - lots of IP updates
   - use flexible arrays for bo list handling
   - W=1 fixes
   - Enable seamless boot in more cases
   - Enable context type property for HDMI
   - Rework GPUVM TLB flushing
   - VCN IB start/size alignment fixes

  amdkfd:
   - GC 10/11 fixes
   - GC 11.5 support
   - use partial migration in GPU faults

  radeon:
   - W=1 Fixes
   - fix some possible buffer overflow/NULL derefs

  nouveau:
   - update uapi for NO_PREFETCH
   - scheduler/fence fixes
   - rework suspend/resume for GSP-RM
   - rework display in preparation for GSP-RM

  habanalabs:
   - uapi: expose tsc clock
   - uapi: block access to eventfd through control device
   - uapi: force dma-buf export to PAGE_SIZE alignments
   - complete move to accel subsystem
   - move firmware interface include files
   - perform hard reset on PCIe AXI drain event
   - optimise user interrupt handling

  msm:
   - DP: use existing helpers for DPCD
   - DPU: interrupts reworked
   - gpu: a7xx (a730/a740) support
   - decouple msm_drv from kms for headless devices

  mediatek:
   - MT8188 dsi/dp/edp support
   - DDP GAMMA - 12 bit LUT support
   - connector dynamic selection capability

  rockchip:
   - rv1126 mipi-dsi/vop support
   - add planar formats

  ast:
   - rename constants

  panels:
   - Mitsubishi AA084XE01
   - JDI LPM102A188A
   - LTK050H3148W-CTA6

  ivpu:
   - power management fixes

  qaic:
   - add detach slice bo api

  komeda:
   - add NV12 writeback

  tegra:
   - support NVSYNC/NHSYNC
   - host1x suspend fixes

  ili9882t:
   - separate into own driver"

* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
  drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
  drm/amdgpu: Remove duplicate fdinfo fields
  drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
  drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
  drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
  drm/amdgpu: Identify data parity error corrected in replay mode
  drm/amdgpu: Fix typo in IP discovery parsing
  drm/amd/display: fix S/G display enablement
  drm/amdxcp: fix amdxcp unloads incompletely
  drm/amd/amdgpu: fix the GPU power print error in pm info
  drm/amdgpu: Use pcie domain of xcc acpi objects
  drm/amd: check num of link levels when update pcie param
  drm/amdgpu: Add a read to GFX v9.4.3 ring test
  drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
  drm/amdgpu: get RAS poison status from DF v4_6_2
  drm/amdgpu: Use discovery table's subrevision
  drm/amd/display: 3.2.256
  drm/amd/display: add interface to query SubVP status
  drm/amd/display: Read before writing Backlight Mode Set Register
  drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
  ...
2023-11-01 06:28:35 -10:00
Yang Wang 2bfb0ca3dd drm/amdgpu: remove unused macro HW_REV
remove unused macro HW_REV

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:13:59 -04:00
Arunpravin Paneer Selvam 9ae587f850 drm/amdgpu: Fix the vram base start address
If the size returned by drm buddy allocator is higher than
the required size, we take the higher size to calculate
the buffer start address. This is required if we couldn't
trim the buffer to the requested size. This will fix the
display corruption issue on APU's which has limited VRAM
size.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2859
Fixes: 0a1844bf0b ("drm/buddy: Improve contiguous memory allocation")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:10:13 -04:00
Tao Zhou d539b0ad7c drm/amdgpu: set XGMI IP version manually for v6_4
The version can't be queried from discovery table.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 17:09:53 -04:00
Tong Liu01 853eebe6ec drm/amdgpu: add unmap latency when gfx11 set kiq resources
[why]
If driver does not set unmap latency for KIQ, the default value of KIQ
unmap latency is zero. When do unmap queue, KIQ will return that almost
immediately after receiving unmap command. So, the queue status will be
saved to MQD incorrectly or lost in some chance.

[how]
Set unmap latency when do kiq set resources. The unmap latency is set to
be 1 second that is synchronized with Windows driver.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:16 -04:00
Kenneth Feng 5f38ac54e6 drm/amd/pm: fix the high voltage and temperature issue
fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0,
smu 13.0.7 and smu 13.0.10
v2 - fix the code format and make sure it is used on the unload case only.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:16 -04:00
Yifan Zhang a17f574ab4 drm/amdgpu: remove amdgpu_mes_self_test in gpu recover
gpu tlb flush is skipped if reset sem is held, it makes
mes_self_test fail since it involves add_hw_queue/remove_hw_queue
which needs tlb flush functional. Remove mes_self_test in gpu
recover sequence.

This patch is to fix the recover failure in gfx11.

[ 1831.768292] [drm] ring sdma_32769.3.3 was added
[ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass
[ 1831.768337] [drm] ring compute_32769.2.2 ib test pass
[ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process  pid 0 thread  pid 0)
[ 1831.768434] amdgpu 0000:c2:00.0: amdgpu:   in page starting at address 0x0000aec200000000 from client 10
[ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
[ 1831.768473] amdgpu 0000:c2:00.0: amdgpu:      Faulty UTCL2 client ID: CPC (0x5)
[ 1831.768489] amdgpu 0000:c2:00.0: amdgpu:      MORE_FAULTS: 0x0
[ 1831.768501] amdgpu 0000:c2:00.0: amdgpu:      WALKER_ERROR: 0x0
[ 1831.768513] amdgpu 0000:c2:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[ 1831.768521] amdgpu 0000:c2:00.0: amdgpu:      MAPPING_ERROR: 0x0
[ 1831.768529] amdgpu 0000:c2:00.0: amdgpu:      RW: 0x0
[ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110)
[ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3

Fixes: e2e3788850 ("drm/amdgpu: rework lock handling for flush_tlb v2")
Reported-by: Li Ma <li.ma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Candice Li e020d01575 drm/amdgpu: Drop deferred error in uncorrectable error check
Drop checking deferred error which can be handled by poison
consumption.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Tao Zhou d1d4c0b7b6 drm/amdgpu: check RAS supported first in ras_reset_error_count
Not all platforms support RAS.

Fixes: 73582be11a ("drm/amdgpu: bypass RAS error reset in some conditions")
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-31 16:40:15 -04:00
Dave Airlie 631808095a amd-drm-next-6.7-2023-10-27:
amdgpu:
 - RAS fixes
 - Seamless boot fixes
 - NBIO 7.7 fix
 - SMU 14.0 fixes
 - GC 11.5 fixes
 - DML2 fixes
 - ASPM fixes
 - VPE fixes
 - Misc code cleanups
 - SRIOV fixes
 - Add some missing copyright notices
 - DCN 3.5 fixes
 - FAMS fixes
 - Backlight fix
 - S/G display fix
 - fdinfo cleanups
 - EXT_COHERENT fixes for APU and NUMA systems
 
 amdkfd:
 - Misc fixes
 - Misc code cleanups
 - SVM fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTwWOAAKCRC93/aFa7yZ
 2Lz3AP0cInVS4qZYTZCh2O/k5AoidJjcmRl2DVm8OdowBPCa4wEAoNsekTIQnZsI
 Ru4SoVKhT2bs1LEMOcdzexsVrwlaxA8=
 =G3QX
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-10-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-10-27:

amdgpu:
- RAS fixes
- Seamless boot fixes
- NBIO 7.7 fix
- SMU 14.0 fixes
- GC 11.5 fixes
- DML2 fixes
- ASPM fixes
- VPE fixes
- Misc code cleanups
- SRIOV fixes
- Add some missing copyright notices
- DCN 3.5 fixes
- FAMS fixes
- Backlight fix
- S/G display fix
- fdinfo cleanups
- EXT_COHERENT fixes for APU and NUMA systems

amdkfd:
- Misc fixes
- Misc code cleanups
- SVM fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com
2023-10-31 12:37:19 +10:00
Dave Airlie 915b6d034b Merge tag 'drm-misc-next-2023-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.7-rc1:

drm-misc-next-2023-10-19 + following:

UAPI Changes:

Cross-subsystem Changes:
- Convert fbdev drivers to use fbdev i/o mem helpers.

Core Changes:
- Use cross-references for macros in docs.
- Make drm_client_buffer_addb use addfb2.
- Add NV20 and NV30 YUV formats.
- Documentation updates for create_dumb ioctl.
- CI fixes.
- Allow variable number of run-queues in scheduler.

Driver Changes:
- Rename drm/ast constants.
- Make ili9882t its own driver.
- Assorted fixes in ivpu, vc4, bridge/synopsis, amdgpu.
- Add planar formats to rockchip.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3d92fae8-9b1b-4165-9ca8-5fda11ee146b@linux.intel.com
2023-10-31 10:47:50 +10:00
Umio Yasuno dd3dd9829b drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
Remove unused variables from amdgpu_show_fdinfo

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:23:01 -04:00
Rob Clark e8e696c307 drm/amdgpu: Remove duplicate fdinfo fields
Some of the fields that are handled by drm_show_fdinfo() crept back in
when rebasing the patch.  Remove them again.

Fixes: 376c25f8ca ("drm/amdgpu: Switch to fdinfo helper")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: <alexander.deucher@amd.com>
Co-developed-by: Umio Yasuno <coelacanth_dream@protonmail.com>
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:23:01 -04:00
Kenneth Feng 3ea8dd3758 drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
avoid to disable gfxhub interrupt when driver is unloaded on gmc 11

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:15:39 -04:00
David Francis 142262a1c0 drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and
MTYPE_UC for nonlocal memory.

On NUMA systems, local memory gets the local mtype, set by an
override callback. If EXT_COHERENT is set, memory will be set as
MTYPE_UC by default, with local memory MTYPE_CC.

Add an option in the override function for this case, and
add a check to ensure it is not used on UNCACHED memory.

V2: Combined APU and NUMA code into one patch
V3: Fixed a potential nullptr in amdgpu_vm_bo_update

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:15:16 -04:00
Candice Li a395f7ffce drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
Retrieve correctable error count from ce_count_lo_chip instead of
mca_umc_status.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:15:10 -04:00
Candice Li d59fcfb084 drm/amdgpu: Identify data parity error corrected in replay mode
Use ErrorCodeExt field to identify data parity error in replay mode.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:15:03 -04:00
Mukul Joshi f7a17b2b36 drm/amdgpu: Fix typo in IP discovery parsing
Fix a typo in parsing of the GC info table header when
reading the IP discovery table.

Fixes: 0e64c9aad0 ("drm/amdgpu: add type conversion for gc info")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27 14:13:27 -04:00
Dave Airlie 44117828ed amd-drm-fixes-6.6-2023-10-25:
amdgpu:
 - Extend VI APSM quirks to more platforms
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTngEQAKCRC93/aFa7yZ
 2AJrAP432Pqn+TmmQnm564esdTzy/aH8m/5eeB5KSTE3SQt+agD/bAoDPRCk/jgi
 PMgVEWp8pQo1Kezzcb/+iJq0tUfxQgc=
 =kRQ2
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.6-2023-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.6-2023-10-25:

amdgpu:
- Extend VI APSM quirks to more platforms

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026035452.14921-1-alexander.deucher@amd.com
2023-10-27 12:17:26 +10:00
Dave Airlie 6366ffa6ed Short summary of fixes pull:
amdgpu:
 - ignore duplicated BOs in CS parser
 - remove redundant call to amdgpu_ctx_priority_is_valid()
 
 amdkfd:
 - reserve fence slot while locking BO
 
 dp_mst:
 - Fix NULL deref in get_mst_branch_device_by_guid_helper()
 
 logicvc:
 - Kconfig: Select REGMAP and REGMAP_MMIO
 
 ivpu:
 - Fix missing VPUIP interrupts
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmU6RugACgkQaA3BHVML
 eiOH1ggAsc+3EcnoNbeKuPRTefAGb7xsoljmyY0EJriSQZtiropmVGpXYmLSSSSC
 4x8TLztWsDXkM/+AKJEQWT56egKwP4Z+QgASOo25p050z56IlPScSsENasuwumW1
 vI/hlUQRkDjAixEd6WHcD/hGPu9kKNPhS/DUVNeWYr2237rFJChHg+hWw3PycTzm
 2aD8fx1275sx6/OXkKW5bQKmshjHtGYnT86w3xgAJUfvTDPq/eZtwqCKhqCqaQja
 5tAccKVmnWAaXneCvs7XwAXYou1Qny3GOBOOpCrBQYaj49N9cMyflovY2y2s1uyR
 pQyquVzLZLVhFHc2B58++49NyES8bw==
 =d+WA
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-10-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

amdgpu:
- ignore duplicated BOs in CS parser
- remove redundant call to amdgpu_ctx_priority_is_valid()

amdkfd:
- reserve fence slot while locking BO

dp_mst:
- Fix NULL deref in get_mst_branch_device_by_guid_helper()

logicvc:
- Kconfig: Select REGMAP and REGMAP_MMIO

ivpu:
- Fix missing VPUIP interrupts

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231026110132.GA10591@linux-uq9g.fritz.box
2023-10-27 11:51:35 +10:00
Lijo Lazar d055714a21 drm/amdgpu: Use pcie domain of xcc acpi objects
PCI domain/segment information of xccs is available through ACPI DSM
methods. Consider that also while looking for devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 19:04:39 -04:00
Lijo Lazar 3f69d5860f drm/amdgpu: Add a read to GFX v9.4.3 ring test
Issue a read to confirm the register write before ringing doorbell. With
multiple XCCs there is chance for race condition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 19:04:13 -04:00
Tao Zhou 2cea7bb911 drm/amdgpu: get RAS poison status from DF v4_6_2
Add DF block and RAS poison mode query for DF v4_6_2.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 19:02:52 -04:00
Lijo Lazar dd2687f5d9 drm/amdgpu: Use discovery table's subrevision
Use subrevision of IP version in discovery table to identify SOC
revision id for NBIO v7.9 SOCs. Only newer bootloaders update
subrevision field.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 19:02:44 -04:00
Mario Limonciello 2757a848cb drm/amd: Explicitly disable ASPM when dynamic switching disabled
Currently there are separate but related checks:
* amdgpu_device_should_use_aspm()
* amdgpu_device_aspm_support_quirk()
* amdgpu_device_pcie_dynamic_switching_supported()

Simplify into checking whether DPM was enabled or not in the auto
case.  This works because amdgpu_device_pcie_dynamic_switching_supported()
populates that value.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:23 -04:00
Mario Limonciello 1a6513de49 drm/amd: Move AMD_IS_APU check for ASPM into top level function
There is no need for every ASIC driver to perform the same check.
Move the duplicated code into amdgpu_device_should_use_aspm().

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:23 -04:00
Mario Limonciello fbf1035b03 drm/amd: Disable PP_PCIE_DPM_MASK when dynamic speed switching not supported
Rather than individual ASICs checking for the quirk, set the quirk at the
driver level.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:23 -04:00
Lin.Cao 9ee819285c drm/amdgpu remove restriction of sriov max_pfn on Vega10
Remove restriction of sriov max_pfn so that TBA and TMA can move to high
47 bits address.

Regression test: change range alloc flag of libdrm as
AMDGPU_VA_RANGE_HIGH and there is no flr occur when testing amdgpu_test
of drm.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Qu Huang 5104fdf50d drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL
In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log:

1. Navigate to the directory: /sys/kernel/debug/dri/0
2. Execute command: cat amdgpu_regs_smc
3. Exception Log::
[4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000
[4005007.702562] #PF: supervisor instruction fetch in kernel mode
[4005007.702567] #PF: error_code(0x0010) - not-present page
[4005007.702570] PGD 0 P4D 0
[4005007.702576] Oops: 0010 [#1] SMP NOPTI
[4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G           OE     5.15.0-43-generic #46-Ubunt       u
[4005007.702590] RIP: 0010:0x0
[4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.702622] FS:  00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.702626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0
[4005007.702633] Call Trace:
[4005007.702636]  <TASK>
[4005007.702640]  amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu]
[4005007.703002]  full_proxy_read+0x5c/0x80
[4005007.703011]  vfs_read+0x9f/0x1a0
[4005007.703019]  ksys_read+0x67/0xe0
[4005007.703023]  __x64_sys_read+0x19/0x20
[4005007.703028]  do_syscall_64+0x5c/0xc0
[4005007.703034]  ? do_user_addr_fault+0x1e3/0x670
[4005007.703040]  ? exit_to_user_mode_prepare+0x37/0xb0
[4005007.703047]  ? irqentry_exit_to_user_mode+0x9/0x20
[4005007.703052]  ? irqentry_exit+0x19/0x30
[4005007.703057]  ? exc_page_fault+0x89/0x160
[4005007.703062]  ? asm_exc_page_fault+0x8/0x30
[4005007.703068]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[4005007.703075] RIP: 0033:0x7f5e07672992
[4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f        1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e       c 28 48 89 54 24
[4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992
[4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003
[4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010
[4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[4005007.703105]  </TASK>
[4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_       iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t       tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm       i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo       mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v       2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core        drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca
[4005007.703184] CR2: 0000000000000000
[4005007.703188] ---[ end trace ac65a538d240da39 ]---
[4005007.800865] RIP: 0010:0x0
[4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.800891] FS:  00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.800895] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0

Signed-off-by: Qu Huang <qu.huang@linux.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Tao Zhou 73582be11a drm/amdgpu: bypass RAS error reset in some conditions
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.

v2: add check for ras->in_recovery, it's set earlier than
amdgpu_in_reset.

v3: fix error in gpu reset check.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Tao Zhou f3a3bbf156 drm/amdgpu: enable RAS poison mode for APU
Enable it by default on APU platform.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Lang Yu fc4981b69c drm/amdgpu/vpe: correct queue stop programing
Otherwise IB test would fail during GPU reset.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Mario Limonciello e5f52a84bf drm/amd: Disable ASPM for VI w/ all Intel systems
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.

Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.

Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:22 -04:00
Lijo Lazar 8eece69ace drm/amdgpu: Add API to get full IP version
Fetch the full version of IP including variant and subrevision.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:21 -04:00
Jiadong Zhu 037fb9c600 drm/amdgpu: add tmz support for GC IP v11.5.0
Add tmz support for GC 11.5.0.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:21 -04:00
Li Ma 493c75bbe3 drm/amdgpu: modify if condition in nbio_v7_7.c
remove unnecessary "enable" in if condition.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:21 -04:00
Yang Wang ec3e0a9167 drm/amdgpu: refine ras error kernel log print
refine ras error kernel log to avoid user-ridden ambiguity.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:21 -04:00
Yang Wang 53d4d77927 drm/amdgpu: fix find ras error node error
the origin function might return the wrong node.

Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-26 18:41:21 -04:00
Alex Deucher b70438004a drm/amdgpu: move buffer funcs setting up a level
Rather than doing this in the IP code for the SDMA paging
engine, move it up to the core device level init level.
This should fix the scheduler init ordering.

v2: drop extra parens
v3: drop SDMA helpers
v4: Added a Fixes tag because amdgpu dereferences an uninitialized
    scheduler without this patch, and this patch fixes this. (Luben)

Tested-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20231025171928.3318505-1-alexander.deucher@amd.com
Acked-by: Christian König <christian.koenig@amd.com>
Fixes: 56e449603f ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-10-26 16:04:24 -04:00
Luben Tuikov 56e449603f drm/sched: Convert the GPU scheduler to variable number of run-queues
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.

Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Danilo Krummrich <dakr@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Emma Anholt <emma@anholt.net>
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-10-26 12:03:47 -04:00
Mario Limonciello 64ffd2f1d0 drm/amd: Disable ASPM for VI w/ all Intel systems
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.

Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.

Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-25 09:53:17 -04:00
Dave Airlie 0ecf4aa32b amd-drm-next-6.7-2023-10-20:
amdgpu:
 - SMU 13 updates
 - UMSCH updates
 - DC MPO fixes
 - RAS updates
 - MES 11 fixes
 - Fix possible memory leaks in error pathes
 - GC 11.5 fixes
 - Kernel doc updates
 - PSP updates
 - APU IMU fixes
 - Misc code cleanups
 - SMU 11 fixes
 - OD fix
 - Frame size warning fixes
 - SR-IOV fixes
 - NBIO 7.11 updates
 - NBIO 7.7 updates
 - XGMI fixes
 - devcoredump updates
 
 amdkfd:
 - Misc code cleanups
 - SVM fixes
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZTLX4QAKCRC93/aFa7yZ
 2ORJAP9lnoyvUPIP63Hx5TADQtZHiA+ShkATGQmDia94ABtCxwEAlo88TipxAo7c
 tRX8Mn+rix3M739FDFxV0bp7hCXsbgQ=
 =fY9I
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-10-20:

amdgpu:
- SMU 13 updates
- UMSCH updates
- DC MPO fixes
- RAS updates
- MES 11 fixes
- Fix possible memory leaks in error pathes
- GC 11.5 fixes
- Kernel doc updates
- PSP updates
- APU IMU fixes
- Misc code cleanups
- SMU 11 fixes
- OD fix
- Frame size warning fixes
- SR-IOV fixes
- NBIO 7.11 updates
- NBIO 7.7 updates
- XGMI fixes
- devcoredump updates

amdkfd:
- Misc code cleanups
- SVM fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020195043.4937-1-alexander.deucher@amd.com
2023-10-25 10:54:22 +10:00
Christian König 4984fc578a drm/amdkfd: reserve a fence slot while locking the BO
Looks like the KFD still needs this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 8abc1eb298 ("drm/amdkfd: switch over to using drm_exec v3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-23 14:48:47 +02:00
Dave Airlie 7cd62eab9b Linux 6.6-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmU1ngkeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGrsIH/0k/+gdBBYFFdEym
 foRhKir9WV3ZX4oIozJjA1f7T+qVYclKs6kaYm3gNepRBb6AoG8pdgv4MMAqhYsf
 QMe2XHi0MrO/qKBgfNfivxEa9jq+0QK5uvTbqCRqCAB8LfwVyDqapCmg3EuiZcPW
 UbMITmnwLIfXgPxvp9rabmCsTqO6FLbf0GDOVIkNSAIDBXMpcO1iffjrWUbhRa7n
 oIoiJmWJLcXLxPWDsRKbpJwzw2cIG08YhfQYAiQnC3YaeRm1FKLDIICRBsmfYzja
 rWv9r4dn4TDfV4/AnjggQnsZvz2yPCxNaFSQIT88nIeiLvyuUTJ9j8aidsSfMZQf
 xZAbzbA=
 =NoQv
 -----END PGP SIGNATURE-----

BackMerge tag 'v6.6-rc7' into drm-next

This is needed to add the msm pr which is based on a higher base.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-10-23 18:20:06 +10:00
Luben Tuikov d3df66fd98 drm/amdgpu: Remove redundant call to priority_is_valid()
Remove a redundant call to amdgpu_ctx_priority_is_valid() from
amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is
called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where
we've called amdgpu_ctx_priority_is_valid() already first thing in the
function.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20231018010359.30393-1-luben.tuikov@amd.com
2023-10-21 20:27:15 -04:00
André Almeida de009982c6 drm/amdgpu: Create version number for coredumps
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.

Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:29 -04:00
André Almeida 69619868d3 drm/amdgpu: Move coredump code to amdgpu_reset file
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:29 -04:00
André Almeida 2d6a2a28cd drm/amdgpu: Encapsulate all device reset info
To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang 723fac64d0 drm/amdgpu: support the port num info based on the capability flag
XGMI TA will set the capability flag to indicate whether the port_num
info is supported or not. KGD checks the flag and accordingly picks up
the right buffer format and send the right command to TA to retrieve
the info.

v2: simplify the code by reusing the same statement (lijo)

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang e8a5ded36b drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command
Per the xgmi ta implementation, KGD needs to fill in node_ids
in concern into the shared command output buffer rather than the
command input buffer.

Input buffer is not used for GET_PEER_LINKS command execution.

In this way, xgmi ta can reuse the node info in the output buffer
just filled in and populate the same buffer with link info directly.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou d9443ac4f9 drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8
PMFW will be responsible for them.

v2: remove query interfaces.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Shiwu Zhang 626121fce4 drm/amdgpu: update the xgmi ta interface header
Update the header file to the v20.00.00.13

v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to
TA_COMMAND_XGMI__GET_TOPOLOGY_INFO

And also rename struct ta_xgmi_cmd_get_peer_link_info_output to
ta_xgmi_cmd_get_peer_link_info accordingly

v2: add structs to support xgmi GET_EXTEND_PEER_LINK command

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou 8096df7664 drm/amdgpu: add set/get mca debug mode operations
Record the debug mode status in RAS.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Tao Zhou 21226f02d7 drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count
Simplify the code.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Li Ma 9d7a965e22 drm/amdgpu: add clockgating support for NBIO v7.7.1
add clockgating support for NBIO ip 7.7.1

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Li Ma fa9dd7a285 drm/amdgpu: fix missing stuff in NBIO v7.11
add get_clockgating_state, update_medium_grain_light_sleep and
update_medium_grain_clock_gating in nbio_v7_11_funcs
v1:
add missing funcs in nbio_v7_11.c
v2:
modify the if condition and add spport for nbio v7.11 clockgating.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Stanley.Yang 66d64e4e03 drm/amdgpu: Enable RAS feature by default for APU
Enable RAS feature by default for aqua vanjaram on apu
platform.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Yang Wang 49c260bef3 drm/amdgpu: fix typo for amdgpu ras error data print
typo fix.

Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:28 -04:00
Bokun Zhang 017634a68d drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P4
- In VCN 4 SRIOV code path, add code to enable RB decouple feature

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang eb9d6256b9 drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P3
- Update VCN header for RB decouple feature
- Add metadata struct, metadata will be placed after each RB

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang fc3136730b drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P2
- Add function to check if RB decouple is enabled under SRIOV

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Bokun Zhang 97b2821643 drm/amd/amdgpu/vcn: Add RB decouple feature under SRIOV - P1
- Update SRIOV header with RB decouple flag

Signed-off-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Stanley.Yang 8a65661114 drm/amdgpu: Fix delete nodes that have been relesed
Fix delete nodes that it has been freed.

Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Hawking Zhang f2176d7063 drm/amdgpu: Add UVD_VCPU_INT_EN2 to dpg sram
Add RAS sepcifc programming to dpg sram.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:27 -04:00
Hawking Zhang 9248462d7e drm/amdgpu: Enable software RAS in vcn v4_0_3
Set VCN/JPEG RAS masks to enable software RAS for
VCN and JPEG.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Tao Zhou 472c5fb297 drm/amdgpu: define ras_reset_error_count function
Make the code architecture more simple.

v2: reuse ras_reset_error_count in ras_reset_error_status.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Candice Li afcf949cf3 drm/amdgpu: Log UE corrected by replay as correctable error
Support replay mode where UE could be converted to CE.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20 15:11:26 -04:00
Dave Airlie d43c76c820 Short summary of fixes pull:
amdgpu:
 - Disable AMD_CTX_PRIORITY_UNSET
 
 bridge:
 - ti-sn65dsi86: Fix device lifetime
 
 edid:
 - Add quirk for BenQ GW2765
 
 ivpu:
 - Extend address range for MMU mmap
 
 nouveau:
 - DP-connector fixes
 - Documentation fixes
 
 panel:
 - Move AUX B116XW03 into panel-simple
 
 scheduler:
 - Eliminate DRM_SCHED_PRIORITY_UNSET
 
 ttm:
 - Fix possible NULL-ptr deref in cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmUxFsoACgkQaA3BHVML
 eiNkkwf/YORvSPLN2LEqh4P8to7X92moFR7XwkA1GVofKOC3C04wDBq+Ezdy5pnw
 7T/uvEnmrnDP5Iucerly8bE5IdJGkafIB8+oBIcGhGvPgtYNGSV8NWoeA/lhBmsS
 av56+zBPpBF3s5fZa4rB/WciGRuCsLhbcYUN/LGGVBXdhLgb8LRb/seTv/Ah9Sra
 LQtFmrDNNE0+FIWuKkUsY/CQYysbMzuHYFiLumtY59lE1R5kyvlzE8OrdcqlDXhC
 HCFt0KuA+4onzdHKnge/as5T3jBJucGV9mTcgal3158n94U/ODrXEJqVD/KFKle6
 052vhpApOL3+RyDKGjXQLm/GumEc9w==
 =LJB1
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-10-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Short summary of fixes pull:

amdgpu:
- Disable AMD_CTX_PRIORITY_UNSET

bridge:
- ti-sn65dsi86: Fix device lifetime

edid:
- Add quirk for BenQ GW2765

ivpu:
- Extend address range for MMU mmap

nouveau:
- DP-connector fixes
- Documentation fixes

panel:
- Move AUX B116XW03 into panel-simple

scheduler:
- Eliminate DRM_SCHED_PRIORITY_UNSET

ttm:
- Fix possible NULL-ptr deref in cleanup

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231019114605.GA22540@linux-uq9g
2023-10-20 14:07:58 +10:00
Felix Kuehling 316baf09d3 drm/amdgpu: Reserve fences for VM update
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:56:57 -04:00
Felix Kuehling 51b79f3381 drm/amdgpu: Fix possible null pointer dereference
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 1802537820 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:56:50 -04:00
Felix Kuehling 207430b76a drm/amdgpu: Reserve fences for VM update
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Felix Kuehling e6f8588733 drm/amdgpu: Fix possible null pointer dereference
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 1802537820 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Stanley.Yang b1338a8e71 drm/amdgpu: Workaround to skip kiq ring test during ras gpu recovery
This is workaround, kiq ring test failed in suspend stage when do ras
recovery.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:52 -04:00
Mario Limonciello e56690bb37 drm/amd: Read IMU FW version from scratch register during hw_init
If the IMU version wasn't discovered from the header, such as when
the firmware was directly loaded by PSP then there is no firmware
version to show to userspace from sysfs or IOCTL.

The IMU F/W stores the version in the first scratch register though,
so fetch it in these cases to let the driver export.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mario Limonciello 4916615fe9 drm/amd: Don't parse IMU ucode version if it won't be loaded
When the IMU ucode is loaded by the PSP parsing the version that comes from
Linux will vary. Rather than showing the wrong data to kernel interface
consumers, avoid populating it in this case.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mario Limonciello d757dfd667 drm/amd: Move microcode init step to early_init()
The intention for early init is to find any missing microcode early
and fail the driver load if it's missing.  Move this step to earlier
in driver init to match other IP blocks.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Asad Kamal d8c1925ba8 drm/amdgpu: update retry times for psp BL wait
Increase retry time for PSP BL wait, to compensate
for longer time to set c2pmsg 35 ready bit during
mode1 with RAS

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Alex Deucher 28ab9a02b6 drm/amdgpu/mes11: remove aggregated doorbell code
It's not enabled in hardware so the code is dead.
Remove it.

Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Asad Kamal 53dd920c1f drm/amdgpu : Add hive ras recovery check
If one of the devices in the hive detects a
fatal error, need to send ras recovery reset
message to PMFW of all devices in the hive.
For that add a flag in hive to indicate that
it's undergoing ras recovery

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:51 -04:00
Mangesh Gadre 2d955a06a5 Revert "drm/amdgpu: Program xcp_ctl registers as needed"
This reverts commit 0bdebfef3f.

XCP_CTL register is programmed by firmware and
register access is protected.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:50 -04:00
Lang Yu ab29ac57ad drm/amdgpu/umsch: add suspend and resume callback
Add missing IP callbacks.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-19 18:26:50 -04:00
Christian König 6b18ef481f drm/amdgpu: ignore duplicate BOs again
Looks like RADV is actually hitting this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: ca6c1e210a ("drm/amdgpu: use the new drm_exec object for CS v3")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231017121015.1336786-1-christian.koenig@amd.com
2023-10-19 13:19:44 +02:00
Dave Airlie 27442758e9 amd-drm-next-6.7-2023-10-13:
amdgpu:
 - DC replay fixes
 - Misc code cleanups and spelling fixes
 - Documentation updates
 - RAS EEPROM Updates
 - FRU EEPROM Updates
 - IP discovery updates
 - SR-IOV fixes
 - RAS updates
 - DC PQ fixes
 - SMU 13.0.6 updates
 - GC 11.5 Support
 - NBIO 7.11 Support
 - GMC 11 Updates
 - Reset fixes
 - SMU 11.5 Updates
 - SMU 13.0 OD support
 - Use flexible arrays for bo list handling
 - W=1 Fixes
 - SubVP fixes
 - DPIA fixes
 - DCN 3.5 Support
 - Devcoredump fixes
 - VPE 6.1 support
 - VCN 4.0 Updates
 - S/G display fixes
 - DML fixes
 - DML2 Support
 - MST fixes
 - VRR fixes
 - Enable seamless boot in more cases
 - Enable content type property for HDMI
 - OLED fixes
 - Rework and clean up GPUVM TLB flushing
 - DC ODM fixes
 - DP 2.x fixes
 - AGP aperture fixes
 - SDMA firmware loading cleanups
 - Cyan Skillfish GPU clock counter fix
 - GC 11 GART fix
 - Cache GPU fault info for userspace queries
 - DC cursor check fixes
 - eDP fixes
 - DC FP handling fixes
 - Variable sized array fixes
 - SMU 13.0.x fixes
 - IB start and size alignment fixes for VCN
 - SMU 14 Support
 - Suspend and resume sequence rework
 - vkms fix
 
 amdkfd:
 - GC 11 fixes
 - GC 10 fixes
 - Doorbell fixes
 - CWSR fixes
 - SVM fixes
 - Clean up GC info enumeration
 - Rework memory limit handling
 - Coherent memory handling fixes
 - Use partial migrations in GPU faults
 - TLB flush fixes
 - DMA unmap fixes
 - GC 9.4.3 fixes
 - SQ interrupt fix
 - GTT mapping fix
 - GC 11.5 Support
 
 radeon:
 - Misc code cleanups
 - W=1 Fixes
 - Fix possible buffer overflow
 - Fix possible NULL pointer dereference
 
 UAPI:
 - Add EXT_COHERENT memory allocation flags.  These allow for system scope atomics.
   Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
 - Add support for new VPE engine.  This is a memory to memory copy engine with advanced scaling, CSC, and color management features
   Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713
 - Add INFO IOCTL interface to query GPU faults
   Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238
   Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZSmDAQAKCRC93/aFa7yZ
 2EdeAQC2lkQ9IHLOon5kIZUK+r9IPYlgFsii+qfmMPLBaMcuwgEA8F4eJln/cc9V
 02EKhlapkggYXYa+uhOE2KTnWgMFJgI=
 =SEXq
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.7-2023-10-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.7-2023-10-13:

amdgpu:
- DC replay fixes
- Misc code cleanups and spelling fixes
- Documentation updates
- RAS EEPROM Updates
- FRU EEPROM Updates
- IP discovery updates
- SR-IOV fixes
- RAS updates
- DC PQ fixes
- SMU 13.0.6 updates
- GC 11.5 Support
- NBIO 7.11 Support
- GMC 11 Updates
- Reset fixes
- SMU 11.5 Updates
- SMU 13.0 OD support
- Use flexible arrays for bo list handling
- W=1 Fixes
- SubVP fixes
- DPIA fixes
- DCN 3.5 Support
- Devcoredump fixes
- VPE 6.1 support
- VCN 4.0 Updates
- S/G display fixes
- DML fixes
- DML2 Support
- MST fixes
- VRR fixes
- Enable seamless boot in more cases
- Enable content type property for HDMI
- OLED fixes
- Rework and clean up GPUVM TLB flushing
- DC ODM fixes
- DP 2.x fixes
- AGP aperture fixes
- SDMA firmware loading cleanups
- Cyan Skillfish GPU clock counter fix
- GC 11 GART fix
- Cache GPU fault info for userspace queries
- DC cursor check fixes
- eDP fixes
- DC FP handling fixes
- Variable sized array fixes
- SMU 13.0.x fixes
- IB start and size alignment fixes for VCN
- SMU 14 Support
- Suspend and resume sequence rework
- vkms fix

amdkfd:
- GC 11 fixes
- GC 10 fixes
- Doorbell fixes
- CWSR fixes
- SVM fixes
- Clean up GC info enumeration
- Rework memory limit handling
- Coherent memory handling fixes
- Use partial migrations in GPU faults
- TLB flush fixes
- DMA unmap fixes
- GC 9.4.3 fixes
- SQ interrupt fix
- GTT mapping fix
- GC 11.5 Support

radeon:
- Misc code cleanups
- W=1 Fixes
- Fix possible buffer overflow
- Fix possible NULL pointer dereference

UAPI:
- Add EXT_COHERENT memory allocation flags.  These allow for system scope atomics.
  Proposed userspace: https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88
- Add support for new VPE engine.  This is a memory to memory copy engine with advanced scaling, CSC, and color management features
  Proposed mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25713
- Add INFO IOCTL interface to query GPU faults
  Proposed Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238
  Proposed libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/298

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231013175758.1735031-1-alexander.deucher@amd.com
2023-10-18 16:08:07 +10:00
Luben Tuikov fa8391ad68 gpu/drm: Eliminate DRM_SCHED_PRIORITY_UNSET
Eliminate DRM_SCHED_PRIORITY_UNSET, value of -2, whose only user was
amdgpu. Furthermore, eliminate an index bug, in that when amdgpu boots, it
calls drm_sched_entity_init() with DRM_SCHED_PRIORITY_UNSET, which uses it to
index sched->sched_rq[].

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Link: https://lore.kernel.org/r/20231017035656.8211-2-luben.tuikov@amd.com
2023-10-17 20:35:38 -04:00
Luben Tuikov eab0261967 drm/amdgpu: Unset context priority is now invalid
A context priority value of AMD_CTX_PRIORITY_UNSET is now invalid--instead of
carrying it around and passing it to the Direct Rendering Manager--and it
becomes AMD_CTX_PRIORITY_NORMAL in amdgpu_ctx_ioctl(), the gateway to context
creation.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Link: https://lore.kernel.org/r/20231017035656.8211-1-luben.tuikov@amd.com
2023-10-17 20:35:38 -04:00
Ma Ke cd90511557 drm/amdgpu/vkms: fix a possible null pointer dereference
In amdgpu_vkms_conn_get_modes(), the return value of drm_cvt_mode()
is assigned to mode, which will lead to a NULL pointer dereference
on failure of drm_cvt_mode(). Add a check to avoid null pointer
dereference.

Signed-off-by: Ma Ke <make_ruc2021@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:25 -04:00
Yang Wang 3bba4bc6a0 drm/amdgpu: add RAS error info support for umc_v12_0
add RAS error info support for umc_v12_0.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:11 -04:00
Yang Wang 8736d17a7f drm/amdgpu: add RAS error info support for mmhub_v1_8
add RAS error info support for mmhub_v1_8.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:36:03 -04:00
Yang Wang 156c2814c2 drm/amdgpu: add RAS error info support for gfx_v9_4_3
add RAS error info support for gfx_v9_4_3.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:35:55 -04:00
Yang Wang dd401cd29a drm/amdgpu: add RAS error info support for sdma_v4_4_2.
add RAS error info support for sdma_v4_4_2.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:35:45 -04:00
Yang Wang 5b1270beb3 drm/amdgpu: add ras_err_info to identify RAS error source
introduced "ras_err_info" to better identify a RAS ERROR source.

NOTE:
For legacy chips, keep the original RAS error print format.

v1:
RAS errors may come from different dies during a RAS error query,
therefore, need a new data structure to identify the source of RAS ERROR.

v2:
- use new data structure 'amdgpu_smuio_mcm_config_info' instead of
  ras_err_id (in v1 patch)
- refine ras error dump function name
- refine ras error dump log format

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:35:35 -04:00
Yifan Zhang 6a1c31c7a8 drm/amdgpu: flush the correct vmid tlb for specific pasid
flush the correct vmid tlb for specific pasid on gmc 11.

Fixes: 041a574388 ("drm/amdgpu: fix and cleanup gmc_v11_0_flush_gpu_tlb_pasid")
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:34:29 -04:00
Yang Wang 1a00cfab37 drm/amdgpu: make err_data structure built-in for ras_manager
(No effect outside the ras_mgr data structure)

Since a new member was added to the ras_err_data data structure,
it becomes unreasonable for the ras_mgr instance to contain this data,
because ras mgr only uses the 2 member information of ue_count/ce_count in err_data.

This patch changes the code err_data into built-in structure members,
making the code directly compatible.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:34:13 -04:00
Jesse Zhang e341631f4a drm/amdgpu: disable GFXOFF and PG during compute for GFX9
Temporary workaround to fix issues observed in some compute
applications when GFXOFF is enabled on GFX9.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:56 -04:00
Lang Yu ef2354c70f drm/amdgpu/umsch: fix missing stuff during rebase
These are missed during rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:49 -04:00
Lang Yu fb5b73acf7 drm/amdgpu/umsch: correct IP version format
FW uses IP_VERSION_MAJ_MIN_REV format.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:42 -04:00
Lang Yu 1c1f14a472 drm/amdgpu: don't use legacy invalidation on MMHUB v3.3
Legacy invalidation is not supported.
This is missed during rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:29 -04:00
Lang Yu 4661482b9c drm/amdgpu: correct NBIO v7.11 programing
Use v7.7 before, switch to v7.11 now.
Fix incorrect programing.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:21 -04:00
Xiaogang Chen ffa88b0019 drm/amdgpu: Correctly use bo_va->ref_count in compute VMs
This is needed to correctly handle BOs imported into compute VM from gfx.
Both kfd and gfx should use same bo_va and set bo_va->ref_count correctly
when map the Bos into same VM, otherwise we may trigger kernel general
protection when iterate mappings over bo_va's valids or invalids list.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Tested-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:08 -04:00
Lijo Lazar f20f3b0d6c drm/amd/pm: Add P2S tables for SMU v13.0.6
Add P2S table load support on SMU v13.0.6 ASICs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:33:01 -04:00
Lijo Lazar 79daf69246 drm/amdgpu: Add support to load P2S tables
Add support to load P2S tables through PSP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:32:55 -04:00
Lijo Lazar cd21cb1fcb drm/amdgpu: Update PSP interface header
Adds FW id for P2S table.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:32:47 -04:00
Lijo Lazar a8558fce7a drm/amdgpu: Avoid FRU EEPROM access on APU
FRU EEPROM access is not valid for APU devices.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:32:41 -04:00
Lin.Cao f74f19c440 drm/amdgpu: save VCN instances init info before jpeg init
JPEG init header will overwirte vcn init header info which will loss
some debug information

Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:32:34 -04:00
Alex Hung 98a80bb3dd Revert "drm/amd/display: Hande writeback request from userspace"
This reverts commit cd1a4bc228.

[WHY & HOW]
The writeback series cause a regression in thunderbolt display.

Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:24:47 -04:00
Alex Hung 731a20cb89 Revert "drm/amd/display: Add writeback enable field (wb_enabled)"
This reverts commit f6893fcb10.

[WHY & HOW]
The writeback series cause a regression in thunderbolt display.

Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:08:57 -04:00
Asad Kamal 625e5f3851 drm/amdgpu: Expose ras version & schema info
Expose ras table version & schema info to sysfs

v2: Updated schema to get poison support info
from ras context, removed asic specific checks

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:02:54 -04:00
Lijo Lazar d4a02673b3 drm/amdgpu: Read PSPv13 OS version from register
PSP OS updates the version information in register. On APUs with PSPv13,
PSP OS will already be loaded with SBIOS. Hence use the version register
instead of using information in driver binary header.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:02:43 -04:00
Lang Yu faeddb6eab drm/amdgpu/umsch: enable doorbell for umsch
Program vcn_doorbell_range with vcn_ring0_1.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:01:37 -04:00
Mario Limonciello db99889065 drm/amd: Split up UVD suspend into prepare and suspend steps
amdgpu_uvd_suspend() allocates memory and copies objects into that
allocated memory.  This fails under memory pressure.  Instead move
majority of this code into a prepare step when swap can still be
allocated.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:01:04 -04:00
Mario Limonciello cb11ca3233 drm/amd: Add concept of running prepare_suspend() sequence for IP blocks
If any IP blocks allocate memory during their hw_fini() sequence
this can cause the suspend to fail under memory pressure.  Introduce
a new phase that IP blocks can use to allocate memory before suspend
starts so that it can potentially be evicted into swap instead.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:00:58 -04:00
Mario Limonciello 5095d54181 drm/amd: Evict resources during PM ops prepare() callback
Linux PM core has a prepare() callback run before suspend.

If the system is under high memory pressure, the resources may need
to be evicted into swap instead.  If the storage backing for swap
is offlined during the suspend() step then such a call may fail.

So move this step into prepare() to move evict majority of
resources and update all non-pmops callers to call the same callback.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:00:18 -04:00
Li Ma 31715a8620 drm/amdgpu: enable GFX IP v11.5.0 CG and PG support
Add CG support for GFX/MC/HDP/ATHUB/IH/BIF.
Add PG support for GFX.

Signed-off-by: Li Ma <li.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:00:15 -04:00
Li Ma ad3e54ab9e drm/amdgpu/discovery: add SMU 14 support
add smu 14 into the IP discovery list.

Signed-off-by: Li Ma <li.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 11:00:00 -04:00
Lang Yu 4acf679f86 drm/amdgpu/umsch: power on/off UMSCH by DLDO
VCN 4.0.5 uses DLDO.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:59:32 -04:00
Lang Yu 617b472431 drm/amdgpu/umsch: fix psp frontdoor loading
These changes are missed in rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:59:24 -04:00
Lijo Lazar 558fcb7d11 drm/amdgpu: Increase IP discovery region size
IP discovery region has increased to > 8K on some SOCs.Maximum reserve
size is upto 12K, but not used. For now increase to 10K.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:59:16 -04:00
Lin.Cao b053117e86 drm/amdgpu: Return -EINVAL when MMSCH init status incorrect
Return -EINVAL when MMSCH init fail which can be handle by function
amdgpu_device_reset_sriov correctly.

Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:58:48 -04:00
Lang Yu 9a37f65c4e drm/amdgpu/vpe: fix insert_nop ops
Avoid infinite loop when count is 0.
This is missed in rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:58:33 -04:00
Srinivasan Shanmugam 54967d5683 drm/amdgpu: Address member 'gart_placement' not described in 'amdgpu_gmc_gart_location'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c:274: warning: Function parameter or member 'gart_placement' not described in 'amdgpu_gmc_gart_location'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:58:22 -04:00
Lang Yu 84aa39ab1e drm/amdgpu/vpe: align with mcbp changes
MCBP is decided by adev->gfx.mcbp now.
This is missed in rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:58:13 -04:00
Lang Yu 99ea82f424 drm/amdgpu/vpe: remove IB end boundary requirement
Remove IB end boundary requirement,
VPE has no such limitions, use existing
amdgpu_ring_generic_pad_ib() instead.
This is missed in rebase.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:58:01 -04:00
Jay Cornwall 757920585d drm/amdgpu: Improve MES responsiveness during oversubscription
When MES is oversubscribed it may not frequently check for new
command submissions from driver if the scheduling load is high.
Response latency as high as 5 seconds has been observed.

Enable a flag which adds a check for new commands between
scheduling quantums.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Cc: Alexandru Tudor <alexandru.tudor@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:57:46 -04:00
Thomas Zimmermann 57390019b6 Merge drm/drm-next into drm-misc-next
Updating drm-misc-next to the state of Linux v6.6-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-10-11 09:50:59 +02:00
Icenowy Zheng 3806a8c647 drm/amdgpu: fix SI failure due to doorbells allocation
SI hardware does not have doorbells at all, however currently the code
will try to do the allocation and thus fail, makes SI AMDGPU not usable.

Fix this failure by skipping doorbells allocation when doorbells count
is zero.

Fixes: 54c30d2a8d ("drm/amdgpu: create kernel doorbell pages")
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 17:59:29 -04:00
Christian König ff89f064dc drm/amdgpu: add missing NULL check
bo->tbo.resource can easily be NULL here.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2902
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2023-10-09 17:59:29 -04:00
Icenowy Zheng 219223eca4 drm/amdgpu: fix SI failure due to doorbells allocation
SI hardware does not have doorbells at all, however currently the code
will try to do the allocation and thus fail, makes SI AMDGPU not usable.

Fix this failure by skipping doorbells allocation when doorbells count
is zero.

Fixes: 54c30d2a8d ("drm/amdgpu: create kernel doorbell pages")
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 17:02:52 -04:00
Aaron Liu ce862c4995 drm/amdgpu/discovery: enable DCN 3.5.0 support
Enable DCN 3.5.0 support.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 17:02:46 -04:00
Arvind Yadav 367a0af433 drm/amdkfd: get doorbell's absolute offset based on the db_size
Here, Adding db_size in byte to find the doorbell's
absolute offset for both 32-bit and 64-bit doorbell sizes.
So that doorbell offset will be aligned based on the doorbell
size.

v2:
- Addressed the review comment from Felix.
v3:
- Adding doorbell_size as parameter to get db absolute offset.
v4:
  Squash the two patches into one.

Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 17:02:34 -04:00
Christian König 31220ee9dc drm/amdgpu: add missing NULL check
bo->tbo.resource can easily be NULL here.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2902
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
2023-10-09 17:01:32 -04:00
Yifan Zhang 061863e5db drm/amdgpu: add hub->ctx_distance in setup_vmid_config
add hub->ctx_distance when read CONTEXT1_CNTL, align w/
write back operation.

v2: fix coding style errors reported by checkpatch.pl (Christian)

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:59:06 -04:00
Stanley.Yang 80285ae1ec drm/amdgpu: Fix potential null pointer derefernce
The amdgpu_ras_get_context may return NULL if device
not support ras feature, so add check before using.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:52:46 -04:00
Lijo Lazar b3e73b5a8f Documentation/amdgpu: Add FRU attribute details
Add documentation for the newly added manufacturer and fru_id attributes
in sysfs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:52:25 -04:00
Lijo Lazar ac6b1f275f drm/amdgpu: Add more FRU field information
Add support to read Manufacturer Name and FRU File Id fields. Also add
sysfs device attributes for external usage.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:52:17 -04:00
Lijo Lazar 8a2b51392a drm/amdgpu: Refactor FRU product information
Keep FRU related information together in a separate structure.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:52:08 -04:00
Yang Wang be2e8aca06 drm/amdgpu: enable FRU device for SMU v13.0.6
v1:
enable GFX v9.4.3 FRU device to query board information.

v2:
use MP1 version to identify different asic

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:51:58 -04:00
Boyuan Zhang 6cb8e3ee3a drm/amdgpu: update ib start and size alignment
Update IB starting address alignment and size alignment with correct values
for decode and encode IPs.

Decode IB starting address alignment: 256 bytes
Decode IB size alignment: 64 bytes
Encode IB starting address alignment: 256 bytes
Encode IB size alignment: 4 bytes

Also bump amdgpu driver version for this update.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-09 16:51:39 -04:00
Kees Cook c8e7df374b drm/amdgpu: Annotate struct amdgpu_bo_list with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct amdgpu_bo_list.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-hardening@vger.kernel.org
Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1]
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Srinivasan Shanmugam e0a3e7bf62 drm/amdgpu: Drop unnecessary return statements
There is no reason to call return at the end of function that
returns void.

Fixes the below:

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected functions.

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Lijo Lazar 4798db85b7 Documentation/amdgpu: Add board info details
Add documentation for board info sysfs attribute.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Lijo Lazar 76da73f026 drm/amdgpu: Add sysfs attribute to get board info
Add a sysfs attribute which shows the board form factor like OAM or
CEM.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Lijo Lazar b0a4553336 drm/amdgpu: Get package types for smuio v13.0
Add support to query package types supported in smuio v13.0 ASICs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Lijo Lazar 4365d2ed09 drm/amdgpu: Add more smuio v13.0.3 package types
Expand support to get other board types like OAM or CEM.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Sathishkumar S cbad0dd13a drm/amdgpu: fix ip count query for xcp partitions
fix wrong ip count INFO on spatial partitions. update the query
to return the instance count corresponding to the partition id.

v2:
 initialize variables only when required to be (Christian)
 move variable declarations to the beginning of function (Christian)

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Lijo Lazar 28a3f49609 drm/amdgpu: Move package type enum to amdgpu_smuio
Move definition of package type to amdgpu_smuio header and add new
package types for CEM and OAM.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Srinivasan Shanmugam 2b6b29f33f drm/amdgpu: Fix complex macros error
Fixes the below:

ERROR: Macros with complex values should be enclosed in parentheses

WARNING: macros should not use a trailing semicolon
+#define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter));

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:59:35 -04:00
Alex Deucher 7d3f1d76f3 drm/amdgpu: refine fault cache updates
Don't update the fault cache if status is 0.  In the multiple
fault case, subsequent faults will return a 0 status which is
useless for userspace and replaces the useful fault status, so
only update if status is non-0.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:49:51 -04:00
Alex Deucher 7a41ed8b59 drm/amdgpu: add new INFO ioctl query for the last GPU page fault
Add a interface to query the last GPU page fault for the process.
Useful for debugging context lost errors.

v2: split vmhub representation between kernel and userspace
v3: add locking when fetching fault info in INFO IOCTL

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238
libdrm MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23238

Cc: samuel.pitoiset@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05 17:49:39 -04:00
Kees Cook ac8e62ab25 drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct ip_hw_instance.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230922173216.3823169-2-keescook@chromium.org
2023-10-05 11:29:09 +02:00
Mario Limonciello 134b8c5d86 drm/amd: Fix detection of _PR3 on the PCIe root port
On some systems with Navi3x dGPU will attempt to use BACO for runtime
PM but fails to resume properly.  This is because on these systems
the root port goes into D3cold which is incompatible with BACO.

This happens because in this case dGPU is connected to a bridge between
root port which causes BOCO detection logic to fail.  Fix the intent of
the logic by looking at root port, not the immediate upstream bridge for
_PR3.

Cc: stable@vger.kernel.org
Suggested-by: Jun Ma <Jun.Ma2@amd.com>
Tested-by: David Perry <David.Perry@amd.com>
Fixes: b10c1c5b3a ("drm/amdgpu: add check for ACPI power resources")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 22:52:05 -04:00
Luben Tuikov 5d061675b7 drm/amdgpu: Fix a memory leak
Fix a memory leak in amdgpu_fru_get_product_info().

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Reported-by: Yang Wang <kevinyang.wang@amd.com>
Fixes: 0dbf2c5626 ("drm/amdgpu: Interpret IPMI data for product information (v2)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 22:43:26 -04:00
Philip Yang fdac890966 drm/amdgpu: ratelimited override pte flags messages
Use ratelimited version of dev_dbg to avoid flooding dmesg log. No
functional change.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:40:11 -04:00
Mario Limonciello b8e6aec146 drm/amd: Drop all hand-built MIN and MAX macros in the amdgpu base driver
Several files declare MIN() or MAX() macros that ignore the types of the
values being compared.  Drop these macros and switch to min() min_t(),
and max() from `linux/minmax.h`.

Suggested-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:39:52 -04:00
Alex Deucher 8dbf1ba867 drm/amdgpu: cache gpuvm fault information for gmc7+
Cache the current fault info in the vm struct.  This can be queried
by userspace later to help debug UMDs.

Cc: samuel.pitoiset@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:37:07 -04:00
Alex Deucher 2e8ef6a561 drm/amdgpu: add cached GPU fault structure to vm struct
When we get a GPU page fault, cache the fault for later
analysis.

Cc: samuel.pitoiset@gmail.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:57 -04:00
Rajneesh Bhardwaj f4bff6e0b9 drm/amdgpu: Use ttm_pages_limit to override vram reporting
On GFXIP9.4.3 APU, allow the memory reporting as per the ttm pages
limit in NPS1 mode.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:42 -04:00
Rajneesh Bhardwaj 9b37d45d79 drm/amdgpu: Rework KFD memory max limits
To allow bigger allocations specially on systems such as GFXIP 9.4.3
that use GTT memory for VRAM allocations, relax the limits to
maximize ROCm allocations.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:17 -04:00
Alex Deucher 67318cb843 drm/amdgpu/gmc11: set gart placement GC11
Needed to avoid a hardware issue.

v2: force high for all GC11 parts for consistency (Alex)
v3: rebase

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:12 -04:00
Alex Deucher 917f91d8d8 drm/amdgpu/gmc: add a way to force a particular placement for GART
We normally place GART based on the location of VRAM and the
available address space around that, but provide an option
to force a particular location for hardware that needs it.

v2: Switch to passing the placement via parameter

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-04 18:36:07 -04:00
Lang Yu a19d934986 drm/amdgpu: correct gpu clock counter query on cyan skilfish
Cayn skilfish uses SMUIO v11.0.8 offset.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org> # v5.15+
2023-10-04 18:35:28 -04:00
Mario Limonciello c4c8955b8a drm/amd: Fix detection of _PR3 on the PCIe root port
On some systems with Navi3x dGPU will attempt to use BACO for runtime
PM but fails to resume properly.  This is because on these systems
the root port goes into D3cold which is incompatible with BACO.

This happens because in this case dGPU is connected to a bridge between
root port which causes BOCO detection logic to fail.  Fix the intent of
the logic by looking at root port, not the immediate upstream bridge for
_PR3.

Cc: stable@vger.kernel.org
Suggested-by: Jun Ma <Jun.Ma2@amd.com>
Tested-by: David Perry <David.Perry@amd.com>
Fixes: b10c1c5b3a ("drm/amdgpu: add check for ACPI power resources")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-03 15:43:11 -04:00
Alex Hung f6893fcb10 drm/amd/display: Add writeback enable field (wb_enabled)
[WHAT]
Add a new field to keep track whether a crtc is previously
writeback-enabled.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-03 15:42:38 -04:00
Alex Hung cd1a4bc228 drm/amd/display: Hande writeback request from userspace
[WHAT]
Handle writeback requests and fill in the required information for DWB
programming and setup.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-03 15:42:06 -04:00
Alex Deucher 0021d70a06 drm/amdkfd: drop struct kfd_cu_info
I think this was an abstraction back from when
kfd supported both radeon and amdgpu.  Since we just
support amdgpu now, there is no more need for this and
we can use the amdgpu structures directly.

This also avoids having the kfd_cu_info structures on
the stack when inlining which can blow up the stack.

Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-03 15:41:13 -04:00
Dave Airlie 79fb229b88 Merge tag 'drm-misc-next-2023-09-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.7-rc1:

UAPI Changes:
- drm_file owner is now updated during use, in the case of a drm fd
  opened by the display server for a client, the correct owner is
  displayed.
- Qaic gains support for the QAIC_DETACH_SLICE_BO ioctl to allow bo
  recycling.

Cross-subsystem Changes:
- Disable boot logo for au1200fb, mmpfb and unexport logo helpers.
  Only fbcon should manage display of logo.
- Update freescale in MAINTAINERS.
- Add some bridge files to bridge in MAINTAINERS.
- Update gma500 driver repo in MAINTAINERS to point to drm-misc.

Core Changes:
- Move size computations to drm buddy allocator.
- Make drm_atomic_helper_shutdown(NULL) a nop.
- Assorted small fixes in drm_debugfs, DP-MST payload addition error handling.
- Fix DRM_BRIDGE_ATTACH_NO_CONNECTOR handling.
- Handle bad (h/v)sync_end in EDID by clipping to htotal.
- Build GPUVM as a module.

Driver Changes:
- Simple drivers don't need to cache prepared result.
- Call drm_atomic_helper_shutdown() in shutdown/unbind for a whole lot
  more drm drivers.
- Assorted small fixes in amdgpu, ssd130x, bridge/it6621, accel/qaic,
  nouveau, tc358768.
- Add NV12 for komeda writeback.
- Add arbitration lost event to synopsis/dw-hdmi-cec.
- Speed up s/r in nouveau by not restoring some big bo's.
- Assorted nouveau display rework in preparation for GSP-RM,
  especially related to how the modeset sequence works and
  the DP sequence in relation to link training.
- Update anx7816 panel.
- Support NVSYNC and NHSYNC in tegra.
- Allow multiple power domains in simple driver.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f1fae5eb-25b8-192a-9a53-215e1184ce81@linux.intel.com
2023-09-29 08:27:15 +10:00
Sui Jingfeng 18bf400530 drm/amdgpu: Use pci_get_base_class() to reduce duplicated code
Use pci_get_base_class() to reduce duplicated code.  No functional change
intended.

Link: https://lore.kernel.org/r/20230825062714.6325-5-sui.jingfeng@linux.dev
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 16:54:54 -05:00
Tao Zhou fc59889071 drm/amdgpu: update retry times for psp vmbx wait
Increase the retry loops and replace the constant number with macro.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:44:36 -04:00
Tao Zhou 1934907234 drm/amdgpu: exit directly if gpu reset fails
No need to perform the full reset operation in case of gpu reset
failure.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:44:36 -04:00
Mario Limonciello 93499bd6cd drm/amd: Move microcode init from sw_init to early_init for CIK SDMA
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:38:17 -04:00
Mario Limonciello 751e293f2c drm/amd: Move microcode init from sw_init to early_init for SDMA v2.4
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:38:11 -04:00
Mario Limonciello cc76630483 drm/amd: Move microcode init from sw_init to early_init for SDMA v3.0
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:38:05 -04:00
Mario Limonciello e0d4fbb58c drm/amd: Move microcode init from sw_init to early_init for SDMA v5.2
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:37:50 -04:00
Mario Limonciello 95b456d3b0 drm/amd: Move microcode init from sw_init to early_init for SDMA v6.0
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:37:43 -04:00
Mario Limonciello ed1c1053cd drm/amd: Move microcode init from sw_init to early_init for SDMA v5.0
As part of IP discovery early_init is run for all HW IP blocks.
During this phase all firmware is supposed to be identified that may
be missing so that the driver can avoid releasing resources used by
the EFI framebuffer or simpledrm until the last possible moment.

Move microcode loading from sw_init to early_init.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:37:34 -04:00
Mario Limonciello 161d076c2d drm/amd: Drop error message about failing to load SDMA firmware
The error path for SDMA firmware loading is unnecessarily noisy.
When a firmware is missing 3 errors show up:
```
amdgpu 0000:07:00.0: Direct firmware load for amdgpu/green_sardine_sdma.bin failed with error -2
[drm:sdma_v4_0_early_init [amdgpu]] *ERROR* Failed to load sdma firmware!
[drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <sdma_v4_0> failed -19
```

The error code for the device init is bubbled up already, remove the
second one.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:36:41 -04:00
Le Ma 3152d01e88 drm/amd/pm: deprecate allow_xgmi_power_down interface
Replace with set_plpd_mode uniformly for places to use.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:36:23 -04:00
Mario Limonciello 3657a1d5ac drm/amd: Limit seamless boot by default to APUs
A hang is reported on DCN 3.2 with seamless boot enabled.
As the benefits come from an eDP setup, limit it to only enabled
by default with APUs.

Suggested-by: Alexander.Deucher@amd.com
Reported-by: feifei.xu@amd.com
Closes: https://lore.kernel.org/amd-gfx/85b427f6-11ec-4249-bf6f-eadf9c375f88@amd.com/T/#m2887e919d7c01b2a4860d2261b366d22e070f309
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28 15:35:31 -04:00
Alex Deucher b2e1cbe628 drm/amdgpu/gmc11: disable AGP on GC 11.5
AGP aperture is deprecated and no longer functional.

v2: fix typo (Alex)
v3: just skip the agp setup call
v4: revert back to the original model
v5: back to v3

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:23 -04:00
David (Ming Qiang) Wu fa1f1cc09d drm/amdgpu: not to save bo in the case of RAS err_event_athub
err_event_athub will corrupt VCPU buffer and not good to
be restored in amdgpu_vcn_resume() and in this case
the VCPU buffer needs to be cleared for VCN firmware to
work properly.

Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:23 -04:00
Luben Tuikov 9ed630c5c4 drm/amdgpu: Fix a memory leak
Fix a memory leak in amdgpu_fru_get_product_info().

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Reported-by: Yang Wang <kevinyang.wang@amd.com>
Fixes: 0dbf2c5626 ("drm/amdgpu: Interpret IPMI data for product information (v2)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:23 -04:00
Alex Deucher de59b69932 drm/amdgpu/gmc: set a default disable value for AGP
To disable AGP, the start needs to be set to a higher
value than the end.  Set a default disable value for
the AGP aperture and allow the IP specific GMC code
to enable it selectively be calling amdgpu_gmc_agp_location().

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Alex Deucher 29495d8145 drm/amdgpu/gmc6-8: properly disable the AGP aperture
The BOT register needs to be larger than the TOP register
for this to be properly disabled.  The lower 22 bits
of the BOT address are always 0 and the lower 22 bits of
the TOP register are always 1 so you need to make
the upper bits of BOT larger than the upper bits of BOT.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Mangesh Gadre cd956e7531 drm/amdgpu:Expose physical id of device in XGMI hive
This identifies the physical ordering of devices in the hive

v2: fix compilation issue

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:22 -04:00
Lang Yu 7021b397c6 drm/amdgpu/vpe: fix truncation warnings
Fix truncation warnings.

Fixes: 9d4346bdbc ("drm/amdgpu: add VPE 6.1.0 support")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202309200028.aUVuM8os-lkp@intel.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 17:00:21 -04:00
Philip Yang 101b810430 drm/amdkfd: Move dma unmapping after TLB flush
Otherwise GPU may access the stale mapping and generate IOMMU
IO_PAGE_FAULT.

Move this to inside p->mutex to prevent multiple threads mapping and
unmapping concurrently race condition.

After kfd_mem_dmaunmap_attachment is removed from unmap_bo_from_gpuvm,
kfd_mem_dmaunmap_attachment is called if failed to map to GPUs, and
before free the mem attachment in case failed to unmap from GPUs.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:10 -04:00
Christian König 08abccc9a7 drm/amdgpu: further move TLB hw workarounds a layer up
For the PASID flushing we already handled that at a higher layer, apply
those workarounds to the standard flush as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König e2e3788850 drm/amdgpu: rework lock handling for flush_tlb v2
Instead of each implementation doing this more or less correctly
move taking the reset lock at a higher level.

v2: fix typo

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König 3983c9fd2d drm/amdgpu: drop error return from flush_gpu_tlb_pasid
That function never fails, drop the error return.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König 041a574388 drm/amdgpu: fix and cleanup gmc_v11_0_flush_gpu_tlb_pasid
The same PASID can be used by more than one VMID, reset each of them.

Use the common KIQ handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König 72cc99205c drm/amdgpu: cleanup gmc_v10_0_flush_gpu_tlb_pasid
The same PASID can be used by more than one VMID, reset each of them.

Use the common KIQ handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König e7b90e99fa drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb_pasid
Testing for reset is pointless since the reset can start right after the
test.

The same PASID can be used by more than one VMID, invalidate each of them.

Move the KIQ and all the workaround handling into common GMC code.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König 0c525aa406 drm/amdgpu: fix and cleanup gmc_v8_0_flush_gpu_tlb_pasid
Testing for reset is pointless since the reset can start right after the
test. Grab the reset semaphore instead.

The same PASID can be used by more than once VMID, build a mask of VMIDs
to invalidate instead of just restting the first one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König fb4c52db69 drm/amdgpu: fix and cleanup gmc_v7_0_flush_gpu_tlb_pasid
Testing for reset is pointless since the reset can start right after the
test. Grab the reset semaphore instead.

The same PASID can be used by more than once VMID, build a mask of VMIDs
to invalidate instead of just restting the first one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König a54db42ff3 drm/amdgpu: cleanup gmc_v11_0_flush_gpu_tlb
Remove leftovers from copying this from the gmc v10 code.

v2: squash in fix from Yifan

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:55:09 -04:00
Christian König a70cb2176f drm/amdgpu: rework gmc_v10_0_flush_gpu_tlb v2
Move the SDMA workaround necessary for Navi 1x into a higher layer.

v2: use dev_err

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:52 -04:00
Tao Zhou 8c14a67bdf drm/amdgpu: change if condition for bad channel bitmap update
The amdgpu_ras_eeprom_control.bad_channel_bitmap is u32 type, but the
channel index could be larger than 32. For the ASICs whose channel
number is more than 32, the amdgpu_dpm_send_hbm_bad_channel_flag
interface is not supported, so we simply bypass channel bitmap update under
this condition.

v2: replace sizeof with BITS_PER_TYPE, we should check bit number
instead of byte number.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:52 -04:00
Tao Zhou 6205b558e1 drm/amdgpu: fix value of some UMC parameters for UMC v12
Prepare for bad page retirement.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:52 -04:00
Philip Yang e61801f162 drm/amdkfd: Don't use sw fault filter if retry cam enabled
If retry cam enabled, we don't use sw retry fault filter and add fault
into sw filter ring, so we shouldn't remove fault from sw filter.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:51 -04:00
Christian König 24a6eb92b7 drm/amdgpu: fix and cleanup gmc_v9_0_flush_gpu_tlb
The KIQ code path was ignoring the second flush. Also avoid long lines and
re-calculating the register offsets over and over again.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:51 -04:00
Philip Yang bcfb9cee61 drm/amdgpu: Increase IH soft ring size for GFX v9.4.3 dGPU
On GFX v9.4.3 dGPU, applications have random timeout failure when XNACK
on, dmesg log has "amdgpu: IH soft ring buffer overflow 0x900, 0x900",
because dGPU mode has 272 cam entries. After increasing IH soft ring
to 512 entries, no more IH soft ring overflow message and application
passed.

Fixes: bf80d34b6c ("drm/amdgpu: Increase soft IH ring size")
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:51 -04:00
Lijo Lazar c45e38f217 drm/amdgpu: Restore partition mode after reset
On a full device reset, PSP FW gets unloaded. Hence restore the
partition mode by placing a new request.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26 16:54:51 -04:00
Cong Liu f387bb578d drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function.
The leak occurs when the function sends a command to the firmware to enable
or disable a RAS feature for a GFX block. If the command fails, the kfree()
function is not called to free the info memory.

Fixes: 9f051d6ff1 ("drm/amdgpu: Free ras cmd input buffer properly")
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:27:04 -04:00
Lijo Lazar 06cce38ef5 Revert "drm/amdgpu: Report vbios version instead of PN"
This reverts commit 7748ce5b69.

vbios_version sysfs node is used to identify Part Number also. Revert to
the same so that it doesn't break scripts/software which parse this.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 17:26:15 -04:00
Lijo Lazar ff96ddc3f2 drm/amdgpu: Add more fields to IP version
Include subrevision and variant fileds also to IP version.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:25:17 -04:00
Tao Zhou f8754f58d6 drm/amdgpu: print channel index for UMC bad page
Print channel index for UMC v12.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:25:17 -04:00
Sathishkumar S 5aba51233b drm/amdgpu: update IP count INFO query
update the query to return the number of functional
instances where there is more than an instance of the requested
type and for others continue to return one.

v2: count must reflect the actual number of engines (Alex)
v3: fix wrong number of engines for vcn (Alex)

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:09 -04:00
Stanley.Yang a83f2bf1f4 drm/amdgpu: Fix false positive error log
It should first check block ras obj whether be set, it should
return 0 directly if block ras obj or hw_ops is not set.
If block doesn't support RAS just return 0 is fine.

Changed from V1:
	return 0 directly if block ras obj or hw ops is not set

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:09 -04:00
Vignesh Chander 8c95cda3e1 drm/amdgpu/jpeg: skip set pg for sriov
Host handles PG.

Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:09 -04:00
André Almeida a769178585 drm/amdgpu: Rework coredump to use memory dynamically
Instead of storing coredump information inside amdgpu_device struct,
move if to a proper separated struct and allocate it dynamically. This
will make it easier to further expand the logged information.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:07 -04:00
Cong Liu 5838f74c29 drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function.
The leak occurs when the function sends a command to the firmware to enable
or disable a RAS feature for a GFX block. If the command fails, the kfree()
function is not called to free the info memory.

Fixes: 9f051d6ff1 ("drm/amdgpu: Free ras cmd input buffer properly")
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:07 -04:00
Lijo Lazar 24f60ddc4b drm/amdgpu: Fix vbios version string search
Search for vbios version string in STRING_OFFSET-ATOM_ROM_HEADER region
first. If those offsets are not populated, use the hardcoded region.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:06 -04:00
Lijo Lazar 2af351d692 Revert "drm/amdgpu: Report vbios version instead of PN"
This reverts commit 7748ce5b69.

vbios_version sysfs node is used to identify Part Number also. Revert to
the same so that it doesn't break scripts/software which parse this.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:06 -04:00
David Francis 5f248462c6 drm/amdgpu: Add EXT_COHERENT memory allocation flags
These flags (for GEM and SVM allocations) allocate
memory that allows for system-scope atomic semantics.

On GFX943 these flags cause caches to be avoided on
non-local memory.

On all other ASICs they are identical in functionality to the
equivalent COHERENT flags.

Corresponding Thunk patch is at
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88

Reviewed-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 16:24:06 -04:00
Yang Wang 4051844c66 drm/amdgpu: add amdgpu mca debug sysfs support
add amdgpu mca debug sysfs support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:25:19 -04:00
Alex Deucher d11bbacee3 drm/amdgpu: add VPE IP discovery info to HW IP info query
Add missing IP discovery info.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:25:11 -04:00
Yang Wang 7ff607e272 drm/amdgpu: add amdgpu smu mca dump feature support
add amdgpu smu mca dump feature support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:25:01 -04:00
Mario Limonciello 7f4ce7b50a drm/amd: Enable seamless boot by default on newer ASICs
Seamless boot can technically be supported as far back as DCN1
but to avoid regressions on older hardware, enable it for DCN3 and
later.

If users report using the module parameter that it works on older
ASICs as well, this can be adjusted.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:24:46 -04:00
Mario Limonciello 5dc270d366 drm/amd: Add a module parameter for seamless boot
The module parameter can be used to test more easily enabling seamless
boot support on additional ASICs.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:24:39 -04:00
Timmy Tsai 2fa73a101c drm/amd: Add HDP flush during jpeg init
During jpeg init, CPU writes to frame buffer which can be cached by HDP,
occasionally causing invalid header to be sent to MMSCH.  Perform HDP flush
after writing to frame buffer before continuing with jpeg init sequence.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:24:28 -04:00
Mario Limonciello bb0f84293e drm/amd: Move seamless boot check out of display
This will allow base driver to dictate whether seamless should be
enabled.  No intended functional changes.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:24:21 -04:00
Mario Limonciello 3ef07651a5 drm/amd: Drop special case for yellow carp without discovery
`amdgpu_gmc_get_vbios_allocations` has a special case for how to
bring up yellow carp when amdgpu discovery is turned off. As this ASIC
ships with discovery turned on, it's generally dead code and worse it
causes `adev->mman.keep_stolen_vga_memory` to not be initialized for
yellow carp.

Remove it.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:24:04 -04:00
Lijo Lazar 4e8303cf2c drm/amdgpu: Use function for IP version check
Use an inline function for version check. Gives more flexibility to
handle any format changes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20 12:23:28 -04:00
Tvrtko Ursulin 1c7a387ffe drm: Update file owner during use
With the typical model where the display server opens the file descriptor
and then hands it over to the client(*), we were showing stale data in
debugfs.

Fix it by updating the drm_file->pid on ioctl access from a different
process.

The field is also made RCU protected to allow for lockless readers. Update
side is protected with dev->filelist_mutex.

Before:

$ cat /sys/kernel/debug/dri/0/clients
             command   pid dev master a   uid      magic
                Xorg  2344   0   y    y     0          0
                Xorg  2344   0   n    y     0          2
                Xorg  2344   0   n    y     0          3
                Xorg  2344   0   n    y     0          4

After:

$ cat /sys/kernel/debug/dri/0/clients
             command  tgid dev master a   uid      magic
                Xorg   830   0   y    y     0          0
       xfce4-session   880   0   n    y     0          1
               xfwm4   943   0   n    y     0          2
           neverball  1095   0   n    y     0          3

*)
More detailed and historically accurate description of various handover
implementation kindly provided by Emil Velikov:

"""
The traditional model, the server was the orchestrator managing the
primary device node. From the fd, to the master status and
authentication. But looking at the fd alone, this has varied across
the years.

IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
fd(s) and reuse those whenever needed, DRI2 the client was responsible
for open() themselves and with DRI3 the fd was passed to the client.

Around the inception of DRI3 and systemd-logind, the latter became
another possible orchestrator. Whereby Xorg and Wayland compositors
could ask it for the fd. For various reasons (hysterical and genuine
ones) Xorg has a fallback path going the open(), whereas Wayland
compositors are moving to solely relying on logind... some never had
fallback even.

Over the past few years, more projects have emerged which provide
functionality similar (be that on API level, Dbus, or otherwise) to
systemd-logind.
"""

v2:
 * Fixed typo in commit text and added a fine historical explanation
   from Emil.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230621094824.2348732-1-tvrtko.ursulin@linux.intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
2023-09-20 15:27:44 +02:00
Dave Airlie 1216d49178 amd-drm-fixes-6.6-2023-09-13:
amdgpu:
 - GC 9.4.3 fixes
 - Fix white screen issues with S/G display on system with >= 64G of ram
 - Replay fixes
 - SMU 13.0.6 fixes
 - AUX backlight fix
 - NBIO 4.3 SR-IOV fixes for HDP
 - RAS fixes
 - DP MST resume fix
 - Fix segfault on systems with no vbios
 - DPIA fixes
 
 amdkfd:
 - CWSR grace period fix
 - Unaligned doorbell fix
 - CRIU fix for GFX11
 - Add missing TLB flush on gfx10 and newer
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZQIRSAAKCRC93/aFa7yZ
 2O/nAP4zB0fdLB46Hhz11aYsE9Zghe91b2rcmF4EYpEAQs7awwEAhSjy0Wiy6EYb
 prEGCdW0O8Tq7fdjr7+JrPmF7dasAQk=
 =SUbg
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-fixes-6.6-2023-09-13' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.6-2023-09-13:

amdgpu:
- GC 9.4.3 fixes
- Fix white screen issues with S/G display on system with >= 64G of ram
- Replay fixes
- SMU 13.0.6 fixes
- AUX backlight fix
- NBIO 4.3 SR-IOV fixes for HDP
- RAS fixes
- DP MST resume fix
- Fix segfault on systems with no vbios
- DPIA fixes

amdkfd:
- CWSR grace period fix
- Unaligned doorbell fix
- CRIU fix for GFX11
- Add missing TLB flush on gfx10 and newer

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
2023-09-15 09:50:50 +10:00
Daniel Vetter 15794f9dc3 One doc fix for drm/connector, one fix for amdgpu for an crash when
VRAM usage is high, and one fix in gm12u320 to fix the timeout units in
 the code
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCZPl/TAAKCRDj7w1vZxhR
 xWZZAP0b3k5vIuQdbiZBdXy7+guakiJ2DqOMxJJ+sYS5Mun53AEA73Cu1gmBNMoT
 d8H1uBjOfvPcXANNI0t0OgJfrESOdg8=
 =atPC
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2023-09-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

One doc fix for drm/connector, one fix for amdgpu for an crash when
VRAM usage is high, and one fix in gm12u320 to fix the timeout units in
the code

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-14 14:00:51 +02:00
Alex Deucher addd7aef25 drm/amdgpu: add remap_hdp_registers callback for nbio 7.11
Implement support for remapping the HDP aperture registers for
NBIO 7.11.

Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12 17:30:22 -04:00
Alex Deucher b85a17d354 drm/amdgpu: add vcn_doorbell_range callback for nbio 7.11
Implement support for setting up the VCN doorbell range for
NBIO 7.11.

Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-12 17:30:18 -04:00
Thomas Zimmermann c900529f3d Merge drm/drm-fixes into drm-misc-fixes
Forwarding to v6.6-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-09-12 08:53:30 +02:00
David Francis 5e7e822542 drm/amdgpu: Handle null atom context in VBIOS info ioctl
On some APU systems, there is no atom context and so the
atom_context struct is null.

Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl
to handle this case, returning all zeroes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:25:26 -04:00
Hawking Zhang ffd6bde302 drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
So driver doesn't generate incorrect message until
the new format is settled down for aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:20:07 -04:00
Alex Deucher ab43213e7a drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
Needed for HDP flush to work correctly.

Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:48 -04:00
Alex Deucher 1832403cd4 drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
This matches the behavior for soc15 and nv.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:19:42 -04:00
Hamza Mahfooz 169ed4ece8 Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
This reverts commit 70e64c4d52.

Since, we now have an actual fix for this issue, we can get rid of this
workaround as it can cause pin failures if enough VRAM isn't carved out
by the BIOS.

Cc: stable@vger.kernel.org # 6.1+
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:18:17 -04:00
Mukul Joshi 97e3c6a853 drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3
Currently, we store CU info only for a single XCC assuming
that it is the same for all XCCs. However, that may not be
true. As a result, store CU info for all XCCs. This info is
later used for CU masking.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:16:31 -04:00
Mukul Joshi 81faf9e0c3 drm/amdkfd: Fix reg offset for setting CWSR grace period
This patch fixes the case where the code currently passes
absolute register address and not the reg offset, which HWS
expects, when sending the PM4 packet to set/update CWSR grace
period. Additionally, cleanup the signature of
build_grace_period_packet_info function as it no longer needs
the inst parameter.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:15:43 -04:00
David Francis 86f2ec2265 drm/amdgpu: Handle null atom context in VBIOS info ioctl
On some APU systems, there is no atom context and so the
atom_context struct is null.

Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl
to handle this case, returning all zeroes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:22:32 -04:00
André Almeida ffde72107b drm/amdgpu: Create an option to disable soft recovery
Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:22:23 -04:00
André Almeida 887db1e49a drm/amdgpu: Merge debug module parameters
Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.

Drop the obsolete module options in favor of the new ones.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:19:54 -04:00
Yifan Zhang 0e64c9aad0 drm/amdgpu: add type conversion for gc info
gc info usage misses type conversion.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Li Ma <li.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:17:30 -04:00
Mukul Joshi 68fa72a437 drm/amdgpu: Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES
Rename KGD_MAX_QUEUES to AMDGPU_MAX_QUEUES to conform with
the naming convention followed in amdgpu_gfx.h. No functional
change.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:15:41 -04:00
Tao Zhou ced575203a drm/amdgpu: print more address info of UMC bad page
Print out row, column and bank value of UMC error address for UMC v12.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:15:15 -04:00
Hawking Zhang 9f9d4651f7 drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
So driver doesn't generate incorrect message until
the new format is settled down for aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:15:02 -04:00
Alex Deucher 6a82822b90 drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
Needed for HDP flush to work correctly.

Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:14:59 -04:00
Alex Deucher 8a6e26e7ef drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
This matches the behavior for soc15 and nv.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Timmy Tsai <timmtsai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:14:50 -04:00
Hamza Mahfooz 601c63ad8e Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
This reverts commit 70e64c4d52.

Since, we now have an actual fix for this issue, we can get rid of this
workaround as it can cause pin failures if enough VRAM isn't carved out
by the BIOS.

Cc: stable@vger.kernel.org # 6.1+
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:12:20 -04:00
Tao Zhou 3cb9ebc9d6 drm/amdgpu: add channel index table for UMC v12
Get UMC phyical channel index according to node id, umc instance and
channel instance.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:10:58 -04:00
Tao Zhou 40a08fe890 drm/amdgpu: add address conversion for UMC v12
Convert MCA error address to physical address and find out all pages in
one physical row.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:10:35 -04:00
Lijo Lazar ca7aa3bf31 drm/amdgpu: Use default reset method handler
When reset method is not passed in reset context, look for the handler
for default reset method. On Aldebaran, default reset method for SOCs
connected to CPU over XGMI is MODE2.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:10:25 -04:00
Mukul Joshi f705a6f021 drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3
Currently, we store CU info only for a single XCC assuming
that it is the same for all XCCs. However, that may not be
true. As a result, store CU info for all XCCs. This info is
later used for CU masking.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:10:19 -04:00
Ma Jun a1ce3e1f7c drm/amd: Fix the flag setting code for interrupt request
[1] Remove the irq flags setting code since pci_alloc_irq_vectors()
handles these flags.
[2] Free the msi vectors in case of error.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:10:09 -04:00
Lang Yu dbb8052151 drm/amdgpu: fix unsigned error codes
Fixes: 5d5eac7e83 ("drm/amdgpu: add selftest framework for UMSCH")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/all/ZPhddADtKmOuVyDq@lang-desktop
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:09:12 -04:00
Mukul Joshi 56d6daa3c7 drm/amdkfd: Fix reg offset for setting CWSR grace period
This patch fixes the case where the code currently passes
absolute register address and not the reg offset, which HWS
expects, when sending the PM4 packet to set/update CWSR grace
period. Additionally, cleanup the signature of
build_grace_period_packet_info function as it no longer needs
the inst parameter.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 17:07:33 -04:00
Arunpravin Paneer Selvam 2eb412aa25 drm/amdgpu: Move the size computations to drm buddy
- Move roundup_power_of_two() and IS_ALIGNED() computations to
  drm buddy file to support the new try harder mechanism for
  contiguous allocation.

- Move trim function call to drm_buddy_alloc_blocks() function.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230909160902.15644-2-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2023-09-11 20:18:06 +02:00
Linus Torvalds a48fa7efaf drm fixes for 6.6-rc1
amdgpu:
 - Display replay fixes
 - Fixes for headless boards
 - Fix documentation breakage
 - RAS fixes
 - Handle newer IP discovery tables
 - SMU 13.0.6 fixes
 - SR-IOV fixes
 - Display vstartup fixes
 - NBIO 7.9 fixes
 - Display scaling mode fixes
 - Debugfs power reporting fix
 - GC 9.4.3 fixes
 - Dirty framebuffer fixes for fbcon
 - eDP fixes
 - DCN 3.1.5 fix
 - Display ODM fixes
 - GPU core dump fix
 - Re-enable zops property now that IGT test is fixed
 - Fix possible UAF in CS code
 - Cursor degamma fix
 
 amdkfd:
 - HMM fixes
 - Interrupt masking fix
 - GFX11 MQD fixes
 
 i915:
 - Mark requests for GuC virtual engines to avoid use-after-free
 
 nouveau:
 - Fix fence state in nouveau_fence_emit()
 
 ivpu:
 - replace strncpy
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmT6ihUACgkQDHTzWXnE
 hr6Vqg/+OGfbxx0qev5C93bLYpg8d4zbply0zTed2hU48zczARkOyDH+h2uYM4tP
 rmB6mK/0KRy3H8vkngKduR/IF6QBwzOnLDpS+C/TrHYQYqMDwvs3qEDVYqXh3V5H
 GPIFuu9sb2Nb/o9Fid70pvNACbgGsAUgqMUhQ/is3NjzOR8S/qjdBQ7wSdoUoOqx
 okTdlwuMK5SEYrihSyTZvhNcwpR/1L8JuxOUXXXUSQ0tRXBer/ZNF2lcEyYmQ0zs
 bZHKM4dNbdew/EhygbH6LVB5RjFaT5pGw08Xm8zJry+q5tXQV/NIXPQHL3vWqQoX
 i2QLbvGX/Uu8LJg9YNdsa1kPwNKADAxF64cW38Llv8ybsPHyva/I255j689/TvSG
 Se7HkTooURKS6GWFHPOkyMNC0Y+Fb/7WG5zUPSDVk9stqJz9pzx48okTsCWeuovD
 cBOssp8If1QsTyPvDq5A47l5z1oO3J1rdJ9fL0GnpOuXulPhgJGIoUSkftyI6lbw
 rhUoRd7w6VcgOsA9WIkAf+/325em0Y0AKZBgQnr2jfF56IE4iLa8yFuJNeReZ9oy
 W9yf14AB0orVm9+P4+PqATXBh+PdCTD1CPcB0MyK1SZAjh836Tc0HPKvJAUU6jp+
 8aw3BKXcaLjIP/dCyhSddXCnuuTPWreff5isktwgEXUtNFv9jx0=
 =fPeN
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three
  weeks in one go, one i915, one nouveau and one ivpu.

  I think there might be a few more fixes in misc that I haven't pulled
  in yet, but we should get them all for rc2.

  amdgpu:
   - Display replay fixes
   - Fixes for headless boards
   - Fix documentation breakage
   - RAS fixes
   - Handle newer IP discovery tables
   - SMU 13.0.6 fixes
   - SR-IOV fixes
   - Display vstartup fixes
   - NBIO 7.9 fixes
   - Display scaling mode fixes
   - Debugfs power reporting fix
   - GC 9.4.3 fixes
   - Dirty framebuffer fixes for fbcon
   - eDP fixes
   - DCN 3.1.5 fix
   - Display ODM fixes
   - GPU core dump fix
   - Re-enable zops property now that IGT test is fixed
   - Fix possible UAF in CS code
   - Cursor degamma fix

  amdkfd:
   - HMM fixes
   - Interrupt masking fix
   - GFX11 MQD fixes

  i915:
   - Mark requests for GuC virtual engines to avoid use-after-free

  nouveau:
   - Fix fence state in nouveau_fence_emit()

  ivpu:
   - replace strncpy"

* tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits)
  drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
  drm/amd/display: prevent potential division by zero errors
  drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
  drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
  Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
  drm/amdgpu: fix amdgpu_cs_p1_user_fence
  Revert "Revert "drm/amd/display: Implement zpos property""
  drm/amdkfd: Add missing gfx11 MQD manager callbacks
  drm/amdgpu: Free ras cmd input buffer properly
  drm/amdgpu: Hide xcp partition sysfs under SRIOV
  drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
  drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
  drm/amdgpu: Allocate coredump memory in a nonblocking way
  drm/amdgpu: Support query ecc cap for aqua_vanjaram
  drm/amdgpu: Add umc_info v4_0 structure
  drm/amd/display: always switch off ODM before committing more streams
  drm/amd/display: Remove wait while locked
  drm/amd/display: update blank state on ODM changes
  drm/amd/display: Add smu write msg id fail retry process
  drm/amdgpu: Add SMU v13.0.6 default reset methods
  ...
2023-09-07 19:47:04 -07:00
Lijo Lazar fbe1a9e0c7 drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
Restrict the wait for boot loader steady state only to SMUv13.0.6. For
older SOCs, ASIC init has a longer wait period and that takes care.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 22:11:51 -04:00
Candice Li 7e6ec09974 drm/amdgpu: Add umc v12_0 ras functions
Add umc v12_0 ras error querying.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:38:00 -04:00
Lijo Lazar 6b7d211740 drm/amdgpu: Fix refclk reporting for SMU v13.0.6
SMU v13.0.6 SOCs have 100MHz reference clock.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:37:03 -04:00
Hawking Zhang c2c23a10f1 drm/amdgpu: Correct se_num and reg_inst for gfx v9_4_3 ras counters
gfx_v9_4_3_ue|ce_reg_list is an array per gfx core instance
correct the settings of se_num and reg_inst for some of
gfx ras counters so all the available register instances
can be polled for ras status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:36:57 -04:00
Lijo Lazar 1b8e56b994 drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
Restrict the wait for boot loader steady state only to SMUv13.0.6. For
older SOCs, ASIC init has a longer wait period and that takes care.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:36:52 -04:00
Lijo Lazar b93fb0fe24 drm/amdgpu: Add only valid firmware version nodes
Show only firmware version attributes that have valid version. Hide
others.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:36:44 -04:00
Lang Yu d519072d26 drm/amdgpu: fix incompatible types in conditional expression
Use proper type.

Fixes: 9d4346bdbc ("drm/amdgpu: add VPE 6.1.0 support")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Solomon Chiu <solomon.chiu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202309020608.FwP8QMht-lkp@intel.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:35:29 -04:00
Srinivasan Shanmugam eb3b214c37 drm/amdgpu: Use min_t to replace min
Use min_t to replace min, min_t is a bit fast because min use
twice typeof.

And using min_t is cleaner here since the min/max macros
do a typecheck while min_t()/max_t() to an explicit type cast.

Fixes the below checkpatch warning:

WARNING: min() should probably be min_t()

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:35:22 -04:00
Candice Li d57e24aa56 drm/amdgpu: Update amdgpu_device_indirect_r/wreg_ext
Only calculate pcie_index_hi for register address greater than 32bits.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:34:26 -04:00
Candice Li a76b2870bd drm/amdgpu: Add RREG64_PCIE_EXT/WREG64_PCIE_EXT functions
Add 64bits register access support on register whose address
is greater than 32bits.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:34:18 -04:00
Srinivasan Shanmugam 9b70a1d414 drm/amdgpu: Declare array with strings as pointers constant
This warning is for the declaration of a static array, and it is
recommended to declare it as type "static const char * const" instead of
"static const char *".

an array pointer declared as type "static const char *" can point to a
different character constant because the pointer is mutable. However, if
it is declared as type "static const char * const", the pointer will
point to an immutable character constant, preventing it from being
modified which can better ensure the safety and stability of the
program.

Fixes the below:

WARNING: static const char * array should probably be static const char * const

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:34:11 -04:00
Jiapeng Chong df04434cb5 drm/amdgpu: clean up some inconsistent indenting
No functional modification involved.

drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c:34 nbio_v7_11_get_rev_id() warn: inconsistent indenting.

v2: drop leftover printk (Alex)

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6316
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:32:25 -04:00
Yifan Zhang 0bdf09cc5e drm/amdgpu: calling address translation functions to simplify codes
Use amdgpu_gmc_vram_pa to simplify codes.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-06 14:31:52 -04:00
Simon Pilkington e2884fe84a drm/amd: Make fence wait in suballocator uninterruptible
Commit c103a23f2f
("drm/amd: Convert amdgpu to use suballocation helper.")
made the fence wait in amdgpu_sa_bo_new() interruptible but there is no
code to handle an interrupt. This caused the kernel to randomly explode
in high-VRAM-pressure situations so make it uninterruptible again.

Signed-off-by: Simon Pilkington <simonp.git@gmail.com>
Fixes: c103a23f2f ("drm/amd: Convert amdgpu to use suballocation helper.")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2761
CC: stable@vger.kernel.org # 6.4+
2023-09-01 15:12:07 +02:00
Christian König 35588314e9 drm/amdgpu: fix amdgpu_cs_p1_user_fence
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.

The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-08-31 18:14:49 -04:00
Hawking Zhang 9f051d6ff1 drm/amdgpu: Free ras cmd input buffer properly
Do not access the pointer for ras input cmd buffer
if it is even not allocated.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:12:13 -04:00
Rajneesh Bhardwaj 2031c46b09 drm/amdgpu: Hide xcp partition sysfs under SRIOV
XCP partitions should not be visible for the VF for GFXIP 9.4.3.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:11:41 -04:00
Tao Zhou e23b10675a drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
Instead of using direct update, avoid touching unrelated fields.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:11:09 -04:00
André Almeida 6d1b345548 drm/amdgpu: Allocate coredump memory in a nonblocking way
During a GPU reset, a normal memory reclaim could block to reclaim
memory. Giving that coredump is a best effort mechanism, it shouldn't
disturb the reset path. Change its memory allocation flag to a
nonblocking one.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:10:11 -04:00
Hawking Zhang e0c5c387ac drm/amdgpu: Support query ecc cap for aqua_vanjaram
Driver queries umc_info v4_0 to identify ecc cap
for aqua_vanjaram

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:09:45 -04:00
Lijo Lazar 05347402d1 drm/amdgpu: Add SMU v13.0.6 default reset methods
For APUs with SMU v13.0.6, mode-2 reset is kept as default and for
others mode-1 is the default reset method.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:05:43 -04:00
Lijo Lazar 7c2949c12e drm/amdgpu: Add bootloader wait for PSP v13
Implement the wait for bootloader call back for PSP v13.0 ASICs. Only
for ASICs with PSP v13.0.6, it needs an additional check for VBIOS
mailbox status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:04:10 -04:00
Hamza Mahfooz 0a611560f5 drm/amdgpu: register a dirty framebuffer callback for fbcon
fbcon requires that we implement &drm_framebuffer_funcs.dirty.
Otherwise, the framebuffer might take a while to flush (which would
manifest as noticeable lag). However, we can't enable this callback for
non-fbcon cases since it may cause too many atomic commits to be made at
once. So, implement amdgpu_dirtyfb() and only enable it for fbcon
framebuffers (we can use the "struct drm_file file" parameter in the
callback to check for this since it is only NULL when called by fbcon,
at least in the mainline kernel) on devices that support atomic KMS.

Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:03:48 -04:00
Mangesh Gadre 7b9f623530 drm/amdgpu: Updated TCP/UTCL1 programming
Update TCP/UTCL1 thrashing control settings

v2: updated rev_id check

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:02:49 -04:00
Hawking Zhang 7d4424373d drm/amdgpu: Fix the return for gpu mode1_reset
amdgpu_device_mode1_reset will return gpu mode1_reset
succeed (ret = 0) as long as wait_for_bootloader call
succeed, regardless of the status reported by smu or
psp firmware. This results to driver continue executing
recovery even smu or psp fail to perform mode1 reset.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 18:02:10 -04:00
Mangesh Gadre 3f16096795 drm/amdgpu: Remove SRAM clock gater override by driver
rlc firmware does required setting, driver need not do it.

Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:59:17 -04:00
Lijo Lazar 7656168a8a drm/amdgpu: Add bootloader status check
Add a function to wait till bootloader has reached steady state.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:58:58 -04:00
Horace Chen 8c97e87c13 drm/amdkfd: use correct method to get clock under SRIOV
[What]
Current SRIOV still using adev->clock.default_XX which gets from
atomfirmware. But these fields are abandoned in atomfirmware long ago.
Which may cause function to return a 0 value.

[How]
We don't need to check whether SR-IOV. For SR-IOV one-vf-mode,
pm is enabled and VF is able to read dpm clock
from pmfw, so we can use dpm clock interface directly. For
multi-VF mode, VF pm is disabled, so driver can just react as pm
disabled. One-vf-mode is introduced from GFX9 so it shall not have
any backward compatibility issue.

Signed-off-by: Horace Chen <horace.chen@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:58:29 -04:00
Lijo Lazar 8f1778939b drm/amdgpu: Unset baco dummy mode on nbio v7.9
BACO dummy mode could be set under reset conditions and that affects
framebuffer access. Check If baco dummy mode is set, unset it if so.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:58:23 -04:00
YiPeng Chai e81c455685 drm/amdgpu: Enable ras for mp0 v13_0_6 sriov
Enable ras for mp0 v13_0_6 sriov

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:55:02 -04:00
Samir Dhume bae44a8fcb drm/amdgpu/jpeg - skip change of power-gating state for sriov
Powergating is handled in the host driver.

Reviewed-by: Zhigang Luo <zhigang.luo@amd.com>
Signed-off-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:54:18 -04:00
Le Ma 46b55e25c9 drm/amdgpu: update gc_info v2_1 from discovery
Several new fields are exposed in gc_info v2_1

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:53:19 -04:00
Le Ma d4f6425a56 drm/amdgpu: update mall info v2 from discovery
Mall info v2 is introduced in ip discovery

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:53:08 -04:00
Candice Li 4b721ed87e drm/amdgpu: Only support RAS EEPROM on dGPU platform
RAS EEPROM device is only supported on dGPU platform for smu v13_0_6.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:52:49 -04:00
Lang Yu 983ac45a06 drm/amdgpu: update SET_HW_RESOURCES definition for UMSCH
Align with FW changes.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu eebb06d121 drm/amdgpu: add amdgpu_umsch_mm module parameter
Enable Multi Media User Mode Scheduler
(0 = disabled (default), 1 = enabled).

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu 822f780829 drm/amdgpu/discovery: enable UMSCH 4.0 in IP discovery
Enable UMSCH to support VPE and VCN user queues.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu 4f94903332 drm/amdgpu: add PSP loading support for UMSCH
Add front door loading support.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu 40748f9a0a drm/amdgpu: reserve mmhub engine 3 for UMSCH FW
UMSCH FW uses mmhub engine 3 for invalidation.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Acked-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu d591ae0c9f drm/amdgpu: add VPE queue submission test
Submit a fence command through indirect buffer.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu 5d5eac7e83 drm/amdgpu: add selftest framework for UMSCH
Prepare for VPE and VCN queue submission test.

v2: rebase on drm_exec (Alex)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 17:14:21 -04:00
Lang Yu dc6f3d6ff2 drm/amdgpu: enable UMSCH scheduling for VPE
Add VPE into UMSCH hw resourses,
set vmid mask to 0xf00,
set hqd mask to 0xfe,
then UMSCH can schedule VPE queues.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:40:56 -04:00
Lang Yu 3488c79bea drm/amdgpu: add initial support for UMSCH
Add basic data structure, dummy ring functions
and ip functions for UMSCH.

Implement sw_init(ring_init and init_microcodede) and
hw_init(load_microcode), UMSCH can boot up now.

Implement hw_init(ring_start) and hw_fini(ring_stop),
UMSCH is ready for command submission now.

Implement set_hw_resources and add/remove_queue,
UMSCH is ready for scheduling now.

Aggregated doorbell is used to notify UMSCH FW that
there is unmapped queue with corresponding priority level
(e.g., AGDB[0] for Real time band, etc.) is updating its job.

v2: squash together initial patches to avoid breaking the
    build (Alex)

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:40:53 -04:00
Lang Yu 9c852a42a9 drm/amdgpu: add UMSCH firmware header definition
Add firmware header definition for UMSCH.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:40:45 -04:00
Lang Yu 1a29f36781 drm/amdgpu: add UMSCH RING TYPE definition
Add RING TYPE definition for Multi Mdeia User Mode Scheduler.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:40:40 -04:00
Saleemkhan Jamadar 1cf36599b9 drm/amdgpu/jpeg: initialize number of jpeg ring
Initialize number of jpeg ring for vcn 4.0.5.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:39:41 -04:00
Christian König a5492fe27f drm/amdgpu: fix amdgpu_cs_p1_user_fence
The offset is just 32bits here so this can potentially overflow if
somebody specifies a large value. Instead reduce the size to calculate
the last possible offset.

The error handling path incorrectly drops the reference to the user
fence BO resulting in potential reference count underflow.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:39:28 -04:00
Evan Quan 90bcb9b595 drm/amdgpu: revise the device initialization sequences
By placing the sysfs interfaces creation after `.late_int`. Since some
operations performed during `.late_init` may affect how the sysfs
interfaces should be created.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:35:33 -04:00
Evan Quan 3e38b634f9 drm/amd/pm: introduce a new set of OD interfaces
There will be multiple interfaces(sysfs files) exposed with each representing
a single OD functionality. And all those interface will be arranged in a tree
liked hierarchy with the top dir as "gpu_od". Meanwhile all functionalities
for the same component will be arranged under the same directory.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:35:26 -04:00
Saleemkhan Jamadar 6be6e74b7d drm/amdgpu: enable PG flags for VCN
Enable PG flags for VCN and Jpeg on IP 11_5_0

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:35:02 -04:00
Saleemkhan Jamadar 844d8dd5b9 drm/amdgpu/discovery: add VCN 4.0.5 Support
Enable VCN 4.0.5 on gc 11_5_0.

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:58 -04:00
Saleemkhan Jamadar c64f389506 drm/amdgpu/soc21: Add video cap query support for VCN_4_0_5
Added the video capability query support for VCN version 4_0_5

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:52 -04:00
Saleemkhan Jamadar cc308acc9b drm/amdgpu:enable CG and PG flags for VCN
Enable CG and PG flags for VCN on IP 11_5_0

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:48 -04:00
Saleemkhan Jamadar 1827b37582 drm/amdgpu: add VCN_4_0_5 firmware support
Add VCN_4_0_5 firmware support

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:43 -04:00
Saleemkhan Jamadar 8f98a715da drm/amdgpu/jpeg: add jpeg support for VCN4_0_5
Add jpeg support for VCN4_0_5

v2 - update license year (Leo Liu)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:36 -04:00
Saleemkhan Jamadar 547aad32ed drm/amdgpu: add VCN4 ip block support
Add VCN 4.0.5 initialization and decoder/encoder ring functions.

v2 - update license year (Leo Liu)

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:32 -04:00
Lang Yu f9ecae9a4e drm/amdgpu: fix VPE front door loading issue
Implement proper front door loading for vpe 6.1.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:22 -04:00
Lang Yu 5f6e9cdc83 drm/amdgpu: add VPE FW version query support
Add support to query VPE FW version.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:19 -04:00
Lang Yu 3ee8fb7005 drm/amdgpu: enable VPE for VPE 6.1.0
Enable Video Processing Engine on SoCs
that contain VPE 6.1.0.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:16 -04:00
Lang Yu 523c12802d drm/amdgpu: add user space CS support for VPE
Enable command submission to VPE from user space.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:14 -04:00
Lang Yu c5d67a0ec3 drm/amdgpu: add PSP loading support for VPE
Add PSP loading support for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:10 -04:00
Lang Yu 9d4346bdbc drm/amdgpu: add VPE 6.1.0 support
Add skeleton driver code. (Ray)
Add initial support for Video Processing Engine. (Lang)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:34:05 -04:00
Lang Yu 5861e47731 drm/amdgpu: add nbio 7.11 callback for VPE
Add nbio callback to configure doorbell settings.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:56 -04:00
Lang Yu 75fdd738ff drm/amdgpu: add nbio callback for VPE
Add nbio callback to configure doorbell settings.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:53 -04:00
Lang Yu 964a36d7a4 drm/amdgpu: add PSP FW TYPE for VPE
Add PSP FW TYPE for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:50 -04:00
Lang Yu 4c63735fa8 drm/amdgpu: add UCODE ID for VPE
Add UCODE ID for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:47 -04:00
Lang Yu ce7b59c1e6 drm/amdgpu: add support for VPE firmware name decoding
Add decoding VPE firmware name support.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:43 -04:00
Lang Yu 2f3916bedb drm/amdgpu: add doorbell index for VPE
Add doorbell index for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:40 -04:00
Lang Yu 0b233357a6 drm/amdgpu: add HWID for VPE
Add HWID for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:31 -04:00
Lang Yu b0fa855cab drm/amdgpu: add VPE firmware interface
Add initial firmware interface. (Ray)
Add more opcodes and rename to vpe_v6_1. (Lang)

v2: Update copyright date (Alex)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:27 -04:00
Lang Yu 878fe05116 drm/amdgpu: add VPE firmware header definition
Add firmware header definition for Video Processing Engine.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:25 -04:00
Huang Rui 5b28f1c720 drm/amdgpu: add VPE HW IP BLOCK definition
Add HW IP BLOCK for Video Processing Engine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:21 -04:00
Huang Rui 2d6ea3b07c drm/amdgpu: add VPE RING TYPE definition
Add RING TYPE for Video Processing Engine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-31 16:33:09 -04:00
Linus Torvalds b6f6167ea8 pci-v6.6-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmTvfQgUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vyDKA//UBxniXTyxvN8L/agMZngFJd9jLkE
 p2lnk5eTW6y/aJp1g+ujc7IJEmHG/B1Flp0b5mK8XL7S6OBtAGlPwnuPPpXb0ZxV
 ofSuQpYoNZGpkYrQMYvATfdLnH2WF3Yj3WCqh5jd2EldPEyqhMV68l7NMzf6+td2
 KWJPli1XO8e60JAzbhpXH9vn1I0T8e6Qx8z/ulcydfiOH3PGDPnVrEo8gw9CvJOr
 aDqSPW7uhTk2SjjUJcAlQVpTGclE4yBxOOhEbuSGc7L6Ab04Y6D0XKx1589AUK6Z
 W2dQFK3cFYNQQ9aS/2DMUG88H09ca5t8kgUf7Iz3uan1soPzSYK8SLNBgxAPs11S
 1jY093rDXXoaCJqxWUwDc/JUpWq6T3g4m445SNvFIOMcSwmMOIfAwfug4UexE1zC
 Ie8u3Um35Mp25o0o6V1J2EjdBsUsm0p//CsslfoAAIWi85W02Z/46bLLcITchkCe
 bP05H+c55ZN6maRJiaeghcpY+iWO4XCRCKS9mF1v9yn7FOhNxhBcwgTNPyGBVrYz
 T9w3ynTHAmuwNqtd6jhpTR/b1902up/Qv9I8uHhBDMqJAXfHocGEXHZblNuZMgfE
 bu9cjcbFghUPdrhUHYmbEqAzhdlL2SFuMYfn8D4QV4A6x+32xCdwsi39I0Effm5V
 wl0HmemjKjTYbLw=
 =iFFM
 -----END PGP SIGNATURE-----

Merge tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:
   - Add locking to read/modify/write PCIe Capability Register accessors
     for Link Control and Root Control
   - Use pci_dev_id() when possible instead of manually composing ID
     from dev->bus->number and dev->devfn

  Resource management:
   - Move prototypes for __weak sysfs resource files to linux/pci.h to
     fix 'no previous prototype' warnings
   - Make more I/O port accesses depend on HAS_IOPORT
   - Use devm_platform_get_and_ioremap_resource() instead of open-coding
     platform_get_resource() followed by devm_ioremap_resource()

  Power management:
   - Ensure devices are powered up while accessing VPD
   - If device is powered-up, keep it that way while polling for PME
   - Only read PCI_PM_CTRL register when available, to avoid reading the
     wrong register and corrupting dev->current_state

  Virtualization:
   - Avoid Secondary Bus Reset on NVIDIA T4 GPUs

  Error handling:
   - Remove unused pci_disable_pcie_error_reporting()
   - Unexport pci_enable_pcie_error_reporting(), used only by aer.c
   - Unexport pcie_port_bus_type, used only by PCI core

  VGA:
   - Simplify and clean up typos in VGA arbiter

  Apple PCIe controller driver:
   - Initialize pcie->nvecs (number of available MSIs) before use

  Broadcom iProc PCIe controller driver:
   - Use of_property_read_bool() instead of low-level accessors for
     boolean properties

  Broadcom STB PCIe controller driver:
   - Assert PERST# when probing BCM2711 because some bootloaders don't
     do it

  Freescale i.MX6 PCIe controller driver:
   - Add .host_deinit() callback so we can clean up things like
     regulators on probe failure or driver unload

  Freescale Layerscape PCIe controller driver:
   - Add support for link-down notification so the endpoint driver can
     process LINK_DOWN events
   - Add suspend/resume support, including manual
     PME_Turn_off/PME_TO_Ack handshake
   - Save Link Capabilities during probe so they can be restored when
     handling a link-up event, since the controller loses the Link Width
     and Link Speed values during reset

  Intel VMD host bridge driver:
   - Fix disable of bridge windows during domain reset; previously we
     cleared the base/limit registers, which actually left the windows
     enabled

  Marvell MVEBU PCIe controller driver:
   - Remove unused busn member

  Microchip PolarFlare PCIe controller driver:
   - Fix interrupt bit definitions so the SEC and DED interrupt handlers
     work correctly
   - Make driver buildable as a module
   - Read FPGA MSI configuration parameters from hardware instead of
     hard-coding them

  Microsoft Hyper-V host bridge driver:
   - To avoid a NULL pointer dereference, skip MSI restore after
     hibernate if MSI/MSI-X hasn't been enabled

  NVIDIA Tegra194 PCIe controller driver:
   - Revert 'PCI: tegra194: Enable support for 256 Byte payload' because
     Linux doesn't know how to reduce MPS from to 256 to 128 bytes for
     endpoints below a switch (because other devices below the switch
     might already be operating), which leads to 'Malformed TLP' errors

  Qualcomm PCIe controller driver:
   - Add DT and driver support for interconnect bandwidth voting for
     'pcie-mem' and 'cpu-pcie' interconnects
   - Fix broken SDX65 'compatible' DT property
   - Configure controller so MHI bus master clock will be switched off
     while in ASPM L1.x states
   - Use alignment restriction from EPF core in EPF MHI driver
   - Add Endpoint eDMA support
   - Add MHI eDMA support
   - Add Snapdragon SM8450 support to the EPF MHI driversupport
   - Add MHI eDMA support
   - Add Snapdragon SM8450 support to the EPF MHI driversupport
   - Add MHI eDMA support
   - Add Snapdragon SM8450 support to the EPF MHI driversupport
   - Add MHI eDMA support
   - Add Snapdragon SM8450 support to the EPF MHI driver
   - Use iATU for EPF MHI transfers smaller than 4K to avoid eDMA setup
     latency
   - Add sa8775p DT binding and driver support

  Rockchip PCIe controller driver:
   - Use 64-bit mask on MSI 64-bit PCI address to avoid zeroing out the
     upper 32 bits

  SiFive FU740 PCIe controller driver:
   - Set the supported number of MSI vectors so we can use all available
     MSI interrupts

  Synopsys DesignWare PCIe controller driver:
   - Add generic dwc suspend/resume APIs (dw_pcie_suspend_noirq() and
     dw_pcie_resume_noirq()) to be called by controller driver
     suspend/resume ops, and a controller callback to send PME_Turn_Off

  MicroSemi Switchtec management driver:
   - Add support for PCIe Gen5 devices

  Miscellaneous:
   - Reorder and compress to reduce size of struct pci_dev
   - Fix race in DOE destroy_work_on_stack()
   - Add stubs to avoid casts between incompatible function types
   - Explicitly include correct DT includes to untangle headers"

* tag 'pci-v6.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (96 commits)
  PCI: qcom-ep: Add ICC bandwidth voting support
  dt-bindings: PCI: qcom: ep: Add interconnects path
  PCI: qcom-ep: Treat unknown IRQ events as an error
  dt-bindings: PCI: qcom: Fix SDX65 compatible
  PCI: endpoint: Add kernel-doc for pci_epc_mem_init() API
  PCI: epf-mhi: Use iATU for small transfers
  PCI: epf-mhi: Add support for SM8450
  PCI: epf-mhi: Add eDMA support
  PCI: qcom-ep: Add eDMA support
  PCI: epf-mhi: Make use of the alignment restriction from EPF core
  PCI/PM: Only read PCI_PM_CTRL register when available
  PCI: qcom: Add support for sa8775p SoC
  dt-bindings: PCI: qcom: Add sa8775p compatible
  PCI: qcom-ep: Pass alignment restriction to the EPF core
  PCI: Simplify pcie_capability_clear_and_set_word() control flow
  PCI: Tidy config space save/restore messages
  PCI: Fix code formatting inconsistencies
  PCI: Fix typos in docs and comments
  PCI: Fix pci_bus_resetable(), pci_slot_resetable() name typos
  PCI: Simplify pci_dev_driver()
  ...
2023-08-30 20:23:07 -07:00
Srinivasan Shanmugam 8254e05c82 drm/amdgpu: Fix printk_ratelimit() with DRM_ERROR_RATELIMITED in 'amdgpu_cs_ioctl'
Replaced printk_ratelimit() with its DRM equivalent to avoid flooding of
dmesg logs & hence fixes the following:

WARNING: Prefer printk_ratelimited or pr_<level>_ratelimited to printk_ratelimit
+               if (printk_ratelimit())

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:51:17 -04:00
Srinivasan Shanmugam bf227a4f05 drm/amdgpu: Use READ_ONCE() when reading the values in 'sdma_v4_4_2_ring_get_rptr'
Use READ_ONCE() instead of declaring the pointer volatile. To prevent
the compiler from refetching or reordering the read, so that the read
value is always consistent.

Link: https://lwn.net/Articles/624126/
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Le Ma <le.ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:51:17 -04:00
Yifan Zhang 4d5dc6260c drm/amdgpu: remove unused parameter in amdgpu_vmid_grab_idle
amdgpu_vm is not used in amdgpu_vmid_grab_idle.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:51:17 -04:00
Hawking Zhang bf7aa8bea9 drm/amdgpu: Free ras cmd input buffer properly
Do not access the pointer for ras input cmd buffer
if it is even not allocated.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:51:16 -04:00
Ma Jun 8f9a9a09af drm/amd: Simplify the bo size check funciton
Simplify the code logic of size check function amdgpu_bo_validate_size

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30 15:51:16 -04:00