Commit Graph

14427 Commits

Author SHA1 Message Date
ZhenGuo Yin 5659b0c93a drm/amdgpu: reset vm state machine after gpu reset(vram lost)
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.

v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b05)
2024-07-24 17:30:49 -04:00
Tim Huang fab1ead0ae drm/amdgpu: add missed harvest check for VCN IP v4/v5
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.

v2:
Apply the same check to VCN IP v4.0 and v5.0.

[   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[   54.076927] PKRU: 55555554
[   54.077132] Call Trace:
[   54.077319]  <TASK>
[   54.077484]  ? show_regs+0x69/0x80
[   54.077747]  ? __die+0x28/0x70
[   54.077979]  ? page_fault_oops+0x180/0x4b0
[   54.078286]  ? do_user_addr_fault+0x2d2/0x680
[   54.078610]  ? exc_page_fault+0x84/0x190
[   54.078910]  ? asm_exc_page_fault+0x2b/0x30
[   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[   54.087215]  local_pci_probe+0x48/0xa0
[   54.087509]  pci_device_probe+0xc9/0x250
[   54.087812]  really_probe+0x1a4/0x3f0
[   54.088101]  __driver_probe_device+0x7d/0x170
[   54.088443]  driver_probe_device+0x24/0xa0
[   54.088765]  __driver_attach+0xdd/0x1d0
[   54.089068]  ? __pfx___driver_attach+0x10/0x10
[   54.089417]  bus_for_each_dev+0x8e/0xe0
[   54.089718]  driver_attach+0x22/0x30
[   54.090000]  bus_add_driver+0x120/0x220
[   54.090303]  driver_register+0x62/0x120
[   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   54.091255]  __pci_register_driver+0x62/0x70
[   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
[   54.092190]  do_one_initcall+0x5f/0x330
[   54.092495]  do_init_module+0x68/0x240
[   54.092794]  load_module+0x201c/0x2110
[   54.093093]  init_module_from_file+0x97/0xd0
[   54.093428]  ? init_module_from_file+0x97/0xd0
[   54.093777]  idempotent_init_module+0x11c/0x2a0
[   54.094134]  __x64_sys_finit_module+0x64/0xc0
[   54.094476]  do_syscall_64+0x58/0x120
[   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 0b071245dd)
2024-07-24 17:30:23 -04:00
Stanley.Yang 1a8825259a drm/amdgpu: Fix eeprom max record count
The eeprom table is empty before initializing,
set eeprom table version first before initializing.

Changed from V1:
	Reuse amdgpu_ras_set_eeprom_table_version function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 015b8a2fdf)
2024-07-24 17:30:23 -04:00
YiPeng Chai afac8c6554 drm/amdgpu: fix ras UE error injection failure issue
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8284951a6e)
2024-07-24 17:30:23 -04:00
Jane Jian 6728f55590 drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
For VCN/JPEG 4.0.3, use only the local addressing scheme.

- Mask bit higher than AID0 range

v2
remain the case for mmhub use master XCC

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit caaf576292)
2024-07-24 17:30:23 -04:00
Lijo Lazar 485432d090 drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.

This change is necessary for VCN v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 49cfaebe48)
2024-07-24 17:30:23 -04:00
Lijo Lazar 23df34997d drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.

This change is necessary for JPEG v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 585e3fdb36)
2024-07-24 17:30:23 -04:00
Ma Ke df65aabef3 drm/amd/amdgpu: Fix uninitialized variable warnings
Return 0 to avoid returning an uninitialized variable r.

Cc: stable@vger.kernel.org
Fixes: 230dd6bb61 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6472de66c0)
2024-07-24 17:30:23 -04:00
David Belanger 73048bda46 drm/amdgpu: Fix atomics on GFX12
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 666f14cab2)
2024-07-24 17:30:23 -04:00
Alex Deucher a03ebf1163 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().

Enable this workaround while we are root causing the issue with
the HW team.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit f2ac526349)
2024-07-24 17:29:57 -04:00
YiPeng Chai b3fb79cda5 drm/amdgpu: add mutex to protect ras shared memory
Add mutex to protect ras shared memory.

v2:
  Add TA_RAS_COMMAND__TRIGGER_ERROR command call
  status check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:30 -04:00
Boyuan Zhang 7d75ef3736 drm/amdgpu/vcn: not pause dpg for unified queue
For unified queue, DPG pause for encoding is done inside VCN firmware,
so there is no need to pause dpg based on ring type in kernel.

For VCN3 and below, pausing DPG for encoding in kernel is still needed.

v2: add more comments
v3: update commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:04 -04:00
Boyuan Zhang ecfa23c8df drm/amdgpu/vcn: identify unified queue in sw init
Determine whether VCN using unified queue in sw_init, instead of calling
functions later on.

v2: fix coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:43:48 -04:00
Aurabindo Pillai 28814be882 drm/amd: Bump KMS_DRIVER_MINOR version
Increase the KMS minor version to indicate GFX12 DCC support since this
contains a major change in how DCC is managed across IPs like GFX, DCN
etc. This will be used mainly by userspace like Mesa to figure out
DCC support on GFX12 hardware.

v2: fix version number (Alex)

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-15 17:41:23 -04:00
Alex Deucher 1cff1010be drm/amdgpu/mes12: add missing opcode string
Fixes the indexing of the string array.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher 478cb8badf drm/amdgpu/mes11: update opcode strings
Add new packet.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher 8030f6533e drm/amdgpu: remove exp hw support check for gfx12
Enable it by default.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 13:45:57 -04:00
YiPeng Chai e23300dfff drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
The problem case is as follows:
1. GPU A triggers a gpu ras reset, and GPU A drives
   GPU B to also perform a gpu ras reset.
2. After gpu B ras reset started, gpu B queried a DE
   data. Since the DE data was queried in the ras reset
   thread instead of the page retirement thread, bad
   page retirement work would not be triggered. Then
   even if all gpu resets are completed, the bad pages
   will be cached in RAM until GPU B's bad page retirement
   work is triggered again and then saved to eeprom.

This patch can save the bad pages to eeprom in time after gpu
ras reset is completed.

v2:
  1. Add the above description to code comments.
  2. Reuse existing function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:41 -04:00
YiPeng Chai c04706914d drm/amdgpu: flush all cached ras bad pages to eeprom
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.

v2:
  Put the same code into a function and reuse the function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:35 -04:00
Sunil Khatri c39385710c drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX12.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:22 -04:00
Sunil Khatri a85cc86cce drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX11.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:04 -04:00
Alex Deucher 7d570f56f1 drm/amdgpu/job: Replace DRM_INFO/ERROR logging
Use the dev_info/err variants so we get per device logging
in multi-GPU cases.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:55 -04:00
Sunil Khatri 948f2828a6 drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX10.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:48 -04:00
Lijo Lazar d02ddefc7e drm/amdgpu: Initialize VF partition mode
For SOCs with GFX v9.4.3, a VF may have multiple compute partitions.
Fetch the partition information during init and initialize partition
nodes. There is no support to switch partition mode in VF mode, hence
disable the same.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:28 -04:00
Gavin Wan 5d64af40e3 drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.
sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have
sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For
Even numbered vfs, the sdma2 & sdma3 (irq srouce id
CLIENTID_SDMA2 and CLIENTID_SDMA3) should map to irq seq 0 & 1.

Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:12:19 -04:00
Zhigang Luo ee98fb71ba drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1
to avoid reading wrong WPTR from doorbell in sriov vf, set
CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD.

Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:27 -04:00
Yang Wang 59f488be76 drm/amdgpu: add ras event state device attribute support
add amdgpu ras 'event_state' sysfs device attribute support

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:56:13 -04:00
Yang Wang 12b435a40c drm/amdgpu: add ras POSION_CONSUMPTION event id support
add amdgpu ras POSION_CONSUMPTION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:37 -04:00
Yang Wang 5b9de2596f drm/amdgpu: add ras POSION_CREATION event id support
add amdgpu ras POSION_CREATION event id support.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:18 -04:00
Yang Wang 75ac6a2506 drm/amdgpu: refine amdgpu ras event id core code
v1:
- use unified event id to manage ras events
- add a new function amdgpu_ras_query_error_status_with_event() to accept
  event type as parameter.

v2:
add a warn log to show the location of function failure
when calling amdgpu_ras_mark_event(). (Tao Zhou)

v3:
change RAS_EVENT_TYPE_ISR to RAS_EVENT_TYPE_FATAL.

v4:
rename amdgpu_ras_get_recovery_event() to
amdgpu_ras_get_fatal_error_event().

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:55:11 -04:00
Christian König 320debca1b drm/amdgpu: reject gang submit on reserved VMIDs
A gang submit won't work if the VMID is reserved and we can't flush out
VM changes from multiple engines at the same time.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:53 -04:00
Saleemkhan Jamadar b6ad109166 drm/amdgpu: enable dpg for vcn and jpeg on GC 11_5_2
DPG mode is enabled for vcn and jpeg on VCN v4_0_5

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:50:40 -04:00
Yang Wang 332210c13a drm/amdgpu: remove redundant semicolons in RAS_EVENT_LOG
remove redundant semicolons in RAS_EVENT_LOG to avoid
code format check warning.

Fixes: b712d7c201 ("drm/amdgpu: fix compiler 'side-effect' check issue for RAS_EVENT_LOG()")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:43 -04:00
Frank Min 54837bd2be drm/amdgpu: restore dcc bo tilling configs while moving
While moving buffer which has dcc tiling config, it is needed to restore
its original dcc tiling.

1. extend copy flag to cover tiling bits
2. add logic to restore original dcc tiling config

Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:27 -04:00
Sunil Khatri c8714ac982 drm/amdgpu: add gfx queue support for gfx12 ipdump
Add support of all the CP GFX queues for gfx12 ipdump
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:22 -04:00
Sunil Khatri 495e6173a4 drm/amdgpu: add cp queue registers for gfx12 ipdump
Add gfx12 support of CP queue registers for all queues
to be used by devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:15 -04:00
Sunil Khatri f0c6b79bfc drm/amdgpu: enable redirection of irq's for IH v7.0
Enable redirection of irq for pagefaults for specific
clients to avoid overflow without dropping interrupts.

So here we redirect the interrupts to another IH ring
i.e ring1 where only these interrupts are processed.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:09 -04:00
Sunil Khatri 906219ec94 drm:amdgpu: enable IH ring1 for IH v7.0
We need IH ring1 for handling the pagefault
interrupts which over flow in default
ring for specific usecases.

Enable ring1 allows software to redirect
high interrupts to ring1 from default IH
ring.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:47:02 -04:00
Yifan Zha 33f23fc315 drm/amdgpu: Set no_hw_access when VF request full GPU fails
[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.

[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:56 -04:00
Sunil Khatri 2262acad0a drm/amdgpu: add print support for gfx12 ipdump
Add support of gfx12 ipdump print so devcoredump
could trigger it to dump the captured registers
in devcoredump.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:49 -04:00
Sunil Khatri fbbbb62112 drm/amdgpu: add gfx12 register support in ipdump
Add general registers of gfx12 in ipdump for
devcoredump support.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:43 -04:00
Frank Min ffcc5745ed drm/amdgpu: update gfxhub client id for gfx12
update gfxhub client id for gfx12

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:36 -04:00
Tim Huang 064d92436b drm/amd/pm: avoid to load smu firmware for APUs
Certain call paths still load the SMU firmware for APUs,
which needs to be skipped.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:30 -04:00
YiPeng Chai 78347b651a drm/amdgpu: sysfs node disable query error count during gpu reset
Sysfs node disable query error count during gpu reset.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08 16:46:14 -04:00
Daniel Vetter 6be146cf57 amd-drm-next-6.11-2024-07-03:
amdgpu:
 - Use vmalloc for dc_state
 - Replay fixes
 - Freesync fixes
 - DCN 4.0.1 fixes
 - DML fixes
 - DCC updates
 - Misc code cleanups and bug fixes
 - 8K display fixes
 - DCN 3.5 fixes
 - Restructure DIO code
 - DML1 fixes
 - DML2 fixes
 - GFX11 fix
 - GFX12 updates
 - GFX12 modifiers fixes
 - RAS fixes
 - IP dump fixes
 - Add some updated IP version checks
 _ Silence UBSAN warning
 
 radeon:
 - GPUVM fix
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZoW9RgAKCRC93/aFa7yZ
 2AfjAQDJH95ijhdClk4S7t0ofjam7UPhVkJiUl8u73BNbuul/QEAk7RadnWw8SwC
 wvSXMCyHY3XszMJ1AdWG+4uwq5axPQE=
 =aWkn
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-07-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-07-03:

amdgpu:
- Use vmalloc for dc_state
- Replay fixes
- Freesync fixes
- DCN 4.0.1 fixes
- DML fixes
- DCC updates
- Misc code cleanups and bug fixes
- 8K display fixes
- DCN 3.5 fixes
- Restructure DIO code
- DML1 fixes
- DML2 fixes
- GFX11 fix
- GFX12 updates
- GFX12 modifiers fixes
- RAS fixes
- IP dump fixes
- Add some updated IP version checks
_ Silence UBSAN warning

radeon:
- GPUVM fix

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703211314.2041893-1-alexander.deucher@amd.com
2024-07-05 12:02:12 +02:00
Daniel Vetter 71e9f407fd amd-drm-next-6.11-2024-06-28:
amdgpu:
 - JPEG 5.x fixes
 - More FW loading cleanups
 - Misc code cleanups
 - GC 12.x fixes
 - ASPM fix
 - DCN 4.0.1 updates
 - SR-IOV fixes
 - HDCP fix
 - USB4 fixes
 - Silence UBSAN warnings
 - MES submission fixes
 - Update documentation for new products
 - DCC updates
 - Initial ISP 4.x plumbing
 - RAS fixes
 - Misc small fixes
 
 amdkfd:
 - Fix missing unlock in error path for adding queues
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZn8qfAAKCRC93/aFa7yZ
 2H9QAQCZZsu8vps+HVUY5vihNXNL9TdUbrrroQBEyk+DsJzE9QD8CJW8/oiP8/fa
 /EsAWXNZUH+mPxRaVWTxp0j2P941pww=
 =uZbG
 -----END PGP SIGNATURE-----

Merge tag 'amd-drm-next-6.11-2024-06-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.11-2024-06-28:

amdgpu:
- JPEG 5.x fixes
- More FW loading cleanups
- Misc code cleanups
- GC 12.x fixes
- ASPM fix
- DCN 4.0.1 updates
- SR-IOV fixes
- HDCP fix
- USB4 fixes
- Silence UBSAN warnings
- MES submission fixes
- Update documentation for new products
- DCC updates
- Initial ISP 4.x plumbing
- RAS fixes
- Misc small fixes

amdkfd:
- Fix missing unlock in error path for adding queues

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240628213135.427214-1-alexander.deucher@amd.com
2024-07-05 11:39:23 +02:00
Daniel Vetter 86634fa4e6 Linux 6.10-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH
 E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H
 wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR
 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp
 zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM
 quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg
 O6fPUV8=
 =pgUx
 -----END PGP SIGNATURE-----

Merge v6.10-rc6 into drm-next

The exynos-next pull is based on a newer -rc than drm-next. hence
backmerge first to make sure the unrelated conflicts we accumulated
don't end up randomly in the exynos merge pull, but are separated out.

Conflicts are all benign: Adjacent changes in amdgpu and fbdev-dma
code, and cherry-pick conflict in xe.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-07-05 10:47:28 +02:00
Sunil Khatri ea67deb03c drm/amdgpu: fix out of bounds access in gfx11 during ip dump
During ip dump in gfx11 the index variable is reused but is
not reinitialized to 0 and this causes the index calculation
to be wrong and access out of bound access.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:16 -04:00
Tim Huang 94845ea057 drm/amdgpu: add firmware for PSP IP v14.0.4
This patch is to add firmware for PSP 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:10 -04:00
Tim Huang b709f949f0 drm/amdgpu: enable mode2 reset for SMU IP v14.0.4
Set the default reset method to mode2 for SMU 14.0.4.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02 18:07:02 -04:00