Commit Graph

14456 Commits

Author SHA1 Message Date
Likun Gao 6d5064c379 drm/amdgpu: support for gc_info table v1.3
Add gc_info table v1.3 for IP discovery.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 875ff9a7ee)
2024-08-28 10:05:54 -04:00
Alex Deucher 959fc102ff drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
This needs to be set to 1 to avoid a potential deadlock in
the GC 10.x and newer.  On GC 9.x and older, this needs
to be set to 0. This can lead to hangs in some mixed
graphics and compute workloads.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 40318a2406)
2024-08-28 10:04:53 -04:00
Alex Deucher 9cead81eff drm/amdgpu: fix eGPU hotplug regression
The driver needs to wait for the on board firmware
to finish its initialization before probing the card.
Commit 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
switched from using msleep() to using usleep_range() which
seems to have caused init failures on some navi1x boards. Switch
back to msleep().

Fixes: 959056982a ("drm/amdgpu: Fix discovery initialization failure during pci rescan")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Ma Jun <Jun.Ma2@amd.com>
(cherry picked from commit c69b07f7bb)
Cc: stable@vger.kernel.org # 6.10.x
2024-08-20 23:07:11 -04:00
Candice Li c99769bcea drm/amdgpu: Validate TA binary size
Add TA binary size validation to avoid OOB write.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c0a04e3570)
Cc: stable@vger.kernel.org
2024-08-20 23:04:17 -04:00
Alex Deucher e3e4bf58ba drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
The workaround seems to cause stability issues on other
SDMA 5.2.x IPs.

Fixes: a03ebf1163 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556
Acked-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2dc3851ef7)
Cc: stable@vger.kernel.org
2024-08-20 22:51:37 -04:00
Yang Wang 0b43312902 drm/amdgpu: fixing rlc firmware loading failure issue
Skip rlc firmware validation to ignore firmware header size mismatch issues.
This restores the workaround added in
commit 849e133c97 ("drm/amdgpu: Fix the null pointer when load rlc firmware")

Fixes: 3af2c80ae2 ("drm/amdgpu: refine gfx10 firmware loading")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 89ec85d16e)
2024-08-20 22:51:31 -04:00
Kenneth Feng 23acd1f344 drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
add HDP_SD support on gc 12.0.0/1

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 61cffacb3a)
2024-08-13 13:20:43 -04:00
Yinjie Yao 507a2286c0 drm/amdgpu: Update kmd_fw_shared for VCN5
kmd_fw_shared changed in VCN5

Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa02486fb1)
2024-08-13 13:20:36 -04:00
David (Ming Qiang) Wu 470516c292 drm/amd/amdgpu: command submission parser for JPEG
Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a7f670d5d8)
Cc: stable@vger.kernel.org
2024-08-13 13:17:36 -04:00
Jack Xiao 4246b1077f drm/amdgpu/mes12: fix suspend issue
Use mes pipe to unmap kcq and kgq.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f7fb9d677f)
2024-08-13 13:17:36 -04:00
Jack Xiao af401543df drm/amdgpu/mes12: sw/hw fini for unified mes
Free memory for two pipes and unmap pipe0 via pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98cae695a8)
2024-08-13 13:17:36 -04:00
Jack Xiao 7254027e1e drm/amdgpu/mes12: configure two pipes hardware resources
Configure two pipes with different hardware resources.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ea5d6db17a)
2024-08-13 13:17:36 -04:00
Jack Xiao 1097727d6d drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa539da8af)
2024-08-13 13:17:36 -04:00
Jack Xiao 3738a7f0dd drm/amdgpu/mes12: add mes pipe switch support
Add mes pipe switch to let caller choose pipe
to submit packet.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b2dee0837a)
2024-08-13 13:17:36 -04:00
Jack Xiao a13d91bf3c drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1
Enable unified mes firmware to load on pipe0 and pipe1.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e69c2dd753)
2024-08-13 13:17:36 -04:00
Jack Xiao 2029b3d7e1 drm/amdgpu/mes: add multiple mes ring instances support
Add multiple mes ring instances in mes structure to support
multiple mes pipes.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c7d4355648)
2024-08-13 13:04:48 -04:00
Bas Nieuwenhuizen 0573a1e2ea drm/amdgpu: Actually check flags for all context ops.
Missing validation ...

Checked libdrm and it clears all the structs, so we should be
safe to just check everything.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6b86421f1)
Cc: stable@vger.kernel.org
2024-08-13 13:03:57 -04:00
Alex Deucher e6c6bd6253 drm/amdgpu/jpeg4: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6c2e8b6a4)
Cc: stable@vger.kernel.org
2024-08-13 13:03:11 -04:00
Alex Deucher e414a304f2 drm/amdgpu/jpeg2: properly set atomics vmid field
This needs to be set as well if the IB uses atomics.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 35c628774e)
Cc: stable@vger.kernel.org
2024-08-13 13:02:48 -04:00
Jack Xiao 11752c013f drm/amdgpu/mes: fix mes ring buffer overflow
wait memory room until enough before writing mes packets
to avoid ring buffer overflow.

v2: squash in sched_hw_submission fix

Fixes: de32462541 ("drm/amdgpu: cleanup MES11 command submission")
Fixes: fffe347e14 ("drm/amdgpu: cleanup MES12 command submission")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 34e087e892)
Cc: stable@vger.kernel.org
2024-08-13 12:50:01 -04:00
Arunpravin Paneer Selvam 6ad9dafba1 drm/amdgpu: Add DCC GFX12 flag to enable address alignment
We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other
kernel level GFX12 DCC flag to differentiate the DCC buffers and other
pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled).

If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over
allocate for all the pinned display buffers unnecessarily that leads to
memory allocation failure.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 46142cc1b9)
2024-08-07 18:23:59 -04:00
Frank Min 7fc5f252c0 drm/amdgpu: correct sdma7 max dw
correct sdma7 max dw into 8

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 86598c3819)
2024-08-07 18:23:49 -04:00
Arunpravin Paneer Selvam 4a5ad08f53 drm/amdgpu: Add address alignment support to DCC buffers
Add address alignment support to the DCC VRAM buffers.

v2:
  - adjust size based on the max_texture_channel_caches values
    only for GFX12 DCC buffers.
  - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
    for DCC buffers.
  - roundup non power of two DCC buffer adjusted size to nearest
    power of two number as the buddy allocator does not support non
    power of two alignments. This applies only to the contiguous
    DCC buffers.

v3:(Alex)
  - rewrite the max texture channel caches comparison code in an
    algorithmic way to determine the alignment size.

v4:(Alex)
  - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
    and add a new gmc func callback for dcc alignment. If the callback
    is non-NULL, call it to get the alignment, otherwise, use the default.

v5:(Alex)
  - Set the Alignment to a default value if the callback doesn't exist.
  - Add the callback to amdgpu_gmc_funcs.

v6:
  - Fix checkpatch warning reported by Intel CI.

v7:(Christian)
  - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that
    checks the BO pinning and for a specific hw generation.

v8:(Christian)
  - move this check into gmc_v12_0_get_dcc_alignment.

v9:
  - Fix 32bit build errors

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa94b623cb)
2024-08-07 18:23:42 -04:00
Frank Min 5d687a67fd drm/amdgpu: change non-dcc buffer copy configuration
Without setting cpv bit and 7th ib dw, non-dcc buffer copy will have
random corruption

So set the cpv bit and clear the 7th ib dw for copy non-dcc buffers

Signed-off-by: Frank Min <Frank.Min@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5aacf8917f)
2024-08-07 18:21:05 -04:00
Joshua Ashton 829798c789 drm/amdgpu: Forward soft recovery errors to userspace
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.

1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 434967aadb)
Cc: stable@vger.kernel.org
2024-08-07 18:20:07 -04:00
Likun Gao 8ff3bb44cc drm/amdgpu: add golden setting for gc v12
Adding Manual GDB golden setting for gc v12
revision 0 ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c9875d0a78)
2024-08-07 18:19:38 -04:00
Likun Gao aa5c9701eb drm/amdgpu: force to use legacy inv in mmhub
MMHUB v4.1.0 only support fixed cache mode, so
only use legacy invalidation accordingly.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9192c7613c)
2024-08-07 18:16:38 -04:00
Michael Chen 9038e25c80 drm/amdgpu: increase mes log buffer size for gfx12
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 739d0f3e1f)
2024-07-27 18:10:12 -04:00
Christian König f3572db3c0 drm/amdgpu: fix contiguous handling for IB parsing v2
Otherwise we won't get correct access to the IB.

v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in
    the VRAM backend.

Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501
Fixes: e362b7c8f8 ("drm/amdgpu: Modify the contiguous flags behaviour")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fbfb5f0342)
2024-07-27 18:09:38 -04:00
ZhenGuo Yin 5659b0c93a drm/amdgpu: reset vm state machine after gpu reset(vram lost)
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.

[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.

v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.

Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b05)
2024-07-24 17:30:49 -04:00
Tim Huang fab1ead0ae drm/amdgpu: add missed harvest check for VCN IP v4/v5
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.

v2:
Apply the same check to VCN IP v4.0 and v5.0.

[   54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[   54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[   54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[   54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[   54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[   54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[   54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[   54.075400] FS:  00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[   54.075988] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[   54.076927] PKRU: 55555554
[   54.077132] Call Trace:
[   54.077319]  <TASK>
[   54.077484]  ? show_regs+0x69/0x80
[   54.077747]  ? __die+0x28/0x70
[   54.077979]  ? page_fault_oops+0x180/0x4b0
[   54.078286]  ? do_user_addr_fault+0x2d2/0x680
[   54.078610]  ? exc_page_fault+0x84/0x190
[   54.078910]  ? asm_exc_page_fault+0x2b/0x30
[   54.079224]  ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[   54.079941]  ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[   54.080617]  vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[   54.081316]  amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[   54.082057]  amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[   54.082727]  amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[   54.083351]  amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[   54.084054]  amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[   54.084698]  vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[   54.085307]  amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[   54.085951]  amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[   54.086591]  amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[   54.087215]  local_pci_probe+0x48/0xa0
[   54.087509]  pci_device_probe+0xc9/0x250
[   54.087812]  really_probe+0x1a4/0x3f0
[   54.088101]  __driver_probe_device+0x7d/0x170
[   54.088443]  driver_probe_device+0x24/0xa0
[   54.088765]  __driver_attach+0xdd/0x1d0
[   54.089068]  ? __pfx___driver_attach+0x10/0x10
[   54.089417]  bus_for_each_dev+0x8e/0xe0
[   54.089718]  driver_attach+0x22/0x30
[   54.090000]  bus_add_driver+0x120/0x220
[   54.090303]  driver_register+0x62/0x120
[   54.090606]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   54.091255]  __pci_register_driver+0x62/0x70
[   54.091593]  amdgpu_init+0x67/0xff0 [amdgpu]
[   54.092190]  do_one_initcall+0x5f/0x330
[   54.092495]  do_init_module+0x68/0x240
[   54.092794]  load_module+0x201c/0x2110
[   54.093093]  init_module_from_file+0x97/0xd0
[   54.093428]  ? init_module_from_file+0x97/0xd0
[   54.093777]  idempotent_init_module+0x11c/0x2a0
[   54.094134]  __x64_sys_finit_module+0x64/0xc0
[   54.094476]  do_syscall_64+0x58/0x120
[   54.094767]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 0b071245dd)
2024-07-24 17:30:23 -04:00
Stanley.Yang 1a8825259a drm/amdgpu: Fix eeprom max record count
The eeprom table is empty before initializing,
set eeprom table version first before initializing.

Changed from V1:
	Reuse amdgpu_ras_set_eeprom_table_version function

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 015b8a2fdf)
2024-07-24 17:30:23 -04:00
YiPeng Chai afac8c6554 drm/amdgpu: fix ras UE error injection failure issue
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8284951a6e)
2024-07-24 17:30:23 -04:00
Jane Jian 6728f55590 drm/amdgpu/vcn: Use offsets local to VCN/JPEG in VF
For VCN/JPEG 4.0.3, use only the local addressing scheme.

- Mask bit higher than AID0 range

v2
remain the case for mmhub use master XCC

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit caaf576292)
2024-07-24 17:30:23 -04:00
Lijo Lazar 485432d090 drm/amdgpu: Add empty HDP flush function to VCN v4.0.3
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.

This change is necessary for VCN v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 49cfaebe48)
2024-07-24 17:30:23 -04:00
Lijo Lazar 23df34997d drm/amdgpu: Add empty HDP flush function to JPEG v4.0.3
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.

This change is necessary for JPEG v4.0.3, no need for backward compatibility

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 585e3fdb36)
2024-07-24 17:30:23 -04:00
Ma Ke df65aabef3 drm/amd/amdgpu: Fix uninitialized variable warnings
Return 0 to avoid returning an uninitialized variable r.

Cc: stable@vger.kernel.org
Fixes: 230dd6bb61 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6472de66c0)
2024-07-24 17:30:23 -04:00
David Belanger 73048bda46 drm/amdgpu: Fix atomics on GFX12
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.

Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 666f14cab2)
2024-07-24 17:30:23 -04:00
Alex Deucher a03ebf1163 drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().

Enable this workaround while we are root causing the issue with
the HW team.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit f2ac526349)
2024-07-24 17:29:57 -04:00
YiPeng Chai b3fb79cda5 drm/amdgpu: add mutex to protect ras shared memory
Add mutex to protect ras shared memory.

v2:
  Add TA_RAS_COMMAND__TRIGGER_ERROR command call
  status check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:30 -04:00
Boyuan Zhang 7d75ef3736 drm/amdgpu/vcn: not pause dpg for unified queue
For unified queue, DPG pause for encoding is done inside VCN firmware,
so there is no need to pause dpg based on ring type in kernel.

For VCN3 and below, pausing DPG for encoding in kernel is still needed.

v2: add more comments
v3: update commit message

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:44:04 -04:00
Boyuan Zhang ecfa23c8df drm/amdgpu/vcn: identify unified queue in sw init
Determine whether VCN using unified queue in sw_init, instead of calling
functions later on.

v2: fix coding style

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-16 11:43:48 -04:00
Aurabindo Pillai 28814be882 drm/amd: Bump KMS_DRIVER_MINOR version
Increase the KMS minor version to indicate GFX12 DCC support since this
contains a major change in how DCC is managed across IPs like GFX, DCN
etc. This will be used mainly by userspace like Mesa to figure out
DCC support on GFX12 hardware.

v2: fix version number (Alex)

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-15 17:41:23 -04:00
Alex Deucher 1cff1010be drm/amdgpu/mes12: add missing opcode string
Fixes the indexing of the string array.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher 478cb8badf drm/amdgpu/mes11: update opcode strings
Add new packet.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-12 11:46:46 -04:00
Alex Deucher 8030f6533e drm/amdgpu: remove exp hw support check for gfx12
Enable it by default.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 13:45:57 -04:00
YiPeng Chai e23300dfff drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
The problem case is as follows:
1. GPU A triggers a gpu ras reset, and GPU A drives
   GPU B to also perform a gpu ras reset.
2. After gpu B ras reset started, gpu B queried a DE
   data. Since the DE data was queried in the ras reset
   thread instead of the page retirement thread, bad
   page retirement work would not be triggered. Then
   even if all gpu resets are completed, the bad pages
   will be cached in RAM until GPU B's bad page retirement
   work is triggered again and then saved to eeprom.

This patch can save the bad pages to eeprom in time after gpu
ras reset is completed.

v2:
  1. Add the above description to code comments.
  2. Reuse existing function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:41 -04:00
YiPeng Chai c04706914d drm/amdgpu: flush all cached ras bad pages to eeprom
Before uninstalling gpu driver, flush all cached ras
bad pages to eeprom.

v2:
  Put the same code into a function and reuse the function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:35 -04:00
Sunil Khatri c39385710c drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX12.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:22 -04:00
Sunil Khatri a85cc86cce drm/amdgpu: select compute ME engines dynamically
GFX ME right now is one but this could change in
future SOC's. Use no of ME for GFX as start point
for ME for compute for GFX11.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10 10:13:04 -04:00