Commit Graph

11690 Commits

Author SHA1 Message Date
Yushan Zhou 5a2a19b7ac drm/amdgpu: remove redundant NULL check
release_firmware() checks whether firmware pointer is NULL.
Remove the redundant NULL check in psp_sw_fini().

Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Veerabadhran Gopalakrishnan 65009bf2b4 amdgpu/nv.c: Corrected typo in the video capabilities resolution
Corrected the typo in the 4K resolution parameters.

Fixes: b3a24461f9 ("amdgpu/nv.c - Added codec query for Beige Goby")
Fixes: 9075096b09 ("amdgpu/nv.c - Optimize code for video codec support structure")
Fixes: 9ac0edaa0f ("drm/amdgpu: add vcn_4_0_0 video codec query")

Signed-off-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Liang He dfd0287bd3 drm/amdgpu: Fix potential double free and null pointer dereference
In amdgpu_get_xgmi_hive(), we should not call kfree() after
kobject_put() as the PUT will call kfree().

In amdgpu_device_ip_init(), we need to check the returned *hive*
which can be NULL before we dereference it.

Signed-off-by: Liang He <windhl@126.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Tim Huang 2c83e3fd92 drm/amdgpu: enable PSP IP v13.0.11 support
Enable PSP FW loading for PSP IP v13.0.11

Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Saleemkhan Jamadar 2a0fe2ca6e drm/amdgpu: Enable pg/cg flags on GC11_0_4 for VCN
This enable VCN PG, CG and JPEG PG, CG

Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Yifan Zhang 7308ceb446 drm/amdgpu/discovery: enable nbio support for NBIO v7.7.1
this patch is to enable nbio support for NBIO v7.7.1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Tim Huang 18ad18853c drm/amdgpu/soc21: add mode2 asic reset for SMU IP v13.0.11
Set the default reset method to mode2 for SMU IP v13.0.11

Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Yifan Zhang 51e7a21687 drm/amdgpu: add smu 13 support for smu 13.0.11
this patch to add smu 13 support for smu 13.0.11.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:37 -05:00
Yifan Zhang d0ca824899 drm/amdgpu: add gmc v11 support for GC 11.0.4
Add gmc v11 support for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang 311d52367d drm/amdgpu: add soc21 common ip block support for GC 11.0.4
Add common soc21 ip block support for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang 1763cb65e8 drm/amdgpu: add gfx support for GC 11.0.4
this patch to add GC 11.0.4 gfx support to gfx11 implementation.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang dd2d9c7fd7 drm/amdgpu/discovery: set the APU flag for GC 11.0.4
Set the APU flag appropriately for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang 94ab706858 drm/amdgpu: set GC 11.0.4 family
this patch is to set GC 11.0.4 family.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Tim Huang 7c1389f1b1 drm/amdgpu/discovery: add PSP IP v13.0.11 support
Add PSP IP v13.0.11 ip discovery support.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang 6a6af77570 drm/amdgpu/discovery: enable mes support for GC v11.0.4
this patch is to enable mes for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang b952d6b3d3 drm/amdgpu/discovery: enable gfx v11 for GC 11.0.4
Enable gfx v11 for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang d5fd8c89ed drm/amdgpu/discovery: enable gmc v11 for GC 11.0.4
Enable gmc (graphic memory controller) v11 for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Yifan Zhang 69dc98bbd4 drm/amdgpu/discovery: enable soc21 common for GC 11.0.4
Enable soc21 common for GC 11.0.4.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Likun Gao c0924ad731 drm/amdgpu: skip vram reserve on firmware_v2_2 for bare-metal
vram_usagebyfirmware v2_2 is only used in SRIOV case, skip the related
settings in bare-metal case currently.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Guchun Chen f4b09c295e drm/amdgpu: add printing to indicate rpm completeness
Add an explicit printing to tell when finishing rpm execution
in amdgpu.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:36 -05:00
Tong Liu01 6d96ced760 drm/amdgpu: add drv_vram_usage_va for virt data exchange
For vram_usagebyfirmware_v2_2, fw_vram_reserve is not used. So
fw_vram_usage_va is NULL, and cannot do virt data exchange
anymore. Should add drv_vram_usage_va to do virt data exchange
in vram_usagebyfirmware_v2_2 case. And refine some code style
checks in pre add vram reservation logic patch

Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:35 -05:00
James Zhu c1420a5dd4 drm/amdgpu: fix stall on CPU when allocate large system memory
-v2: 1. rename variable to redue confuse
     2. optimize the code
-v3: move new define out of the middle of the code
-v4: squash in minmax error fix (Luben)

When applications try to allocate large system (more than > 128GB),
"stall cpu" is reported.

for such large system memory, walk_page_range takes more than 20s usually.
The warning message can be removed when splitting hmm range into smaller
ones which is not more 64GB for each walk_page_range.

[  164.437617] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1753: amdgpu: 	create BO VA 0x7f63c7a00000 size 0x2f16000000 domain CPU
[  164.488847] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1785: amdgpu: creating userptr BO for user_addr = 7f63c7a00000
[  185.439116] rcu: INFO: rcu_sched self-detected stall on CPU
[  185.439125] rcu: 	8-....: (20999 ticks this GP) idle=e22/1/0x4000000000000000 softirq=2242/2242 fqs=5249
[  185.439137] 	(t=21000 jiffies g=6325 q=1215)
[  185.439141] NMI backtrace for cpu 8
[  185.439143] CPU: 8 PID: 3470 Comm: kfdtest Kdump: loaded Tainted: G           O      5.12.0-0_fbk5_zion_rc1_5697_g2c723fb88626 #1
[  185.439147] Hardware name: HPE ProLiant XL675d Gen10 Plus/ProLiant XL675d Gen10 Plus, BIOS A47 11/06/2020
[  185.439150] Call Trace:
[  185.439153]  <IRQ>
[  185.439157]  dump_stack+0x64/0x7c
[  185.439163]  nmi_cpu_backtrace.cold.7+0x30/0x65
[  185.439165]  ? lapic_can_unplug_cpu+0x70/0x70
[  185.439170]  nmi_trigger_cpumask_backtrace+0xf9/0x100
[  185.439174]  rcu_dump_cpu_stacks+0xc5/0xf5
[  185.439178]  rcu_sched_clock_irq.cold.97+0x112/0x38c
[  185.439182]  ? tick_sched_handle.isra.21+0x50/0x50
[  185.439185]  update_process_times+0x8c/0xc0
[  185.439189]  tick_sched_timer+0x63/0x70
[  185.439192]  __hrtimer_run_queues+0xff/0x250
[  185.439195]  hrtimer_interrupt+0xf4/0x200
[  185.439199]  __sysvec_apic_timer_interrupt+0x51/0xd0
[  185.439201]  sysvec_apic_timer_interrupt+0x69/0x90
[  185.439206]  </IRQ>
[  185.439207]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  185.439211] RIP: 0010:clear_page_rep+0x7/0x10
[  185.439214] Code: e8 fe 7c 51 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00
[  185.439218] RSP: 0018:ffffc9000f58f818 EFLAGS: 00000246
[  185.439220] RAX: 0000000000000000 RBX: 0000000000000881 RCX: 000000000000005c
[  185.439223] RDX: 0000000000100dca RSI: 0000000000000000 RDI: ffff88a59e0e5d20
[  185.439225] RBP: ffffea0096783940 R08: ffff888118c35280 R09: ffffea0096783940
[  185.439227] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0096783980
[  185.439228] R13: ffffea0096783940 R14: ffff88b07fdfdd00 R15: 0000000000000000
[  185.439232]  prep_new_page+0x81/0xc0
[  185.439236]  get_page_from_freelist+0x13be/0x16f0
[  185.439240]  ? release_pages+0x16a/0x4a0
[  185.439244]  __alloc_pages_nodemask+0x1ae/0x340
[  185.439247]  alloc_pages_vma+0x74/0x1e0
[  185.439251]  __handle_mm_fault+0xafe/0x1360
[  185.439255]  handle_mm_fault+0xc3/0x280
[  185.439257]  hmm_vma_fault.isra.22+0x49/0x90
[  185.439261]  __walk_page_range+0x692/0x9b0
[  185.439265]  walk_page_range+0x9b/0x120
[  185.439269]  hmm_range_fault+0x4f/0x90
[  185.439274]  amdgpu_hmm_range_get_pages+0x24f/0x260 [amdgpu]
[  185.439463]  amdgpu_ttm_tt_get_user_pages+0xc2/0x190 [amdgpu]
[  185.439603]  amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x49f/0x7a0 [amdgpu]
[  185.439774]  kfd_ioctl_alloc_memory_of_gpu+0xfb/0x410 [amdgpu]

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29 11:03:35 -05:00
Dave Airlie e57702069b Merge tag 'amd-drm-fixes-6.1-2022-11-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-23:

amdgpu:
- DCN 3.1.4 fixes
- DP MST DSC deadlock fixes
- HMM userptr fixes
- Fix Aldebaran CU occupancy reporting
- GFX11 fixes
- PSP suspend/resume fix
- DCE12 KASAN fix
- DCN 3.2.x fixes
- Rotated cursor fix
- SMU 13.x fix
- DELL platform suspend/resume fixes
- VCN4 SR-IOV fix
- Display regression fix for polled connectors

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221123143453.8977-1-alexander.deucher@amd.com
2022-11-25 10:55:23 +10:00
Dave Airlie 9e2c5c651a Merge tag 'drm-misc-fixes-2022-11-24' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc7:
- Another amdgpu gang submit fix.
- Use dma_fence_unwrap_for_each when importing sync files.
- Fix race in dma_heap_add().
- Fix use of uninitialized memory in logo.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a5721505-4823-98ef-7d6f-0ea478221391@linux.intel.com
2022-11-25 09:21:11 +10:00
Dave Airlie d47f958083 Linux 6.1-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN
 Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS
 FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i
 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g
 qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k
 WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I
 1GfXkXs=
 =gt6P
 -----END PGP SIGNATURE-----

Backmerge tag 'v6.1-rc6' into drm-next

Linux 6.1-rc6

This is needed for drm-misc-next and tegra.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-24 11:05:43 +10:00
Alex Deucher 10d2d1fc05 drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read"
This partially reverts 20543be93c.

Calling drm_connector_update_edid_property() in
amdgpu_connector_free_edid() causes a noticeable pause in
the system every 10 seconds on polled outputs so revert this
part of the change.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2257
Cc: Claudio Suarez <cssk@net-c.es>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 10:31:31 -05:00
Tao Zhou 244b8edbdf drm/amdgpu: enable RAS poison for VCN 2.6
Configure related settings to enable it.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 10:31:31 -05:00
Shikang Fan 3c22c1ead6 drm/amdgpu: fix for suspend/resume kiq fence fallback under sriov
- in device_resume, sriov configure interrupt should be in full access,
  so release_full_gpu should be done after kfd_resume.
- remove the previous workaround solution for sriov.

Fixes: ec4927d463 ("drm/amdgpu: fix for suspend/resume sequence under sriov")
Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 10:31:31 -05:00
Ren Zhijie b7a3260c73 drm/amdgpu: fix unused-function error
If CONFIG_DRM_AMDGPU=y and CONFIG_DRM_AMD_DC is not set,
gcc complained about unused-function :

drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1705:13: error: ‘amdgpu_discovery_set_sriov_display’ defined but not used [-Werror=unused-function]
 static void amdgpu_discovery_set_sriov_display(struct amdgpu_device *adev)
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

To fix this error, use CONFIG_DRM_AMD_DC to wrap
the definition of amdgpu_discovery_set_sriov_display().

Fixes: 25263da376 ("drm/amdgpu: rework SR-IOV virtual display handling")
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 10:31:30 -05:00
Stanley.Yang 0317d73954 drm/amdgpu: fix use-after-free during gpu recovery
[Why]
    [  754.862560] refcount_t: underflow; use-after-free.
    [  754.862898] Call Trace:
    [  754.862903]  <TASK>
    [  754.862913]  amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
    [  754.863543]  drm_sched_main.cold+0x34/0x39 [amd_sched]

[How]
    The fw_fence may be not init, check whether dma_fence_init
    is performed before job free

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 10:31:25 -05:00
ZhenGuo Yin 2a599113b2 drm/amdgpu: update documentation of parameter amdgpu_gtt_size
Fixes: f7ba887f60 ("drm/amdgpu: Adjust logic around GTT size (v3)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:47:15 -05:00
Jane Jian 98928baeb9 drm/amdgpu/vcn: re-use original vcn0 doorbell value
root cause that S2A need to use deduct offset flag.
after setting this flag, vcn0 doorbell value works.
so return it as before

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:47:15 -05:00
Yang Yingliang b85e285e3d drm/amdgpu: fix pci device refcount leak
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().

So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.

Fixes: 3f12acc8d6 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:47:13 -05:00
Alex Deucher 96e1a88faf drm/amdgpu/psp: don't free PSP buffers on suspend
We can reuse the same buffers on resume.

v2: squash in S4 fix from Shikai

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-11-23 09:47:13 -05:00
Jack Xiao a2b1df9296 drm/amd/amdgpu: reserve vm invalidation engine for firmware
If mes enabled, reserve VM invalidation engine 5 for firmware.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:47:12 -05:00
Jane Jian ecb41b71ef drm/amdgpu/vcn: re-use original vcn0 doorbell value
root cause that S2A need to use deduct offset flag.
after setting this flag, vcn0 doorbell value works.
so return it as before

Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:01:54 -05:00
Alex Deucher 602ad43c3c drm/amdgpu: Partially revert "drm/amdgpu: update drm_display_info correctly when the edid is read"
This partially reverts 20543be93c.

Calling drm_connector_update_edid_property() in
amdgpu_connector_free_edid() causes a noticeable pause in
the system every 10 seconds on polled outputs so revert this
part of the change.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2257
Cc: Claudio Suarez <cssk@net-c.es>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-11-23 09:01:53 -05:00
Stanley.Yang 3cb93f3904 drm/amdgpu: fix use-after-free during gpu recovery
[Why]
    [  754.862560] refcount_t: underflow; use-after-free.
    [  754.862898] Call Trace:
    [  754.862903]  <TASK>
    [  754.862913]  amdgpu_job_free_cb+0xc2/0xe1 [amdgpu]
    [  754.863543]  drm_sched_main.cold+0x34/0x39 [amd_sched]

[How]
    The fw_fence may be not init, check whether dma_fence_init
    is performed before job free

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23 09:01:53 -05:00
Alex Deucher 4f2bea62cf drm/amdgpu/psp: don't free PSP buffers on suspend
We can reuse the same buffers on resume.

v2: squash in S4 fix from Shikai

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-11-23 09:01:53 -05:00
Dave Airlie fc58764bbf Merge tag 'amd-drm-next-6.2-2022-11-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.2-2022-11-18:

amdgpu:
- SR-IOV fixes
- Clean up DC checks
- DCN 3.2.x fixes
- DCN 3.1.x fixes
- Don't enable degamma on asics which don't support it
- IP discovery fixes
- BACO fixes
- Fix vbios allocation handling when vkms is enabled
- Drop buggy tdr advanced mode GPU reset handling
- Fix the build when DCN is not set in kconfig
- MST DSC fixes
- Userptr fixes
- FRU and RAS EEPROM fixes
- VCN 4.x RAS support
- Aldrebaran CU occupancy reporting fix
- PSP ring cleanup

amdkfd:
- Memory limit fix
- Enable cooperative launch on gfx 10.3

amd-drm-next-6.2-2022-11-11:

amdgpu:
- SMU 13.x updates
- GPUVM TLB race fix
- DCN 3.1.4 updates
- DCN 3.2.x updates
- PSR fixes
- Kerneldoc fix
- Vega10 fan fix
- GPUVM locking fixes in error pathes
- BACO fix for Beige Goby
- EEPROM I2C address cleanup
- GFXOFF fix
- Fix DC memory leak in error pathes
- Flexible array updates
- Mtype fix for GPUVM PTEs
- Move Kconfig into amdgpu directory
- SR-IOV updates
- Fix possible memory leak in CS IOCTL error path

amdkfd:
- Fix possible memory overrun
- CRIU fixes

radeon:
- ACPI ref count fix
- HDA audio notifier support
- Move Kconfig into radeon directory

UAPI:
- Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI.
  These are used internally in the driver to align location based memory coherency
  requirements from memory allocated in the KFD with how we manage GPUVM PTEs.  They
  are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now.
  They are just used internally in the kernel driver for now for existing KFD memory
  allocations. So a change to the UAPI header, but no functional change in the UAPI.

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221118170807.6505-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-22 13:41:11 +10:00
Jack Xiao 91abf28a63 drm/amd/amdgpu: reserve vm invalidation engine for firmware
If mes enabled, reserve VM invalidation engine 5 for firmware.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0.x
2022-11-21 16:43:13 -05:00
Ramesh Errabolu b9ab82da88 drm/amdgpu: Enable Aldebaran devices to report CU Occupancy
Allow user to know number of compute units (CU) that are in use at any
given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy
that computes CU occupancy.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-11-21 16:41:51 -05:00
Christian König 4458da0bb0 drm/amdgpu: fix userptr HMM range handling v2
The basic problem here is that it's not allowed to page fault while
holding the reservation lock.

So it can happen that multiple processes try to validate an userptr
at the same time.

Work around that by putting the HMM range object into the mutex
protected bo list for now.

v2: make sure range is set to NULL in case of an error

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21 16:40:30 -05:00
Christian König b39df63b16 drm/amdgpu: always register an MMU notifier for userptr
Since switching to HMM we always need that because we no longer grab
references to the pages.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21 16:40:04 -05:00
Christian König b09d6acba1 drm/amdgpu: handle gang submit before VMID
Otherwise it can happen that not all gang members can get a VMID
assigned and we deadlock.

Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 68ce8b2422 ("drm/amdgpu: add gang submit backend v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221118153023.312582-1-christian.koenig@amd.com
2022-11-18 17:52:15 +01:00
Dave Airlie 585f2bc8fe Merge tag 'amd-drm-fixes-6.1-2022-11-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.1-2022-11-16:

amdgpu:
- Fix a possible memory leak in ganng submit error path
- DP tunneling fixes
- DCN 3.1 page flip fix
- DCN 3.2.x fixes
- DCN 3.1.4 fixes
- Don't expose degamma on hardware that doesn't support it
- BACO fixes for SMU 11.x
- BACO fixes for SMU 13.x
- Virtual display fix for devices with no display hardware

amdkfd:
- Memory limit regression fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221117040416.6100-1-alexander.deucher@amd.com
2022-11-18 11:09:04 +10:00
Alex Deucher aec3bb3a01 drm/amdgpu: make psp_ring_init common
All of the IP specific versions are the same now, so
we can just use a common function.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:09:16 -05:00
Alex Deucher 54037bdfb1 drm/amdgpu/psp12: move ih_reroute into ring_create
This matches what we do for psp 3.1 and makes ring_init
common for all PSP versions.

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:09:11 -05:00
Ramesh Errabolu cff7d3b7df drm/amdgpu: Enable Aldebaran devices to report CU Occupancy
Allow user to know number of compute units (CU) that are in use at any
given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy
that computes CU occupancy.

Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:52 -05:00
Tao Zhou 86e8255f94 drm/amdgpu: add JPEG 4.0 RAS poison consumption handling
Register related irq handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:46 -05:00
Tao Zhou ea5309de73 drm/amdgpu: add VCN 4.0 RAS poison consumption handling
Register irq handler.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:40 -05:00
Tao Zhou 5331745801 drm/amdgpu: add RAS error query for JPEG 4.0
Initialize JPEG RAS structure and add error query interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:27 -05:00
Tao Zhou 377d02215f drm/amdgpu: add RAS query support for VCN 4.0
Initialize VCN RAS structure and add RAS status query function.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:20 -05:00
Tao Zhou 214c76427d drm/amdgpu: define common jpeg_set_ras_funcs
Make the code reusable.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:14 -05:00
Tao Zhou 2ddb629be2 drm/amdgpu: define common vcn_set_ras_funcs
So the code can be reused.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:08:05 -05:00
Tao Zhou 07615da1bf drm/amdgpu: enable RAS for VCN/JPEG v4.0
Set support flag for VCN/JPEG 4.0.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:07:58 -05:00
YiPeng Chai 1a11a65d53 drm/amdgpu: Enable mode-1 reset for RAS recovery in fatal error mode
The patch is enabling mode-1 reset for RAS recovery in fatal error mode.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:07:52 -05:00
Luben Tuikov 64a3dbb06a drm/amdgpu: Add support for RAS table at 0x40000
Add support for RAS table at I2C EEPROM address of 0x40000, since on some
ASICs it is not at 0, but at 0x40000.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Tested-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:07:33 -05:00
Luben Tuikov 0dbf2c5626 drm/amdgpu: Interpret IPMI data for product information (v2)
Don't assume FRU MCU memory locations for the FRU data fields, or their sizes,
instead read and interpret the IPMI data, as stipulated in the IPMI spec
version 1.0 rev 1.2.

Extract the Product Name, Product Part/Model Number, and the Product Serial
Number by interpreting the IPMI data.

Check the checksums of the stored IPMI data to make sure we don't read and
give corrupted data back the the user.

Eliminate small I2C reads, and instead read the whole Product Info Area in one
go, and then extract the information we're seeking from it.

Eliminates a whole function, making this file smaller.

v2: Clarify changes in the commit message.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Tested-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:07:26 -05:00
Luben Tuikov afbe5d1e4b drm/amdgpu: Bug-fix: Reading I2C FRU data on newer ASICs
Set the new correct default FRU MCU I2C address for newer ASICs, so that we
can correctly read the Product Name, Product Part/Model Number and Serial
Number.

On newer ASICs, the FRU MCU was moved to I2C address 0x58.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Tested-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:07:19 -05:00
Luben Tuikov 6a4a745c42 drm/amdgpu: Allow non-standard EEPROM I2C address
Allow non-standard EEPROM I2C address of 0x58, where the Device Type
Identifier is 1011b, where we form 1011000b = 0x58 I2C address, as on some
ASICs the FRU data lives there.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Kent Russell <kent.russell@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Tested-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 18:06:58 -05:00
Dave Airlie 5fa8813878 Merge tag 'drm-misc-fixes-2022-11-17' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc6:
- Fix error handling in vc4_atomic_commit_tail()
- Set bpc for logictechno panels.
- Fix potential memory leak in drm_dev_init()
- Fix potential null-ptr-deref in drm_vblank_destroy_worker()
- Set lima's clkname corrrectly when regulator is missing.
- Small amdgpu fix to gang submission.
- Revert hiding unregistered connectors from userspace, as it breaks on DP-MST.
- Add workaround for DP++ dual mode adaptors that don't support
  i2c subaddressing.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c7d02936-c550-199b-6cb7-cbf6cf104e4a@linux.intel.com
2022-11-18 07:08:57 +10:00
Christian König d4cbff464d drm/amdgpu: cleanup amdgpu_hmm_range_get_pages
Remove unused parameters and cleanup dead code.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 00:23:43 -05:00
Christian König d9483ecd32 drm/amdgpu: rename the files for HMM handling
Clean that up a bit, no functional change.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 00:23:36 -05:00
Christian König fec8fdb54e drm/amdgpu: fix userptr HMM range handling v2
The basic problem here is that it's not allowed to page fault while
holding the reservation lock.

So it can happen that multiple processes try to validate an userptr
at the same time.

Work around that by putting the HMM range object into the mutex
protected bo list for now.

v2: make sure range is set to NULL in case of an error

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 00:23:30 -05:00
Christian König 631945e04e drm/amdgpu: always register an MMU notifier for userptr
Since switching to HMM we always need that because we no longer grab
references to the pages.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17 00:23:20 -05:00
Dave Airlie 4e291f2f58 drm-misc-next for 6.2:
UAPI Changes:
 
 Cross-subsystem Changes:
 
 Core Changes:
 - atomic-helper: Add begin_fb_access and end_fb_access hooks
 - fb-helper: Rework to move fb emulation into helpers
 - scheduler: rework entity flush, kill and fini
 - ttm: Optimize pool allocations
 
 Driver Changes:
 - amdgpu: scheduler rework
 - hdlcd: Switch to DRM-managed resources
 - ingenic: Fix registration error path
 - lcdif: FIFO threshold tuning
 - meson: Fix return type of cvbs' mode_valid
 - ofdrm: multiple fixes (kconfig, types, endianness)
 - sun4i: A100 and D1 support
 - panel:
   - New Panel: Jadard JD9365DA-H3
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCY2y3vQAKCRDj7w1vZxhR
 xQu+AP9CzbNI2s12aNS8DskEZggo0lqUyRiEBaRJ2jrWZdGr0gEA1+Lc06HaKmGC
 2WBD4nw2I7ch0NUN6VFjQ9ATofevPwA=
 =AY3a
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-2022-11-10-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for 6.2:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:
- atomic-helper: Add begin_fb_access and end_fb_access hooks
- fb-helper: Rework to move fb emulation into helpers
- scheduler: rework entity flush, kill and fini
- ttm: Optimize pool allocations

Driver Changes:
- amdgpu: scheduler rework
- hdlcd: Switch to DRM-managed resources
- ingenic: Fix registration error path
- lcdif: FIFO threshold tuning
- meson: Fix return type of cvbs' mode_valid
- ofdrm: multiple fixes (kconfig, types, endianness)
- sun4i: A100 and D1 support
- panel:
  - New Panel: Jadard JD9365DA-H3

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
2022-11-16 07:17:32 +10:00
Christian König 0788a47e7c drm/amdgpu: stop resubmittting jobs in amdgpu_pci_resume
The state of VRAM is unreliable due to a PCI event like AER, link reset
or DPC.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 15:25:45 -05:00
Christian König 6868a2c465 drm/amdgpu: stop resubmitting jobs for GPU reset v2
Re-submitting IBs by the kernel has many problems because pre-
requisite state is not automatically re-created as well. In
other words neither binary semaphores nor things like ring
buffer pointers are in the state they should be when the
hardware starts to work on the IBs again.

Additional to that even after more than 5 years of
developing this feature it is still not stable and we have
massively problems getting the reference counts right.

As discussed with user space developers this behavior is not
helpful in the first place. For graphics and multimedia
workloads it makes much more sense to either completely
re-create the context or at least re-submitting the IBs
from userspace.

For compute use cases re-submitting is also not very
helpful since userspace must rely on the accuracy of
the result.

Because of this we stop this practice and instead just
properly note that the fence submission was canceled. The
only use case we keep the re-submission for now is SRIOV
and function level resets.

v2: as suggested by Sshaoyun stop resubmitting jobs even for SRIOV

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 15:25:37 -05:00
Christian König 06a2d7cc3f drm/amdgpu: revert "implement tdr advanced mode"
This reverts commit e6c6338f39.

This feature basically re-submits one job after another to
figure out which one was the one causing a hang.

This is obviously incompatible with gang-submit which requires
that multiple jobs run at the same time. It's also absolutely
not helpful to crash the hardware multiple times if a clean
recovery is desired.

For testing and debugging environments we should rather disable
recovery alltogether to be able to inspect the state with a hw
debugger.

Additional to that the sw implementation is clearly buggy and causes
reference count issues for the hardware fence.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 15:25:22 -05:00
YiPeng Chai d293470e10 drm/amdgpu: Fixed the problem that ras error can't be queried after gpu recovery is completed
Amdgpu_ras_set_error_query_ready is called at the start of
amdgpu_device_gpu_recover to disable query ras error, but the
code behind only enables query ras error in full reset path,
but not in soft reset path, emergency restart path and skip
the hardware reset path.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:16 -05:00
Alex Deucher 220c8cc855 drm/amdgpu: there is no vbios fb on devices with no display hw (v2)
If we enable virtual display functionality on parts with
no display hardware we can end up trying to check for and
reserve the vbios FB area on devices where it doesn't exist.
Check if display hardware is actually present on the hardware
before trying to reserve the memory.

v2: move the check into common code

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:16 -05:00
Eric Huang 48fb1ecf93 drm/amdkfd: Fix a memory limit issue
It is to resolve a regression, which fails to allocate
VRAM due to no free memory in application, the reason
is we add check of vram_pin_size for memory limit, and
application is pinning the memory for Peerdirect, KFD
should not count it in memory limit. So removing
vram_pin_size will resolve it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:16 -05:00
YiPeng Chai b6da3c5837 drm/amdgpu: Add umc channel index mapping table for umc_v8_10
Add umc channel index mapping table for umc_v8_10.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:15 -05:00
Perry Yuan 469694e91f drm/amdgpu: remove the DID of Vangogh from pciidlist
change the vangogh family to use IP discovery path to initialize IP
list, this needs to remove the DID from the PCI ID list to allow the IP
discovery path to set all the IP versions correctly.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:15 -05:00
Perry Yuan 5e0f4c041c drm/amdgpu: add Vangogh APU flag to IP discovery path
Add the missing apu flag for Vangogh when using IP discovery code path
to initialize IPs

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:35:15 -05:00
Alex Deucher f8794f31ab drm/amdgpu: there is no vbios fb on devices with no display hw (v2)
If we enable virtual display functionality on parts with
no display hardware we can end up trying to check for and
reserve the vbios FB area on devices where it doesn't exist.
Check if display hardware is actually present on the hardware
before trying to reserve the memory.

v2: move the check into common code

Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:24:40 -05:00
Eric Huang 6f9eea4392 drm/amdkfd: Fix a memory limit issue
It is to resolve a regression, which fails to allocate
VRAM due to no free memory in application, the reason
is we add check of vram_pin_size for memory limit, and
application is pinning the memory for Peerdirect, KFD
should not count it in memory limit. So removing
vram_pin_size will resolve it.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:24:32 -05:00
Candice Li 2c2911e09f drm/amdgpu: Add psp_13_0_10_ta firmware to modinfo
TA firmware loaded on psp v13_0_10, but it is missing in modinfo.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:10:17 -05:00
Dong Chenchen 5428672d39 drm/amdgpu: Fix memory leak in amdgpu_cs_pass1
When p->gang_size equals 0, amdgpu_cs_pass1() will return directly
without freeing chunk_array, which will cause a memory leak issue,
this patch fixes it.

Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 13:03:59 -05:00
Candice Li be590ba8e2 drm/amdgpu: Add psp_13_0_10_ta firmware to modinfo
TA firmware loaded on psp v13_0_10, but it is missing in modinfo.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:52:31 -05:00
Alex Deucher d09ef24303 drm/amdgpu: clarify DC checks
There are several places where we don't want to check
if a particular asic could support DC, but rather, if
DC is enabled.  Set a flag if DC is enabled and check
for that rather than if a device supports DC or not.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:45 -05:00
Alex Deucher a347ca9720 drm/amdgpu: move non-DC vblank handling out of irq code
Move it into the DCE code for each generation. This avoids
confusion with the different display paths.

v2: no need for a hotplug worker for vkms

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:36 -05:00
Alex Deucher 25263da376 drm/amdgpu: rework SR-IOV virtual display handling
virtual display is enabled unconditionally in SR-IOV, but
without specifying the virtual_display module, the number
of crtcs defaults to 0.  Set a single display by default
for SR-IOV if the virtual_display parameter is not set.
Only enable virtual display by default on SR-IOV on asics
which actually have display hardware.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:32 -05:00
Alex Deucher 96ce96f877 drm/amdgpu: use proper DC check in amdgpu_display_supported_domains()
amdgpu_device_asic_has_dc_support() just checks the asic itself.
amdgpu_device_has_dc_support() is a runtime check which not
only checks the asic, but also other things in the driver
like whether virtual display is enabled.  We want the latter
here.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:25 -05:00
Yifan Zha 17a37fa122 drm/amdgpu: Remove programming GCMC_VM_FB_LOCATION* on gfxhub_v3_0_3 in VF
[Why]
GCMC_VM related registers should be programmed by PSP on host side.
L1 and RLCG will block these regisers on VF.

[How]
Remove programming GCMC_VM_FB_LOCATION_BASE/TOP on gfxhub_v3_0_3 under SRIOV VF.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:15 -05:00
YuBiao Wang f3416dc88a drm/amdgpu: Stop clearing kiq position during unload
Do not clear kiq position in RLC_CP_SCHEDULER so that CP could perform
IDLE-SAVE after VF fini. CPG also needs to be active in save command.

v2: drop unused variable (Alex)

Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15 11:51:15 -05:00
Christian König eca13f3c67 drm/amdgpu: use the last IB as gang leader v2
It turned out that not the last IB specified is the gang leader,
but instead the last job allocated.

This is a bit unfortunate and not very intuitive for the CS
interface, so try to fix this.

Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221115094206.6181-1-christian.koenig@amd.com
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
2022-11-15 15:27:18 +01:00
Dave Airlie 2e4b294576 Merge tag 'drm-misc-fixes-2022-11-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.1-rc5:
- HDMI fixes to vc4.
- Make panfrost's uapi header compile with C++.
- Add rotation quirks for 2 panels.
- Fix s/r in amdgpu_vram_mgr_new
- Handle 1 gb boundary correctly in panfrost mmu code.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e02de501-4b85-28a0-3f6e-751ca13f5f9d@linux.intel.com
2022-11-11 06:31:38 +10:00
Dong Chenchen 2ebf61f2cf drm/amdgpu: Fix memory leak in amdgpu_cs_pass1
When p->gang_size equals 0, amdgpu_cs_pass1() will return directly
without freeing chunk_array, which will cause a memory leak issue,
this patch fixes it.

Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-10 15:30:34 -05:00
Tong Liu01 4864f2ee9e drm/amdgpu: add vram reservation based on vram_usagebyfirmware_v2_2
Move TMR region from top of FB to 2MB for FFBM, so we need to
reserve TMR region firstly to make sure TMR can be allocated at 2MB

Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-10 15:29:40 -05:00
Tao Zhou 3e93136809 drm/amdgpu: remove the check of init status in psp_ras_initialize
The initialized status indicates RAS TA is loaded, but in some cases
(such as RAS fatal error) RAS TA could be destroyed although it's not
unloaded. Hence we load RAS TA unconditionally here.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-10 15:29:32 -05:00
Yifan Zhang 7701d10a1e drm/amdgpu: force read discovery file if set discovery=2
If discovery is set to 2 in module parameters explicitly, the
intention is to use the discovery file in FW rather than the one in
BIOS, usually because the latter is incorrect. This patch to force
read discovery file if set discovery=2.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-10 15:29:05 -05:00
Andrew Davis cb20d6508b drm: Move radeon and amdgpu Kconfig options into their directories
Most Kconfig options to enable a driver are in the Kconfig file
inside the relevant directory, move these two to the same.

Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-10 15:29:00 -05:00
Philip Yang e034a0d9aa drm/amdgpu: Drop eviction lock when allocating PT BO
Re-take the eviction lock immediately again after the allocation is
completed, to fix circular locking warning with drm_buddy allocator.

Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are
called from multiple files.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 18:06:04 -05:00
Philip Yang 64f65135c4 drm/amdgpu: Unlock bo_list_mutex after error handling
Get below kernel WARNING backtrace when pressing ctrl-C to kill kfdtest
application.

If amdgpu_cs_parser_bos returns error after taking bo_list_mutex, as
caller amdgpu_cs_ioctl will not unlock bo_list_mutex, this generates the
kernel WARNING.

Add unlock bo_list_mutex after amdgpu_cs_parser_bos error handling to
cleanup bo_list userptr bo.

 WARNING: kfdtest/2930 still has locks held!
 1 lock held by kfdtest/2930:
  (&list->bo_list_mutex){+.+.}-{3:3}, at: amdgpu_cs_ioctl+0xce5/0x1f10 [amdgpu]
  stack backtrace:
   dump_stack_lvl+0x44/0x57
   get_signal+0x79f/0xd00
   arch_do_signal_or_restart+0x36/0x7b0
   exit_to_user_mode_prepare+0xfd/0x1b0
   syscall_exit_to_user_mode+0x19/0x40
   do_syscall_64+0x40/0x80

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 18:05:16 -05:00
Christian König 77c092e054 drm/amdgpu: workaround for TLB seq race
It can happen that we query the sequence value before the callback
had a chance to run.

Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.

Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.19+
Fixes: 5255e146c9 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Stefan Springer <stefanspr94@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:58:17 -05:00
Felix Kuehling d1a372af1c drm/amdgpu: Set MTYPE in PTE based on BO flags
The same BO may need different MTYPEs and SNOOP flags in PTEs depending
on its current location relative to the mapping GPU. Setting MTYPEs from
clients ahead of time is not practical for coherent memory sharing.
Instead determine the correct MTYPE for the desired coherence model and
current BO location when updating the page tables.

To maintain backwards compatibility with MTYPE-selection in
AMDGPU_VA_OP_MAP, the coherence-model-based MTYPE selection is only
applied if it chooses an MTYPE other than MTYPE_NC (the default).

Add two AMDGPU_GEM_CREATE_... flags to indicate the coherence model. The
default if no flag is specified is non-coherent (i.e. coarse-grained
coherent at dispatch boundaries).

Update amdgpu_amdkfd_gpuvm.c to use this new method to choose the
correct MTYPE depending on the current memory location.

v2:
* check that bo is not NULL (e.g. PRT mappings)
* Fix missing ~ bitmask in gmc_v11_0.c
v3:
* squash in "drm/amdgpu: Inherit coherence flags on dmabuf import"

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00
Harsh Jain 4b31b92b14 drm/amdgpu: complete gfxoff allow signal during suspend without delay
change guarantees that gfxoff is allowed before moving further in
s2idle sequence to add more reliablity about gfxoff in amdgpu IP's
suspend flow

Signed-off-by: Harsh Jain <harsh.jain@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00
Luben Tuikov 3b8164f808 drm/amdgpu: Decouple RAS EEPROM addresses from chips
Abstract RAS I2C EEPROM addresses from chip names, and set their macro
definition names to the address they set, not the chip they attach
to. Since most chips either use I2C EEPROM address 0 or 40000h for the RAS
table start offset, this leaves us with only two macro definitions as
opposed to five, and removes the redundancy of four.

Cc: Candice Li <candice.li@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00