Commit Graph

11170 Commits

Author SHA1 Message Date
Yifan Zhang a16161a869 drm/amdgpu: correct RLC_RLCS_BOOTLOAD_STATUS offset and index
This patch corrects RLC_RLCS_BOOTLOAD_STATUS offset and index for
GC 11.0.1.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-29 15:24:21 -04:00
Jonathan Kim 1ff186ff32 drm/amdgpu: fix hive reference leak when reflecting psp topology info
Hives that require psp topology info to be reflected will leak hive
reference so fix it.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Jack Xiao ed67f7292b drm/amdgpu: move mes self test after drm sched re-started
mes self test rely on vm mapping, move it after
drm sched re-started so that vm mapping can work
during gpu reset.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-and-tested-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Evan Quan 0da0def770 drm/amdgpu: drop non-necessary call trace dump
This extra call trace dump comes out in every gpu reset.
And it gives people a wrong impression that something
went wrong. Although actually there was not.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Sonny Jiang 47231d5e39 drm/amdgpu: enable VCN cg and JPEG cg/pg
Not enable VCN pg because encode issue

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Sonny Jiang 1c0a903648 drm/amdgpu: vcn_4_0_2 video codec query
Enable support for vcn_4_0_2 video codec

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Sonny Jiang cbe93a234b drm/amdgpu: add VCN_4_0_2 firmware support
Add VCN_4_0_2 firmware support

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Sonny Jiang 4ac77cce84 drm/amdgpu: add VCN function in NBIO v7.7
Add function to support VCN_4_0_2 doorbell

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Sonny Jiang 736f7308d3 drm/amdgpu: fix a vcn4 boot poll bug in emulation mode
The return value should be set in vcn4 boot poll.

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:54 -04:00
Chengming Gui 6fdd2077ec drm/amd/amdgpu: add memory training support for PSP_V13
Add PSP_V13 memory training support funcs.

v2: replace DRM_{DEBUG/ERROR} with dev_{dbg/err}. (Hawking)
v3: fix checkpatch error (Alex)

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:28:28 -04:00
Lang Yu e22ec18750 drm/amdkfd: remove an unnecessary amdgpu_bo_ref
No need to reference the BO here, dmabuf framework will handle that.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:26:01 -04:00
Chengming Gui 7c8e4a2572 drm/amd/amdgpu: add additional page fault settings for gfx11
Add three additional page fault settings.

V2: move reg offset definition to header file. (Alex)
V3: add all shift/mask definitions of used reg. (Hawking)

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:19:59 -04:00
Vijendar Mukunda b2065fb21d drm/amdgpu: fix i2s_pdata out of bound array access
Fixed following Smatch static checker warning:

    drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:393 acp_hw_init()
    error: buffer overflow 'i2s_pdata' 3 <= 3
    drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:396 acp_hw_init()
    error: buffer overflow 'i2s_pdata' 3 <= 3

Fixes: 4c33e5179f ("drm/amdgpu: create I2S platform devices for Jadeite platform")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Lang Yu 0dc204bc3f drm/amdkfd: fix kgd_mem memory leak when importing dmabuf
The kgd_mem memory allocated in amdgpu_amdkfd_gpuvm_import_dmabuf()
is not freed properly.

Explicitly free it in amdgpu_amdkfd_gpuvm_free_memory_of_gpu()
under condition "mem->bo->kfd_bo != mem".

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Alex Sierra 3d2af401cf drm/amdgpu: add debugfs for kfd system and ttm mem used
This keeps track of kfd system mem used and kfd ttm mem used.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Alex Sierra f9af3c16bf drm/amdkfd: track unified memory reservation with xnack off
[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:16 -04:00
Philip Yang 79b2c54f19 drm/amdgpu: Allow TTM to evict svm bo from same process
To support SVM range VRAM overcommitment, TTM should be able to evict
svm bo of same process to system memory, to get space to alloc new svm
bo.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:14 -04:00
Roy Sun 1f83db6be3 drm/amdgpu: Fix the incomplete product number
The comments say that the product number is a 16-digit HEX string so the
buffer needs to be at least 17 characters to hold the NUL terminator. Expand
the buffer size to 20 to avoid the alignment issues.

The comment:Product number should only be 16 characters. Any
more,and something could be wrong. Cap it at 16 to be safe

Signed-off-by: Roy Sun <Roy.Sun@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:14 -04:00
Guchun Chen 8585732baa drm/amdgpu: use adev_to_drm for consistency
Keep code consistency when accessing drm_device from amdgpu driver.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-28 16:05:14 -04:00
Dave Airlie ee8b1ef9a6 Merge tag 'amd-drm-next-5.20-2022-07-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu:
- VCN4 fixes
- RAS support for UMC 8.10
- ACP support for jadeite platforms
- NBIO HDP flush fixes
- Misc spelling and grammar fixes
- Runtime PM fixes
- Non-DC HPD fix
- Clean up amdgpu DM code
- DSC fixes
- Expose some additional GFXOFF data via debugfs
- More FP clean up for new DCN blocks
- PPC DC FP fixes
- DCN 3.1.4 fixes
- DC DML stack usage fixes
- GMC fixes
- SPM fixes for RDNA2

amdkfd:
- MMU notifier fix
- Mutex fix

UAPI:
- Add a comment about VCN4 unified queues
- IP version information for UMDs
  Proposed mesa change: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220726181536.5759-1-alexander.deucher@amd.com
2022-07-27 09:33:45 +10:00
Aaron Liu 4c5aa59492 drm/amdgpu: enable swiotlb for gmc 11.0
Enable swiotlb for gmc 11.0.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 17:14:35 -04:00
Aaron Liu 3616d49da5 drm/amdgpu: enable swiotlb for gmc 10.0 (V2)
Enable swiotlb for gmc 10.0.
v2: include drm_cache.h to use the function ‘drm_need_swiotlb’

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 17:14:19 -04:00
Slark Xiao 9dd4545f65 drm/amd: Fix typo 'the the' in comment
Replace 'the the' with 'the' in the comment.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:05 -04:00
Rajneesh Bhardwaj 44998fbdcd drm/amdgpu: Refactor code to handle non coherent and uncached
This simplifies existing coherence handling for Arcturus and Aldabaran
to account for !coherent && uncached scenarios.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:04 -04:00
Chengming Gui 2207efdd83 drm/amd/amdgpu: add TAP_DELAYS upload support for gfx10
Support {GLOBAL/SE0/SE1/SE2/SE3}_TAP_DELAYS uploading.

v2: upload TAP_DELAYS before RLC autoload was triggered. (Hawking)

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:04 -04:00
Leo Li 792a0cdde3 drm/amd/display: Add visualconfirm module parameter
[Why]

Being able to configure visual confirm at boot or in cmdline is helpful
when debugging.

[How]

Add a module parameter to configure DC visual confirm, which works the
same way as the equivalent debugfs entry.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Alex Deucher 465576ca48 drm/amdgpu: bump driver version for IP discovery info in HW INFO
So userspace knows when it is available.

Proposed mesa patch:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Alex Deucher af14e7c2fc drm/amdgpu: add the IP discovery IP versions for HW INFO data
Use the former pad element to store the IP versions from the
IP discovery table.  This allows userspace to get the IP
version from the kernel to better align with hardware IP
versions.

Proposed mesa patch:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Jason Wang c19a23fadd drm/amdgpu: Fix comment typo
The double `to' is duplicated in the comment, remove one.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:03 -04:00
Roman Li 869b10ac8d drm/amdgpu: add dm ip block for dcn 3.1.4
Adding dm ip block to enable display on dcn 3.1.4.

Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:02 -04:00
André Almeida 4686177f7d drm/amd/debugfs: Expose GFXOFF state to userspace
GFXOFF has two different "state" values: one to define if the GPU is
allowed/disallowed to enter GFXOFF, usually called state; and another
one to define if currently GFXOFF is being used, usually called status.
Even when GFXOFF is allowed, GPU firmware can decide to not used it
accordingly to the GPU load.

Userspace can allow/disallow GPUs to enter into GFXOFF via debugfs. The
kernel maintains a counter of requests for GFXOFF (gfx_off_req_count)
that should be decreased to allow GFXOFF and increased to disallow.

The issue with this interface is that userspace can't be sure if GFXOFF
is currently allowed. Even by checking amdgpu_gfxoff file, one might get
an ambiguous 2, that means that GPU is currently out of GFXOFF, but that
can be either because it's currently disallowed or because it's allowed
but given the current GPU load it's enabled. Then, userspace needs to
rely on the fact that GFXOFF is enabled by default on boot and to track
this information.

To make userspace life easier and GFXOFF more reliable, return the
current state of GFXOFF to userspace when reading amdgpu_gfxoff with the
same semantics of writing: 0 means not allowed, not 0 means allowed.

Expose the current status of GFXOFF through a new file,
amdgpu_gfxoff_status.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25 09:31:02 -04:00
Dave Airlie cb6b81b21b Short summary of fixes pull:
* amdgpu: Fix for drm buddy memory corruption
  * nouveau: PM fixes; DP fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmLY+rcACgkQaA3BHVML
 eiNuGwf/e5tnPLnJMYbkrGy3UxqoAxA/LsJ0ZhUfznsb8yKE8FyIcINJQ3BTtjrG
 9wEiVoUfitKttPtnkFBPrhR+8VDhVOd8qi953PEO+EwLNDmfv+hkKAONyc4oMdLG
 3WucvtvZDYhvWitSH3nJQoPF/W8Ctj6WGsq5JlhQ5AvePt7Fvw7mjYRi+20mLmFS
 cMUEF+R75iW94RfK2aQ8JMz6IKhnh/nGzTdIz1qiO9lK06OAfkhaJbYyJVYt1s7d
 PkL9DcqlMjh1u5uNllTN1RK+UNNgj12DXeFWpUfLfPP1Jd4ywcGgV61tcyHw3D8G
 r34vxdiuaDuFMv8yYJMvlnQSqjp0Hg==
 =vspz
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Short summary of fixes pull:

 * amdgpu: Fix for drm buddy memory corruption
 * nouveau: PM fixes; DP fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Ytj65+PdAJs4jIEO@linux-uq9g
2022-07-22 13:44:07 +10:00
Maíra Canal 40835624ef drm/amdgpu: Write masked value to control register
On the dce_v6_0 and dce_v8_0 hpd tear down callback, the tmp variable
should be written into the control register instead of 0.

Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-20 16:03:59 -04:00
Gavin Wan dc2b9c70eb drm/amdgpu: fix scratch register access method in SRIOV
The scratch register should be accessed through MMIO instead of RLCG
in SRIOV, since it being used in RLCG register access function.

Fixes: d54762cc3e ("drm/amdgpu: nuke dynamic gfx scratch reg allocation")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Gavin Wan <Gavin.Wan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-20 16:03:53 -04:00
Alex Sierra 86bd6706c4 drm/amdgpu: remove acc_size from reserve/unreserve mem
TTM used to track the "acc_size" of all BOs internally. We needed to
keep track of it in our memory reservation to avoid TTM running out
of memory in its own accounting. However, that "acc_size" accounting
has since been removed from TTM. Therefore we don't really need to
track it any more.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-20 16:03:49 -04:00
Luben Tuikov 5df79aeb6e drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2
Protect the struct amdgpu_bo_list with a mutex. This is used during command
submission in order to avoid buffer object corruption as recorded in
the link below.

v2 (chk): Keep the mutex looked for the whole CS to avoid using the
	  list from multiple CS threads at the same time.

Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Vitaly Prosyak <Vitaly.Prosyak@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2048
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-20 16:03:41 -04:00
Kenneth Feng a53bc32182 drm/amd/pm: enable mode1 reset for smu_v13_0_7
enable mode1 reset for smu_v13_0_7 since it's missing.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Hawking Zhang 5877b7ddbc drm/amdgpu: correct the PSP_BL_CMD enum
To match with the enum defined in trusted os

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Guchun Chen 9c913f3803 drm/amdgpu: drop runpm from amdgpu_device structure
It's redundant, as now switching to rpm_mode to indicate
runtime power management mode.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Guchun Chen 75a9ad8c1b drm/amdgpu: drop runtime pm disablement quirk on several sienna cichlid cards
This quirk is not needed any more as it's fixed by bypassing
SMU FW reloading in runtime resume.

Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Guchun Chen f746556aa9 drm/amdgpu: skip SMU FW reloading in runpm BACO case
SMU is always alive, so it's fine to skip SMU FW reloading
when runpm resumed from BACO, this can avoid some race issues
when resuming SMU.

Suggested-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Guchun Chen 50fe04d46a drm/amdgpu: introduce runtime pm mode
It can benefit code consistency in future.

Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
André Almeida 133dc89c64 drm/amdgpu: Clarify asics naming in Kconfig options
Clarify which architecture those asics acronyms refers to.

Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:39 -04:00
Alex Deucher 958afce98c drm/amdgpu: restore original stable pstate on ctx fini
Save the original stable pstate on ctx init and restore
it on ctx fini so that we restore a manually selected
stable pstate on ctx exit.

v2: fix init order (Alex)
v3: don't add new variable to ctx struct (Evan)

Fixes: c65b364c52 ("drm/amdgpu/ctx: only reset stable pstate if the user changed it (v2)")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:33 -04:00
Alex Deucher 98a90f1f0f drm/amdgpu: use the same HDP flush registers for all nbio 2.3.x
Align RDNA2.x with other asics.  One HDP bit per SDMA instance,
aligned with firmware.  This is effectively a revert of
commit 369b7d04ba ("drm/amdgpu/nbio2.3: don't use GPU_HDP_FLUSH bit 12").
On further discussions with the relevant hardware teams,
re-align the bits for SDMA.

Fixes: 369b7d04ba ("drm/amdgpu/nbio2.3: don't use GPU_HDP_FLUSH bit 12")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:42:18 -04:00
Alex Deucher 912db6a587 drm/amdgpu: use the same HDP flush registers for all nbio 7.4.x
Align aldebaran with all other asics.  One HDP bit per
SDMA instance, aligned with firmware.  This is effectively
a revert of
commit a0f9f85466 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12").
On further discussions with the relevant hardware teams,
re-align the bits for SDMA.

Fixes: a0f9f85466 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:41:55 -04:00
Vijendar Mukunda 4c33e5179f drm/amdgpu: create I2S platform devices for Jadeite platform
Jadeite platform uses I2S MICSP instance.
Create platform devices for DMA controller and I2S controller for
Jadeite platform.

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:38:25 -04:00
Vijendar Mukunda 49062ee374 drm/amdgpu: add dmi check for jadeite platform
DMI check is required to distinguish Jadeite platform from
Stoney base variant.
Add DMI check logic for Jadeite platform.

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:38:19 -04:00
Vijendar Mukunda 604d3a3f0d drm/amdgpu: fix for coding style issues
Fixed below checkpatch warnings and errors

drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:131: CHECK: Comparison to NULL could be written "apd"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:150: CHECK: Comparison to NULL could be written "apd"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:196: CHECK: Prefer kernel type 'u64' over 'uint64_t'
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:224: CHECK: Please don't use multiple blank lines
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:226: CHECK: Comparison to NULL could be written "!adev->acp.acp_genpd"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:233: CHECK: Please don't use multiple blank lines
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:239: CHECK: Alignment should match open parenthesis
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:241: CHECK: Comparison to NULL could be written "!adev->acp.acp_cell"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:247: CHECK: Comparison to NULL could be written "!adev->acp.acp_res"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:253: CHECK: Comparison to NULL could be written "!i2s_pdata"
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:350: CHECK: Alignment should match open parenthesis
drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c:550: ERROR: that open brace { should be on the previous line

Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:38:03 -04:00
YiPeng Chai e4b1edf48f drm/amdgpu: add umc ras functions for umc v8_10_0
1. Support query umc ras error counter.
2. Support ras umc ue error address remapping.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:37:47 -04:00
Andrey Grodzovsky f6a3f66063 drm/amdgpu: Get rid of amdgpu_job->external_hw_fence
This is a follow-up cleanup to [1]. See bellow refcount balancing
for calling amdgpu_job_submit_direct after this cleanup as far
as I calculated.

amdgpu_fence_emit
	dma_fence_init 1
	dma_fence_get(fence) 2
	rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

---> amdgpu_job_submit_direct completes before fence signaled
			amdgpu_sa_bo_free
				(*sa_bo)->fence = dma_fence_get(fence) 4

			amdgpu_job_free
				dma_fence_put 3

			amdgpu_vcn_enc_get_destroy_msg
				*fence = dma_fence_get(f) 4
				dma_fence_put(f); 3

			amdgpu_vcn_enc_ring_test_ib
				dma_fence_put(fence) 2

			amdgpu_fence_process
				dma_fence_put 1

			amdgpu_sa_bo_remove_locked
				dma_fence_put 0

---> amdgpu_job_submit_direct completes after fence signaled
			amdgpu_fence_process
				dma_fence_put 2

			amdgpu_job_free
				dma_fence_put 1

			amdgpu_vcn_enc_get_destroy_msg
				*fence = dma_fence_get(f) 2
				dma_fence_put(f); 1

			amdgpu_vcn_enc_ring_test_ib
				dma_fence_put(fence) 0

[1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:37:25 -04:00
Sonny Jiang 0b15205c73 drm/amdgpu: limiting AV1 to first instance on VCN4 decode
AV1 is only supported on first instance.

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18 16:37:09 -04:00
Arunpravin Paneer Selvam 6f2c8d5f16 drm/amdgpu: Fix for drm buddy memory corruption
User reported gpu page fault when running graphics applications
and in some cases garbaged graphics are observed as soon as X
starts. This patch fixes all the issues.

Fixed the typecast issue for fpfn and lpfn variables, thus
preventing the overflow problem which resolves the memory
corruption.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20220714101214.7620-1-Arunpravin.PaneerSelvam@amd.com
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
2022-07-15 15:41:51 +02:00
Dave Airlie 60693e3a38 Merge tag 'amd-drm-next-5.20-2022-07-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.20-2022-07-14:

amdgpu:
- DCN3.2 updates
- DC SubVP support
- DP MST fixes
- Audio fixes
- DC code cleanup
- SMU13 updates
- Adjust GART size on newer APUs for S/G display
- Soft reset for GFX 11
- Soft reset for SDMA 6
- Add gfxoff status query for vangogh
- Improve BO domain pinning
- Fix timestamps for cursor only commits
- MES fixes
- DCN 3.1.4 support
- Misc fixes
- Misc code cleanup

amdkfd:
- Simplify GPUVM validation
- Unified memory for CWSR save/restore area
- fix possible list corruption on queue failure

radeon:
- Fix bogus power of two warning

UAPI:
- Unified memory for CWSR save/restore area for KFD
  Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080952.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220714214716.8203-1-alexander.deucher@amd.com
2022-07-15 15:07:26 +10:00
Leo Li f5ba140436 drm/amdgpu: Check BO's requested pinning domains against its preferred_domains
When pinning a buffer, we should check to see if there are any
additional restrictions imposed by bo->preferred_domains. This will
prevent the BO from being moved to an invalid domain when pinning.

For example, this can happen if the user requests to create a BO in GTT
domain for display scanout. amdgpu_dm will allow pinning to either VRAM
or GTT domains, since DCN can scanout from either or. However, in
amdgpu_bo_pin_restricted(), pinning to VRAM is preferred if there is
adequate carveout. This can lead to pinning to VRAM despite the user
requesting GTT placement for the BO.

v2: Allow the kernel to override the domain, which can happen when
    exporting a BO to a V4L camera (for example).

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2022-07-13 20:56:44 -04:00
Jack Xiao af019bef6d drm/amdgpu/gfx11: add aggregated doorbell support
Port aggregated doorbell support to gfx11.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Jack Xiao 86ef6eae08 drm/amdgpu/sdma6: add aggregated doorbell support
Port aggregated doorbell support to sdma6.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Le Ma 2d7a1f7183 drm/amdgpu/mes: ring aggregatged doorbell when mes queue is unmapped
Ring aggregated doorbel to make unmapped queue scheduled in mes firmware.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Jack Xiao b7320117b3 drm/amdgpu/mes11: initialize aggregated doorbell
Allocate and enable aggregated doorbell.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Le Ma 0fe6906203 drm/amdgpu/mes: init aggregated doorbell
Allocate and enable aggregated doorbell.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Likun Gao f1549c09c5 drm/amdgpu: support reset flag set for gpu reset
Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:17 -04:00
Likun Gao 58e969b60d drm/amdgpu: support SDMA soft recovery for sdma v6
Support SDMA soft reset for SDMA v6.

V3: use ib test to check soft reset.
V4: squash in unused variable fix (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:16 -04:00
Likun Gao c0ff84cb58 drm/amdgpu: enable soft reset for gfx 11
Enable soft reset for gfx 11.
V2: enable both gfx v11.0.0 and gfx v11.0.2.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:16 -04:00
Likun Gao a84e43b81e drm/amdgpu: support gfx soft reset for gfx v11
Support GFX soft reset for gfx v11.

V3: use ib test check soft reset.
V4: squash in unused variable fix (Alex)

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-13 11:25:16 -04:00
Jack Xiao 636774860a drm/amdgpu/mes: set correct mes ring ready flag
Set corresponding ready flag for mes ring when enable or disable
mes ring.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-12 15:33:17 -04:00
xinhui pan ac9257f0f5 drm/amdgpu: Remove one duplicated ef removal
That has been done in BO release notify.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-12 15:33:17 -04:00
Alex Deucher 88c775bbeb drm/amdgpu/gmc10: adjust gart size for parts that support S/G display
For GMC 10 parts which support scatter/gather display (display
from system memory), we should allocate a larger gart size
to better handler larger displays.  This mirrors what we already
do for GMC 9 parts.

v2: fix typo (Alex)

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-12 10:03:20 -04:00
Jack Xiao 737dad0b5d drm/amdgpu/mes: fix bo va unmap issue in mes
Need reserve buffers before unmap mes ctx bo va.

v2: fix removal of dma_resv_excl_fence() (Alex)
v3: fix dma_resv_usage (Alex)

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-12 10:03:20 -04:00
Dave Airlie 344feb7ccf Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.20-2022-07-05:

amdgpu:
- Various spelling and grammer fixes
- Various eDP fixes
- Various DMCUB fixes
- VCN fixes
- GMC 11 fixes
- RAS fixes
- TMZ support for GC 10.3.7
- GPUVM TLB flush fixes
- SMU 13.0.x updates
- DCN 3.2 Support
- DCN 3.2.1 Support
- MES updates
- GFX11 modifiers support
- USB-C fixes
- MMHUB 3.0.1 support
- SDMA 6.0 doorbell fixes
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Enable GPU reset for SMU 13.0.4
- OLED display fixes
- MPO fixes
- DC frame size fixes
- ASPM support for PCIE 7.4/7.6
- GPU reset support for SMU 13.0.0
- GFX11 updates
- VCN JPEG fix
- BACO support for SMU 13.0.7
- VCN instance handling fix
- GFX8 GPUVM TLB flush fix
- GPU reset rework
- VCN 4.0.2 support
- GTT size fixes
- DP link training fixes
- LSDMA 6.0.1 support
- Various backlight fixes
- Color encoding fixes
- Backlight config cleanup
- VCN 4.x unified queue cleanup

amdkfd:
- MMU notifier fixes
- Updates for GC 10.3.6 and 10.3.7
- P2P DMA support using dma-buf
- Add available memory IOCTL
- SDMA 6.0.1 fix
- MES fixes
- HMM profiler support

radeon:
- License fix
- Backlight config cleanup

UAPI:
- Add available memory IOCTL to amdkfd
  Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html
- HMM profiler support for amdkfd
  Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com
2022-07-12 11:07:32 +10:00
Jack Xiao 35ba8850b6 drm/amdgpu/mes: fix mes submission in atomic context
For some cases (accessing registers, unmap legacy queue), it needs
access mes in atomic context. Use spinlock to protect agaist mes
ring buffer race condition.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-08 18:25:56 -04:00
Alex Deucher fc25fd602b drm/amdgpu/display: disable prefer_shadow for generic fb helpers
Seems to break hibernation.  Disable for now until we can root
cause it.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=216119
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:19:30 -04:00
Alex Deucher 89e2b4373a drm/amdgpu: keep fbdev buffers pinned during suspend
Was dropped when we converted to the generic helpers.

Fixes: 087451f372 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:18:36 -04:00
André Almeida edadd6fc28 drm/amdpgu/debugfs: Simplify some exit paths
To avoid code repetition, unify the function exit path when possible. No
functional changes.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:18:14 -04:00
Jianglei Nie c3c483391b drm/amdgpu/mes: Fix an error handling path in amdgpu_mes_self_test()
if amdgpu_mes_ctx_alloc_meta_data() fails, we should call amdgpu_vm_fini()
to handle amdgpu_vm_init().

Add a new lable before amdgpu_vm_init() and goto this lable when
amdgpu_mes_ctx_alloc_meta_data() fails.

Signed-off-by: Jianglei Nie <niejianglei2021@163.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:18:07 -04:00
Jack Xiao 7acd7ab029 drm/amdgpu/mes11: fix to unmap legacy queue
MES fw updated to support unmapping legacy gfx/compute queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:17:13 -04:00
Stanley.Yang e0e146d556 drm/amdgpu: skip whole ras bad page framework on sriov
It should not init whole ras bad page framework on sriov guest side
due to it is handled on host side.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:10:20 -04:00
Stanley.Yang 26093ce14b drm/amdgpu: Only send ras feature for gfx block
GFX is the only IP block that RAS TA needs to program
the hardware when receiving enable_feature command.

Changed from V1:
    remove amdgpu_ras_need_send_ras_feature inline function,
    use GFX RAS block check directly.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:10:13 -04:00
Lang Yu 4bdb9d6501 drm/amdkfd: simplify vm_validate_pt_pd_bos
We don't need to validate and map root PD specially here,
it would be validated and mapped by amdgpu_vm_validate_pt_bos
if it is evicted.

The special case is when turning a GFX VM to a compute VM,
if vm_update_mode changed, we should make sure root PD gets
mapped. So just map root PD after updating vm->update_funcs
in amdgpu_vm_make_compute whether the vm_update_mode changed
or not.

v3:
 - Add some comments suggested by Christian.

v2:
 - Don't rename vm_validate_pt_pd_bos and make it public.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-05 16:09:55 -04:00
Philip Yang c7f21978fa drm/amdkfd: Add user queue eviction restore SMI event
Output user queue eviction and restore event. User queue eviction may be
triggered by svm or userptr MMU notifier, TTM eviction, device suspend
and CRIU checkpoint and restore.

User queue restore may be rescheduled if eviction happens again while
restore.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:31:14 -04:00
Jack Xiao 395ece6f14 Revert "drm/amdgpu/gmc11: avoid cpu accessing registers to flush VM"
This reverts commit 8748de873f
since drv enabled mes to access registers.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:28:31 -04:00
Jack Xiao cf60672900 drm/amdgpu: enable mes to access registers v2
Enable mes to access registers.

v2: squash mes sched ring enablement flag

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:28:24 -04:00
Jack Xiao adc0e6ab0d drm/amdgpu/mes: add mes register access interface
Add mes register access routines:
1. read register
2. write register
3. wait register
4. write and wait register

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:28:18 -04:00
Jack Xiao 7d4705b33c drm/amdgpu/mes11: add mes11 misc op
Add misc op commands in mes11.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:28:11 -04:00
Jack Xiao 6a4a1f6054 drm/amdgpu: add common interface for mes misc op
Add common interface for mes misc op, including accessing register
interface.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30 15:27:57 -04:00
Alex Deucher 6e9c65f71e drm/amdgpu: fix documentation warning
Fixes this issue:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:5094: warning: expecting prototype for amdgpu_device_gpu_recover_imp(). Prototype was for amdgpu_device_gpu_recover() instead

Fixes: cf72704414 ("drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover")
Reviewed-by: Kent Russell <kent.russell@amd.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29 22:03:51 -04:00
Aurabindo Pillai ff15cea338 drm/amd/display: expose additional modifier for DCN32/321
[Why&How]
Some userspace expect a backwards compatible modifier on DCN32/321. For
hardware with num_pipes more than 16, we expose the most efficient
modifier first. As a fall back method, we need to expose slightly inefficient
modifier AMD_FMT_MOD_TILE_GFX9_64K_R_X after the best option.

Also set the number of packers to fixed value as required per hardware
documentation. This value is cached during hardware initialization and
can be read through the base driver.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29 17:12:18 -04:00
Aurabindo Pillai 7268f0a9e8 drm/amd: Load TA firmware for DCN321/DCN32
[Why&How]
TA firmware is needed to enable HDCP.

Changes in v2:

Load separate firmware for PSP 13.0.0

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29 17:12:08 -04:00
Kent Russell d193b12b2f drm/amdgpu: Fix typos in amdgpu_stop_pending_resets
Change amdggpu to amdgpu and pedning to pending

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29 17:10:21 -04:00
Andrey Grodzovsky 9ae55f030d drm/amdgpu: Follow up change to previous drm scheduler change.
Align refcount behaviour for amdgpu_job embedded HW fence with
classic pointer style HW fences by increasing refcount each
time emit is called so amdgpu code doesn't need to make workarounds
using amdgpu_job.job_run_counter to keep the HW fence refcount balanced.

Also since in the previous patch we resumed setting s_fence->parent to NULL
in drm_sched_stop switch to directly checking if job->hw_fence is
signaled to short circuit reset if already signed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Tested-by: Yiqing Yao <yiqing.yao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:24:41 -04:00
Andrey Grodzovsky 9e225fb9e6 drm/amdgpu: Prevent race between late signaled fences and GPU reset.
Problem:
After we start handling timed out jobs we assume there fences won't be
signaled but we cannot be sure and sometimes they fire late. We need
to prevent concurrent accesses to fence array from
amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process
from a late EOP interrupt.

Fix:
Before accessing fence array in GPU disable EOP interrupt and flush
all pending interrupt handlers for amdgpu device's interrupt line.

v2: Switch from irq_get/put to full enable/disable_irq for amdgpu

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:24:24 -04:00
Andrey Grodzovsky dd70748eda drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
This function should drop the fence refcount when it extracts the
fence from the fence array, just as it's done in amdgpu_fence_process.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:24:18 -04:00
Leslie Shi 5c4904ac34 drm/amdgpu: Remove useless amdgpu_display_freesync_ioctl() declaration
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:24:12 -04:00
Jack Xiao fe4e9ff987 drm/amdgpu: add mc wptr addr support for mes
MES requires mc wptr address for usermode queues.
Export bo gart address for mc wptr address.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:24:05 -04:00
Evan Quan d7f0c8aff0 drm/amdgpu: update GFX11 cs settings
Update GFX11 cs related settings.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-28 11:23:32 -04:00
Colin Ian King 5027605aed drm/amdkfd: Fix spelling mistake "mechanim" -> "mechanism"
There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:23:59 -04:00
Alex Deucher f64e6e0b6a Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
This reverts commit 92020e81dd.

This causes stuttering and timeouts with DMCUB for some users
so revert it until we understand why and safely enable it
to save power.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1887
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
2022-06-23 17:23:17 -04:00
Jiang Jian 50ef0cacc4 drm/amdgpu: drop unexpected word 'for' in comments
there is an unexpected word 'for' in the comments that need to be dropped

file - drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
line - 245

 * position and also advance the position for for Vega10

changed to:

 * position and also advance the position for Vega10

Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:23:13 -04:00
Graham Sider a957995618 drm/amdgpu: Update mes_v11_api_def.h
Update MES API to support oversubscription without aggregated doorbell
for usermode queues.

v2: Change oversubscription_no_aggregated_en to is_kfd_process (align
with MES)

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:22:31 -04:00
Graham Sider e77a541f5d drm/amdkfd: Enable GFX11 usermode queue oversubscription
Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to
GART for usermode queues in order to support oversubscription. In the
case that work is submitted to an unmapped queue, MES must have a GART
wptr address to determine whether the queue should be mapped.

This change is accompanied with changes in MES and is applicable for
MES_API_VERSION >= 2.

v3:
- Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup
- Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart
- Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart
- Cleanup/fix create_queue wptr_bo error handling
v4:
- Add MES version shift/mask defines to amdgpu_mes.h
- Change version check from MES_VERSION to MES_API_VERSION
- Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to
ensure bo is a single page.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:22:12 -04:00
Graham Sider ff83e6e7ab drm/amdgpu: Fetch MES scheduler/KIQ versions
Store MES scheduler and MES KIQ version numbers in amdgpu_mes for GFX11.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-23 17:22:05 -04:00