Commit Graph

15798 Commits

Author SHA1 Message Date
Sunil Khatri 71353c1a4f drm/amdgpu: change DRM_DBG_DRIVER to drm_dbg_driver
update the functions in amdgpu_userqueues.c from
DRM_DBG_DRIVER to drm_dbg_driver so multi gpu instance
can be logged in.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:38 -04:00
Sunil Khatri c46a37628a drm/amdgpu: change DRM_ERROR to drm_file_err in amdgpu_userq.c
change the DRM_ERROR and drm_err to drm_file_err
to add process name and pid to the logging.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:33 -04:00
Sunil Khatri 8c97cdb1a6 drm/amdgpu: use drm_file_err in fence timeouts
use drm_file_err instead of DRM_ERROR which adds
process and pid information in the userqueue error
logging.

Sample log:

[   19.802315] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Couldn't unmap all the queues
[   19.802319] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Failed to evict userqueue
[   19.838432] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Couldn't unmap all the queues
[   19.838436] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Failed to evict userqueue

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:25 -04:00
Sunil Khatri 30ff75809d drm/amdgpu: add drm_file reference in userq_mgr
drm_file will be used in usermode queues code to
enable better process information in logging and hence
add drm_file part of the userq_mgr struct.

update the drm_file pointer in userq_mgr for each
amdgpu_driver_open_kms.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-05 13:29:18 -04:00
Dan Carpenter 97c39b4da6 drm/amdgpu/userq: remove unnecessary NULL check
The "ticket" pointer points to in the middle of the &exec struct so it
can't be NULL.  Remove the check.

Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:18:52 -04:00
Dan Carpenter d6c6d5ec66 drm/amdgpu/userq: Call unreserve on error in amdgpu_userq_fence_read_wptr()
This error path should call amdgpu_bo_unreserve() before returning.

Fixes: d8675102ba ("drm/amdgpu: add vm root BO lock before accessing the vm")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:17:42 -04:00
Alex Deucher aded8b3c36 drm/amdgpu: properly handle GC vs MM in amdgpu_vmid_mgr_init()
When kernel queues are disabled, all GC vmids are available
for the scheduler.  MM vmids are still managed by the driver
so make all 16 available.

Also fix gmc 10 vs 11 mix up in
commit 1f61fc28b9 ("drm/amdgpu/mes: make more vmids available when disable_kq=1")

v2: Properly handle pre-GC 10 hardware

Fixes: 1f61fc28b9 ("drm/amdgpu/mes: make more vmids available when disable_kq=1")
Cc: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:16:53 -04:00
Alex Deucher 2e828a25f8 drm/amdgpu/mes: use correct MES pipe for resets
Use the KIQ pipe for kernel queues and the SCHED pipe for
user queues.

Fixes: 2408b0272b ("drm/amdgpu/mes: consolidate on a single mes reset callback")
Cc: Michael Chen <Michael.Chen@amd.com>
Cc: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:16:14 -04:00
Alex Deucher 2408b0272b drm/amdgpu/mes: consolidate on a single mes reset callback
Use the legacy one as it covers both kernel queues and
user queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:16:07 -04:00
Alex Deucher 6535348a3e drm/amdgpu/mes: remove more unused functions
These were leftover from mes bring up and are unused.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:15:57 -04:00
Bagas Sanjaya 4e24c6bb5f drm/amdgpu/userq: fix user_queue parameters list
Sphinx reports htmldocs warning:

Documentation/gpu/amdgpu/module-parameters:7: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:1119: ERROR: Unexpected indentation. [docutils]

Fix the warning by using reST bullet list syntax for user_queue
parameter options, separated from preceding paragraph by a blank
line.

Fixes: fb20954c97 ("drm/amdgpu/userq: rework driver parameter")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250422202956.176fb590@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:15:15 -04:00
Lijo Lazar 0105725e2d drm/amdgpu: Fix comment style
Fix code comment style

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504271826.xy2fFO28-lkp@intel.com/
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:15:10 -04:00
Lijo Lazar cf1fcdeec4 drm/amdgpu: Print bootloader status for long waits
If it needs a long wait for completion of bootloader execution, report
the status in between. That helps to know if there is some issue during
bootloader execution.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:14:58 -04:00
Yifan Zha 161949dd71 drm/amdgpu: refine MES register print for devices of hive
[Why]
Register access print missed device info.

[How]
Using dev_xxx instead of DRM_xxx to indicate which device
of a hive is the message for.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:14:47 -04:00
Lijo Lazar 3805e6959c drm/amdgpu: Fix query order of XGMI v6.4.1 status
Keep the register offsets as per link order for querying XGMI v6.4.1
link status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Fixes: 6dee64e765 ("drm/amdgpu: Fix xgmi v6.4.1 link status reporting")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:13:44 -04:00
Jesse.Zhang ad7c088e31 drm/amdgpu: Fix API status offset for MES queue reset
The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.

This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:10:06 -04:00
Alex Deucher 482d485332 drm/amdgpu/userq: take the userq_mgr lock in enforce isolation
Add the missing locking.

Fixes: 94976e7e5e ("drm/amdgpu/userq: add helpers to start/stop scheduling")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:09:26 -04:00
Alex Deucher c5e02d6588 drm/amdgpu/userq: take the userq_mgr lock in suspend/resume
Add the missing locking.

Fixes: 73e12e98ec ("drm/amdgpu/userq: add suspend and resume helpers")
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:24 -04:00
Sonny Jiang 3e5f86c14c drm/amdgpu: Add DPG pause for VCN v5.0.1
For vcn5.0.1 only, enable DPG PAUSE to avoid DPG resets.

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:08:21 -04:00
Arvind Yadav 56801cb83c drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
DRM_AMDGPU_NAVI3X_USERQ config support is not required for
usermode queue.

v2: rebase.

Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:06:00 -04:00
Rodrigo Siqueira ffc7e11c10 drm/amdgpu: Add documentation associated with CSB
Add a description for the get_csb_buffer callback, update the glossary,
and add some extra information about RB, which is associated with CSB
configuration.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:41 -04:00
Rodrigo Siqueira e7164c7ade drm/amdgpu/gfx: Use CSB helpers in gfx_v6_0_get_csb_buffer
Remove duplications from gfx_v6_0_get_csb_buffer by using CSB helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:38 -04:00
Rodrigo Siqueira aff78a6172 drm/amdgpu/gfx: Fix gfx_v7_0_get_csb_buffer to use rb_config
Instead of having the hardcoded values for the CSB buffer in
gfx_v7_0_get_csb_buffer, use the values calculated in previous steps by
accessing raster_config and raster_config_1.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:34 -04:00
Prike Liang e125a6e8ce drm/amdgpu: set the evf name to identify the userq case
The evf fence name can clearly identify the userq usage.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:27 -04:00
Lijo Lazar d8116a32cd drm/amdgpu: Fix offset for HDP remap in nbio v7.11
APUs in passthrough mode use HDP flush. 0x7F000 offset used for
remapping HDP flush is mapped to VPE space which could get power gated.
Use another unused offset in BIF space.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:14 -04:00
Prike Liang 9d40b05d6d drm/amdgpu: add the evf attached gem obj resv dump
This debug dump will help on debugging the evf attached gem obj fence
related issue.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:07 -04:00
Felix Kuehling dbe4c63689 drm/amdgpu: Fail DMABUF map of XGMI-accessible memory
If peer memory is XGMI-accessible, we should never access it through PCIe
P2P DMA mappings. PCIe P2P is slower, has different coherence behaviour,
limited or no support for atomics, or may not work at all. Fail with a
warning if DMABUF mappings of such memory are attempted.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:05:02 -04:00
Rodrigo Siqueira 9bd5a47ee2 drm/amdgpu/gfx: Use CSB helpers in gfx_v7_0_get_csb_buffer
Use CSB helpers to remove code duplication from gfx_v7_0_get_csb_buffer.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:48 -04:00
Rodrigo Siqueira b990cb5234 drm/amdgpu/gfx: Use CSB helpers in gfx_v8_0_get_csb_buffer
Remove code duplication from gfx_v8_0_get_csb_buffer by using CSB
helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:45 -04:00
Rodrigo Siqueira 9fec2e92fa drm/amdgpu/gfx: Use CSB helpers in gfx_v9_0_get_csb_buffer
Eliminate code duplication in gfx_v9_0_get_csb_buffer by using CSB
helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:42 -04:00
Rodrigo Siqueira 0eef0e36ba drm/amdgpu/gfx: Use CSB helpers in gfx_v10_0_get_csb_buffer
Remove duplicate code by using CSB helpers.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:38 -04:00
Rodrigo Siqueira 106172df6e drm/amdgpu/gfx: Use CSB helpers in gfx_v11_0_get_csb_buffer
Part of the code in gfx_v11_0_get_csb_buffer can be removed in favor of
some GFX CSB helpers. This commit removes the duplicated part for the
GFX 11 CSB function.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:34 -04:00
Rodrigo Siqueira 9718f7457d drm/amdgpu/gfx: Introduce helpers handling CSB manipulation
From GFX6 to GFX11, there is a function for getting the CSB buffer to be
put into the hardware. Three common parts are duplicated in all of these
GFX functions:

1. Prepare the CSB preamble.
2. Parser the CS data.
3. End the CSB preamble.

This commit creates helpers to be used from GFX6 to GFX11.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:29 -04:00
Colin Ian King 5a2658fda4 drm/amdgpu: Fix spelling mistake "rounter" -> "router"
There is a spelling mistake with the array utcl2_rounter_str, it
appears it should be utcl2_router_str. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:23 -04:00
Kees Cook e5f7e4e0a4 drm/amdgpu/atom: Work around vbios NULL offset false positive
GCC really does not want to consider NULL (or near-NULL) addresses as
valid, so calculations based off of NULL end up getting range-tracked into
being an offset wthin a 0 byte array. It gets especially mad about this:

                if (vbios_str == NULL)
                        vbios_str += sizeof(BIOS_ATOM_PREFIX) - 1;
	...
        if (vbios_str != NULL && *vbios_str == 0)
                vbios_str++;

It sees this as being "sizeof(BIOS_ATOM_PREFIX) - 1" byte offset from
NULL, when building with -Warray-bounds (and the coming
-fdiagnostic-details flag):

In function 'atom_get_vbios_pn',
    inlined from 'amdgpu_atom_parse' at drivers/gpu/drm/amd/amdgpu/atom.c:1553:2:
drivers/gpu/drm/amd/amdgpu/atom.c:1447:34: error: array subscript 0 is outside array bounds of 'unsigned char[0]' [-Werror=array-bounds=]
 1447 |         if (vbios_str != NULL && *vbios_str == 0)
      |                                  ^~~~~~~~~~
  'amdgpu_atom_parse': events 1-2
 1444 |                 if (vbios_str == NULL)
      |                    ^
      |                    |
      |                    (1) when the condition is evaluated to true
......
 1447 |         if (vbios_str != NULL && *vbios_str == 0)
      |                                  ~~~~~~~~~~
      |                                  |
      |                                  (2) out of array bounds here
In function 'amdgpu_atom_parse':
cc1: note: source object is likely at address zero

As there isn't a sane way to convince it otherwise, hide vbios_str from
GCC's optimizer to avoid the warning so we can get closer to enabling
-Warray-bounds globally.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:04:12 -04:00
Lijo Lazar 75f138db48 drm/amdgpu: Disallow partition query during reset
Reject queries to get current partition modes during reset. Also, don't
accept sysfs interface requests to switch compute partition mode while
in reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:03:02 -04:00
Sunil Khatri 127e612bf1 drm/amdgpu: update fence ptr with context:seqno
log context:seqno of the fence during timeout rather
than logging fence pointer.

Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Arvind Yadav cade59abaa drm/amdgpu/gfx12: Add fw minimum version check for usermode queue
This patch is load usermode queue based on FW support for gfx12.
CP Ucode FW Vesion: [PFP = 2840, ME = 2780, MEC = 3050, MES = 123]

v2: Addressed review comments from Alex
   - Just check the firmware versions directly.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Arvind Yadav 61ca97e959 drm/amdgpu/gfx11: Add fw minimum version check for usermode queue
This patch is load usermode queue based on FW support for gfx11.
CP Ucode FW version: [PFP = 2530, ME = 2390, MEC = 2600, MES = 120]

v2: Addressed review comments from Alex.
    - Just check the firmware versions directly.
v3: Firmware version checks only for Navi3x(by Alex).

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher 42a6667780 drm/amdgpu/userq: use consistent function naming
s/userqueue/userq/

1. remove the mix of amdgpu_userqueue and amdgpu_userq
2. to be consistent with other amdgpu_userq_fence.c
3. it's shorter

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher 4fdbe3a623 drm/amdgpu/userq: rename eviction helpers
suspend/resume -> evict/restore

Rename to avoid confusion with the system suspend
and resume helpers.

v2: update error messages

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:46 -04:00
Alex Deucher d13e95967e drm/amdgpu/userq: move waiting for last fence before umap
Need to wait for the last fence before unmapping.  This
also fixes a memory leak in amdgpu_userqueue_cleanup()
when the fence isn't signalled.

Fixes: b0db33c8c5 ("drm/amdgpu/userq: rework front end call sequence")
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher 36b0bc1731 drm/amdgpu/userq: unmap queues amdgpu_userq_mgr_fini()
This was missed when the map and unmap were split out
of the mqd create and destroy functions.

Fixes: b0db33c8c5 ("drm/amdgpu/userq: rework front end call sequence")
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher e67b95f0cd drm/amdgpu: switch from queue_active to queue state
Track the state of the queue rather than simple active vs
not.  This is needed for other states (hung, preempted, etc.).
While we are at it, move the state tracking into the user
queue front end code.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher ba324ffb25 drm/amdgpu/userq: optimize enforce isolation and s/r
If user queues are disabled for all IPs in the case
of suspend and resume and for gfx/compute in the case
of enforce isolation, we can return early.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Xiang Liu 696e8fa354 drm/amdgpu: Print kernel message when error logged by scrub
Print a kernel message when the scrub bit of status register is set to
indicate that errors are being logged by the scrub.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Alex Deucher 11772eb73b drm/amdgpu/userq: add a helper to check which IPs are enabled
Add a helper to get a mask of IPs which support user queues.
Use this in the INFO IOCTL to get the IP mask to replace
the current code.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:45 -04:00
Arunpravin Paneer Selvam 4b27406380 drm/amdgpu: Add queue id support to the user queue wait IOCTL
Add queue id support to the user queue wait IOCTL
drm_amdgpu_userq_wait structure.

This is required to retrieve the wait user queue and maintain
the fence driver references in it so that the user queue in
the same context releases their reference to the fence drivers
at some point before queue destruction.

Otherwise, we would gather those references until we
don't have any more space left and crash.

v2: Modify the UAPI comment as per the mesa and libdrm UAPI comment.

Libdrm MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/408
Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34493

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Alex Deucher 4ec2141d23 drm/amdgpu/userq: enable support for secure queues
Enable users to create secure GFX/compute queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Tested-by: Jesse.Zhang <Jesse.zhang@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00
Alex Deucher 87ceff6136 drm/amdgpu/userq/mes: pass the secure flag to mqd init
So that we initialize the MQD as a secure queue.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22 08:51:44 -04:00