linux

Commit Graph

Author	SHA1	Message	Date
Christian König	bd22e44ad4	drm/amdgpu: rework how isolation is enforced v2 Limiting the number of available VMIDs to enforce isolation causes some issues with gang submit and applying certain HW workarounds which require multiple VMIDs to work correctly. So instead start to track all submissions to the relevant engines in a per partition data structure and use the dma_fences of the submissions to enforce isolation similar to what a VMID limit does. v2: use ~0l for jobs without isolation to distinct it from kernel submissions which uses NULL for the owner. Add some warning when we are OOM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:16:34 -04:00
Christian König	7f11c59e07	drm/amdgpu: overwrite signaled fence in amdgpu_sync This allows using amdgpu_sync even without peeking into the fences for a long time. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Christian König	16590745b5	drm/amdgpu: use GFP_NOWAIT for memory allocations In the critical submission path memory allocations can't wait for reclaim since that can potentially wait for submissions to finish. Finally clean that up and mark most memory allocations in the critical path with GFP_NOWAIT. The only exception left is the dma_fence_array() used when no VMID is available, but that will be cleaned up later on. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Kenneth Feng	a67f0094c9	drm/amd/amdgpu: Revert "drm/amd/amdgpu: shorten the gfx idle worker timeout" This reverts commit `55ff973fe1`. Reason for revert: this causes some tests fail with call trace. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ahmad Rehman	cfdf8b34b9	drm/amdgpu/sdam: Skip SDMA queue reset for SRIOV For SRIOV, skip the SDMA queue reset and return error. The engine/queue reset failure will trigger FLR in the sequence. v2: do not add queue reset support mask for sriov Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ahmad Rehman	0a59fbd5d9	drm/amdgpu: Add support to load PSP TA v13.0.12 for SRIOV Add case for 13.0.12. Signed-off-by: Ahmad Rehman <ahrehman@amd.com> Reviewed-by: Vignesh.Chander@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Ellen Pan	d7f5c13e45	drm/amdgpu: Enable amdgpu_ras_resume for gfx 9.5.0 This enables ras to be resumed after gpu recovery on mi350 sriov. Signed-off-by: Ellen Pan <yunru.pan@amd.com> Reviewed-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-21 12:15:08 -04:00
Victor Skvortsov	9c05636ca7	drm/amdgpu: Skip pcie_replay_count sysfs creation for VF VFs cannot read the NAK_COUNTER register. This information is only available through PMFW metrics. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:27 -04:00
Candice Li	a5f7e90fe0	drm/amdgpu: Add active_umc_mask to ras init_flags Add active_umc_mask to ras init_flags. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:23 -04:00
Lijo Lazar	a7818b15cf	Documentation/amdgpu: Add debug_mask documentation Add description for debug_mask bit options. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:19 -04:00
Lijo Lazar	ab6893402a	drm/amd/pm: Add debug bit for smu pool allocation In certain cases, it's desirable to avoid PMFW log transactions to system memory. Add a mask bit to decide whether to allocate smu pool in device memory or system memory. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:13 -04:00
Alex Deucher	3b669df92c	drm/amdgpu/vcn: adjust workload profile handling No need to make the workload profile setup dependent on the results of cancelling the delayed work thread. We have all of the necessary checking in place for the workload profile reference counting, so separate the two. As it is now, we can theoretically end up with the call from begin_use happening while the worker thread is executing which would result in the profile not getting set for that submission. It should not affect the reference counting. v2: bail early if the the profile is already active (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:09 -04:00
Alex Deucher	9e34d8d1a1	drm/amdgpu/gfx: adjust workload profile handling No need to make the workload profile setup dependent on the results of cancelling the delayed work thread. We have all of the necessary checking in place for the workload profile reference counting, so separate the two. As it is now, we can theoretically end up with the call from begin_use happening while the worker thread is executing which would result in the profile not getting set for that submission. It should not affect the reference counting. v2: bail early if the the profile is already active (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:04 -04:00
Candice Li	5762f9dcf7	drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12 Add EEPROM I2C address support for smu v13_0_12. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:56:00 -04:00
Alex Deucher	ca6575a32a	drm/amdgpu/vcn: fix ref counting for ring based profile handling We need to make sure the workload profile ref counts are balanced. This isn't currently the case because we can increment the count on submissions, but the decrement may be delayed as work comes in. Track when we enable the workload profile so the references are balanced. v2: switch to a mutex and active flag v3: fix mutex init Fixes: `1443dd3c67` ("drm/amd/pm: fix and simplify workload handling") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:54:36 -04:00
Alex Deucher	553673a3e1	drm/amdgpu/gfx: fix ref counting for ring based profile handling We need to make sure the workload profile ref counts are balanced. This isn't currently the case because we can increment the count on submissions, but the decrement may be delayed as work comes in. Track when we enable the workload profile so the references are balanced. v2: switch to a mutex and active flag v3: fix mutex init Fixes: `8fdb3958e3` ("drm/amdgpu/gfx: add ring helpers for setting workload profile") Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Tested-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:52:56 -04:00
Christian König	0d9a95099d	drm/amdgpu: grab an additional reference on the gang fence v2 We keep the gang submission fence around in adev, make sure that it stays alive. v2: fix memory leak on retry Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-19 15:51:31 -04:00
David Belanger	35b6162bb7	drm/amdgpu: Restore uncached behaviour on GFX12 Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `eb6cdfb807`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-18 16:29:52 -04:00
Wentao Liang	86730b5261	drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini() In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: `52cb80c12e` ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ebdc52607a`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-18 16:29:16 -04:00
David Rosca	7fc0765208	drm/amdgpu: Remove JPEG from vega and carrizo video caps JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0a6e7b06bd`) Cc: stable@vger.kernel.org	2025-03-18 16:26:12 -04:00
David Rosca	ec33964d9d	drm/amdgpu: Fix JPEG video caps max size for navi1x and raven 8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6e0d2fde3a`) Cc: stable@vger.kernel.org	2025-03-18 16:25:46 -04:00
David Rosca	f0105e1731	drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size 1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1a0807feb9`) Cc: stable@vger.kernel.org	2025-03-18 16:25:16 -04:00
Lijo Lazar	6c11d4a87d	drm/amdgpu: Use wafl version for xgmi XGMI and WAFL share the same versions. Use WAFL version if XGMI version is not present in discovery. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Jesse.zhang@amd.com	cc63bcfd14	drm/amdgpu: Fix SDMA engine reset logic The scheduler should restart only if the reset operation succeeds This ensures that new tasks are only submitted to the queues after a successful reset. Fixes: `4c02f73016` ("drm/amdgpu: Introduce conditional user queue suspension for SDMA resets") Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse.Zhang <Jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Flora Cui	b5aaa82e2b	drm/amdgpu: release xcp_mgr on exit Free on driver cleanup. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:47 -04:00
Xiang Liu	d6f9bbce18	drm/amdgpu: Fix computation for remain size of CPER ring The mistake of computation for remain size of CPER ring will cause unbreakable while cycle when CPER ring overflow. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:46 -04:00
Kenneth Feng	55ff973fe1	drm/amd/amdgpu: shorten the gfx idle worker timeout Shorten the gfx idle worker timeout. This is to sync with DAL when there is no activity on the screen. Original 1 second can not sync with DAL, so DAL can not apply MALL when the workload type is not bootup default. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Tao Zhou	05d50ea3ea	drm/amdgpu: format old RAS eeprom data into V3 version Clear old data and save it in V3 format. v2: only format eeprom data for new ASICs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	9deacd6c55	drm/amdgpu: don't free conflicting apertures for non-display devices PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be the sysfb, so there is no need to free conflicting apertures. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	e00e5c2238	drm/amdgpu: adjust drm_firmware_drivers_only() handling Move to probe so we can check the PCI device type and only apply the drm_firmware_drivers_only() check for PCI DISPLAY classes. Also add a module parameter to override the nomodeset kernel parameter as a workaround for platforms that have this hardcoded on their kernel command lines. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:38 -04:00
Alex Deucher	4db4c82d4d	drm/amdgpu: drop drm_firmware_drivers_only() There are a number of systems and cloud providers out there that have nomodeset hardcoded in their kernel parameters to block nouveau for the nvidia driver. This prevents the amdgpu driver from loading. Unfortunately the end user cannot easily change this. The preferred way to block modules from loading is to use modprobe.blacklist=<driver>. That is what providers should be using to block specific drivers. Drop the check to allow the driver to load even when nomodeset is specified on the kernel command line. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-18 14:03:37 -04:00
David Belanger	eb6cdfb807	drm/amdgpu: Restore uncached behaviour on GFX12 Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:18:02 -04:00
Wentao Liang	ebdc52607a	drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini() In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: `52cb80c12e` ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:17:24 -04:00
Shaoyun Liu	f81cd79311	drm/amd/amdgpu: Fix MES init sequence When MES is been used , the set_hw_resource_1 API is required to initialize MES internal context correctly Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:16:08 -04:00
Xiang Liu	13c13bdd1b	drm/amdgpu: Enable ACA by default for psp v13_0_6/v13_0_14 Enable ACA by default for psp v13_0_6/v13_0_14. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:15:58 -04:00
ganglxie	a4b6e990d7	drm/amdgpu: Save PA of bad pages for old asics for old asics that do not support mca translating, we just save PA for them Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:13:08 -04:00
Emily Deng	2da3af5f0b	drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf. In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:13:02 -04:00
Emily Deng	8d5e70ba5d	drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev). Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:12:52 -04:00
Lijo Lazar	357506799b	drm/amdgpu: Calculate IP specific xgmi bandwidth Use IP version specific xgmi speed/width for bandwidth calculation. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:11:46 -04:00
Harish Kasiviswanathan	8a7820c072	drm/amdgpu: Reduce dequeue retry timeout for gfx9 family Dequeue retry timeout controls the interval between checks for unmet conditions. On MI series, reduce this from 0x40 to 0x1 (~ 1 uS). The cost of additional bandwidth consumed by CP when polling memory shouldn't be substantial. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:38 -04:00
Alex Deucher	fc3c139cf0	drm/amdgpu/gfx12: don't read registers in mqd init Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:30 -04:00
Alex Deucher	e27b36ea6b	drm/amdgpu/gfx11: don't read registers in mqd init Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:25 -04:00
Lijo Lazar	b9e75bcb2b	drm/amdgpu: Remove unsupported xgmi versions XGMI v4.8.0 is not used in any SOCs. Remove the associated functions. Also, ensure get_xgmi_info callback pointer is not NULL before calling the function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:10:08 -04:00
David Rosca	19478f2011	drm/amdgpu: Update SRIOV video codec caps There have been multiple fixes to the video caps that are missing for SRIOV. Update the SRIOV caps with correct values. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:51 -04:00
David Rosca	0a6e7b06bd	drm/amdgpu: Remove JPEG from vega and carrizo video caps JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:25 -04:00
David Rosca	6e0d2fde3a	drm/amdgpu: Fix JPEG video caps max size for navi1x and raven 8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:08:14 -04:00
David Rosca	1a0807feb9	drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size 1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:07:52 -04:00
Natalie Vock	6cc30748e1	drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: `0cce5f285d` ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3e3fcd29b5`) Cc: stable@vger.kernel.org # 6.12.x	2025-03-12 14:59:21 -04:00
Natalie Vock	3e3fcd29b5	drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags PRT BOs may not have any backing store, so bo->tbo.resource will be NULL. Check for that before dereferencing. Fixes: `0cce5f285d` ("drm/amdkfd: Check correct memory types for is_system variable") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:37:10 -04:00
Alexandre Demers	37c890d831	drm/amdgpu: finish wiring up sid.h in DCE6 For coherence with DCE8 et DCE10, add or move some values under sid.h and remove duplicated from si_enums.h. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:37:07 -04:00
Alexandre Demers	760632fa2e	drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6 By wiring up sid.h in GFX6, we end up with a few duplicated defines such as the golden registers. Let's clean this up. [TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h and under si_enums.h, with different values. Keep the values used under radeon and move them under gfx_v6_0.c where they are used (as it is done under cik) Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:48 -04:00
Alexandre Demers	20fb56dfd8	drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10 Let's begin the cleanup in sid.h to prevent warnings and errors when wiring sid.h into dce_v6_0.c. This is a bigger cleanup. Many defines found under sid.h have already been properly moved into the different "_d.h" and "_sh_mask.h", so they should have been already removed from sid.h and properly linked in where needed. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:42 -04:00
Dan Carpenter	0ee560d71f	drm/amdgpu/gfx: delete stray tabs These lines are indented one tab too far. Delete the extra tabs. Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-11 12:36:29 -04:00
André Almeida	099f273eff	drm/amdgpu: Trigger a wedged event for ring reset Instead of only triggering a wedged event for complete GPU resets, trigger for ring resets. Regardless of the reset, it's useful for userspace to know that it happened because the kernel will reject further submissions from that app. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 14:56:25 -04:00
Alex Deucher	ded6ad4c6e	drm/amdgpu/vce2: fix ip block reference Need to use the correct IP block type. VCE vs VCN. Fixes mclk issues on Hawaii. Suggested by selendym. Fixes: `82ae6619a4` ("drm/amdgpu: update the handle ptr in wait_for_idle") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Cc: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `02438acd25`) Cc: stable@vger.kernel.org	2025-03-10 14:18:04 -04:00
Mario Limonciello	4afacc9948	drm/amd: Keep display off while going into S4 When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `68bfdc8dc0`)	2025-03-10 14:09:09 -04:00
Alex Deucher	02438acd25	drm/amdgpu/vce2: fix ip block reference Need to use the correct IP block type. VCE vs VCN. Fixes mclk issues on Hawaii. Suggested by selendym. Fixes: `82ae6619a4` ("drm/amdgpu: update the handle ptr in wait_for_idle") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Cc: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:36:21 -04:00
Alex Deucher	2c01befe4a	drm/amdgpu/vcn: fix idle work handler for VCN 2.5 VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) v3: switch to its own idle work handler v4: fix logic in DPG handling Fixes: `4ce4fe2720` ("drm/amdgpu/vcn: use per instance callbacks for idle work handler") Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:22:05 -04:00
Marek Olšák	3855f1d925	drm/amd/display: allow 256B DCC max compressed block sizes on gfx12 The hw supports it. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-10 13:22:01 -04:00
Greg Kroah-Hartman	993a47bd7b	Linux 6.14-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmfOKBUeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG1aQH/iC+Oyij4VxAjBek BOXIT/p6CwlIXb8ObiWWcRjDPizlcxb3RaV8J2RO+IqaQ2wltxpFANq2G7Re2FPm SNcEpIURAOVcxHGedcfFA91srO5F4FzNTO8LVp7MIbcgMYy3pdk+dbZmi6A691R+ t9pb74m+MAnF1o/MUx7pUlhAT/4ymuuR0F7WCSg4h0Xwe5m0nlJY89kJBC7PCjyd n3mdhsz3rDSLmt/z/T7HGD89r8sYSvm9cOKtL3ELgGTrm7boQV8ii9Y9w04DI8PQ JmIernugcCxmhH36mVUAHgJf2+/T388xFUh/D5+skeUOUZpaJZG866rnb32WpsHc eWLFUeg= =Wypt -----END PGP SIGNATURE----- Merge 6.14-rc6 into driver-core-next We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-10 17:37:25 +01:00
Dave Airlie	236f475d29	amdgpu: - Fix spelling typos - RAS updates - VCN 5.0.1 updates - SubVP fixes - DCN 4.0.1 fixes - MSO DPCD fixes - DIO encoder refactor - PCON fixes - Misc cleanups - DMCUB fixes - USB4 DP fixes - DM cleanups - Backlight cleanups and fixes - Support platform backlight curves - Misc code cleanups - SMU 14 fixes - JPEG 4.0.3 reset updates - SR-IOV fixes - SVM fixes - GC 12 DCC fixes - DC DCE 6.x fix - Hiberation fix amdkfd: - Fix possible NULL pointer in queue validation - Remove unnecessary CP domain validation - SDMA queue reset support - Add per process flags radeon: - Fix spelling typos - RS400 hyperZ fix UAPI: - Add KFD per process flags for setting precision Proposed user space: `2a64fa5e06` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ8tfmgAKCRC93/aFa7yZ 2CmkAP4xoy09aHecz6WOdNN9VBJiSMSZ3UOyBL9Ll2i5nI3YegEA7Co+AmQpXOPn HKeD6Vy1tt1XE/Liuur5dbLUKsVYSAI= =IQ45 -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.15-2025-03-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Fix spelling typos - RAS updates - VCN 5.0.1 updates - SubVP fixes - DCN 4.0.1 fixes - MSO DPCD fixes - DIO encoder refactor - PCON fixes - Misc cleanups - DMCUB fixes - USB4 DP fixes - DM cleanups - Backlight cleanups and fixes - Support platform backlight curves - Misc code cleanups - SMU 14 fixes - JPEG 4.0.3 reset updates - SR-IOV fixes - SVM fixes - GC 12 DCC fixes - DC DCE 6.x fix - Hiberation fix amdkfd: - Fix possible NULL pointer in queue validation - Remove unnecessary CP domain validation - SDMA queue reset support - Add per process flags radeon: - Fix spelling typos - RS400 hyperZ fix UAPI: - Add KFD per process flags for setting precision Proposed user space: `2a64fa5e06` Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250307211051.1880472-1-alexander.deucher@amd.com	2025-03-10 09:04:52 +10:00
Mario Limonciello	68bfdc8dc0	drm/amd: Keep display off while going into S4 When userspace invokes S4 the flow is: 1) amdgpu_pmops_prepare() 2) amdgpu_pmops_freeze() 3) Create hibernation image 4) amdgpu_pmops_thaw() 5) Write out image to disk 6) Turn off system Then on resume amdgpu_pmops_restore() is called. This flow has a problem that because amdgpu_pmops_thaw() is called it will call amdgpu_device_resume() which will resume all of the GPU. This includes turning the display hardware back on and discovering connectors again. This is an unexpected experience for the display to turn back on. Adjust the flow so that during the S4 sequence display hardware is not turned back on. Reported-by: Xaver Hugl <xaver.hugl@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2038 Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Link: https://lore.kernel.org/r/20250306185124.44780-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 15:33:49 -05:00
Alexandre Demers	092da9fb25	drm/amdgpu: add defines for pin_offsets in DCE8 Define pin_offsets values in the same way it is done in DCE8 Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:56:08 -05:00
Srinivasan Shanmugam	9c551ca3db	drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function Updated description for the 'other_mode' parameter. This parameter is used to determine the display mode of another display controller that may be sharing the line buffer. Cc: Ken Wang <Qingqing.Wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:56:04 -05:00
Xiang Liu	148084bbb1	drm/amdgpu: Use unique CPER record id across devices Encode socket id to CPER record id to be unique across devices. v2: add pointer check for adev->smuio.funcs->get_socket_id v2: set 0 if adev->smuio.funcs->get_socket_id is NULL Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:54:08 -05:00
Shiwu Zhang	216be476f1	drm/amdgpu: fix the gb_addr_config_fields init value mismatch For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten in gfx hw_init it is read out to popluate the gb_addr_config_fields in the sw_init stage, which causes mismatch. Fix it by using the golden value in sw_init as well. v2: This is a driver-set golden reg and keep as it is (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:58 -05:00
Shiwu Zhang	3bc7bc73af	drm/amdgpu: retire ip init code specific for A0 rev For aqua_vanjaram, A0 HW is retired so remove the code specific for it in gfx ip init. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:43 -05:00
Tao Zhou	334dc5fcc3	drm/amdgpu: increase RAS bad page threshold For default policy, driver will issue an RMA event when the number of bad pages is greater than 8 physical rows, rather than reaches 8 physical rows, don't rely on threshold configurable parameters in default mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:39 -05:00
Emily Deng	fe2fa3be3d	drm/amdgpu: Fix missing drain retry fault the last entry While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Xiaogang Chen<xiaogang.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:30 -05:00
Victor Lu	94b0908b85	drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV Aldebaran SRIOV VF cannot access the power brake feature regs. The accesses can be skipped to avoid a dmesg warning. v2: Remove redundant asic type check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:24 -05:00
Charles Han	571d36837c	drm/amdgpu: fix inconsistent indenting warning Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting Signed-off-by: Charles Han <hanchunchao@inspur.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:53:06 -05:00
Victor Lu	3646cc65e2	drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL. This access can be skipped to avoid a dmesg warning. v2: Use GC IP version check instead of asic check Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-07 12:52:29 -05:00
Sathishkumar S	a29936bcd2	drm/amdgpu: Fix core reset sequence for JPEG5_0_1 For cores 1 through 9 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:36 -05:00
Jonathan Kim	bac38ca8c4	drm/amdkfd: implement per queue sdma reset for gfx 9.4+ To reset hung SDMA queues on GFX 9.4+ for the GFX9 family, a soft reset must be issued through SMU. Since soft resets will reset an entire SDMA engine, use a common KGD call to do the reset as the KGD will handle avoiding a reset of in flight GFX and paging queues on that engine. In addition, create a common call for all reset types to simplify the handling of module parameter settings that block gpu resets. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:26 -05:00
Victor Lu	057fef20b8	drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:21 -05:00
Sathishkumar S	20c34e5c4a	drm/amdgpu: Fix core reset sequence for JPEG4_0_3 For cores 1 through 7 repair the core reset sequence by adjusting offsets to access the expected registers. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:07 -05:00
Tony Yi	a91d91b600	drm/amdgpu: Add support for CPERs on virtualization Add support for CPERs on VFs. VFs do not receive PMFW messages directly; as such, they need to query them from the host. To avoid hitting host event guard, CPER queries need to be rate limited. CPER queries share the same RAS telemetry buffer as error count query, so a mutex protecting the shared buffer was added as well. For readability, the amdgpu_detect_virtualization was refactored into multiple individual functions. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:47:03 -05:00
James Zhu	ca17c8e149	drm/amdkfd: remove unnecessary cpu domain validation before move to GTT domain. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:55 -05:00
Tony Yi	d4c60219ac	drm/amdgpu: Update headers for CPER support on SRIOV Update amdgv_sriovmsg.h and mxgpu_nv.h to add new definitions for CPER support on VFs. PMFW ACA messages are not available on VFs, and VFs must query CPERs from host. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:40 -05:00
Lijo Lazar	6ef5ccaad7	drm/amdgpu: Reinit FW shared flags on VCN v5.0.1 After a full device reset, shared memory region will clear out and it's not possible to reliably save the region in case of RAS errors. Reinitialize the flags if required. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:33 -05:00
Lijo Lazar	6e09402098	drm/amdgpu: Use the right struct for VCN v5.0.1 VCN IP versions >= 5.0 uses VCN5 fw shared struct. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:46:26 -05:00
Alexandre Demers	ab23db6d08	drm/amdgpu: add dce_v6_0_soft_reset() to DCE6 DCE6 was missing soft reset, but it was easily identifiable under radeon. This should be it, pretty much as it is done under DCE8 and DCE10. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:30 -05:00
Alexandre Demers	5f6021d52b	drm/amdgpu: fix style in DCE6 Whitespace cleanups. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:27 -05:00
Alexandre Demers	029ab8cabd	drm/amdgpu: add some comments in DCE6 Add some comments. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:45:23 -05:00
Asad Kamal	8df5f03be5	drm/amdgpu: Set PG state to gating for vcn_v_5_0_1 For vcn_v_5_0_1, set power state to gating during hw fini. Also there may be scenario where VCN engine hangs during a job execution, then it's not safe to assume that set_pg_state works fine during hw_fini to put the state to gated. After a reset, we can assume that it's in the default state, therefore reset the driver maintained state. Put the default state as gated during reset as per this assumption. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:44:56 -05:00
Mario Limonciello	f729e63743	drm/amd: Pass luminance data to amdgpu_dm_backlight_caps The ATIF method on some systems will provide a backlight curve. Pass this curve into amdgpu_dm add it to the structures. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:42:42 -05:00
Mario Limonciello	1c79b5fcdf	drm/amd: Copy entire structure in amdgpu_acpi_get_backlight_caps() As new members are introduced to the structure copying the entire structure will help avoid missing them. Reviewed-by: Alex Hung <alex.hung@amd.com> Link: https://lore.kernel.org/r/20250228185145.186319-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:42:37 -05:00
Lijo Lazar	a734a717dc	drm/amdgpu: Avoid HDP flush on JPEG v5.0.1 Similar to JPEG v4.0.3, HDP flush shouldn't be performed by JPEG engine. Keep it empty. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:38:01 -05:00
Lijo Lazar	6fcfaac604	drm/amdgpu: Initialize RRMT status on JPEG v5.0.1 Initialize RRMT enablement status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:56 -05:00
Jesse.zhang@amd.com	77bd621d14	drm/amdgpu: Update SDMA scheduler mask handling to include page queue This patch updates the SDMA scheduler mask handling to include the page queue if it exists. The scheduler mask is calculated based on the number of SDMA instances and the presence of the page queue. The mask is updated to reflect the state of both the SDMA gfx ring and the page queue. Changes: - Add handling for the SDMA page queue in `amdgpu_debugfs_sdma_sched_mask_set`. - Update scheduler mask calculations to include the page queue. - Modify `amdgpu_debugfs_sdma_sched_mask_get` to return the correct mask value. This change is necessary to verify multiple queues (SDMA gfx queue + page queue) and ensure proper scheduling and state management for SDMA instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:42 -05:00
Lijo Lazar	0b9647d40e	drm/amdgpu: Add offset normalization in VCN v5.0.1 VCN v5.0.1 also will need register offset normalization. Reuse the logic from VCN v4.0.3. Also, avoid HDP flush similar to VCN v4.0.3 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:35 -05:00
Lijo Lazar	b5a3fc54e8	drm/amdgpu: Initialize RRMT status on VCN v5.0.1 Initialize RRMT status from register. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:31 -05:00
Xiang Liu	677ae51f49	drm/amdgpu: Free CPER entry after committing to ring Free CPER entry when it's committed to CPER ring to avoid memory leak. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:27 -05:00
Alexandre Demers	899634a57a	drm/amdgpu: fix spelling typos in SI Fix typos Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:23 -05:00
Alexandre Demers	ce43abd7ec	drm/amdgpu: fix spelling typos Found some typos while exploring amdgpu code. Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-05 10:37:13 -05:00
Dave Airlie	e21cba7047	drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmfAMnwACgkQaA3BHVML eiOszgf+NCaYdCCKXurnKy2DVHZ61ar+Fg/FHYs1oThFVqknRT9CBGY93g3I7VGs gYUATrH2fYt4daIF94ap4YLYuMrZSKXSd0duzr8lQzCVAb/UgvUap9yjGwJsCOZA bE2jxlKvo6cUJbq19+ATgrAQ7y31/w30PCPyBjMGR0t4C3bBbbCHpxFyKzDWCtFT JS6pNbCPblZCCvtFPmQRqQqLSE0K5x4SE49ZpHM6hhA4MavlwoN4ZkWincliSzF2 XVCo7QHpVy3SItshhOB7ch7pIdkbm6nUvW2poR3UdXe+B+OSKRIz0C+2E98bYAra vwkyWoUP2A9nKZBsNy2d5026DeUDwg== =KURF -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-02-27' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: Cross-subsystem Changes: bus: - mhi: Avoid access to uninitialized field Core Changes: - Fix docmentation dp: - Add helpers for LTTPR transparent mode sched: - Improve job peek/pop operations - Optimize layout of struct drm_sched_job Driver Changes: arc: - Convert to devm_platform_ioremap_resource() aspeed: - Convert to devm_platform_ioremap_resource() bridge: - ti-sn65dsi86: Support CONFIG_PWM tristate i915: - dp: Use helpers for LTTPR transparent mode mediatek: - Convert to devm_platform_ioremap_resource() msm: - dp: Use helpers for LTTPR transparent mode nouveau: - dp: Use helpers for LTTPR transparent mode panel: - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 repaper: - Fix integer overflows stm: - Convert to devm_platform_ioremap_resource() vc4: - Convert to devm_platform_ioremap_resource() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250227094041.GA114623@linux.fritz.box	2025-02-28 12:36:01 +10:00
Srinivasan Shanmugam	7d83c129a8	drm/amdgpu: Fix parameter annotation in vcn_v5_0_0_is_idle Update parameter description in the vcn_v5_0_0_is_idle function Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1231: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:05 -05:00
Srinivasan Shanmugam	0f3fda3117	drm/amdgpu: Fix parameter annotations for VCN clock gating functions The previous references to a non-existent `adev` parameter have been removed & corrected to reflect the use of the `vinst` pointer, which points to the VCN instance structure, in the below files: - vcn_v1_0.c - vcn_v2_0.c - vcn_v3_0.c Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Function parameter or struct member 'vinst' not described in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c:624: warning: Excess function parameter 'adev' description in 'vcn_v1_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Function parameter or struct member 'vinst' not described in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c:376: warning: Excess function parameter 'adev' description in 'vcn_v2_0_mc_resume' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'adev' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:776: warning: Excess function parameter 'inst' description in 'vcn_v3_0_disable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Function parameter or struct member 'vinst' not described in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'adev' description in 'vcn_v3_0_enable_clock_gating' drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c:965: warning: Excess function parameter 'inst' description in 'vcn_v3_0_enable_clock_gating' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:05 -05:00
Asad Kamal	f9234217d0	drm/amd/amdgpu: Add support for xgmi_v6_4_1 Add support for xgmi_v6_4_1 and use it appropriate places Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Lijo Lazar	485993e2f1	drm/amdgpu: Add xgmi speed/width related info Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Lijo Lazar	6f16d101da	drm/amdgpu: Move xgmi definitions to xgmi header Move definitions related to xgmi to amdgpu_xgmi header Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
André Almeida	9c696cc57c	drm/amdgpu: Create a debug option to disable ring reset Prior to the addition of ring reset, the debug option `debug_disable_soft_recovery` could be used to force a full device reset. Now that we have ring reset, create a debug option to disable them in amdgpu, forcing the driver to go with the full device reset path again when both options are combined. This option is useful for testing and debugging purposes when one wants to test the full reset from userspace. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:04 -05:00
Alex Deucher	1d72fc2e9e	drm/amdgpu/mes11: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls They are noops on GFX11 for most firmware versions. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Colin Ian King	eaa3feb16d	drm/amdgpu: Fix spelling mistake "initiailize" -> "initialize" and grammar There is a spelling mistake and a grammatical error in a dev_err message. Fix it. Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	00f85667fa	drm/amdgpu: Decode deferred error type in aca bank parser In the case of poison inband log, the error type need to be specified by checking the deferred or poison bit of status register. v2: check both deferred and poison bit Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Le Ma	5b5f01eff7	drm/amdgpu: add sdma page queue irq processing for sdma442 Add the trap irq processing for page queue of sdma442 Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by and Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	d4bd7a50ca	drm/amdgpu: Report generic instead of unknown boot time errors Change the DMESG reporting of unknown errors to "Boot Controller Generic Error" to align with the RAS SPEC and provide more clarity to customers. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Lijo Lazar	b965e42530	drm/amdgpu: Fix logic to fetch supported NPS modes Correct the logic to find supported NPS modes from firmware. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Ava Zhang <niandong.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: `30eb41f5d1` ("drm/amdgpu: Use firmware supported NPS modes") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Xiang Liu	906d2859e1	drm/amdgpu: Disable fru_id field in CPER section The fru_id field is disabled cause of mis-matching defination between CPER spec and driver. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 16:50:03 -05:00
Benjamin Chan	dce1b82398	drm/amdgpu: Add amdisp pinctrl MFD resource AMDISP GPIO control uses a dedicated pinctrl driver, and requires MFD hotadd GPIO resources. Co-developed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Benjamin Chan <benjamin.chan@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:49 -05:00
Alex Deucher	4343f814e5	drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls They are noops on GFX12. There is no suspend/resume all support in firmware so the function doesn't do anything. KFD already handles its own queues and they should already be unmapped at this point so even if this runs, it's not doing anything. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:43 -05:00
Pratap Nirujogi	a67e75beff	drm/amdgpu: Replace DRM_ERROR() with drm_err() DRM_ERROR() is no longer preferred. Replace DRM_ERROR() usage with drm_err() in isp driver. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:36 -05:00
Alex Deucher	4d1b653571	drm/amdgpu/vcn: use dev_info() for firmware information To properly handle multiple GPUs. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	c51aa7923e	drm/amdgpu/vcn: optimize firmware storage If each instance uses the same fw image, only store one copy in the driver. Acked-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	31a37dfc8f	drm/amdgpu/vcn5.0.1: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	9b648fa54c	drm/amdgpu/vcn5.0.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:32 -05:00
Alex Deucher	4bb5879322	drm/amdgpu/vcn4.0.5: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	1ee6b2bff2	drm/amdgpu/vcn4.0.3: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	8bdfa5756b	drm/amdgpu/vcn4.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	38c0d9882a	drm/amdgpu/vcn3.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	bd32af6faa	drm/amdgpu/vcn2.5: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	3389dd059f	drm/amdgpu/vcn2.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	cac3dc89f2	drm/amdgpu/vcn1.0: use generic set_power_gating_state helper No need for an IP specific version. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	a2cf2a883c	drm/amdgpu/vcn: add a generic helper for set_power_gating_state It's common for all VCN variants. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	4ce4fe2720	drm/amdgpu/vcn: use per instance callbacks for idle work handler Use the vcn instance power gating callbacks rather than the IP powergating callback. This limits power gating to only the instance in use rather than all of the instances. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	592846e3fe	drm/amdgpu/vcn5.0.1: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	f2eb0a66ca	drm/amdgpu/vcn5.0.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	f9993efed7	drm/amdgpu/vcn4.0.5: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	39fb77a8d3	drm/amdgpu/vcn4.0.3: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:31 -05:00
Alex Deucher	8b18f03142	drm/amdgpu/vcn4.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	bda37b68f6	drm/amdgpu/vcn3.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	307ce8bdc6	drm/amdgpu/vcn2.5: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	40c6d55806	drm/amdgpu/vcn2.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	c5ed3655cd	drm/amdgpu/vcn1.0: add set_pg_state callback Rework the code as a vcn instance callback. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	55945f08d9	drm/amdgpu/vcn: add new per instance callback for powergating This is per instance so add a new function pointer for it. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	64303b72de	drm/amdgpu/vcn: adjust pause_dpg_mode function signature Change it to take a vcn instance rather than adev to align with the vcn instance changes. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	0a3fb7338f	drm/amdgpu/vcn5.0.1: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	e3eb71cd69	drm/amdgpu/vcn5.0.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	c07c0c0df9	drm/amdgpu/vcn4.0.5: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:30 -05:00
Alex Deucher	4a23b9c670	drm/amdgpu/vcn4.0.3: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	259873561f	drm/amdgpu/vcn4.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	f1ab687040	drm/amdgpu/vcn2.5: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	38a404f8af	drm/amdgpu/vcn2.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	201fee333d	drm/amdgpu/vcn1.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	710151263c	drm/amdgpu/vcn3.0: convert internal functions to use vcn_inst Pass the vcn instance structure to these functions rather than adev and the instance number. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	f98675638f	drm/amdgpu/vcn: switch vcn helpers to be instance based Pass the instance to the helpers. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	cb10727168	drm/amdgpu/vcn: move more instanced data to vcn_instance Move more per instance data into the per instance structure. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) v3: fix typo on vcn 2.5 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> (v2) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	9bf9442051	drm/amdgpu/vcn: make powergating status per instance Store it per instance so we can track it per instance. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	bee48570cf	drm/amdgpu/vcn: switch work handler to be per instance Have a separate work handler for each VCN instance. This paves the way for per instance VCN power gating at runtime. v2: index instances directly on vcn1.0 and 2.0 to make it clear that they only support a single instance (Lijo) Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:29 -05:00
Alex Deucher	94629182f3	drm/amdgpu/vcn5.0.1: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	0797c54502	drm/amdgpu/vcn5.0.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	ecc9ab4e92	drm/amdgpu/vcn4.0.5: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	5826d5a5d5	drm/amdgpu/vcn4.0.3: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	f4cd7a85db	drm/amdgpu/vcn4.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	d39f1bb577	drm/amdgpu/vcn3.0: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. v2: squash in fix for stop() from Boyuan Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:28 -05:00
Alex Deucher	dae8700198	drm/amdgpu/vcn2.5: fix VCN stop logic Need to make sure we call amdgpu_dpm_enable_vcn() in vcn_v2_5_stop() at the end if there are errors or DPG is enabled. Fixes: `ebc25499de` ("drm/amdgpu/vcn2.5: split code along instances") Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Suggested-by: Boyuan Zhang <boyuan.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-27 15:52:11 -05:00
Dave Airlie	425b848175	amd-drm-next-6.15-2025-02-21: amdgpu: - Add OEM i2c support for RGB lights, etc. - Add support for GC 11.5.3 - Add support for GC 11.5.2 - Add support for SDMA 6.1.3 - Add support for NBIO 7.11.2 - Add support for NBIO 7.9.1 - Add support for MMHUB 3.3.2 - Add support for MMHUB 1.8.1 - Add support for SMU 14.0.5 - Add support for SMUIO 13.0.11 - Add support for PSP 14.0.5 - Add support for UMC 12.5.0 - Add support for DCN 3.6.0 - JPEG 4.0.3 updates - Add dynamic workload profile switching for GC 10-12 - support larger vbios sizes - GC 9.5.0 updates - SMU 13.0.12 updates - SMU 13.0.6 updates - IP discovery updates - GC 10 queue reset updates - DCN 4.0.1 updates - UHBR link rate fixes - Aborted suspend fix - Mark gttsize parameter as deprecated - GC 10 cleaner shader updates - PSR-SU fixes - Clean up PM4 headers - Cursor fixes - Enable devcoredump for JPEG - Misc cleanups - Runpm cleanups - MES updates - GC 9 gfxoff fixes - Vbios fetching cleanups - Documentation updates - Update secondary plane handling - DML2 updates - SDMA fixes for MI - Cleaner shader fixes for GC 11/12 - ACA updates - Initial JPEG queue reset support - RAS updates - Initial RAS CPER support - DCN/DCE panic screen handling cleanup - BT2020 fixes - SR-IOV fixes amdkfd: - synchronize pasid values between KGD and KFD - Misc cleanups - Improve GTT/VRAM handling for APUs - Topology updates - Fix user queue validation on GC 7/8 UAPI: - Enable "Broadcast RGB" drm property - Add INFO IOCTL query for virtualization mode Proposed userspace: `e663bed7d6` -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ7jwVwAKCRC93/aFa7yZ 2EBmAP42gCX0ESuLJlwLTrobhs9mALp5g6AClLTSBuxNcgfOJwEAvqq12R2db8F9 hFF3rq07C8YaL3bMIi3IhqIXVsnUBgo= =uvFs -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.15-2025-02-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.15-2025-02-21: amdgpu: - Add OEM i2c support for RGB lights, etc. - Add support for GC 11.5.3 - Add support for GC 11.5.2 - Add support for SDMA 6.1.3 - Add support for NBIO 7.11.2 - Add support for NBIO 7.9.1 - Add support for MMHUB 3.3.2 - Add support for MMHUB 1.8.1 - Add support for SMU 14.0.5 - Add support for SMUIO 13.0.11 - Add support for PSP 14.0.5 - Add support for UMC 12.5.0 - Add support for DCN 3.6.0 - JPEG 4.0.3 updates - Add dynamic workload profile switching for GC 10-12 - support larger vbios sizes - GC 9.5.0 updates - SMU 13.0.12 updates - SMU 13.0.6 updates - IP discovery updates - GC 10 queue reset updates - DCN 4.0.1 updates - UHBR link rate fixes - Aborted suspend fix - Mark gttsize parameter as deprecated - GC 10 cleaner shader updates - PSR-SU fixes - Clean up PM4 headers - Cursor fixes - Enable devcoredump for JPEG - Misc cleanups - Runpm cleanups - MES updates - GC 9 gfxoff fixes - Vbios fetching cleanups - Documentation updates - Update secondary plane handling - DML2 updates - SDMA fixes for MI - Cleaner shader fixes for GC 11/12 - ACA updates - Initial JPEG queue reset support - RAS updates - Initial RAS CPER support - DCN/DCE panic screen handling cleanup - BT2020 fixes - SR-IOV fixes amdkfd: - synchronize pasid values between KGD and KFD - Misc cleanups - Improve GTT/VRAM handling for APUs - Topology updates - Fix user queue validation on GC 7/8 UAPI: - Enable "Broadcast RGB" drm property - Add INFO IOCTL query for virtualization mode Proposed userspace: `e663bed7d6` From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250221213651.4176031-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-02-26 16:41:44 +10:00
Pierre-Eric Pelloux-Prayer	d3c7059b6a	drm/amdgpu: init return value in amdgpu_ttm_clear_buffer Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7c62aacc3b`) Cc: stable@vger.kernel.org	2025-02-25 12:29:12 -05:00
Alex Deucher	748a1f51bb	drm/amdgpu/mes: keep enforce isolation up to date Re-send the mes message on resume to make sure the mes state is up to date. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `27b7915147`)	2025-02-25 12:23:48 -05:00
Alex Deucher	e7ea88207c	drm/amdgpu/gfx: only call mes for enforce isolation if supported This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit `80513e3897`)	2025-02-25 12:22:56 -05:00
Alex Deucher	099bffc7ca	drm/amdgpu: disable BAR resize on Dell G5 SE There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: `907830b0fc` ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5235053f44`) Cc: stable@vger.kernel.org	2025-02-25 12:20:27 -05:00
Tao Zhou	dab993bf15	drm/amdgpu: increase AMDGPU_MAX_RINGS Increase it since a cper ring is introduced. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:13 -05:00
Srinivasan Shanmugam	59f9c2c9f6	drm/amdgpu: Fix correct parameter desc for VCN idle check functions Fixes the kdoc for the following VCN idle check functions by updating the parameter description from 'handle' to 'ip_block': - vcn_v4_0_is_idle - vcn_v4_0_3_is_idle - vcn_v4_0_5_is_idle - vcn_v5_0_1_is_idle Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:13 -05:00
Pierre-Eric Pelloux-Prayer	7c62aacc3b	drm/amdgpu: init return value in amdgpu_ttm_clear_buffer Otherwise an uninitialized value can be returned if amdgpu_res_cleared returns true for all regions. Possibly closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3812 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	a8f921a10a	drm/amdgpu: Change page/record number calculation based on nps save only one record to save eeprom space,and bad_page_num = pa_rec_num + mca_rec_num*16 Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	0153d27673	drm/amdgpu: Refine bad page adding bad page adding can be simpler with nps info Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Jesse.zhang@amd.com	e4e6ae41cc	drm/amdgpu: update SDMA sysfs reset mask in late_init - Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask. - update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization and capability setup are completed before checking the SDMA reset capability. - For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset. - Add a TODO comment for future support of per-queue reset for IP version 9.5.0. This change ensures that per-queue reset is only enabled when the MEC and PMFW support it. v2: fix ip version (9.5.4 -> 9.5.0)(Lijo) Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Xiang Liu	ff930483af	drm/amdgpu: Set CPER enabled flag after ring initiailized Setting cper.enabled to be true only after cper ring is successfully created. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
ganglxie	f2510355fb	drm/amdgpu: Save nps to eeprom nps info saved together with bad page makes bad page parsing more efficient Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Xiang Liu	ce615fe328	drm/amdgpu: Check if CPER enabled when generating CPER In the case of CPER disabled, generating CPER will cause kernel NULL pointer dereference without checking. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Jonathan Kim	9424a5bf08	drm/amdgpu: simplify xgmi peer info calls Deprecate KFD XGMI peer info calls in favour of calling directly from simplified XGMI peer info functions. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:45:12 -05:00
Dr. David Alan Gilbert	9d8af72fe7	drm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcs The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in commit `894c6d3522` ("drm/amdgpu: Add nbif v6_3_1 ip block support") but has remained unused. Alex has confirmed it wasn't needed. Remove it, together with the four unused stub functions: nbif_v6_3_1_sriov_ih_doorbell_range nbif_v6_3_1_sriov_gc_doorbell_init nbif_v6_3_1_sriov_vcn_doorbell_range nbif_v6_3_1_sriov_sdma_doorbell_range Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:01 -05:00
Sathishkumar S	62431979dd	drm/amdgpu: Add ring reset callback for JPEG5_0_1 Add ring reset function callback for JPEG5_0_1 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
André Almeida	b7fd6528b5	drm/amdgpu: Log after a successful ring reset When a ring reset happens, the kernel log shows only "amdgpu: Starting <ring name> ring reset", but when it finishes nothing appears in the log. Explicitly write in the log that the reset has finished correctly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
André Almeida	28d05f0836	drm/amdgpu: Log the creation of a coredump file After a GPU reset happens, the driver creates a coredump file. However, the user might not be aware of it. Log the file creation the user can find more information about the device and add the file to bug reports. This is similar to what the xe driver does. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Alex Deucher	27b7915147	drm/amdgpu/mes: keep enforce isolation up to date Re-send the mes message on resume to make sure the mes state is up to date. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Sathishkumar S	9b71be8785	drm/amdgpu: Add core reset registers for JPEG5_0_1 Add core reset control register definitions and align all prior register definitions to end at 100 column length for uniformity. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Sathishkumar S	da120ed561	drm/amdgpu: Per-instance init func for JPEG5_0_1 Add helper functions to handle per-instance and per-core initialization and deinitialization in JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Alex Deucher	5235053f44	drm/amdgpu: disable BAR resize on Dell G5 SE There was a quirk added to add a workaround for a Sapphire RX 5600 XT Pulse that didn't allow BAR resizing. However, the quirk caused a regression with runtime pm on Dell laptops using those chips, rather than narrowing the scope of the resizing quirk, add a quirk to prevent amdgpu from resizing the BAR on those Dell platforms unless runtime pm is disabled. v2: update commit message, add runpm check Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: `907830b0fc` ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Asad Kamal	25907304cf	drm/amd/pm: Fetch fru product info for smu_v13_0_12 Fetch fru product info for smu_v13_0_12 from static metrics table v2: Field by field copy for fru info(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:44:00 -05:00
Jesse.zhang@amd.com	c94943b086	drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty This patch updates the `amdgpu_job_timedout` function to check if the ring is actually guilty of causing the timeout. If not, it skips error handling and fence completion. v2: move the is_guilty check down into the queue reset area (Alex) v3: need to call is_guilty before reset (Alex) v4: squash in is_guilty logic fixes (Alex) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	8225254492	drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring This patch adds a reset function pointer to the SDMA v4.4.2 page ring functionality. The new function pointer `reset` is set to `sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue. Changes: - Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	fdbfaaaae0	drm/amdgpu: Improve SDMA reset logic with guilty queue tracking This patch includes the remaining improvements to the SDMA reset logic: - Added `gfx_guilty` and `page_guilty` flags to track guilty queues. - Updated the reset and resume functions to handle the guilty state. - Cached the `rptr` before reset. v2: 1.replace the caller with a guilty bool. If the queue is the guilty one, set the rptr and wptr to the saved wptr value, else, set the rptr and wptr to the saved rptr value. (Alex) 2. cache the rptr before the reset. (Alex) v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	0ad649321a	drm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE rings This patch introduces the `is_guilty` callbacks for the GFX and PAGE rings. These callbacks check if a ring is guilty of causing a timeout or error. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	4d3c4f4f7f	drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring This patch introduces the following changes: - Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset. - Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout. Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Jesse.zhang@amd.com	4c02f73016	drm/amdgpu: Introduce conditional user queue suspension for SDMA resets - Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter. - This parameter allows the function to conditionally suspend and resume user queues during SDMA resets. - Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks. - Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted. This change improves synchronization between the KGD and the KFD during SDMA resets, ensuring proper handling of user queues and avoiding race conditions. V2: replace the ring_lock with the existed the scheduler locks for the queues (ring->sched) on the sdma engine.(Alex) v3: call drm_sched_wqueue_stop() rather than job_list_lock. If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout, skip resetting that ring and call drm_sched_wqueue_stop() for the other rings (Alex) replace the common lock (sdma_reset_lock) with DQM lock to to resolve reset races between the two driver sections during KFD eviction.(Jon) Rename the caller to Reset_src and Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon) v4: restart the wqueue if the reset was successful, or fall back to a full adapter reset. (Alex) move definition of reset source to enumeration AMDGPU_RESET_SRCS, and check reset src in amdgpu_sdma_reset_instance (Jon) v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS conditions only (Jon) v6: replace the paramter src with a bool suspend_user_queues, remove the paramter src in pre/post func. (Jon) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Lijo Lazar	0ca5751560	drm/amdgpu: Remove redundant logic in GC v9.4.3 GFXOFF check is not needed for GC v9.4.3. Also, save/restore list is available by default. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:59 -05:00
Sathishkumar S	793ee232ee	drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3 Update power gate setting to not poweroff UVDJ in JPEG4_0_3. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com	d6e6ea5efb	drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver to improve modularity and support shared usage between AMDGPU and KFD. The changes include: 1. Refactored SDMA Reset Logic: - Split the `sdma_v4_4_2_reset_queue` function into two separate functions: - `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset. - `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset. - These functions are now used as callbacks for the shared reset mechanism. 2. Added Callback Support: - Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and restore callbacks. - Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the shared reset mechanism using `amdgpu_set_on_reset_callbacks`. 3. Fixed Reset Queue Function: - Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue` function, ensuring consistency across the driver. This patch ensures that SDMA reset functionality is more modular, reusable, and aligned with the shared reset mechanism between AMDGPU and KFD. v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs. Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex) Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Jesse.zhang@amd.com	f33044952c	drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support This patch introduces shared SDMA reset functionality between AMDGPU and KFD. The implementation includes the following key changes: 1. Added `amdgpu_sdma_reset_queue`: - Resets a specific SDMA queue by instance ID. - Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU to save/restore their state during the reset process. 2. Added `amdgpu_set_on_reset_callbacks`: - Allows KFD and AMDGPU to register callback functions for pre-reset and post-reset operations. - Callbacks are stored in a global linked list and invoked in the correct order during SDMA reset. This patch ensures that both AMDGPU and KFD can handle SDMA reset events gracefully, with proper state saving and restoration. It also provides a flexible callback mechanism for future extensions. v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex) v3: rename the `amdgpu_register_on_reset_callbacks` function to `amdgpu_sdma_register_on_reset_callbacks` move global reset_callback_list to struct amdgpu_sdma (Alex) v4: Update the reset callback function description and rename the reset function to amdgpu_sdma_reset_engine (Alex) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Likun Gao	71209c9663	drm/amdgpu: correct the name of mes_pipe structure Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Sunil Khatri	7dc3405403	drm/amdgpu: update the handle ptr in is_idle Update the *handle to amdgpu_ip_block ptr for all functions pointers of is_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-25 11:43:58 -05:00
Thomas Zimmermann	130377304e	Merge drm/drm-next into drm-misc-next Backmerging to get fixes from v6.14-rc4. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-02-25 11:43:10 +01:00
Dave Airlie	fb51bf0255	Linux 6.14-rc4 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAme7hfkeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGU+IH/1bk6zIvAwXXS5yu KNsQ8dEkC3Xme6HqLtPsAhRLF+5YJf6MaGm1ip5dDMyIvasa2gwvCQQQoOpeMbKj 79VKT+m9t3szMHZaQYjlOuYHBmNSJ4cMCD2Qh6ktXHGPfTTWDFGf7fBwBOkVNeJU 1Ask+bxeop21aJMhfYXrUta3OYyerLBUR6jCiCM82A/GLtdv6oNGXBu3ygDt9Tjx ZHSl+CYjKpmGUP8JnMKwCBHVguEfqgzZ//dY1H16AvOLed9k2jkMFn8O5Vi3vjnx TWMMXoiJimuamGzbjxtCCqzxNlFFDT4gRpDqeJxb16W/gDTFmbRr9LDjNehCZe33 AigLZ6M= =Y/7F -----END PGP SIGNATURE----- Merge tag 'v6.14-rc4' into drm-next Backmerge Linux 6.14-rc4 at the request of tzimmermann so misc-next can base on rc4. Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-02-25 17:36:09 +10:00
Tvrtko Ursulin	80b6ef8ae2	drm/amdgpu: Pop jobs from the queue more robustly Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly added drm_sched_entity_queue_pop. This allows breaking the hidden dependency that queue_node has to be the first element in struct drm_sched_job. A comment is also added with a reference to the mailing list discussion explaining the copied helper will be removed when the whole broken amdgpu_job_stop_all_jobs_on_sched is removed. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Cc: Zhang, Hawking <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-3-tvrtko.ursulin@igalia.com	2025-02-24 10:17:40 +01:00
Christian König	cb0de06d1b	drm/amdgpu: remove all KFD fences from the BO on release Remove all KFD BOs from the private dma_resv object. This prevents the KFD from being evict unecessarily when an exported BO is released. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-21 10:41:49 -05:00
Thomas Weißschuh	2d0f5001b6	drm/amdgpu: Constify 'struct bin_attribute' The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241216-sysfs-const-bin_attr-drm-v1-4-210f2b36b9bf@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-21 09:20:31 +01:00
Sunil Khatri	3521276ad1	drm/amdgpu: update the handle ptr in get_clockgating_state Update the *handle to amdgpu_ip_block ptr for all functions pointers of get_clockgating_state. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:19:05 -05:00
Xiang Liu	2f94469cc0	drm/amdgpu: Remove redundant check of adev There is no need to check adev for sure. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:37 -05:00
Xiang Liu	663a87763b	drm/amdgpu: Check aca enabled inside cper init/fini func Move code about checking aca enabled to the cper init/fini function to make code clean. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:33 -05:00
Lijo Lazar	30eb41f5d1	drm/amdgpu: Use firmware supported NPS modes If firmware supported NPS modes are available through CAP register, use those values for supported NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:29 -05:00
Sathishkumar S	c4c3808feb	drm/amdgpu: Add ring reset callback for JPEG4_0_3 Add ring reset function callback for JPEG4_0_3 to recover from job timeouts without a full gpu reset. V2: - sched->ready flag shouldn't be modified by HW backend (Christian) V3: - Dont modifying sched/job-submission state from HW backend (Christian) - Implement per-core reset sequence V4: - Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo) Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:16:11 -05:00
Srinivasan Shanmugam	dc0297f319	drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.017057] amdgpu_pci_probe+0x1c2/0x660 [amdgpu] [ 253.017493] local_pci_probe+0x4b/0xb0 [ 253.017746] work_for_cpu_fn+0x1a/0x30 [ 253.017995] process_one_work+0x21e/0x680 [ 253.018248] worker_thread+0x190/0x330 [ 253.018500] ? __pfx_worker_thread+0x10/0x10 [ 253.018746] kthread+0xe7/0x120 [ 253.018988] ? __pfx_kthread+0x10/0x10 [ 253.019231] ret_from_fork+0x3c/0x60 [ 253.019468] ? __pfx_kthread+0x10/0x10 [ 253.019701] ret_from_fork_asm+0x1a/0x30 [ 253.019939] </TASK> v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian). Fixes: `e864180ee4` ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-19 15:14:38 -05:00
Nam Cao	690d59fee8	drm/amdgpu: Switch to use hrtimer_setup() hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Zack Rusin <zack.rusin@broadcom.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/all/2e3664eebb00d3a8c786ee7cc1fba8096bababc9.1738746904.git.namcao@linutronix.de	2025-02-18 11:19:05 +01:00
Thomas Zimmermann	8bd1a8e757	Merge drm/drm-next into drm-misc-next Backmerging to get bugfixes from v6.14-rc2. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-02-18 07:43:43 +01:00
Xiang Liu	f9d35b945c	drm/amdgpu: Generate bad page threshold cper records Generate CPER record when bad page threshold exceed and commit to CPER ring. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Xiang Liu	4058e7cbfd	drm/amdgpu: Commit CPER entry Commit the CPER entry to the ring buffer. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	8652920d2c	drm/amdgpu: add mutex lock for cper ring Avoid the confliction between read and write of ring buffer. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	a6d9d19290	drm/amdgpu: add data write function for CPER ring Old CPER data will be overwritten if ring buffer is full, and read pointer always points to CPER header. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:30 -05:00
Tao Zhou	5a14282429	drm/amdgpu: read CPER ring via debugfs We read CPER data from read pointer to write pointer without changing the pointers. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Tao Zhou	4d614ce8ff	drm/amdgpu: add RAS CPER ring buffer And initialize it, this is a pure software ring to store RAS CPER data. v2: change ring size to 0x100000 v2: update the initialization of count_dw of cper ring, it's dword variable v3: skip VM inv eng for cper v3: init/fini when aca enabled Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Xiang Liu	b3060f5bea	drm/amdgpu: Get timestamp from system time Get system local time and encode it to timestamp for CPER. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Alex Deucher	f3e10e1a0c	drm/amdgpu/mes12: allocate hw_resource_1 buffer once Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Alex Deucher	13d68ae651	drm/amdgpu/mes11: allocate hw_resource_1 buffer once Allocate the buffer at sw init time so we don't alloc and free it for every suspend/resume or reset cycle. Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	652e090230	drm/amdgpu: Generate cper records Encode the error information in CPER format and commit to the cper ring Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	ad97840f95	drm/amdgpu: Introduce funcs for generating cper record Introduce new functions that are used to generate cper ue or ce records. v2: return -ENOMEM instead of false v2: check return value of fill section function Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	56316ee91b	drm/amdgpu: Include ACA error type in aca bank ACA error types managed by driver a direct 1:1 correspondence with those managed by firmware. To address this, for each ACA bank, include both the ACA error type and the ACA SMU type. This addition is useful for creating CPER records. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <keivnyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Candice Li	76b1f8b32d	drm/amdgpu: Optimize the enablement of GECC Enable GECC only when the default memory ECC mode or the module parameter amdgpu_ras_enable is activated. v2: Add kernel message to remind users explicitly set amdgpu_ras_enable=1 before driver loading to enable GECC and set amdgpu_ras_enable=0 to disable GECC when GECC is currently enabled if needed. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Hawking Zhang	92d5d2a09d	drm/amdgpu: Introduce funcs for populating CPER Introduce utility functions designed to assist in populating CPER records. v2: call cper_init/fini in device_ip_init/fini. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:29 -05:00
Srinivasan Shanmugam	2012aff981	drm/amdgpu: Rename VCN clock gating function for consistency Change the function name from vcn_v2_5_enable_clock_gating_inst to vcn_v2_5_enable_clock_gating to ensure consistency in naming. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c:781: warning: expecting prototype for vcn_v2_5_enable_clock_gating_inst(). Prototype was for vcn_v2_5_enable_clock_gating() instead Cc: Leo Liu <leo.liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:28 -05:00
Alex Deucher	eda80f1c2a	drm/amdgpu/vcn4.0.3: drop dpm power helpers VCN 4.0.3 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:25 -05:00
Alex Deucher	7780239809	drm/amdgpu/vcn5.0.1: drop dpm power helpers VCN 5.0.1 doesn't support powergating so there is no need to call these. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:09:18 -05:00
Alex Deucher	0487f50310	drm/amdgpu/vcn5.0.1: use correct dpm helper The VCN and UVD helpers were split in commit `ff69bba05f` ("drm/amd/pm: add inst to dpm_set_powergating_by_smu") However, this happened in parallel to the vcn 5.0.1 development so it was missed there. Fixes: `346492f30c` ("drm/amdgpu: Add VCN_5_0_1 support") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sonny Jiang <sonjiang@amd.com> Cc: Boyuan Zhang <boyuan.zhang@amd.com>	2025-02-17 14:09:11 -05:00
Alex Deucher	56763be400	drm/amdgpu/umsch: tidy up the ucode name string handling Make the constant parts of the name part of the string we pass to amdgpu_ucode_request(). Only the version number varies from IP to IP. Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:09:02 -05:00
Alex Deucher	c917e39cbd	drm/amdgpu/umsch: fix ucode check Return an error if the IP version doesn't match otherwise we end up passing a NULL string to amdgpu_ucode_request. We should never hit this in practice today since we only enable the umsch code on the supported IP versions, but add a check to be safe. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502130406.iWQ0eBug-lkp@intel.com/ Fixes: `020620424b` ("drm/amd: Use a constant format string for amdgpu_ucode_request") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:08:53 -05:00
Amber Lin	5183e69090	drm/amdgpu: Remove extra checks for CPX As far as the number of XCCs, the number of compute partitions, and the number of memory partitions qualify, CPX is valid. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:49 -05:00
Alex Deucher	fe652becdb	drm/amdgpu/umsch: declare umsch firmware Needed to be properly picked up for the initrd, etc. Fixes: `3488c79bea` ("drm/amdgpu: add initial support for UMSCH") Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lang Yu <Lang.Yu@amd.com>	2025-02-17 14:08:37 -05:00
Alex Deucher	80513e3897	drm/amdgpu/gfx: only call mes for enforce isolation if supported This should not be called on chips without MES so check if MES is enabled and if the cleaner shader is supported. Fixes: `8521e3c5f0` ("drm/amd/amdgpu: limit single process inside MES") Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Shaoyun Liu <shaoyun.liu@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	500c04d2a7	drm/amdgpu: Add ring reset callback for JPEG2_0_0 Add ring reset function callback for JPEG2_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	09e24a0b52	drm/amdgpu: Add ring reset callback for JPEG2_5_0 Add ring reset function callback for JPEG2_5_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	cb493aee4d	drm/amdgpu: Per-instance init func for JPEG2_5_0 Add helper functions to handle per-instance initialization and deinitialization in JPEG2_5_0. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:28 -05:00
Sathishkumar S	03399d0bff	drm/amdgpu: Add ring reset callback for JPEG3_0_0 Add ring reset function callback for JPEG3_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Sathishkumar S	74894ffc7d	drm/amdgpu: Add ring reset callback for JPEG4_0_0 Add ring reset function callback for JPEG4_0_0 to recover from job timeouts without a full gpu reset. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Sathishkumar S	abce7b4fc7	drm/amdgpu: Per-instance init func for JPEG4_0_3 Add helper functions to handle per-instance and per-core initialization and deinitialization in JPEG4_0_3. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Saleemkhan Jamadar	b0bebbe4ea	drm/amdgpu/umsch: remove vpe test from umsch current test is more intrusive for user queue test Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Suggested-by: Christian Koenig <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Candice Li	59af05d6a3	drm/amdgpu: Enable ACA by default for psp v13_0_12 Enable ACA by default for psp v13_0_12. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-17 14:08:27 -05:00
Dave Airlie	0ed1356af8	drm-misc-next for v6.15: UAPI Changes: fourcc: - Add modifiers for MediaTek tiled formats Cross-subsystem Changes: bus: - mhi: Enable image transfer via BHIe in PBL dma-buf: - Add fast-path for single-fence merging Core Changes: atomic helper: - Allow full modeset on connector changes - Clarify semantics of allow_modeset - Clarify semantics of drm_atomic_helper_check() buddy allocator: - Fix multi-root cleanup ci: - Update IGT display: - dp: Support Extendeds Wake Timeout - dp_mst: Fix RAD-to-string conversion panic: - Encode QR code according to Fido 2.2 probe helper: - Cleanups scheduler: - Cleanups ttm: - Refactor pool-allocation code - Cleanups Driver Changes: amdxdma: - Fix error handling - Cleanups ast: - Refactor detection of transmitter chips - Refactor support of VBIOS display-mode handling - astdp: Fix connection status; Filter unsupported display modes bridge: - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - sn65dsi86: Fix device IDs - Cleanups i915: - Enable Extendeds Wake Timeout imagination: - Check job dependencies with DRM-sched helper ivpu: - Improve command-queue handling - Use workqueue for IRQ handling - Add suport for HW fault injection - Locking fixes - Cleanups mgag200: - Add support for G200eH5 chips msm: - dpu: Add concurrent writeback support for DPU 10.x+ nouveau: - Move drm_slave_encoder interface into driver - nvkm: Refactor GSP RPC omapdrm: - Cleanups panel: - Convert several panels to multi-style functions to improve error handling - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Cleanups qaic: - Add support for AIC200 - Cleanups renesas: - Fix limits in DT bindings rockchip: - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - Cleanups vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 xlnx: - Set correct DMA segment size - Fix error handling - Fix docs -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmesYp4ACgkQaA3BHVML eiP0agf/ZsUXdVdJ3HMI3vAMbiUnW8ThQZx0M3Zkpf3EOCKgdibbjNkhgTzsp3b/ GqcP/pmqwmjCUgVADVDVJvRTD1rSadcxCYYTwI9fsjnXWmib9l4sjI3jPRJkIekj a5eeCFnMEKzfaxhFPJ/nu+esMZ19ZRE1iSO7WC7YC0SSR6zYP/QaNkGti8QA5TpL oBHd4jctoxXw9sw/ACEwzsSRdxL/opXQ5KbFJFmr2Q9/osaTUX7QtslDhnh+2lVK ZGxGdFIvFYKB+QrXpfjJU51Q01gqp/PWS+YIDrpr7bLC5YeFQVe3FDjqYIXSoE2t dFYtU2mSS/rwijcRc64sghXCdbKS1w== =iJGz -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-02-12' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.15: UAPI Changes: fourcc: - Add modifiers for MediaTek tiled formats Cross-subsystem Changes: bus: - mhi: Enable image transfer via BHIe in PBL dma-buf: - Add fast-path for single-fence merging Core Changes: atomic helper: - Allow full modeset on connector changes - Clarify semantics of allow_modeset - Clarify semantics of drm_atomic_helper_check() buddy allocator: - Fix multi-root cleanup ci: - Update IGT display: - dp: Support Extendeds Wake Timeout - dp_mst: Fix RAD-to-string conversion panic: - Encode QR code according to Fido 2.2 probe helper: - Cleanups scheduler: - Cleanups ttm: - Refactor pool-allocation code - Cleanups Driver Changes: amdxdma: - Fix error handling - Cleanups ast: - Refactor detection of transmitter chips - Refactor support of VBIOS display-mode handling - astdp: Fix connection status; Filter unsupported display modes bridge: - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - sn65dsi86: Fix device IDs - Cleanups i915: - Enable Extendeds Wake Timeout imagination: - Check job dependencies with DRM-sched helper ivpu: - Improve command-queue handling - Use workqueue for IRQ handling - Add suport for HW fault injection - Locking fixes - Cleanups mgag200: - Add support for G200eH5 chips msm: - dpu: Add concurrent writeback support for DPU 10.x+ nouveau: - Move drm_slave_encoder interface into driver - nvkm: Refactor GSP RPC omapdrm: - Cleanups panel: - Convert several panels to multi-style functions to improve error handling - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Cleanups qaic: - Add support for AIC200 - Cleanups renesas: - Fix limits in DT bindings rockchip: - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - Cleanups vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 xlnx: - Set correct DMA segment size - Fix error handling - Fix docs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250212090625.GA24865@linux.fritz.box	2025-02-14 10:24:02 +10:00
André Almeida	6fe52b63f5	drm/amdgpu: Use device wedged event Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250204070528.1919158-6-raag.jadav@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-02-13 12:15:44 -05:00
Alex Deucher	7845438718	drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX12 This commit introduces enhancements to the handling of the cleaner shader fence in the AMDGPU MES driver: - The MES (Microcode Execution Scheduler) now sends a PM4 packet to the KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring that requests are handled in a controlled manner and avoiding the race conditions. - The CP (Compute Processor) firmware has been updated to use a private bus for accessing specific registers, avoiding unnecessary operations that could lead to issues in VF (Virtual Function) mode. - The cleaner shader fence memory address is now set correctly in the `mes_set_hw_res_pkt` structure, allowing for proper synchronization of the cleaner shader execution. Cc: Christian König <christian.koenig@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:06:37 -05:00
Xiaogang Chen	10e08943ca	drm/amdkfd: Fix pasid value leak Curret kfd does not allocate pasid values, instead uses pasid value for each vm from graphic driver. So should not prevent graphic driver from releasing pasid values since the values are allocated by graphic driver, not kfd driver anymore. This patch does not stop graphic driver release pasid values. Fixes: `8544374c0f` ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver") Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:50 -05:00
Saleemkhan Jamadar	4b9a3117bb	drm/amdgpu/vcn: enable TMZ support for vcn 4_0_5 TMZ support is enabled for vcn on GC IP 11_5_0 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:50 -05:00
Srinivasan Shanmugam	be2560e4b8	drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX11 This commit introduces enhancements to the handling of the cleaner shader fence in the AMDGPU MES driver: - The MES (Microcode Execution Scheduler) now sends a PM4 packet to the KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring that requests are handled in a controlled manner and avoiding the race conditions. - The CP (Compute Processor) firmware has been updated to use a private bus for accessing specific registers, avoiding unnecessary operations that could lead to issues in VF (Virtual Function) mode. - The cleaner shader fence memory address is now set correctly in the `mes_set_hw_res_pkt` structure, allowing for proper synchronization of the cleaner shader execution. Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Ying Li	16a5a8fe6f	drm/amd/amdgpu: add support for IP version 11.5.2 This initializes drm/amd/amdgpu version 11.5.2 Signed-off-by: YING LI <yingli12@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Philip Yang	23b645231e	drm/amdgpu: Unlocked unmap only clear page table leaves SVM migration unmap pages from GPU and then update mapping to GPU to recover page fault. Currently unmap clears the PDE entry for range length >= huge page and free PTB bo, update mapping to alloc new PT bo. There is race bug that the freed entry bo maybe still on the pt_free list, reused when updating mapping and then freed, leave invalid PDE entry and cause GPU page fault. By setting the update to clear only one PDE entry or clear PTB, to avoid unmap to free PTE bo. This fixes the race bug and improve the unmap and map to GPU performance. Update mapping to huge page will still free the PTB bo. With this change, the vm->pt_freed list and work is not needed. Add WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the PTB. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Alex Deucher	1350dd3691	drm/amdgpu/mes11: fix set_hw_resources_1 calculation It's GPU page size not CPU page size. In most cases they are the same, but not always. This can lead to overallocation on systems with larger pages. Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Alex Deucher	ebc25499de	drm/amdgpu/vcn2.5: split code along instances Split the code on a per instance basis. This will allow us to use the per instance functions in the future to handle more things per instance. Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Harish Kasiviswanathan	3394b1f76d	drm/amdgpu: Set snoop bit for SDMA for MI series SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:05:49 -05:00
Tim Huang	76e0410fe0	drm/amdgpu: add discovery support for DCN IP version 3.6.0 Add discovery entry for DCN IP version 3.6.0. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:09 -05:00
Jiang Liu	e92f3f94ca	drm/amdgpu: reset psp->cmd to NULL after releasing the buffer Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Asad Kamal	aafe181f7d	drm/amdgpu: Add flags to distinguish vf/pf/pt mode Add extra flag definition for ids_flag field to distinguish between vf/pf/pt modes v2: Updated kms driver minor version & removed pf check as default is 0 v3: Fix up version (Alex) v4: rebase (Alex) Proposed userspace: `e663bed7d6` Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	759e764f7d	drm/amdkfd: use GTT for VRAM on APUs only if GTT is larger If the user has configured a large carveout on a small APU, only use GTT for VRAM allocations if GTT is larger than VRAM. v2: fix reversed check (Philip) Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	8b0d068e7d	drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	cc0e91a755	drm/amdgpu: Make VBIOS image read optional Keep VBIOS image read optional for select SOCs in passthrough mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	6e8ca38ebc	drm/amdgpu: Add flag to make VBIOS read optional Certain SOCs may not need much data from VBIOS. Some data like VBIOS version used will be missed but it doesn't affect functionality. Add a flag to make VBIOS image optional. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	7e0aa70681	drm/amdgpu: Add VBIOS flags Instead of read_bios, use get_bios_flags to get various options around reading VBIOS. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	e986e89659	drm/amdgpu: Add wrapper for freeing vbios memory Use bios_release wrapper to release memory allocated for vbios image and reset the variables. v2: Use the same wrapper for clean up in sw_fini (Alex Deucher) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	15f00b073c	drm/amdgpu/gfx9: use amdgpu_gfx_off_ctrl_immediate() for PG Use amdgpu_gfx_off_ctrl_immediate() when powergating. There's no need for the delay in gfx off allow. The powergating is dynamically disabled/enabled as for RV/PCO on compute queues and allowing gfx off again as soon the job is submitted improves power savings. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Alex Deucher	250d9769ee	drm/amdgpu/gfx: add amdgpu_gfx_off_ctrl_immediate() Same as amdgpu_gfx_off_ctrl(), but without the delay for gfxoff disallow. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Lijo Lazar	a5219b41dd	drm/amdgpu: Clean up atom header file inclusion atom bios header files are not required in these files. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Alex Deucher	abab978127	drm/amdgpu/sdma4: drop gfxoff calls in dump ip state SDMA 4.x is not part of the GFX power domain so this is not necessary. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Sathishkumar S	64dc2f0029	drm/amdgpu: Enable devcoredump for JPEG5_0_0 Add register list and enable devcoredump for JPEG5_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	8ecd4ec6a5	drm/amdgpu: Enable devcoredump for JPEG2_5_0 Add register list and enable devcoredump for JPEG2_5_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	63d5f8db53	drm/amdgpu: Enable devcoredump for JPEG2_0_0 Add register list and enable devcoredump for JPEG2_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	d949e91b42	drm/amdgpu: Enable devcoredump for JPEG3_0_0 Add register list and enable devcoredump for JPEG3_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	2b0ccf3923	drm/amdgpu: Enable devcoredump for JPEG4_0_5 Add register list and enable devcoredump for JPEG4_0_5 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	c3dddd6029	drm/amdgpu: Enable devcoredump for JPEG4_0_0 Add register list and enable devcoredump for JPEG4_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	358b3774a0	drm/amdgpu: Enable devcoredump for JPEG5_0_1 Add register list and enable devcoredump for JPEG5_0_1 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	08527cb534	drm/amdgpu: Enable devcoredump for JPEG4_0_3 Add register list and enable devcoredump for JPEG4_0_3 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	df996b5eff	drm/amdgpu: Add helper funcs for jpeg devcoredump Add devcoredump helper functions that can be reused for all jpeg versions. V2: (Lijo) - add amdgpu_jpeg_reg_dump_init() and amdgpu_jpeg_reg_dump_fini() - use reg_list and reg_count from init() to dump and print registers - memory allocation and freeing is moved to the init() and fini() V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Shiwu Zhang	b3dd2903b0	drm/amdgpu: Enable IFWI update support with PSPv13.0.12 Make psp_vbflash_status and psp_vbflash available in sysfs Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	5caea7a589	drm/amdgpu: Add support for smuio 13.0.11 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	37971df806	drm/amdgpu: Add support for nbio 7.9.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	a03f5f8d56	drm/amdgpu: Add support for smu 13.0.12 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	05fd502e04	drm/amdgpu: Add support for umc 12.5.0/mmhub 1.8.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	580dac7437	drm/amdgpu: Add a func for core specific reg offset Add an inline function to calculate core specific register offsets for JPEG v4.0.3 and reuse it, makes code more readable and easier to align. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	2f9a32b589	drm/amdgpu: Clean up IP version checks in gmcv9.0 Clean up some IP version checks in gmcv9.0 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a52e6cb06b	drm/amdgpu: Clean up GFX v9.4.3 IP version checks Remove unnecessary IP version checks for GFX 9.4.3 and similar variants. Wrap checks inside meaningful function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a01e934242	drm/amdgpu: Use version to figure out harvest info IP tables with version <=2 may use harvest bit. For version 3 and above, harvest bit is not applicable, instead uses harvest table. Fix the logic accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	31f9ed5882	drm/amdgpu: Pass IP instance/hwid as parameters Use IP instance number and hwid as function args for validation checks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	17585c07c2	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.1.1/10.1.2 GPUs Enable the cleaner shader for GFX10.1.1/10.1.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.1.1/10.1.2 GPUs, previously available for GFX10.1.10. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Alex Deucher	e818635a31	drm/amdgpu: update and cleanup PM4 headers Consolidate PM4 definitions. Most of these were previously only defined in UMDs. Add them here as well and sync with latest packets. Also no need to include soc15d.h on gfx10+. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Suggested-by: Saurabh Verma <saurabh.verma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	25961bad92	drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10 This commit adds the cleaner shader microcode for GFX10.1.0 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_10_1_0_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Victor Skvortsov	0489339776	drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks VFs are not able to query error counts for all RAS blocks. Rather than returning error for queries on these blocks, skip sysfs the creation all together. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Hawking Zhang	16b85a0942	drm/amdgpu: Update usage for bad page threshold The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Mario Limonciello	50e30e3a0e	drm/amd: Mark amdgpu.gttsize parameter as deprecated and show warnings on use When not set `gttsize` module parameter by default will get the value to use for the GTT pool from the TTM page limit, which is set by a separate module parameter. This inevitably leads to people not sure which one to set when they want more addressable memory for the GPU, and you'll end up seeing instructions online saying to set both. Add some messages to try to guide people both who are using or misusing the parameters and mark the parameter as deprecated with the plan to drop it after the next LTS kernel release. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jiang Liu	38e8ca3e4b	amdgpu/soc15: enable asic reset for dGPU in case of suspend abort When GPU suspend is aborted, do the same for dGPU as APU to reset soc15 asic. Otherwise it may cause following errors: [ 547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] ERROR KCQ enable failed [ 555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <gfx_v9_4_3> failed -110 [ 555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). [ 555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110 [ 555.126966] PM: Device 0000:0a:00.0 failed to resume async: error -110 This fix has been tested on Mi308X. Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Tested-by: Shuo Liu <shuox.liu@linux.alibaba.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/2462b4b12eb9d025e82525178d568cbaa4c223ff.1736739303.git.gerry@linux.alibaba.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jesse.zhang@amd.com	30f7f53a5b	drm/amdgpu/gfx10: implement gfx queue reset via MMIO Using mmio to do queue reset v2: Alignment the function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Jesse.zhang@amd.com	ffdd7a7b28	drm/amdgpu/gfx10: implement queue reset via MMIO Using mmio to do queue reset. v2: Alignment this function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Lijo Lazar	f7a594e405	drm/amdgpu: Use active umc info from discovery There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances. v2: fix warning (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Amber Lin	46d0436a3e	drm/amdgpu: Set noretry default for GC 9.5.0 Set GC 9.5.0 noretry default as 1 for better performance. It can be changed by the administrator using amdgpu.noretry=0 or by the user using HSA_XNACK=1 environment variable. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviwanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Le Ma	23cb207751	drm/amdgpu: read harvest info from harvest table for gfx950 Harvest table is applied for gfx950. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Shiwu Zhang	667b96134c	drm/amdgpu: enlarge the VBIOS binary size limit Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	5f95a15495	drm/amdgpu: add dynamic workload profile switching for gfx12 Enable dynamic workload profile switching for gfx12. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	963537ca23	drm/amdgpu: add dynamic workload profile switching for gfx11 Enable dynamic workload profile switching for gfx11. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	b9467983b7	drm/amdgpu: add dynamic workload profile switching for gfx10 Enable dynamic workload profile switching for gfx10. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	8fdb3958e3	drm/amdgpu/gfx: add ring helpers for setting workload profile Add helpers to switch the workload profile dynamically when commands are submitted. This allows us to switch to the FULLSCREEN3D or COMPUTE profile when work is submitted. Add a delayed work handler to delay switching out of the selected profile if additional work comes in. This works the same as the VIDEO profile for VCN. This lets dynamically enable workload profiles on the fly and then move back to the default when there is no work. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Xiaogang Chen	8544374c0f	drm/amdkfd: Have kfd driver use same PASID values from graphic driver Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	ca44922107	drm/amdgpu: Check RRMT status for JPEG v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	485380f7fe	drm/amdgpu: Check RRMT status for VCN v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e55565f880	drm/amdgpu: add support for PSP IP version 14.0.5 This initializes PSP IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e7704d7c72	drm/amdgpu: add support for SMU IP version 14.0.5 This initializes SMU IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6d437d5203	drm/amdgpu: enable VCN/JPEG CGPG for GC IP version 11.5.3 Enable VCN/JPEG CGPG for ASIC with GFX version 11.5.3. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6bde08d317	drm/amdgpu: add support for MMHUB IP version 3.3.2 This initializes MMHUB IP version 3.3.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e659c9eb87	drm/amdgpu: add support for NBIO IP version 7.11.2 This initializes NBIO IP version 7.11.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b2e5a04147	drm/amdgpu: add support for SDMA IP version 6.1.3 This initializes SDMA IP version 6.1.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b784faeba2	drm/amdgpu: add support for GC IP version 11.5.3 This initializes GC IP version 11.5.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Alex Deucher	20f48be63d	drm/amdgpu: add OEM i2c bus for polaris chips It uses the VGADCC bus. DC doesn't use this bus, so it is safe to add it here. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	1c0b144bf7	drm/amdgpu: rework i2c init and fini No functional change. Rework the code to allow for adding some additional i2c buses in conjunction with DC in the future. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	ba7f8eb7e4	drm/amdgpu/atombios: drop empty function This was leftover from when amdgpu was forked from radeon. The function is empty so drop it. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	b217105acb	drm/amd/display/dm: handle OEM i2c buses in i2c functions Allow the creation of an OEM i2c bus and use the proper DC helpers for that case. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Sathishkumar S	8064ca6e93	drm/amdgpu: increase amdgpu max rings limit increase max rings to 132 to support all JPEG5_0_1 cores, else ring_init fails due to ring count exceeding maximum limit. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Jiang Liu	a0a455b4bc	drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: `07dbfc6b10` ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 19:47:15 -05:00
Alex Deucher	55ed2b1b50	drm/amdgpu: bump version for RV/PCO compute fix Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:15 -05:00
Alex Deucher	b35eb9128e	drm/amdgpu/gfx9: manually control gfxoff for CS on RV When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:01 -05:00
Philipp Stanner	796a9f55a8	drm/sched: Use struct for drm_sched_init() params drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit `6f1cacf4eb` ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org	2025-02-12 11:59:52 +01:00
Maxime Ripard	93c7dd1b39	Merge drm/drm-next into drm-misc-next Bring rc1 to start the new release dev. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-02-06 13:47:32 +01:00
Marek Olšák	2255b40cac	drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-03 12:11:36 -05:00
Kenneth Feng	5cda56bd86	drm/amd/amdgpu: change the config of cgcg on gfx12 change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-01-28 16:22:39 -05:00
Shaoyun Liu	335acfb64e	drm/amd/amdgpu: Enable scratch data dump for mes 12 MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch data location during ucode start, driver side need to start the MES one by one with different setting for each pipe Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:13 -05:00
Mario Limonciello	7e4cb7dea2	drm/amd: Clarify kdoc for amdgpu.gttsize Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:08 -05:00
Srinivasan Shanmugam	dc915275ea	drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device adev, 6171 enum pci_bus_speed speed, 6172 enum pcie_link_width width) 6173 { 6174 struct pci_dev parent = adev->pdev; 6175 6176 if (!speed \|\| !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU / 6182 speed = pcie_get_speed_cap(parent); 6183 width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 / use the upstream/downstream switches internal to dGPU / 6187 speed = pcie_get_speed_cap(parent); 6188 width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 / use the device itself / --> 6193 speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: `757e8b951c` ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:26 -05:00
Lin.Cao	b529093999	drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment commit `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will cause emit switch buffer be skippen if job->vm set as NULL. Check job rather than vm could solve this problem. Fixes: `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:04 -05:00
Alex Deucher	64314e3f9c	drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL Combine the platform and GPU caps like we do for PCIe Gen. This aligns properly with expectations and documentation for the interface. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:30 -05:00
Alex Deucher	757e8b951c	drm/amdgpu: cache gpu pcie link width Get the PCIe link with of the device itself (or it's integrated upstream bridge) and cache that. v2: fix typo Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:24 -05:00
Lijo Lazar	a0db1ea0dd	drm/amdgpu: Refine ip detection log message 'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:58 -05:00
Lijo Lazar	b1df8050e7	drm/amdgpu: Add handler for SDMA context empty Context empty interrupt is enabled for SDMA 4.4.2. Add a handler for context empty interrupt so that it is disposed of fast, and not propagated to KFD layer. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:43 -05:00
Srinivasan Shanmugam	19b7f7c721	drm/amdgpu/gfx12: Add Cleaner Shader Support for GFX12.0 GPUs This commit enables the cleaner shader feature for GFX12.0 and GFX12.0.1 GPUs. The cleaner shader is important for clearing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs) between workloads. - This feature ensures that GPU resources are reset between workloads, preventing data leaks and ensuring accurate computation. By enabling the cleaner shader, this update enhances the security and reliability of GPU operations on GFX12.0 hardware. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kent Russell	f2935a3019	drm/amdgpu: Mark debug KFD module params as unsafe Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Gui Chengming	62952a38d9	drm/amdgpu: fix fw attestation for MP0_14_0_{2/3} FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	def59436fb	drm/amdgpu: always sync the GFX pipe on ctx switch That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	177b76a8d8	drm/amdgpu: mark a bunch of module parameters unsafe We sometimes have people trying to use debugging options in production environments. Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Tvrtko Ursulin	e996127ec1	drm/amdgpu: Use DRM scheduler API in amdgpu_xcp_release_sched Lets use the existing helper instead of peeking into the structure directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kenneth Feng	2affe2bbc9	drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Srinivasan Shanmugam	0b6b2dd383	drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] * DEADLOCK * [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: `afefd6f245` ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 10:38:33 -05:00
Dave Airlie	c3d590f8ba	amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ4FXWgAKCRC93/aFa7yZ 2MudAQCzmzUNF9W29JOcset09IcS24Xe5liXrJWzHIPaHhQ25QD/ZU4JHb1947/8 EnS3P7vraGPuCCet2aKmiWgtay7zggE= =5nDZ -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.14-2025-01-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110172731.2960668-1-alexander.deucher@amd.com	2025-01-13 11:13:13 +10:00
Victor Zhao	85b73415fd	drm/amdgpu: fill the ucode bo during psp resume for SRIOV refill the ucode bo during psp resume for SRIOV, otherwise ucode load will fail after VM hibernation and fb clean. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Srinivasan Shanmugam	9814626751	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs Enable the cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.3.2/10.3.4/10.3.5 GPUs, previously available for GFX10.3.0. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jonathan Kim	86bde64cb7	drm/amdgpu: fix gpu recovery disable with per queue reset Per queue reset should be bypassed when gpu recovery is disabled with module parameter. Fixes: `ee0a469cf9` ("drm/amdkfd: support per-queue reset on gfx9") Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jiang Liu	edec9b0690	drm/amdgpu: wrong array index to get ip block for PSP The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx, instead we should call amdgpu_device_ip_get_ip_block() to get the corresponding IP block oject. Fix some checkpatch issues (Alex) Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jiang Liu	60a2c0c12b	drm/amdgpu: tear down ttm range manager for doorbell in amdgpu_ttm_fini() Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(), to avoid memory leakage. Fixes: `792b84fb90` ("drm/amdgpu: initialize ttm for doorbells") Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	6b34d0328b	drm/amdgpu: fix incorrect number of active RBs for gfx12 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	4a60c55b3b	drm/amdgpu: fix incorrect active RB bitmap in setup RBs The RB bitmap width per SA may be 0x1 for some ASICs. Use the actual bitmap of SA instead of 0x3 to determine the active RB bitmap. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Dan Carpenter	6ec6cd9acb	drm/amdgpu: Fix shift type in amdgpu_debugfs_sdma_sched_mask_set() The "mask" and "val" variables are type u64. The problem is that the BIT() macros are type unsigned long which is just 32 bits on 32bit systems. It's unlikely that people will be using this driver on 32bit kernels and even if they did we only use the lower AMDGPU_MAX_SDMA_INSTANCES (16) bits. So this bug does not affect anything in real life. Still, for correctness sake, u64 bit masks should use BIT_ULL(). Fixes: `d2e3961ae3` ("drm/amdgpu: add amdgpu_sdma_sched_mask debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/d39a9325-87a4-4543-b6ec-1c61fca3a6fc@stanley.mountain Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jesse Zhang	f7e672e6f8	drm/amdgpu: enable gfx12 queue reset flag Enable the kgq and kcq queue reset flag Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:47 -05:00
Jesse Zhang	39b0fa29f6	drm/amdgpu/sdma4.4.2: add apu support in sdma queue reset Remove apu check in sdma queue reset. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:01:29 -05:00
Dave Airlie	0739b8ba82	drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ3uZUQAKCRAnX84Zoj2+ dgM8AX4ur9y2eXLVQPS2IhCouWpFsYgSRnCysdVG43vszZ2kcObvlj4UV8nrrv7j W6x9FZYBfRLm6ctAUnu05ppm3zSbSdmsocadu1mfoDbShy31Pc5xklnB1u6M3Asw 3mOtO5dxkA== =QZ6P -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-01-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106-augmented-kakapo-of-action-0cf000@houat	2025-01-09 15:48:50 +10:00
Dmitry Baryshkov	26d6fd8191	drm/connector: make mode_valid take a const struct drm_display_mode The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge take a const struct drm_display_mode argument. Change the mode_valid callback of drm_connector to also take a const argument. Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241214-drm-connector-mode-valid-const-v2-5-4f9498a4c822@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2025-01-07 12:45:19 +02:00
Kent Russell	6c9c97387b	drm/amdgpu: Remove unnecessary NULL check container_of cannot return NULL, so it is unnecessary to check for NULL after gem_to_amdgpu_bo, which is just a container_of call Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Arunpravin Paneer Selvam	3318ba94e5	drm/amdgpu: Add a lock when accessing the buddy trim function When running YouTube videos and Steam games simultaneously, the tester found a system hang / race condition issue with the multi-display configuration setting. Adding a lock to the buddy allocator's trim function would be the solution. <log snip> [ 7197.250436] general protection fault, probably for non-canonical address 0xdead000000000108 [ 7197.250447] RIP: 0010:__alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250470] Call Trace: [ 7197.250472] <TASK> [ 7197.250475] ? show_regs+0x6d/0x80 [ 7197.250481] ? die_addr+0x37/0xa0 [ 7197.250483] ? exc_general_protection+0x1db/0x480 [ 7197.250488] ? drm_suballoc_new+0x13c/0x93d [drm_suballoc_helper] [ 7197.250493] ? asm_exc_general_protection+0x27/0x30 [ 7197.250498] ? __alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250501] ? __alloc_range+0x109/0x340 [amddrm_buddy] [ 7197.250506] amddrm_buddy_block_trim+0x1b5/0x260 [amddrm_buddy] [ 7197.250511] amdgpu_vram_mgr_new+0x4f5/0x590 [amdgpu] [ 7197.250682] amdttm_resource_alloc+0x46/0xb0 [amdttm] [ 7197.250689] ttm_bo_alloc_resource+0xe4/0x370 [amdttm] [ 7197.250696] amdttm_bo_validate+0x9d/0x180 [amdttm] [ 7197.250701] amdgpu_bo_pin+0x15a/0x2f0 [amdgpu] [ 7197.250831] amdgpu_dm_plane_helper_prepare_fb+0xb2/0x360 [amdgpu] [ 7197.251025] ? try_wait_for_completion+0x59/0x70 [ 7197.251030] drm_atomic_helper_prepare_planes.part.0+0x2f/0x1e0 [ 7197.251035] drm_atomic_helper_prepare_planes+0x5d/0x70 [ 7197.251037] drm_atomic_helper_commit+0x84/0x160 [ 7197.251040] drm_atomic_nonblocking_commit+0x59/0x70 [ 7197.251043] drm_mode_atomic_ioctl+0x720/0x850 [ 7197.251047] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251049] drm_ioctl_kernel+0xb9/0x120 [ 7197.251053] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251056] drm_ioctl+0x2d4/0x550 [ 7197.251058] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251063] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 7197.251186] __x64_sys_ioctl+0xa0/0xf0 [ 7197.251190] x64_sys_call+0x143b/0x25c0 [ 7197.251193] do_syscall_64+0x7f/0x180 [ 7197.251197] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251199] ? amdgpu_display_user_framebuffer_create+0x215/0x320 [amdgpu] [ 7197.251329] ? drm_internal_framebuffer_create+0xb7/0x1a0 [ 7197.251332] ? srso_alias_return_thunk+0x5/0xfbef5 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Fixes: `4a5ad08f53` ("drm/amdgpu: Add address alignment support to DCC buffers") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Prike Liang	2b11179e18	drm/amdgpu: reduce RLC safe mode request for gfx clock gating The driver can only request one time for the power safe mode instead of polling and disabling the power feature each time prior to program the GFX clock gating control registers. This update will reduce the latency on the GFX clock gating entry. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Srinivasan Shanmugam	8b248b9045	drm/amdgpu/gfx10: Add cleaner shader for GFX10.3.0 This commit adds the cleaner shader microcode for GFX10.3.0 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_10_3_0_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
Srinivasan Shanmugam	9095567bc3	drm/amdgpu: Fix error handling in amdgpu_ras_add_bad_pages It ensures that appropriate error codes are returned when an error condition is detected Fixes the below; drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2849 amdgpu_ras_add_bad_pages() warn: missing error code here? 'amdgpu_umc_pages_in_a_row()' failed. drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2884 amdgpu_ras_add_bad_pages() warn: missing error code here? 'amdgpu_ras_mca2pa()' failed. v2: s/-EIO/-EINVAL, retained the use of -EINVAL from amdgpu_umc_pages_in_a_row & and amdgpu_ras_mca2pa_by_idx, when the RAS context is not initialized or the convert_ras_err_addr function is unavailable. (Thomas) V3: Returning 0 as the absence of eh_data is acceptable. (Tao) Fixes: `a8d133e625` ("drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: YiPeng Chai <yipeng.chai@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
yfeng1	62bf9fe6fa	drm/amdgpu: Fix for MEC SJT FW Load Fail on VF Users might switch to ROCM build does not include MEC SJT FW and driver needs to consider this case.w Signed-off-by: yfeng1 <yfeng1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
Dave Airlie	8368e9719d	amd-drm-next-6.14-2024-12-18: amdgpu: - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - Panel replay fixes - DCN 3.5 updates - DP tunneling fixes - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - Misc cleanups - GC 12.x updates - DRM panic support - DC FAMS updates - DSC fixes - job handling fixes amdkfd: - GG 9.5 updates - Logging improvements - Misc cleanups - Various Optimizations -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ2MsKgAKCRC93/aFa7yZ 2D4tAP9YZI2TMu8hMjNKPRp1GDvA/GptRzZNRg3AMTK0HLhQzwEAocsJ72GnZL6e t3+c4i72+b0JBi/jzSy5PsVZsqG+6gg= =MFJT -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.14-2024-12-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2024-12-18: amdgpu: - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - Panel replay fixes - DCN 3.5 updates - DP tunneling fixes - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - Misc cleanups - GC 12.x updates - DRM panic support - DC FAMS updates - DSC fixes - job handling fixes amdkfd: - GG 9.5 updates - Logging improvements - Misc cleanups - Various Optimizations Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218201758.2580723-1-alexander.deucher@amd.com	2024-12-20 07:57:01 +10:00
Yunxiang Li	74ef9527bd	drm/amdgpu: track bo memory stats at runtime Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we track the BOs as they change states. This way when the fdinfo is queried we only need to take the status lock and copy out the usage stats with minimal impact to the runtime performance. With this new approach however, we would no longer be able to track active buffers. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-6-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:28 +01:00
Yunxiang Li	a541a6e865	drm/amdgpu: remove unused function parameter amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all callers have a reference to adev handy, so remove it for cleanliness. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-5-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:25 +01:00
Yunxiang Li	bebf2ebd70	drm: make drm-active- stats optional When memory stats is generated fresh everytime by going though all the BOs, their active information is quite easy to get. But if the stats are tracked with BO's state this becomes harder since the job scheduling part doesn't really deal with individual buffers. Make drm-active- optional to enable amdgpu to switch to the second method. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-3-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:17 +01:00
Dave Airlie	38e961097e	Linux 6.13-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmdfbR8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOOIH/j788VAvWIM+0RdL PcKlFdfTucdHHS4P19WT9FvK3CQw025xyliY0YFyXreCXwIu/lGGC6OW+tV7aC8c EWJPqP+kJuLStfm3vpXbEnPql1K1IOW8CfeKhLgbPW2K+BFSX1BFbm9lJ6DQo3JS GMvgn9NE7ssJ6o4Do6mICHkcOxGq39fFYxhdlB/QVro3dPWfYffDivlGhycePBle ZXWWebLlEA9pMd7xw4LJImb1huhxt1rdBzQtmInpIGP2J6CDJ4nXJdiLib/TprWO Wg1HRDrIIegMEm/xC6M0D9RmH3NGJ5AB54qnzG6nXDEPMurcBkTR6EcnQitYmp9c 7O1z9JI= =Go2H -----END PGP SIGNATURE----- Merge tag 'v6.13-rc3' into drm-next Backmerge linux 6.13-rc3 as amd next has some dependencies on fixes in it. Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-12-19 12:00:02 +10:00
Michel Dänzer	695c2c745e	drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update Third time's the charm, I hope? Fixes: `d3116756a7` ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:08 -05:00
Mirsad Todorovac	a21ab06b8c	drm/admgpu: replace kmalloc() and memcpy() with kmemdup() The static analyser tool gave the following advice: ./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup → 1266 tmp = kmalloc(used_size, GFP_KERNEL); 1267 if (!tmp) 1268 return -ENOMEM; 1269 → 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size); Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics. Original code works without fault, so this is not a bug fix but proposed improvement. Link: https://lwn.net/Articles/198928/ Fixes: `84a2947ecc` ("drm/amdgpu: Implement virt req_ras_err_count") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Zhigang Luo <Zhigang.Luo@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Vignesh Chander <Vignesh.Chander@amd.com> Cc: Danijel Slivka <danijel.slivka@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:08 -05:00
Philip Yang	e37ccf44ac	drm/amdgpu: Show warning message if IH ring overflow If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA interrupts and cause application soft hang. Show warning message with ring name if overflow happens. Add function to get ih ring name to avoid duplicating it. To keep warning message consistent between GPU generations, change all *_ih.c except ASICs older than Vega which has only one ih ring. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Philip Yang	1b00143231	drm/amdgpu: Optimize gfx v9 GPU page fault handling After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is different. Each page fault copy to KFD ih fifo to send event to user space by KFD interrupt worker, this could cause KFD ih fifo overflow while other processes generate events at same time. KFD process is aborted after GPU page fault, we only need one GPU page fault interrupt sent to KFD ih fifo to send memory exception event to user space. Incease KFD ih fifo size to 2 times of IH primary ring size, to handle the burst events case. This patch handle the gfx v9 path, cover retry on/off and CAM filter on/off cases. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	11815bb0e3	drm/amdgpu: partially revert "reduce reset time" This partially reverts commit `194eb174cb`. This commit introduced a new state variable into adev without even remotely worrying about CPU barriers. Since we already have the amdgpu_in_reset() function exactly for this use case partially revert that. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	26c95e838e	drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare As soon as the prepare phase is completed the VM might be released, better set it to NULL. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	57f812d171	drm/amdgpu: fix amdgpu_coredump The VM pointer might already be outdated when that function is called. Use the PASID instead to gather the information instead. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Candice Li	d1ebe307b4	drm/amdgpu: Enable psp v14_0_3 RAS support for non-SRIOV configurations. Enable psp v14_0_3 RAS support for non-SRIOV configurations. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Philip Yang	b4b7271e5c	drm/amdgpu: Don't enable sdma 4.4.5 CTXEMPTY interrupt The sdma context empty interrupt is dropped in amdgpu_irq_dispatch as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of total interrupts and causes IH primary ring overflow when running stressful benchmark application. Disable this interrupt has no side effect found. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Karol Przybylski	34c4eb7d4e	drm/amdgpu: Fix potential integer overflow in scheduler mask calculations The use of 1 << i in scheduler mask calculations can result in an unintentional integer overflow due to the expression being evaluated as a 32-bit signed integer. This patch replaces 1 << i with 1ULL << i to ensure the operation is performed as a 64-bit unsigned integer, preventing overflow Discovered in coverity scan, CID 1636393, 1636175, 1636007, 1635853 Fixes: `c5c63d9cb5` ("drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs") Signed-off-by: Karol Przybylski <karprzy7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:38:42 -05:00
Alex Deucher	f1fd1d0f40	drm/amdgpu/gfx12: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:24 -05:00
Alex Deucher	63bfd24088	drm/amdgpu/mmhub4.1: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:22 -05:00
Alex Deucher	2c8eeaaa0f	drm/amdgpu/nbio7.11: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:19 -05:00
Alex Deucher	0ec43fbece	drm/amdgpu/nbio7.0: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:16 -05:00
Alex Deucher	22b9555bc9	drm/amdgpu/nbio7.7: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:03 -05:00
Andrew Martin	357ef5b3b7	drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:18:20 -05:00
Lijo Lazar	635c659fce	drm/amdgpu: Use dbg level for VBIOS check messages Driver has different ways to fetch VBIOS. If one of the methods doesn't find an authentic one, it will show misleading info messages eventhough a subsequent method finds a valid VBIOS. Keep the message level at debug and add device context. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:18:04 -05:00
Pierre-Eric Pelloux-Prayer	54a1b36d4b	drm/amdgpu: remove useless init from amdgpu_job_alloc This init is useless because base.sched will be cleared to 0 in drm_sched_job_init because of commit `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()"). Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:17:46 -05:00
Pierre-Eric Pelloux-Prayer	0014952b17	drm/amdgpu: drop the amdgpu_device argument from amdgpu_ib_free It's unused. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:17:32 -05:00
Pierre-Eric Pelloux-Prayer	2ae520cb12	drm/amdgpu: don't access invalid sched Since `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()") accessing job->base.sched can produce unexpected results as the initialisation of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the memset. This commit fixes an issue when a CS would fail validation and would be rejected after job->num_ibs is incremented. In this case, amdgpu_ib_free(ring->adev, ...) will be called, which would crash the machine because the ring value is bogus. To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this because the device is actually not used in this function. The next commit will remove the ring argument completely. Fixes: `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:16:35 -05:00
Dheeraj Reddy Jonnalagadda	69b54d7c7c	drm/amdgpu: simplify return statement in amdgpu_ras_eeprom_init Remove the logically dead code in the last return statement of amdgpu_ras_eeprom_init. The condition res < 0 is redundant since res is already checked for a negative value earlier. Replace return res < 0 ? res : 0; with return 0 to improve clarity. Fixes: `63d4c081a5` ("drm/amdgpu: Optimize EEPROM RAS table I/O") Closes: https://scan7.scan.coverity.com/#/project-view/52337/11354?selectedIssue=1602413 Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:16:05 -05:00
Alex Deucher	7e50642d41	drm/amd/display: add non-DC drm_panic support Add support for the drm_panic module, which displays a pretty user friendly message on the screen when a Linux kernel panic occurs. Adapt Lu Yao's code to use common helpers derived from Jocelyn's patch. This extends the non-DC code to enable access to non-CPU accessible VRAM and adds support for other DCE versions. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lu Yao <yaolu@kylinos.cn> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Harry Wentland <harry.wentland@amd.com>	2024-12-18 12:16:01 -05:00
Mario Limonciello	1ad5bdc28b	drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO If the kernel hasn't been compiled with PCIe hotplug support this can lead to problems with dGPUs that use BOCO because they effectively drop off the bus. To prevent issues, disable BOCO support when compiled without PCIe hotplug. Reported-by: Gabriel Marcano <gabemarcano@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:15:40 -05:00
Bokun Zhang	3676f37a88	drm/amdgpu/vcn: reset fw_shared under SRIOV - The previous patch only considered the case for baremetal and is not applicable for SRIOV code path. We also need to init fw_share for SRIOV VF Fixes: `928cd772e1` ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:14:16 -05:00
Alex Deucher	fe151ed7af	drm/amdgpu: add generic display panic helper code Pull this out of Jocelyn's patch and make it generic. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lu Yao <yaolu@kylinos.cn> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Harry Wentland <harry.wentland@amd.com>	2024-12-18 12:13:49 -05:00
Dave Airlie	d172ea67db	amd-drm-fixes-6.13-2024-12-11: amdgpu: - ISP hw init fix - SR-IOV fixes - Fix contiguous VRAM mapping for UVD on older GPUs - Fix some regressions due to drm scheduler changes - Workload profile fixes - Cleaner shader fix amdkfd: - Fix DMA map direction for migration - Fix a potential null pointer dereference - Cacheline size fixes - Runtime PM fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ1oJLgAKCRC93/aFa7yZ 2JoeAQCdBc9+GR9hLY7R6rjSNfwW0qQ7CovcKD/95BCvy9AtPAD9E/m7ULHPdQ6r cSwuvVzccsRM0Qnz74imXXARW+26rQ0= =Tb2v -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.13-2024-12-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.13-2024-12-11: amdgpu: - ISP hw init fix - SR-IOV fixes - Fix contiguous VRAM mapping for UVD on older GPUs - Fix some regressions due to drm scheduler changes - Workload profile fixes - Cleaner shader fix amdkfd: - Fix DMA map direction for migration - Fix a potential null pointer dereference - Cacheline size fixes - Runtime PM fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211215449.741848-1-alexander.deucher@amd.com	2024-12-13 09:43:20 +10:00
Dave Airlie	c7d6cb4c43	drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ1G6igAKCRAnX84Zoj2+ dpx8AX4m4lM6bo7/I/SDqR6Dw6zDX2AgbupW9NzFoJmlC+X/XOLgKEoCwam+j+09 hZKYTwcBfRwVa1UDccjHNdWA0IUxUYFQUeiVk59xlBhZZs5vFKorX7r7eMQNl3S1 gcnSrwy6OQ== =/dK/ -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next [airlied: handle module ns conflict] drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat	2024-12-13 08:48:09 +10:00
Alex Deucher	e70ba46795	drm/amdgpu/jpeg5.0.1: use num_jpeg_inst for SR-IOV They should be the same, but use the proper variable. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:26 -05:00
Alex Deucher	f53758bc34	drm/amdgpu/jpeg4.0.3: use num_jpeg_inst for SR-IOV They should be the same, but use the proper variable. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:23 -05:00
Alex Deucher	4b842c852f	drm/amdgpu: add sysfs reset mask for vcn 5.0.1 Add the calls to the vcn 5.0.1 code. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:19 -05:00
Alex Deucher	40253e36e0	drm/amdgpu: add ip_dump support for vcn 5.0.1 Shared with vcn 5.0.0. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:13 -05:00
Mario Limonciello	3f6f237b9d	drm/amd: Update strapping for NBIO 2.5.0 This helps to avoid a spurious PME event on hotplug to Azalia. Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reported-and-tested-by: ionut_n2001@yahoo.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884 Tested-by: Gabriel Marcano <gabemarcano@yahoo.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:36:56 -05:00
Jesse.zhang@amd.com	bcc263dea6	drm/amdgpu/gfx11: clean up kcq reset code Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:36:50 -05:00
Jesse.zhang@amd.com	0c0dec8207	drm/amdgpu/gfx12: clean up kcq reset code Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:35:46 -05:00
Jesse.zhang@amd.com	11974b7eac	drm/amdgpu/sdma7: Add queue reset sysfs for sdmav7 sdmv7 queue reset already supports by mmio, add its sys file. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:35:21 -05:00
Jesse.zhang@amd.com	a73a83241e	drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmio Reset gfx/compute queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:32:13 -05:00
Jesse.zhang@amd.com	0f8666138f	drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmio Reset sdma queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:32:06 -05:00
Lijo Lazar	fccb446f82	drm/amdgpu: Avoid VF for RAS recovery source check VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com> Reviewed-by: Yiqing Yao <yiqing.yao@amd.com> Tested-by: Yiqing Yao <yiqing.yao@amd.com> Fixes: `e1ee2111ca` ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:59 -05:00
Jesse.zhang@amd.com	f4d583cd3f	drm/amdgpu/sdma7: implement queue reset callback for sdma7 Implement sdma queue reset callback by mes_reset_queue_mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:52 -05:00
Jesse.zhang@amd.com	8a4c6fc826	drm/amdgpu/sdma7: Implement resume function for each instance Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume. This function can be used in start or restart scenarios of specific instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:39 -05:00
Candice Li	ecd1191e12	drm/amdgpu: Support nbif v6_3_1 fatal error handling Add nbif v6_3_1 fatal error handling support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:31:00 -05:00
Sonny Jiang	178ad3a9d1	drm/amdgpu: Enable VCN_5_0_1 IP block Add VCN_5_0_1 IP block to kernel boot Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:50 -05:00
Sonny Jiang	346492f30c	drm/amdgpu: Add VCN_5_0_1 support Add vcn support for VCN_5_0_1 v2: rebase, squash in fixes (Alex) Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:46 -05:00
Sathishkumar S	c406fca4b5	drm/amdgpu: enable JPEG5_0_1 ip block enable JPEG5_0_1 ip block Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:42 -05:00
Sathishkumar S	b8f57b6994	drm/amdgpu: Add JPEG5_0_1 support add support for JPEG5_0_1 v2: squash in updates, rebase on IP instance changes Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:24 -05:00
Sonny Jiang	4e4b1a1b80	drm/amdgpu: Add VCN_5_0_1 codec query Support VCN_5_0_1 codec query v2: squash in updates Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:21 -05:00
Sonny Jiang	fdce10ff8f	drm/amdgpu: Add VCN_5_0_1 firmware Add vcn_5_0_1 firmware support Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:18 -05:00
Sathishkumar S	20a3029227	drm/amdgpu: update macro for maximum jpeg rings Update the macro to accomdate more rings. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:13 -05:00
Alex Deucher	b1d0286c81	drm/amdgpu: update irq sec header for vcn 5.0.0 No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:01 -05:00
Alex Deucher	26893116c3	drm/amdgpu: update irq sec header for jpeg 5.0.0 No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:55 -05:00
Candice Li	33f1aa210a	drm/amdgpu: Add umc v8_14 ras functions Add umc v8_14 ras functions. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:39 -05:00
Candice Li	2c2b84f193	drm/amdgpu: Add psp v14_0_3 ras support Add psp v14_0_3 ras support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:21 -05:00
Srinivasan Shanmugam	55f4139b65	drm/amd/amdgpu: Add Annotations to Process Isolation functions This update adds explanations to key functions that manage how the Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the GPU. amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period for KFD to ensure it takes turns with KGD in using the GPU. It uses a mutex to safely manage shared data, like timing and state, and tracks when KFD starts and stops waiting. amdgpu_gfx_enforce_isolation_ring_begin_use: Ensures KFD has enough time to run before new tasks are submitted to the GPU ring. It uses a mutex to synchronize access and may adjust the KFD scheduler. amdgpu_gfx_enforce_isolation_ring_end_use: Handles cleanup and state updates when finishing the use of a GPU ring. It may also adjust the KFD scheduler, using a mutex to manage shared data access. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:13 -05:00
Hawking Zhang	57bcfa89fe	drm/amdgpu: Init mmhub v1_8_1 ras func reuse mmhub v1_8 ras functuion Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:09 -05:00
Shiwu Zhang	bd18b11f2d	drm/amdgpu: Enable xgmi for gfx v9_5_0 Enable xgmi for gfx v9_5_0 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:58 -05:00
Asad Kamal	f79cfbac5c	drm/amdgpu: Fetch refclock for SMU v13.0.12 Add support to fetch refclock value for SMU v13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:54 -05:00
Asad Kamal	100350c373	drm/amd/pm: Add mode2 support for SMU v13.0.12 Add mode2 reset support for smu version 13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:50 -05:00
Srinivasan Shanmugam	a69f4cc278	drm/amd/amdgpu: Add Descriptions to Process Isolation and Cleaner Shader Sysfs Functions This update adds explanations to key functions related to process isolation and cleaner shader execution sysfs interfaces. - `amdgpu_gfx_set_run_cleaner_shader`: Describes how to manually run a cleaner shader, which clears the Local Data Store (LDS) and General Purpose Registers (GPRs) to ensure data isolation between GPU workloads. - `amdgpu_gfx_get_enforce_isolation`: Describes how to query the current settings of the 'enforce_isolation' feature for each GPU partition. - `amdgpu_gfx_set_enforce_isolation`: Describes how to enable or disable process isolation for GPU partitions through the sysfs interface. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:31 -05:00
Hawking Zhang	9a826c4af8	drm/amdgpu: Enable RAS for psp v13_0_12 Enable RAS Cap check and initialize RAS funcs for psp v13_0_12 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:28 -05:00
Hawking Zhang	98230feb55	drm/amdgpu: Load spdm_drv for psp v13_0_12 spdm_drv is a firmware that needs to be loaded in driver initialization phase. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:23 -05:00
Hawking Zhang	3516d35f81	drm/amdgpu: Add psp v13_0_12 firmware specifiers Add psp v13_0_12 firmware specifiers for sos and ta Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:19 -05:00
Le Ma	2d2f1622c8	drm/amdgpu: add psp 13_0_12 version support Add support for new psp 13_0_12 version Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:08 -05:00
Mario Limonciello	b6e6871a56	drm/amd: Show an info message about optional firmware missing With the warning from the core about missing firmware gone, users still may be notified of missing optional firmware by a more friendly message to clarify it's optional. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	2a50d94b11	drm/amdgpu: add ACA support for jpeg v4.0.3 Add ACA support for jpeg v4.0.3. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	3748c439bb	drm/amdgpu: add ACA support for vcn v4.0.3 v1: Add ACA support for vcn v4.0.3. v2: - split VCN ACA(v1) to 2 parts: vcn and jpeg. - move mmSMNAID_AID0_MCA_SMU to amdgpu_aca.h file. v3: - split JPEG ACA to another patch. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	abfcf95607	drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h move common ACA ipid defines into amdgpu_aca.h file. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Alex Sierra	1a3d4abd54	drm/amdgpu: add ih cam support for IH 4.4.4 Same as IH 4.4.2. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Le Ma	968e3811c3	drm/amdgpu: add initial support for sdma444 add sdma444 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Lijo Lazar	fd0c6bd82d	drm/amdgpu: Increase FRU File Id buffer size Some boards use longer File Ids. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	ae756cd853	drm/amdgpu: correct the calculation of RAS bad page After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	1f06e7f344	drm/amdgpu: split ras_eeprom_init into init and check functions Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Mario Limonciello	ea5d493498	drm/amd: Add the capability to mark certain firmware as "required" Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Hawking Zhang	0ca6d97596	drm/amdgpu: Apply gc v9_5_0 golden settings Apply gc v9_5_0 golden settings. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Alex Sierra	dad0c70507	drm/amd: update mtype flags for gfx 9.5.0 Update mtype flags to meet gfx 9.5.0 requirements for remote GPU memory and system memory. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	0b58a55af5	drm/amdgpu: add initial support for gfx950 add gfx950 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	ebc7d1acf3	drm/amdgpu/gfx: add gfx950 microcode Add firmware declarations. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Alex Sierra	9bfe4caa4e	drm/amd: define gc ip version local variable For better readability. Also leftover orphaned code. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	3f1e050c99	drm/amdgpu: Remove gfxoff usage GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in GFX domain. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Prike Liang	d2382f29ce	drm/amdgpu: Avoid to release the FW twice in the validated error There will to release the FW twice when the FW validated error. Even if the release_firmware() will further validate the FW whether is empty, but that will be redundant and inefficient. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Randy Dunlap	a567db808e	drm/amdgpu: device: fix spellos and punctuation Make spelling and punctuation changes to ease reading of the comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Sathishkumar S	de258d06fd	drm/amdgpu: Add amdgpu_vcn_sched_mask debugfs Add debugfs entry to enable or disable job submission to specific vcn instances. The entry is created only when there is more than an instance and is unified queue type. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Jinzhou Su	9db3aed8ea	drm/amdgpu: return error when eeprom checksum failed Return eeprom table checksum error result, otherwise it might be overwritten by next call. V2: replace DRM_ERROR with dev_err Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Mario Limonciello	2965e6355d	drm/amd: Add Suspend/Hibernate notification callback support As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	edd628ad17	drm/amdgpu: Simplify cleanup check for FRU sysfs FRU info is expected to be non-NULL if FRU sys files are created. Simplify the check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Jinzhou Su	e1a34ed917	drm/amdgpu: Add secure display v2 command Add secure display v2 command to support multiple ROI instances per display. v2: fix typo and coding style issue Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Mario Limonciello	c2ee5c2f0e	drm/amd: Invert APU check for amdgpu_device_evict_resources() Resource eviction isn't needed for s3 or s2idle on APUs, but should be run for S4. As amdgpu_device_evict_resources() will be called by prepare notifier adjust logic so that APUs only cover S4. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	86fa54f349	drm/amdgpu: add "restore" missing variable comment add "restore" missing variable in the fucntions sdma_v4_4_2_page_resume and sdma_v4_4_2_inst_start. This fixes the warning: warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_page_resume' warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_inst_start' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	093bbeb994	drm/amdgpu: Update the variable name to dma_buf Instead of fixing the warning for missing variable its better to update the variable name to match with the style followed in the code. This will fix the below mentioned warning: warning: Function parameter or struct member 'dbuf' not described in 'amdgpu_bo_create_isp_user' warning: Excess function parameter 'dma_buf' description in 'amdgpu_bo_create_isp_user' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	ea8094abfb	drm/amdgpu: set UMC PA per NPS mode when PA is 0 The shift bit of PA varys according to NPS mode due to different address format. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	d08fb66370	drm/amdgpu: remove is_mca_add for ras_add_bad_pages Remove unnecessary variable and simplify the logic. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	a8d133e625	drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use retired_page stored on eeprom directly in non-nps1 mode even for legacy data. We need to take different actions for different data, new data can be identified from old data by UMC_CHANNEL_IDX_V2 flag. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Shikang Fan	0859eb540f	drm/amdgpu: Check fence emitted count to identify bad jobs In SRIOV, when host driver performs MODE 1 reset and notifies FLR to guest driver, there is a small chance that there is no job running on hw but the driver has not updated the pending list yet, causing the driver not respond the FLR request. Modify the has_job_running function to make sure if there is still running job. v2: Use amdgpu_fence_count_emitted to determine job running status. v3: Remove the timeout wait in has_job_running Signed-off-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00

... 7 8 9 10 11 ...

15900 Commits