linux

Commit Graph

Author	SHA1	Message	Date
Sakari Ailus	ef4a4b8781	drm/amd: Remove redundant pm_runtime_mark_last_busy() calls pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(), pm_runtime_autosuspend() and pm_request_autosuspend() now include a call to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to pm_runtime_mark_last_busy(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 11:31:45 -04:00
Lijo Lazar	9e2096baab	drm/amdgpu: Add amdgpu_discovery_info Add amdgpu_discovery_info structure to keep all discovery related information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Alex Deucher	db36632ea5	drm/amdgpu: clean up and unify hw fence handling Decouple the amdgpu fence from the amdgpu_job structure. This lets us clean up the separate fence ops for the embedded fence and other fences. This also allows us to allocate the vm fence up front when we allocate the job. v2: Additional cleanup suggested by Christian v3: Additional cleanups suggested by Christian v4: Additional cleanups suggested by David and vm fence fix v5: cast seqno (David) Cc: David.Wu3@amd.com Cc: christian.koenig@amd.com Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-13 14:14:35 -04:00
Sunil Khatri	476a4e10a3	drm/amdgpu: print root PD address in PDE format instead of GPU Print PD address of VM root instead of GPU address in the debugfs. On modern GPU's this is what UMR tool expects in the registers as well. Fixes: `719b378d37` ("drm/amdgpu: add debugfs support for VM pagetable per client") Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-05 17:38:41 -04:00
Sunil Khatri	7e0fc7b2b7	drm/amdgpu: add more information in debugfs to pagetable dump Add more information in the debugfs which is needed to dump a pagetable correctly for userqueues where vmid is not known in the kernel. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-05 16:06:52 -04:00
Sunil Khatri	03d5236014	drm/amdgpu: fix the logic to validate fpriv and root bo Fix the smatch warning, smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:2146 amdgpu_pt_info_read() error: we previously assumed 'fpriv' could be null (see line 2146) "if (!fpriv && !fpriv->vm.root.bo)", It has to be an OR condition rather than an AND which makes an NULL dereference in case fpriv is NULL. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507090525.9rDWGhz3-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Link: https://lore.kernel.org/r/20250709071618.591866-1-sunil.khatri@amd.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>	2025-07-09 10:15:30 +02:00
Maarten Lankhorst	e21354aea4	Merge remote-tracking branch 'drm/drm-next' into drm-misc-next Pull in drm-intel-next for the updates to drm panic handling. Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2025-07-08 16:49:07 +02:00
Sunil Khatri	719b378d37	drm/amdgpu: add debugfs support for VM pagetable per client Add a debugfs file under the client directory which shares the root page table base address of the VM. This address could be used to dump the pagetable for debug memory issues. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704075548.1549849-4-sunil.khatri@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2025-07-04 16:00:06 +02:00
Dave Airlie	7e2818386a	amd-drm-next-6.17-2025-07-01: amdgpu: - FAMS2 fixes - OLED fixes - Misc cleanups - AUX fixes - DMCUB updates - SR-IOV hibernation support - RAS updates - DP tunneling fixes - DML2 fixes - Backlight improvements - Suspend improvements - Use scaling for non-native modes on eDP - SDMA 4.4.x fixes - PCIe DPM fixes - SDMA 5.x fixes - Cleaner shader updates for GC 9.x - Remove fence slab - ISP genpd support - Parition handling rework - SDMA FW checks for userq support - Add missing firmware declaration - Fix leak in amdgpu_ctx_mgr_entity_fini() - Freesync fix - Ring reset refactoring - Legacy dpm verbosity changes amdkfd: - GWS fix - mtype fix for ext coherent system memory - MMU notifier fix - gfx7/8 fix radeon: - CS validation support for additional GL extensions - Bump driver version for new CS validation checks -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCaGQ6hwAKCRC93/aFa7yZ 2JVJAQC/IpZoSmW22VrtXjBiQ3yoYt61oOM/WJVXPV11FmisyQEAqQ9nc+/lY9SM s5/RskaQGlfCGqafnaSGuoILkD/YDQ4= =i8mT -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.17-2025-07-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.17-2025-07-01: amdgpu: - FAMS2 fixes - OLED fixes - Misc cleanups - AUX fixes - DMCUB updates - SR-IOV hibernation support - RAS updates - DP tunneling fixes - DML2 fixes - Backlight improvements - Suspend improvements - Use scaling for non-native modes on eDP - SDMA 4.4.x fixes - PCIe DPM fixes - SDMA 5.x fixes - Cleaner shader updates for GC 9.x - Remove fence slab - ISP genpd support - Parition handling rework - SDMA FW checks for userq support - Add missing firmware declaration - Fix leak in amdgpu_ctx_mgr_entity_fini() - Freesync fix - Ring reset refactoring - Legacy dpm verbosity changes amdkfd: - GWS fix - mtype fix for ext coherent system memory - MMU notifier fix - gfx7/8 fix radeon: - CS validation support for additional GL extensions - Bump driver version for new CS validation checks From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250701194707.32905-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-07-04 10:06:29 +10:00
Alex Deucher	bf1cd14f9e	drm/amdgpu: switch job hw_fence to amdgpu_fence Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: `3f4c175d62` ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-06-18 12:19:21 -04:00
André Almeida	35dc4ce200	drm: amdgpu: Use struct drm_wedge_task_info inside of struct amdgpu_task_info To avoid a cast when calling drm_dev_wedged_event(), replace pid and task name inside of struct amdgpu_task_info with struct drm_wedge_task_info. Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250617124949.2151549-6-andrealmeid@igalia.com Signed-off-by: André Almeida <andrealmeid@igalia.com>	2025-06-17 11:32:47 -03:00
Shiwu Zhang	72ea78335e	drm/amdgpu: add debugfs for spirom IFWI dump Expose the debugfs file node for user space to dump the IFWI image on spirom. For one transaction between PSP and host, it will read out the images on both active and inactive partitions so a buffer with two times the size of maximum IFWI image (currently 16MByte) is needed. v2: move the vbios gfl macros to the common header and rename the bo triplet struct to spirom_bo for this specific usage (Hawking) v3: return directly the result of last command execution (Lijo) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-14 11:30:15 -04:00
Emily Deng	8d5e70ba5d	drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev). Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-03-13 23:12:52 -04:00
Sathishkumar S	de258d06fd	drm/amdgpu: Add amdgpu_vcn_sched_mask debugfs Add debugfs entry to enable or disable job submission to specific vcn instances. The entry is created only when there is more than an instance and is unified queue type. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Alex Deucher	f5d873f582	drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read() Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:59 -05:00
Alex Deucher	7ba9395430	drm/amdgpu: Adjust debugfs eviction and IB access permissions Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:56 -05:00
Alex Deucher	c0cfd2e652	drm/amdgpu: Adjust debugfs register access permissions Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:49 -05:00
Jesse Zhang	d2e3961ae3	drm/amdgpu: add amdgpu_sdma_sched_mask debugfs Userspace wants to run jobs on a specific sdma ring for verification purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the sdma ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:15 -05:00
Jesse Zhang	c5c63d9cb5	drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs compute/gfx may have multiple rings on some hardware. In some cases, userspace wants to run jobs on a specific ring for validation purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the gfx/compute ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:53 -05:00
Sathishkumar S	f0b19b84d3	drm/amdgpu: add amdgpu_jpeg_sched_mask debugfs JPEG_4_0_3 has up to 32 jpeg cores and a single mjpeg video decode will use all available cores on the hardware. This debugfs entry helps to disable or enable job submission to a cluster of cores or one specific core in the ip for debugging. The entry is populated only if there is at least two or more cores in the jpeg ip. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Sunil Khatri	17277da266	drm/amdgpu: Remove debugfs amdgpu_reset_dump_register_list There are some problem with existing amdgpu_reset_dump_register_list debugfs node. It is supposed to read a list of registers but there could be cases when the IP is not in correct power state. Register read in such cases could lead to more problems. We are taking care of all such power states in devcoredump and dumping the registers of need for debugging. So cleaning this code and we dont need this functionality via debugfs anymore. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-06 10:42:09 -04:00
Jesse Zhang	806e8c5579	drm/amdgpu: fix invadate operation for pg_flags Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0 regardless of the values of its operands. So removing the operations upper_32_bits and lower_32_bits. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-23 15:09:50 -04:00
Saleemkhan Jamadar	98a2e3a0d1	drm/amdgpu/umsch: add support to capture fw debug log Added support to capture unsch fw debug logs in debugfs. To enable set amdgpu_umschfw_log =1 in boot args. v1 - rename variable to umsch_mm_fwlog (Veera) Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-05-13 15:45:41 -04:00
Jesse Zhang	ea686fef54	drm/amdgpu: fix the warning about the expression (int)size - len Converting size from size_t to int may overflow. v2: keep reverse xmas tree order (Christian) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-04-26 17:22:45 -04:00
Shashank Sharma	b8f67b9ddf	drm/amdgpu: change vm->task_info handling This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-03-04 15:59:08 -05:00
Ma Jun	9749c86843	drm/amdgpu: Fix the warning info in mode1 reset Fix the warning info below during mode1 reset. [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000006] ? show_regs+0x6e/0x80 [ +0.000011] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000005] ? __warn+0x91/0x150 [ +0.000009] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000006] ? report_bug+0x19d/0x1b0 [ +0.000013] ? handle_bug+0x46/0x80 [ +0.000012] ? exc_invalid_op+0x1d/0x80 [ +0.000011] ? asm_exc_invalid_op+0x1f/0x30 [ +0.000014] ? __flush_work.isra.0+0x2e8/0x390 [ +0.000007] ? __flush_work.isra.0+0x208/0x390 [ +0.000007] ? _prb_read_valid+0x216/0x290 [ +0.000008] __cancel_work_timer+0x11d/0x1a0 [ +0.000007] ? try_to_grab_pending+0xe8/0x190 [ +0.000012] cancel_work_sync+0x14/0x20 [ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched] [ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu] This warning info was printed after applying the patch "drm/sched: Convert drm scheduler to use a work queue rather than kthread". The root cause is that amdgpu driver tries to use the uninitialized work_struct in the struct drm_gpu_scheduler v2: - Rename the function to amdgpu_ring_sched_ready and move it to amdgpu_ring.c (Alex) v3: - Fix a few more checks based on Vitaly's patch (Alex) v4: - squash in fix noticed by Bert in https://gitlab.freedesktop.org/drm/amd/-/issues/3139 Fixes: `11b3b9f461` ("drm/sched: Check scheduler ready before calling timeout handling") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-31 09:40:42 -05:00
Mangesh Gadre	5eb8094a9b	drm/amdgpu: Add register read/write debugfs support for AID's SMN address is larger than 32 bits for registers on different AID's Updating existing interface to support access to such registers. Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-01-03 10:30:19 -05:00
Alex Deucher	afe58346d5	drm/amdgpu/debugfs: fix error code when smc register accessors are NULL Should be -EOPNOTSUPP. Fixes: `5104fdf50d` ("drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-14 15:27:42 -05:00
shaoyunl	b2662d4cc4	drm/amdgpu: SW part of MES event log enablement This is the generic SW part, prepare the event log buffer and dump it through debugfs Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-12-07 17:43:13 -05:00
Dave Airlie	5edfd7d94b	amd-drm-next-6.8-2023-12-01: amdgpu: - Add new 64 bit sequence number infrastructure. This will ultimately be used for user queue synchronization. - GPUVM updates - Misc code cleanups - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9 updates - NBIO 7.11 updates - SubVP updates - DCN 3.1.4 fixes - ABM fixes - AGP aperture fix - DCN 3.1.5 fix - Fix some potential error path memory leaks - Enable PCIe PMEs - Add XGMI, PCIe state dumping for aqua vanjaram - GFX11 golden register updates - Misc display fixes amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend and reset - Fix possible memory leak in pqm_uninit() radeon: - Fix some possible overflows in command buffer checking - Check for errors in ring_lock -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZWohuwAKCRC93/aFa7yZ 2A6XAP9a6/3noDYxabnLmr3B+13wtweSkvEwD2bjOk7KinyS6AEAr8H3HZ5U0HMP zj35QhcNcYaTsjrip7e53XunpMBQSwQ= =DvId -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.8-2023-12-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.8-2023-12-01: amdgpu: - Add new 64 bit sequence number infrastructure. This will ultimately be used for user queue synchronization. - GPUVM updates - Misc code cleanups - RAS updates - DCN 3.5 updates - Rework PCIe link speed handling - Document GPU reset types - DMUB fixes - eDP fixes - NBIO 7.9 updates - NBIO 7.11 updates - SubVP updates - DCN 3.1.4 fixes - ABM fixes - AGP aperture fix - DCN 3.1.5 fix - Fix some potential error path memory leaks - Enable PCIe PMEs - Add XGMI, PCIe state dumping for aqua vanjaram - GFX11 golden register updates - Misc display fixes amdkfd: - Migrate TLB flushing logic to amdgpu - Trap handler fixes - Fix restore workers handling on suspend and reset - Fix possible memory leak in pqm_uninit() radeon: - Fix some possible overflows in command buffer checking - Check for errors in ring_lock From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231201181743.5313-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2023-12-05 12:11:41 +10:00
Lu Yao	7b194fdccb	drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-11-29 16:49:23 -05:00
Maxime Ripard	3bf3e21c15	Merge drm/drm-next into drm-misc-next Let's kickstart the v6.8 release cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2023-11-15 10:56:44 +01:00
Matthew Brost	35963cf2cd	drm/sched: Add drm_sched_wqueue_* helpers Add scheduler wqueue ready, stop, and start helpers to hide the implementation details of the scheduler from the drivers. v2: - s/sched_wqueue/sched_wqueue (Luben) - Remove the extra white line after the return-statement (Luben) - update drm_sched_wqueue_ready comment (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.com Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>	2023-11-01 17:29:20 -04:00
Qu Huang	5104fdf50d	drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log: 1. Navigate to the directory: /sys/kernel/debug/dri/0 2. Execute command: cat amdgpu_regs_smc 3. Exception Log:: [4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000 [4005007.702562] #PF: supervisor instruction fetch in kernel mode [4005007.702567] #PF: error_code(0x0010) - not-present page [4005007.702570] PGD 0 P4D 0 [4005007.702576] Oops: 0010 [#1] SMP NOPTI [4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u [4005007.702590] RIP: 0010:0x0 [4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 [4005007.702633] Call Trace: [4005007.702636] <TASK> [4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu] [4005007.703002] full_proxy_read+0x5c/0x80 [4005007.703011] vfs_read+0x9f/0x1a0 [4005007.703019] ksys_read+0x67/0xe0 [4005007.703023] __x64_sys_read+0x19/0x20 [4005007.703028] do_syscall_64+0x5c/0xc0 [4005007.703034] ? do_user_addr_fault+0x1e3/0x670 [4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0 [4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20 [4005007.703052] ? irqentry_exit+0x19/0x30 [4005007.703057] ? exc_page_fault+0x89/0x160 [4005007.703062] ? asm_exc_page_fault+0x8/0x30 [4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae [4005007.703075] RIP: 0033:0x7f5e07672992 [4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24 [4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992 [4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003 [4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010 [4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000 [4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [4005007.703105] </TASK> [4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca [4005007.703184] CR2: 0000000000000000 [4005007.703188] ---[ end trace ac65a538d240da39 ]--- [4005007.800865] RIP: 0010:0x0 [4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206 [4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68 [4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000 [4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980 [4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000 [4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000 [4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000 [4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0 Signed-off-by: Qu Huang <qu.huang@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-26 18:41:22 -04:00
André Almeida	2d6a2a28cd	drm/amdgpu: Encapsulate all device reset info To better organize struct amdgpu_device, keep all reset information related fields together in a separated struct. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-10-20 15:11:28 -04:00
Praful Swarnakar	ad19c200b1	drm/amdgpu: Fix style issues in amdgpu_debugfs.c Fixes the following to align to linux coding style: WARNING: Missing a blank line after declarations WARNING: sizeof rd should be sizeof(rd) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Praful Swarnakar <Praful.Swarnakar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-08-07 17:12:48 -04:00
Victor Lu	8ed49dd1d3	drm/amdgpu: Add RLCG interface driver implementation for gfx v9.4.3 (v3) Add RLCG interface support for gfx v9.4.3 and multiple XCCs. Do not enable it yet. v2: Fix amdgpu_rlcg_reg_access_ctrl init, add support for multiple XCCs in amdgpu_mm_wreg_mmio_rlc v3: Use GET_INST() when indexing amdgpu_rlcg_reg_access_ctrl Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-07-18 11:16:41 -04:00
Dan Carpenter	9c9d501b28	drm/amd/amdgpu: Fix up locking etc in amdgpu_debugfs_gprwave_ioctl() There are two bugs here. 1) Drop the lock if copy_from_user() fails. 2) If the copy fails then the correct error code is -EFAULT instead of -EINVAL. I also broke up the long line and changed "sizeof rd->id" to "sizeof(rd->id)". Fixes: `553f973a0d` ("drm/amd/amdgpu: Update debugfs for XCC support (v3)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 11:10:04 -04:00
Su Hui	109b4d8cfe	drm/amdgpu: remove unnecessary (void) conversions No need cast (void) to (struct amdgpu_device *). Signed-off-by: Su Hui <suhui@nfschina.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 10:40:12 -04:00
Tom St Denis	553f973a0d	drm/amd/amdgpu: Update debugfs for XCC support (v3) This patch updates the 'regs2' interface for MMIO registers to add a new IOCTL command for a 'v2' state data that includes the XCC ID. This patch then updates amdgpu_gfx_select_se_sh() and amdgpu_gfx_select_me_pipe_q() (and the implementations in the gfx drivers) to support an additional parameter. This patch then creates a new debugfs interface "gprwave" which is a merge of shader GPR and wave status access. This new inteface uses an IOCTL to select banks as well as XCC identity. (v2) Fix missing xcc_id in wave_ind function (v3) Fix pm runtime calls and mutex locking (v4) Fix bad label Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:48:22 -04:00
Srinivasan Shanmugam	8fa7635058	drm/amd/amdgpu: Fix style problems in amdgpu_debugfs.c Fix the following issues reported by checkpatch: WARNING: please, no space before tabs WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: sizeof rd should be sizeof(rd) WARNING: Missing a blank line after declarations WARNING: sizeof rd->id should be sizeof(rd->id) WARNING: static const char * array should probably be static const char * const WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. WARNING: Prefer seq_puts to seq_printf ERROR: space prohibited after that open parenthesis '(' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:25:54 -04:00
Le Ma	d51ac6d0a2	drm/amdgpu: add xcc index argument to select_sh_se function v2 v1: To support multiple XCD case (Le) v2: introduce xcc index to gfx_v11_0_select_sh_se (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-04-18 16:28:55 -04:00
Christian König	cd3a8a5962	drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Those functions never worked correctly since it is still perfectly possible that a buffer object is released and the background worker restarted even after calling them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-2-christian.koenig@amd.com	2022-12-06 10:28:12 +01:00
Alex Deucher	d09ef24303	drm/amdgpu: clarify DC checks There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-11-15 11:51:45 -05:00
Victor Zhao	afbaa15501	Revert "drm/amdgpu: add debugfs amdgpu_reset_level" This reverts commit `5bd8d53f6f`. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: `5bd8d53f6f` ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-10-18 22:08:25 -04:00
André Almeida	0ad7347a64	drm/amd: Add detailed GFXOFF stats to debugfs Add debugfs interface to log GFXOFF statistics: - Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the time of query since system power-up - Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop. Read it to get average GFXOFF residency % multiplied by 100 during the last logging interval. Both features are designed to be keep the values persistent between suspends. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-16 18:17:31 -04:00
Victor Zhao	5bd8d53f6f	drm/amdgpu: add debugfs amdgpu_reset_level Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-16 18:14:31 -04:00
Sebin Sebastian	ad2feebd71	drm/amdgpu: double free error and freeing uninitialized null pointer Fix a double free and an uninitialized pointer read error. Both tmp and new are pointing at same address and both are freed which leads to double free. Adding a check to verify if new and tmp are free in the error_free label fixes the double free issue. new is not initialized to null which also leads to a free on an uninitialized pointer. Reviewed-by: André Almeida <andrealmeid@igalia.com> Suggested by: S. Amaranath <Amaranath.Somalapuram@amd.com> Signed-off-by: Sebin Sebastian <mailmesebin00@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-08-10 15:41:23 -04:00
André Almeida	4686177f7d	drm/amd/debugfs: Expose GFXOFF state to userspace GFXOFF has two different "state" values: one to define if the GPU is allowed/disallowed to enter GFXOFF, usually called state; and another one to define if currently GFXOFF is being used, usually called status. Even when GFXOFF is allowed, GPU firmware can decide to not used it accordingly to the GPU load. Userspace can allow/disallow GPUs to enter into GFXOFF via debugfs. The kernel maintains a counter of requests for GFXOFF (gfx_off_req_count) that should be decreased to allow GFXOFF and increased to disallow. The issue with this interface is that userspace can't be sure if GFXOFF is currently allowed. Even by checking amdgpu_gfxoff file, one might get an ambiguous 2, that means that GPU is currently out of GFXOFF, but that can be either because it's currently disallowed or because it's allowed but given the current GPU load it's enabled. Then, userspace needs to rely on the fact that GFXOFF is enabled by default on boot and to track this information. To make userspace life easier and GFXOFF more reliable, return the current state of GFXOFF to userspace when reading amdgpu_gfxoff with the same semantics of writing: 0 means not allowed, not 0 means allowed. Expose the current status of GFXOFF through a new file, amdgpu_gfxoff_status. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-25 09:31:02 -04:00
André Almeida	edadd6fc28	drm/amdpgu/debugfs: Simplify some exit paths To avoid code repetition, unify the function exit path when possible. No functional changes. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2022-07-05 16:18:14 -04:00

1 2 3

149 Commits