linux

Commit Graph

Author	SHA1	Message	Date
Alex Deucher	759e764f7d	drm/amdkfd: use GTT for VRAM on APUs only if GTT is larger If the user has configured a large carveout on a small APU, only use GTT for VRAM allocations if GTT is larger than VRAM. v2: fix reversed check (Philip) Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	8b0d068e7d	drm/amdkfd: add a new flag to manage where VRAM allocations go On big and small APUs we send KFD VRAM allocations to GTT since the carve out is either non-existent or relatively small. However, if someone sets the carve out size to be relatively large, we may end up using GTT rather than VRAM. No change of logic with this patch, but it allows the driver to determine which logic to use based on the carve out size in the future. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	cc0e91a755	drm/amdgpu: Make VBIOS image read optional Keep VBIOS image read optional for select SOCs in passthrough mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	6e8ca38ebc	drm/amdgpu: Add flag to make VBIOS read optional Certain SOCs may not need much data from VBIOS. Some data like VBIOS version used will be missed but it doesn't affect functionality. Add a flag to make VBIOS image optional. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	7e0aa70681	drm/amdgpu: Add VBIOS flags Instead of read_bios, use get_bios_flags to get various options around reading VBIOS. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Lijo Lazar	e986e89659	drm/amdgpu: Add wrapper for freeing vbios memory Use bios_release wrapper to release memory allocated for vbios image and reset the variables. v2: Use the same wrapper for clean up in sw_fini (Alex Deucher) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:08 -05:00
Alex Deucher	15f00b073c	drm/amdgpu/gfx9: use amdgpu_gfx_off_ctrl_immediate() for PG Use amdgpu_gfx_off_ctrl_immediate() when powergating. There's no need for the delay in gfx off allow. The powergating is dynamically disabled/enabled as for RV/PCO on compute queues and allowing gfx off again as soon the job is submitted improves power savings. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Alex Deucher	250d9769ee	drm/amdgpu/gfx: add amdgpu_gfx_off_ctrl_immediate() Same as amdgpu_gfx_off_ctrl(), but without the delay for gfxoff disallow. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:07 -05:00
Lijo Lazar	a5219b41dd	drm/amdgpu: Clean up atom header file inclusion atom bios header files are not required in these files. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Alex Deucher	abab978127	drm/amdgpu/sdma4: drop gfxoff calls in dump ip state SDMA 4.x is not part of the GFX power domain so this is not necessary. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:04:06 -05:00
Sathishkumar S	64dc2f0029	drm/amdgpu: Enable devcoredump for JPEG5_0_0 Add register list and enable devcoredump for JPEG5_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	8ecd4ec6a5	drm/amdgpu: Enable devcoredump for JPEG2_5_0 Add register list and enable devcoredump for JPEG2_5_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	63d5f8db53	drm/amdgpu: Enable devcoredump for JPEG2_0_0 Add register list and enable devcoredump for JPEG2_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	d949e91b42	drm/amdgpu: Enable devcoredump for JPEG3_0_0 Add register list and enable devcoredump for JPEG3_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	2b0ccf3923	drm/amdgpu: Enable devcoredump for JPEG4_0_5 Add register list and enable devcoredump for JPEG4_0_5 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	c3dddd6029	drm/amdgpu: Enable devcoredump for JPEG4_0_0 Add register list and enable devcoredump for JPEG4_0_0 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	358b3774a0	drm/amdgpu: Enable devcoredump for JPEG5_0_1 Add register list and enable devcoredump for JPEG5_0_1 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:02 -05:00
Sathishkumar S	08527cb534	drm/amdgpu: Enable devcoredump for JPEG4_0_3 Add register list and enable devcoredump for JPEG4_0_3 V2: (Lijo) - remove version specific callbacks and use simplified helper functions V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() and avoid the call here Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	df996b5eff	drm/amdgpu: Add helper funcs for jpeg devcoredump Add devcoredump helper functions that can be reused for all jpeg versions. V2: (Lijo) - add amdgpu_jpeg_reg_dump_init() and amdgpu_jpeg_reg_dump_fini() - use reg_list and reg_count from init() to dump and print registers - memory allocation and freeing is moved to the init() and fini() V3: (Lijo) - move amdgpu_jpeg_reg_dump_fini() to sw_fini() Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Shiwu Zhang	b3dd2903b0	drm/amdgpu: Enable IFWI update support with PSPv13.0.12 Make psp_vbflash_status and psp_vbflash available in sysfs Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	5caea7a589	drm/amdgpu: Add support for smuio 13.0.11 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	37971df806	drm/amdgpu: Add support for nbio 7.9.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	a03f5f8d56	drm/amdgpu: Add support for smu 13.0.12 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Mangesh Gadre	05fd502e04	drm/amdgpu: Add support for umc 12.5.0/mmhub 1.8.1 Add new IP version support Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:01 -05:00
Sathishkumar S	580dac7437	drm/amdgpu: Add a func for core specific reg offset Add an inline function to calculate core specific register offsets for JPEG v4.0.3 and reuse it, makes code more readable and easier to align. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	2f9a32b589	drm/amdgpu: Clean up IP version checks in gmcv9.0 Clean up some IP version checks in gmcv9.0 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a52e6cb06b	drm/amdgpu: Clean up GFX v9.4.3 IP version checks Remove unnecessary IP version checks for GFX 9.4.3 and similar variants. Wrap checks inside meaningful function. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	a01e934242	drm/amdgpu: Use version to figure out harvest info IP tables with version <=2 may use harvest bit. For version 3 and above, harvest bit is not applicable, instead uses harvest table. Fix the logic accordingly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Lijo Lazar	31f9ed5882	drm/amdgpu: Pass IP instance/hwid as parameters Use IP instance number and hwid as function args for validation checks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	17585c07c2	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.1.1/10.1.2 GPUs Enable the cleaner shader for GFX10.1.1/10.1.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.1.1/10.1.2 GPUs, previously available for GFX10.1.10. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Alex Deucher	e818635a31	drm/amdgpu: update and cleanup PM4 headers Consolidate PM4 definitions. Most of these were previously only defined in UMDs. Add them here as well and sync with latest packets. Also no need to include soc15d.h on gfx10+. Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Suggested-by: Saurabh Verma <saurabh.verma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:03:00 -05:00
Srinivasan Shanmugam	25961bad92	drm/amdgpu/gfx10: Add cleaner shader for GFX10.1.10 This commit adds the cleaner shader microcode for GFX10.1.0 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_10_1_0_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Victor Skvortsov	0489339776	drm/amdgpu: Skip err_count sysfs creation on VF unsupported RAS blocks VFs are not able to query error counts for all RAS blocks. Rather than returning error for queries on these blocks, skip sysfs the creation all together. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Hawking Zhang	16b85a0942	drm/amdgpu: Update usage for bad page threshold The driver's behavior varies based on the configuration of amdgpu_bad_page_threshold setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:59 -05:00
Mario Limonciello	50e30e3a0e	drm/amd: Mark amdgpu.gttsize parameter as deprecated and show warnings on use When not set `gttsize` module parameter by default will get the value to use for the GTT pool from the TTM page limit, which is set by a separate module parameter. This inevitably leads to people not sure which one to set when they want more addressable memory for the GPU, and you'll end up seeing instructions online saying to set both. Add some messages to try to guide people both who are using or misusing the parameters and mark the parameter as deprecated with the plan to drop it after the next LTS kernel release. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jiang Liu	38e8ca3e4b	amdgpu/soc15: enable asic reset for dGPU in case of suspend abort When GPU suspend is aborted, do the same for dGPU as APU to reset soc15 asic. Otherwise it may cause following errors: [ 547.229463] amdgpu 0001:81:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126827] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_0.2.1.0 test failed (-110) [ 555.126901] [drm:amdgpu_gfx_enable_kcq [amdgpu]] ERROR KCQ enable failed [ 555.126957] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] ERROR resume of IP block <gfx_v9_4_3> failed -110 [ 555.126959] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). [ 555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110 [ 555.126966] PM: Device 0000:0a:00.0 failed to resume async: error -110 This fix has been tested on Mi308X. Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Tested-by: Shuo Liu <shuox.liu@linux.alibaba.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/2462b4b12eb9d025e82525178d568cbaa4c223ff.1736739303.git.gerry@linux.alibaba.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:58 -05:00
Jesse.zhang@amd.com	30f7f53a5b	drm/amdgpu/gfx10: implement gfx queue reset via MMIO Using mmio to do queue reset v2: Alignment the function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Jesse.zhang@amd.com	ffdd7a7b28	drm/amdgpu/gfx10: implement queue reset via MMIO Using mmio to do queue reset. v2: Alignment this function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:57 -05:00
Lijo Lazar	f7a594e405	drm/amdgpu: Use active umc info from discovery There could be configs where some UMC instances are harvested. This information is obtained through discovery data and populated in umc.active_mask. Avoid reassigning this as AID mask, instead use the mask directly while iterating through umc instances. This is to avoid accesses to harvested UMC instances. v2: fix warning (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Amber Lin	46d0436a3e	drm/amdgpu: Set noretry default for GC 9.5.0 Set GC 9.5.0 noretry default as 1 for better performance. It can be changed by the administrator using amdgpu.noretry=0 or by the user using HSA_XNACK=1 environment variable. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviwanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Le Ma	23cb207751	drm/amdgpu: read harvest info from harvest table for gfx950 Harvest table is applied for gfx950. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Shiwu Zhang	667b96134c	drm/amdgpu: enlarge the VBIOS binary size limit Some chips have a larger VBIOS file so raise the size limit to support the flashing tool. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	5f95a15495	drm/amdgpu: add dynamic workload profile switching for gfx12 Enable dynamic workload profile switching for gfx12. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	963537ca23	drm/amdgpu: add dynamic workload profile switching for gfx11 Enable dynamic workload profile switching for gfx11. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	b9467983b7	drm/amdgpu: add dynamic workload profile switching for gfx10 Enable dynamic workload profile switching for gfx10. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:56 -05:00
Alex Deucher	8fdb3958e3	drm/amdgpu/gfx: add ring helpers for setting workload profile Add helpers to switch the workload profile dynamically when commands are submitted. This allows us to switch to the FULLSCREEN3D or COMPUTE profile when work is submitted. Add a delayed work handler to delay switching out of the selected profile if additional work comes in. This works the same as the VIDEO profile for VCN. This lets dynamically enable workload profiles on the fly and then move back to the default when there is no work. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Xiaogang Chen	8544374c0f	drm/amdkfd: Have kfd driver use same PASID values from graphic driver Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions, ex: adev in CPX mode. This patch has kfd driver use same pasid values that graphic driver generated which is per vm per pasid. These pasid values are passed to fw/hardware. We do not need change interrupt handler though more pasid values are used. Also, pasid values at log are replaced by user process pid; pasid values are not exposed to user. Users see their process pids that have meaning in user space. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	ca44922107	drm/amdgpu: Check RRMT status for JPEG v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Lijo Lazar	485380f7fe	drm/amdgpu: Check RRMT status for VCN v4.0.3 RRMT could get dynamically enabled/disabled by PSP firmware. Read the status from register for reading RRMT status. For VFs, this is not accessible, hence assume that it's always disabled for now. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e55565f880	drm/amdgpu: add support for PSP IP version 14.0.5 This initializes PSP IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e7704d7c72	drm/amdgpu: add support for SMU IP version 14.0.5 This initializes SMU IP version 14.0.5. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6d437d5203	drm/amdgpu: enable VCN/JPEG CGPG for GC IP version 11.5.3 Enable VCN/JPEG CGPG for ASIC with GFX version 11.5.3. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	6bde08d317	drm/amdgpu: add support for MMHUB IP version 3.3.2 This initializes MMHUB IP version 3.3.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	e659c9eb87	drm/amdgpu: add support for NBIO IP version 7.11.2 This initializes NBIO IP version 7.11.2. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b2e5a04147	drm/amdgpu: add support for SDMA IP version 6.1.3 This initializes SDMA IP version 6.1.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Tim Huang	b784faeba2	drm/amdgpu: add support for GC IP version 11.5.3 This initializes GC IP version 11.5.3. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:55 -05:00
Alex Deucher	20f48be63d	drm/amdgpu: add OEM i2c bus for polaris chips It uses the VGADCC bus. DC doesn't use this bus, so it is safe to add it here. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	1c0b144bf7	drm/amdgpu: rework i2c init and fini No functional change. Rework the code to allow for adding some additional i2c buses in conjunction with DC in the future. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	ba7f8eb7e4	drm/amdgpu/atombios: drop empty function This was leftover from when amdgpu was forked from radeon. The function is empty so drop it. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Alex Deucher	b217105acb	drm/amd/display/dm: handle OEM i2c buses in i2c functions Allow the creation of an OEM i2c bus and use the proper DC helpers for that case. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Sathishkumar S	8064ca6e93	drm/amdgpu: increase amdgpu max rings limit increase max rings to 132 to support all JPEG5_0_1 cores, else ring_init fails due to ring count exceeding maximum limit. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 21:02:54 -05:00
Jiang Liu	a0a455b4bc	drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode() In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Fixes: `07dbfc6b10` ("drm/amd: Use `amdgpu_ucode_*` helpers for PSP") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-12 19:47:15 -05:00
Alex Deucher	55ed2b1b50	drm/amdgpu: bump version for RV/PCO compute fix Bump the driver version for RV/PCO compute stability fix so mesa can use this check to enable compute queues on RV/PCO. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:15 -05:00
Alex Deucher	b35eb9128e	drm/amdgpu/gfx9: manually control gfxoff for CS on RV When mesa started using compute queues more often we started seeing additional hangs with compute queues. Disabling gfxoff seems to mitigate that. Manually control gfxoff and gfx pg with command submissions to avoid any issues related to gfxoff. KFD already does the same thing for these chips. v2: limit to compute v3: limit to APUs v4: limit to Raven/PCO v5: only update the compute ring_funcs v6: Disable GFX PG v7: adjust order Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Błażej Szczygieł <mumei6102@gmail.com> Suggested-by: Sergey Kovalenko <seryoga.engineering@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3861 Link: https://lists.freedesktop.org/archives/amd-gfx/2025-January/119116.html Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-02-12 19:47:01 -05:00
Philipp Stanner	796a9f55a8	drm/sched: Use struct for drm_sched_init() params drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit `6f1cacf4eb` ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org	2025-02-12 11:59:52 +01:00
Maxime Ripard	93c7dd1b39	Merge drm/drm-next into drm-misc-next Bring rc1 to start the new release dev. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-02-06 13:47:32 +01:00
Marek Olšák	2255b40cac	drm/amdgpu: add a BO metadata flag to disable write compression for Vulkan Vulkan can't support DCC and Z/S compression on GFX12 without WRITE_COMPRESS_DISABLE in this commit or a completely different DCC interface. AMDGPU_TILING_GFX12_SCANOUT is added because it's already used by userspace. Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-02-03 12:11:36 -05:00
Kenneth Feng	5cda56bd86	drm/amd/amdgpu: change the config of cgcg on gfx12 change the config of cgcg on gfx12 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x	2025-01-28 16:22:39 -05:00
Shaoyun Liu	335acfb64e	drm/amd/amdgpu: Enable scratch data dump for mes 12 MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch data location during ucode start, driver side need to start the MES one by one with different setting for each pipe Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:13 -05:00
Mario Limonciello	7e4cb7dea2	drm/amd: Clarify kdoc for amdgpu.gttsize Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled by what the TTM page limit is set to. Clarify the kdoc. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:56:08 -05:00
Srinivasan Shanmugam	dc915275ea	drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and width, ensuring that the function can still determine these capabilities from the device itself. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth() error: we previously assumed 'parent' could be null (see line 6180) drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device adev, 6171 enum pci_bus_speed speed, 6172 enum pcie_link_width width) 6173 { 6174 struct pci_dev parent = adev->pdev; 6175 6176 if (!speed \|\| !width) 6177 return; 6178 6179 parent = pci_upstream_bridge(parent); 6180 if (parent && parent->vendor == PCI_VENDOR_ID_ATI) { ^^^^^^ If parent is NULL 6181 /* use the upstream/downstream switches internal to dGPU / 6182 speed = pcie_get_speed_cap(parent); 6183 width = pcie_get_width_cap(parent); 6184 while ((parent = pci_upstream_bridge(parent))) { 6185 if (parent->vendor == PCI_VENDOR_ID_ATI) { 6186 / use the upstream/downstream switches internal to dGPU / 6187 speed = pcie_get_speed_cap(parent); 6188 width = pcie_get_width_cap(parent); 6189 } 6190 } 6191 } else { 6192 / use the device itself / --> 6193 speed = pcie_get_speed_cap(parent); ^^^^^^ Then we are toasted here. 6194 *width = pcie_get_width_cap(parent); 6195 } 6196 } Fixes: `757e8b951c` ("drm/amdgpu: cache gpu pcie link width") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:26 -05:00
Lin.Cao	b529093999	drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment commit `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will cause emit switch buffer be skippen if job->vm set as NULL. Check job rather than vm could solve this problem. Fixes: `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:55:04 -05:00
Alex Deucher	64314e3f9c	drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL Combine the platform and GPU caps like we do for PCIe Gen. This aligns properly with expectations and documentation for the interface. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:30 -05:00
Alex Deucher	757e8b951c	drm/amdgpu: cache gpu pcie link width Get the PCIe link with of the device itself (or it's integrated upstream bridge) and cache that. v2: fix typo Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:53:24 -05:00
Lijo Lazar	a0db1ea0dd	drm/amdgpu: Refine ip detection log message 'add ip block' causes a confusion if the blocks are disabled later with ip_block_mask. Instead change to 'detected' and also add device context. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:58 -05:00
Lijo Lazar	b1df8050e7	drm/amdgpu: Add handler for SDMA context empty Context empty interrupt is enabled for SDMA 4.4.2. Add a handler for context empty interrupt so that it is disposed of fast, and not propagated to KFD layer. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-24 09:52:43 -05:00
Srinivasan Shanmugam	19b7f7c721	drm/amdgpu/gfx12: Add Cleaner Shader Support for GFX12.0 GPUs This commit enables the cleaner shader feature for GFX12.0 and GFX12.0.1 GPUs. The cleaner shader is important for clearing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs) between workloads. - This feature ensures that GPU resources are reset between workloads, preventing data leaks and ensuring accurate computation. By enabling the cleaner shader, this update enhances the security and reliability of GPU operations on GFX12.0 hardware. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kent Russell	f2935a3019	drm/amdgpu: Mark debug KFD module params as unsafe Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Gui Chengming	62952a38d9	drm/amdgpu: fix fw attestation for MP0_14_0_{2/3} FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	def59436fb	drm/amdgpu: always sync the GFX pipe on ctx switch That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Christian König	177b76a8d8	drm/amdgpu: mark a bunch of module parameters unsafe We sometimes have people trying to use debugging options in production environments. Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Tvrtko Ursulin	e996127ec1	drm/amdgpu: Use DRM scheduler API in amdgpu_xcp_release_sched Lets use the existing helper instead of peeking into the structure directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Kenneth Feng	2affe2bbc9	drm/amdgpu: disable gfxoff with the compute workload on gfx12 Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 11:06:50 -05:00
Srinivasan Shanmugam	0b6b2dd383	drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] * DEADLOCK * [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: `afefd6f245` ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-14 10:38:33 -05:00
Dave Airlie	c3d590f8ba	amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ4FXWgAKCRC93/aFa7yZ 2MudAQCzmzUNF9W29JOcset09IcS24Xe5liXrJWzHIPaHhQ25QD/ZU4JHb1947/8 EnS3P7vraGPuCCet2aKmiWgtay7zggE= =5nDZ -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.14-2025-01-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2025-01-10: amdgpu: - Fix max surface handling in DC - clang fixes - DCN 3.5 fixes - DCN 4.0.1 fixes - DC CRC fixes - DML updates - DSC fixes - PSR fixes - DC add some divide by 0 checks - SMU13 updates - SR-IOV fixes - RAS fixes - Cleaner shader support for gfx10.3 dGPUs - fix drm buddy trim handling - SDMA engine reset updates _ Fix RB bitmap setup - Fix doorbell ttm cleanup - Add CEC notifier support - DPIA updates - MST fixes amdkfd: - Shader debugger fixes - Trap handler cleanup - Cleanup includes - Eviction fence wq fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110172731.2960668-1-alexander.deucher@amd.com	2025-01-13 11:13:13 +10:00
Victor Zhao	85b73415fd	drm/amdgpu: fill the ucode bo during psp resume for SRIOV refill the ucode bo during psp resume for SRIOV, otherwise ucode load will fail after VM hibernation and fb clean. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Srinivasan Shanmugam	9814626751	drm/amdgpu/gfx10: Enable cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs Enable the cleaner shader for GFX10.3.2/10.3.4/10.3.5 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX10.3.2/10.3.4/10.3.5 GPUs, previously available for GFX10.3.0. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jonathan Kim	86bde64cb7	drm/amdgpu: fix gpu recovery disable with per queue reset Per queue reset should be bypassed when gpu recovery is disabled with module parameter. Fixes: `ee0a469cf9` ("drm/amdkfd: support per-queue reset on gfx9") Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:57 -05:00
Jiang Liu	edec9b0690	drm/amdgpu: wrong array index to get ip block for PSP The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx, instead we should call amdgpu_device_ip_get_ip_block() to get the corresponding IP block oject. Fix some checkpatch issues (Alex) Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jiang Liu	60a2c0c12b	drm/amdgpu: tear down ttm range manager for doorbell in amdgpu_ttm_fini() Tear down ttm range manager for doorbell in function amdgpu_ttm_fini(), to avoid memory leakage. Fixes: `792b84fb90` ("drm/amdgpu: initialize ttm for doorbells") Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	6b34d0328b	drm/amdgpu: fix incorrect number of active RBs for gfx12 The RB bitmap should be global active RB bitmap & active RB bitmap based on active SA. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Tim Huang	4a60c55b3b	drm/amdgpu: fix incorrect active RB bitmap in setup RBs The RB bitmap width per SA may be 0x1 for some ASICs. Use the actual bitmap of SA instead of 0x3 to determine the active RB bitmap. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Dan Carpenter	6ec6cd9acb	drm/amdgpu: Fix shift type in amdgpu_debugfs_sdma_sched_mask_set() The "mask" and "val" variables are type u64. The problem is that the BIT() macros are type unsigned long which is just 32 bits on 32bit systems. It's unlikely that people will be using this driver on 32bit kernels and even if they did we only use the lower AMDGPU_MAX_SDMA_INSTANCES (16) bits. So this bug does not affect anything in real life. Still, for correctness sake, u64 bit masks should use BIT_ULL(). Fixes: `d2e3961ae3` ("drm/amdgpu: add amdgpu_sdma_sched_mask debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/d39a9325-87a4-4543-b6ec-1c61fca3a6fc@stanley.mountain Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:56 -05:00
Jesse Zhang	f7e672e6f8	drm/amdgpu: enable gfx12 queue reset flag Enable the kgq and kcq queue reset flag Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:02:47 -05:00
Jesse Zhang	39b0fa29f6	drm/amdgpu/sdma4.4.2: add apu support in sdma queue reset Remove apu check in sdma queue reset. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-09 16:01:29 -05:00
Dave Airlie	0739b8ba82	drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ3uZUQAKCRAnX84Zoj2+ dgM8AX4ur9y2eXLVQPS2IhCouWpFsYgSRnCysdVG43vszZ2kcObvlj4UV8nrrv7j W6x9FZYBfRLm6ctAUnu05ppm3zSbSdmsocadu1mfoDbShy31Pc5xklnB1u6M3Asw 3mOtO5dxkA== =QZ6P -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2025-01-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.14: UAPI Changes: - Clarify drm memory stats documentation Cross-subsystem Changes: Core Changes: - sched: Documentation fixes, Driver Changes: - amdgpu: Track BO memory stats at runtime - amdxdna: Various fixes - hisilicon: New HIBMC driver - bridges: - Provide default implementation of atomic_check for HDMI bridges - it605: HDCP improvements, MCCS Support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106-augmented-kakapo-of-action-0cf000@houat	2025-01-09 15:48:50 +10:00
Dmitry Baryshkov	26d6fd8191	drm/connector: make mode_valid take a const struct drm_display_mode The mode_valid() callbacks of drm_encoder, drm_crtc and drm_bridge take a const struct drm_display_mode argument. Change the mode_valid callback of drm_connector to also take a const argument. Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241214-drm-connector-mode-valid-const-v2-5-4f9498a4c822@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2025-01-07 12:45:19 +02:00
Kent Russell	6c9c97387b	drm/amdgpu: Remove unnecessary NULL check container_of cannot return NULL, so it is unnecessary to check for NULL after gem_to_amdgpu_bo, which is just a container_of call Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Arunpravin Paneer Selvam	3318ba94e5	drm/amdgpu: Add a lock when accessing the buddy trim function When running YouTube videos and Steam games simultaneously, the tester found a system hang / race condition issue with the multi-display configuration setting. Adding a lock to the buddy allocator's trim function would be the solution. <log snip> [ 7197.250436] general protection fault, probably for non-canonical address 0xdead000000000108 [ 7197.250447] RIP: 0010:__alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250470] Call Trace: [ 7197.250472] <TASK> [ 7197.250475] ? show_regs+0x6d/0x80 [ 7197.250481] ? die_addr+0x37/0xa0 [ 7197.250483] ? exc_general_protection+0x1db/0x480 [ 7197.250488] ? drm_suballoc_new+0x13c/0x93d [drm_suballoc_helper] [ 7197.250493] ? asm_exc_general_protection+0x27/0x30 [ 7197.250498] ? __alloc_range+0x8b/0x340 [amddrm_buddy] [ 7197.250501] ? __alloc_range+0x109/0x340 [amddrm_buddy] [ 7197.250506] amddrm_buddy_block_trim+0x1b5/0x260 [amddrm_buddy] [ 7197.250511] amdgpu_vram_mgr_new+0x4f5/0x590 [amdgpu] [ 7197.250682] amdttm_resource_alloc+0x46/0xb0 [amdttm] [ 7197.250689] ttm_bo_alloc_resource+0xe4/0x370 [amdttm] [ 7197.250696] amdttm_bo_validate+0x9d/0x180 [amdttm] [ 7197.250701] amdgpu_bo_pin+0x15a/0x2f0 [amdgpu] [ 7197.250831] amdgpu_dm_plane_helper_prepare_fb+0xb2/0x360 [amdgpu] [ 7197.251025] ? try_wait_for_completion+0x59/0x70 [ 7197.251030] drm_atomic_helper_prepare_planes.part.0+0x2f/0x1e0 [ 7197.251035] drm_atomic_helper_prepare_planes+0x5d/0x70 [ 7197.251037] drm_atomic_helper_commit+0x84/0x160 [ 7197.251040] drm_atomic_nonblocking_commit+0x59/0x70 [ 7197.251043] drm_mode_atomic_ioctl+0x720/0x850 [ 7197.251047] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251049] drm_ioctl_kernel+0xb9/0x120 [ 7197.251053] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251056] drm_ioctl+0x2d4/0x550 [ 7197.251058] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [ 7197.251063] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] [ 7197.251186] __x64_sys_ioctl+0xa0/0xf0 [ 7197.251190] x64_sys_call+0x143b/0x25c0 [ 7197.251193] do_syscall_64+0x7f/0x180 [ 7197.251197] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7197.251199] ? amdgpu_display_user_framebuffer_create+0x215/0x320 [amdgpu] [ 7197.251329] ? drm_internal_framebuffer_create+0xb7/0x1a0 [ 7197.251332] ? srso_alias_return_thunk+0x5/0xfbef5 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Fixes: `4a5ad08f53` ("drm/amdgpu: Add address alignment support to DCC buffers") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Prike Liang	2b11179e18	drm/amdgpu: reduce RLC safe mode request for gfx clock gating The driver can only request one time for the power safe mode instead of polling and disabling the power feature each time prior to program the GFX clock gating control registers. This update will reduce the latency on the GFX clock gating entry. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:29 -05:00
Srinivasan Shanmugam	8b248b9045	drm/amdgpu/gfx10: Add cleaner shader for GFX10.3.0 This commit adds the cleaner shader microcode for GFX10.3.0 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_10_3_0_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
Srinivasan Shanmugam	9095567bc3	drm/amdgpu: Fix error handling in amdgpu_ras_add_bad_pages It ensures that appropriate error codes are returned when an error condition is detected Fixes the below; drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2849 amdgpu_ras_add_bad_pages() warn: missing error code here? 'amdgpu_umc_pages_in_a_row()' failed. drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2884 amdgpu_ras_add_bad_pages() warn: missing error code here? 'amdgpu_ras_mca2pa()' failed. v2: s/-EIO/-EINVAL, retained the use of -EINVAL from amdgpu_umc_pages_in_a_row & and amdgpu_ras_mca2pa_by_idx, when the RAS context is not initialized or the convert_ras_err_addr function is unavailable. (Thomas) V3: Returning 0 as the absence of eh_data is acceptable. (Tao) Fixes: `a8d133e625` ("drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: YiPeng Chai <yipeng.chai@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
yfeng1	62bf9fe6fa	drm/amdgpu: Fix for MEC SJT FW Load Fail on VF Users might switch to ROCM build does not include MEC SJT FW and driver needs to consider this case.w Signed-off-by: yfeng1 <yfeng1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-01-06 14:44:28 -05:00
Dave Airlie	8368e9719d	amd-drm-next-6.14-2024-12-18: amdgpu: - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - Panel replay fixes - DCN 3.5 updates - DP tunneling fixes - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - Misc cleanups - GC 12.x updates - DRM panic support - DC FAMS updates - DSC fixes - job handling fixes amdkfd: - GG 9.5 updates - Logging improvements - Misc cleanups - Various Optimizations -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ2MsKgAKCRC93/aFa7yZ 2D4tAP9YZI2TMu8hMjNKPRp1GDvA/GptRzZNRg3AMTK0HLhQzwEAocsJ72GnZL6e t3+c4i72+b0JBi/jzSy5PsVZsqG+6gg= =MFJT -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.14-2024-12-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.14-2024-12-18: amdgpu: - RAS updates - ISP updates - SDMA queue reset support - Rework DPM powergating interfaces - Documentation updates and cleanups - Panel replay fixes - DCN 3.5 updates - DP tunneling fixes - Use a pm notifier to more gracefully handle VRAM eviction on suspend or hibernate - Add debugfs interfaces for forcing scheduling to specific engine instances - GG 9.5 updates - IH 4.4 updates - Make missing optional firmware less noisy - PSP 13.x updates - SMU 13.x updates - VCN 5.x updates - JPEG 5.x updates - Misc cleanups - GC 12.x updates - DRM panic support - DC FAMS updates - DSC fixes - job handling fixes amdkfd: - GG 9.5 updates - Logging improvements - Misc cleanups - Various Optimizations Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241218201758.2580723-1-alexander.deucher@amd.com	2024-12-20 07:57:01 +10:00
Yunxiang Li	74ef9527bd	drm/amdgpu: track bo memory stats at runtime Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we track the BOs as they change states. This way when the fdinfo is queried we only need to take the status lock and copy out the usage stats with minimal impact to the runtime performance. With this new approach however, we would no longer be able to track active buffers. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-6-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:28 +01:00
Yunxiang Li	a541a6e865	drm/amdgpu: remove unused function parameter amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all callers have a reference to adev handy, so remove it for cleanliness. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-5-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:25 +01:00
Yunxiang Li	bebf2ebd70	drm: make drm-active- stats optional When memory stats is generated fresh everytime by going though all the BOs, their active information is quite easy to get. But if the stats are tracked with BO's state this becomes harder since the job scheduling part doesn't really deal with individual buffers. Make drm-active- optional to enable amdgpu to switch to the second method. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219151411.1150-3-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>	2024-12-19 16:56:17 +01:00
Dave Airlie	38e961097e	Linux 6.13-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmdfbR8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOOIH/j788VAvWIM+0RdL PcKlFdfTucdHHS4P19WT9FvK3CQw025xyliY0YFyXreCXwIu/lGGC6OW+tV7aC8c EWJPqP+kJuLStfm3vpXbEnPql1K1IOW8CfeKhLgbPW2K+BFSX1BFbm9lJ6DQo3JS GMvgn9NE7ssJ6o4Do6mICHkcOxGq39fFYxhdlB/QVro3dPWfYffDivlGhycePBle ZXWWebLlEA9pMd7xw4LJImb1huhxt1rdBzQtmInpIGP2J6CDJ4nXJdiLib/TprWO Wg1HRDrIIegMEm/xC6M0D9RmH3NGJ5AB54qnzG6nXDEPMurcBkTR6EcnQitYmp9c 7O1z9JI= =Go2H -----END PGP SIGNATURE----- Merge tag 'v6.13-rc3' into drm-next Backmerge linux 6.13-rc3 as amd next has some dependencies on fixes in it. Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-12-19 12:00:02 +10:00
Michel Dänzer	695c2c745e	drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update Third time's the charm, I hope? Fixes: `d3116756a7` ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:08 -05:00
Mirsad Todorovac	a21ab06b8c	drm/admgpu: replace kmalloc() and memcpy() with kmemdup() The static analyser tool gave the following advice: ./drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c:1266:7-14: WARNING opportunity for kmemdup → 1266 tmp = kmalloc(used_size, GFP_KERNEL); 1267 if (!tmp) 1268 return -ENOMEM; 1269 → 1270 memcpy(tmp, &host_telemetry->body.error_count, used_size); Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics. Original code works without fault, so this is not a bug fix but proposed improvement. Link: https://lwn.net/Articles/198928/ Fixes: `84a2947ecc` ("drm/amdgpu: Implement virt req_ras_err_count") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Zhigang Luo <Zhigang.Luo@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Vignesh Chander <Vignesh.Chander@amd.com> Cc: Danijel Slivka <danijel.slivka@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:08 -05:00
Philip Yang	e37ccf44ac	drm/amdgpu: Show warning message if IH ring overflow If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA interrupts and cause application soft hang. Show warning message with ring name if overflow happens. Add function to get ih ring name to avoid duplicating it. To keep warning message consistent between GPU generations, change all *_ih.c except ASICs older than Vega which has only one ih ring. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Philip Yang	1b00143231	drm/amdgpu: Optimize gfx v9 GPU page fault handling After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is different. Each page fault copy to KFD ih fifo to send event to user space by KFD interrupt worker, this could cause KFD ih fifo overflow while other processes generate events at same time. KFD process is aborted after GPU page fault, we only need one GPU page fault interrupt sent to KFD ih fifo to send memory exception event to user space. Incease KFD ih fifo size to 2 times of IH primary ring size, to handle the burst events case. This patch handle the gfx v9 path, cover retry on/off and CAM filter on/off cases. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	11815bb0e3	drm/amdgpu: partially revert "reduce reset time" This partially reverts commit `194eb174cb`. This commit introduced a new state variable into adev without even remotely worrying about CPU barriers. Since we already have the amdgpu_in_reset() function exactly for this use case partially revert that. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	26c95e838e	drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare As soon as the prepare phase is completed the VM might be released, better set it to NULL. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Christian König	57f812d171	drm/amdgpu: fix amdgpu_coredump The VM pointer might already be outdated when that function is called. Use the PASID instead to gather the information instead. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Candice Li	d1ebe307b4	drm/amdgpu: Enable psp v14_0_3 RAS support for non-SRIOV configurations. Enable psp v14_0_3 RAS support for non-SRIOV configurations. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Philip Yang	b4b7271e5c	drm/amdgpu: Don't enable sdma 4.4.5 CTXEMPTY interrupt The sdma context empty interrupt is dropped in amdgpu_irq_dispatch as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of total interrupts and causes IH primary ring overflow when running stressful benchmark application. Disable this interrupt has no side effect found. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:39:07 -05:00
Karol Przybylski	34c4eb7d4e	drm/amdgpu: Fix potential integer overflow in scheduler mask calculations The use of 1 << i in scheduler mask calculations can result in an unintentional integer overflow due to the expression being evaluated as a 32-bit signed integer. This patch replaces 1 << i with 1ULL << i to ensure the operation is performed as a 64-bit unsigned integer, preventing overflow Discovered in coverity scan, CID 1636393, 1636175, 1636007, 1635853 Fixes: `c5c63d9cb5` ("drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs") Signed-off-by: Karol Przybylski <karprzy7@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:38:42 -05:00
Alex Deucher	f1fd1d0f40	drm/amdgpu/gfx12: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:24 -05:00
Alex Deucher	63bfd24088	drm/amdgpu/mmhub4.1: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:22 -05:00
Alex Deucher	2c8eeaaa0f	drm/amdgpu/nbio7.11: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:19 -05:00
Alex Deucher	0ec43fbece	drm/amdgpu/nbio7.0: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:16 -05:00
Alex Deucher	22b9555bc9	drm/amdgpu/nbio7.7: fix IP version check Use the helper function rather than reading it directly. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:22:03 -05:00
Andrew Martin	357ef5b3b7	drm/amdgpu: Failed to check various return code Clean up code to quiet the compiler on us failing to check the return code. Signed-off-by: Andrew Martin <Andrew.Martin@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:18:20 -05:00
Lijo Lazar	635c659fce	drm/amdgpu: Use dbg level for VBIOS check messages Driver has different ways to fetch VBIOS. If one of the methods doesn't find an authentic one, it will show misleading info messages eventhough a subsequent method finds a valid VBIOS. Keep the message level at debug and add device context. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:18:04 -05:00
Pierre-Eric Pelloux-Prayer	54a1b36d4b	drm/amdgpu: remove useless init from amdgpu_job_alloc This init is useless because base.sched will be cleared to 0 in drm_sched_job_init because of commit `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()"). Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:17:46 -05:00
Pierre-Eric Pelloux-Prayer	0014952b17	drm/amdgpu: drop the amdgpu_device argument from amdgpu_ib_free It's unused. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:17:32 -05:00
Pierre-Eric Pelloux-Prayer	2ae520cb12	drm/amdgpu: don't access invalid sched Since `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()") accessing job->base.sched can produce unexpected results as the initialisation of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the memset. This commit fixes an issue when a CS would fail validation and would be rejected after job->num_ibs is incremented. In this case, amdgpu_ib_free(ring->adev, ...) will be called, which would crash the machine because the ring value is bogus. To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this because the device is actually not used in this function. The next commit will remove the ring argument completely. Fixes: `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:16:35 -05:00
Dheeraj Reddy Jonnalagadda	69b54d7c7c	drm/amdgpu: simplify return statement in amdgpu_ras_eeprom_init Remove the logically dead code in the last return statement of amdgpu_ras_eeprom_init. The condition res < 0 is redundant since res is already checked for a negative value earlier. Replace return res < 0 ? res : 0; with return 0 to improve clarity. Fixes: `63d4c081a5` ("drm/amdgpu: Optimize EEPROM RAS table I/O") Closes: https://scan7.scan.coverity.com/#/project-view/52337/11354?selectedIssue=1602413 Signed-off-by: Dheeraj Reddy Jonnalagadda <dheeraj.linuxdev@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:16:05 -05:00
Alex Deucher	7e50642d41	drm/amd/display: add non-DC drm_panic support Add support for the drm_panic module, which displays a pretty user friendly message on the screen when a Linux kernel panic occurs. Adapt Lu Yao's code to use common helpers derived from Jocelyn's patch. This extends the non-DC code to enable access to non-CPU accessible VRAM and adds support for other DCE versions. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lu Yao <yaolu@kylinos.cn> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Harry Wentland <harry.wentland@amd.com>	2024-12-18 12:16:01 -05:00
Mario Limonciello	1ad5bdc28b	drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO If the kernel hasn't been compiled with PCIe hotplug support this can lead to problems with dGPUs that use BOCO because they effectively drop off the bus. To prevent issues, disable BOCO support when compiled without PCIe hotplug. Reported-by: Gabriel Marcano <gabemarcano@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211155601.3585256-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:15:40 -05:00
Bokun Zhang	3676f37a88	drm/amdgpu/vcn: reset fw_shared under SRIOV - The previous patch only considered the case for baremetal and is not applicable for SRIOV code path. We also need to init fw_share for SRIOV VF Fixes: `928cd772e1` ("drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Bokun Zhang <bokun.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-18 12:14:16 -05:00
Alex Deucher	fe151ed7af	drm/amdgpu: add generic display panic helper code Pull this out of Jocelyn's patch and make it generic. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Lu Yao <yaolu@kylinos.cn> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Harry Wentland <harry.wentland@amd.com>	2024-12-18 12:13:49 -05:00
Dave Airlie	d172ea67db	amd-drm-fixes-6.13-2024-12-11: amdgpu: - ISP hw init fix - SR-IOV fixes - Fix contiguous VRAM mapping for UVD on older GPUs - Fix some regressions due to drm scheduler changes - Workload profile fixes - Cleaner shader fix amdkfd: - Fix DMA map direction for migration - Fix a potential null pointer dereference - Cacheline size fixes - Runtime PM fix -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ1oJLgAKCRC93/aFa7yZ 2JoeAQCdBc9+GR9hLY7R6rjSNfwW0qQ7CovcKD/95BCvy9AtPAD9E/m7ULHPdQ6r cSwuvVzccsRM0Qnz74imXXARW+26rQ0= =Tb2v -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.13-2024-12-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.13-2024-12-11: amdgpu: - ISP hw init fix - SR-IOV fixes - Fix contiguous VRAM mapping for UVD on older GPUs - Fix some regressions due to drm scheduler changes - Workload profile fixes - Cleaner shader fix amdkfd: - Fix DMA map direction for migration - Fix a potential null pointer dereference - Cacheline size fixes - Runtime PM fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211215449.741848-1-alexander.deucher@amd.com	2024-12-13 09:43:20 +10:00
Dave Airlie	c7d6cb4c43	drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZ1G6igAKCRAnX84Zoj2+ dpx8AX4m4lM6bo7/I/SDqR6Dw6zDX2AgbupW9NzFoJmlC+X/XOLgKEoCwam+j+09 hZKYTwcBfRwVa1UDccjHNdWA0IUxUYFQUeiVk59xlBhZZs5vFKorX7r7eMQNl3S1 gcnSrwy6OQ== =/dK/ -----END PGP SIGNATURE----- Merge tag 'drm-misc-next-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next [airlied: handle module ns conflict] drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat	2024-12-13 08:48:09 +10:00
Alex Deucher	e70ba46795	drm/amdgpu/jpeg5.0.1: use num_jpeg_inst for SR-IOV They should be the same, but use the proper variable. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:26 -05:00
Alex Deucher	f53758bc34	drm/amdgpu/jpeg4.0.3: use num_jpeg_inst for SR-IOV They should be the same, but use the proper variable. Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:23 -05:00
Alex Deucher	4b842c852f	drm/amdgpu: add sysfs reset mask for vcn 5.0.1 Add the calls to the vcn 5.0.1 code. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:19 -05:00
Alex Deucher	40253e36e0	drm/amdgpu: add ip_dump support for vcn 5.0.1 Shared with vcn 5.0.0. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:37:13 -05:00
Mario Limonciello	3f6f237b9d	drm/amd: Update strapping for NBIO 2.5.0 This helps to avoid a spurious PME event on hotplug to Azalia. Cc: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reported-and-tested-by: ionut_n2001@yahoo.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215884 Tested-by: Gabriel Marcano <gabemarcano@yahoo.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241211024414.7840-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:36:56 -05:00
Jesse.zhang@amd.com	bcc263dea6	drm/amdgpu/gfx11: clean up kcq reset code Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:36:50 -05:00
Jesse.zhang@amd.com	0c0dec8207	drm/amdgpu/gfx12: clean up kcq reset code Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:35:46 -05:00
Jesse.zhang@amd.com	11974b7eac	drm/amdgpu/sdma7: Add queue reset sysfs for sdmav7 sdmv7 queue reset already supports by mmio, add its sys file. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:35:21 -05:00
Jesse.zhang@amd.com	a73a83241e	drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmio Reset gfx/compute queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:32:13 -05:00
Jesse.zhang@amd.com	0f8666138f	drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmio Reset sdma queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:32:06 -05:00
Lijo Lazar	fccb446f82	drm/amdgpu: Avoid VF for RAS recovery source check VF device sets the RAS flag when mailbox data can't be read properly. There is no conclusive way to tell if the real source is RAS error. Therefore VF schedules a KFD based reset which doesn't set RAS source. SKip checking RAS source for any VF scheduled recovery. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Vojislav Tomasevic <vojislav.tomasevic@amd.com> Reviewed-by: Yiqing Yao <yiqing.yao@amd.com> Tested-by: Yiqing Yao <yiqing.yao@amd.com> Fixes: `e1ee2111ca` ("drm/amdgpu: Prefer RAS recovery for scheduler hang") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:59 -05:00
Jesse.zhang@amd.com	f4d583cd3f	drm/amdgpu/sdma7: implement queue reset callback for sdma7 Implement sdma queue reset callback by mes_reset_queue_mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:52 -05:00
Jesse.zhang@amd.com	8a4c6fc826	drm/amdgpu/sdma7: Implement resume function for each instance Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume. This function can be used in start or restart scenarios of specific instances. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-11 17:30:39 -05:00
Candice Li	ecd1191e12	drm/amdgpu: Support nbif v6_3_1 fatal error handling Add nbif v6_3_1 fatal error handling support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:31:00 -05:00
Sonny Jiang	178ad3a9d1	drm/amdgpu: Enable VCN_5_0_1 IP block Add VCN_5_0_1 IP block to kernel boot Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:50 -05:00
Sonny Jiang	346492f30c	drm/amdgpu: Add VCN_5_0_1 support Add vcn support for VCN_5_0_1 v2: rebase, squash in fixes (Alex) Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:46 -05:00
Sathishkumar S	c406fca4b5	drm/amdgpu: enable JPEG5_0_1 ip block enable JPEG5_0_1 ip block Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:42 -05:00
Sathishkumar S	b8f57b6994	drm/amdgpu: Add JPEG5_0_1 support add support for JPEG5_0_1 v2: squash in updates, rebase on IP instance changes Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:24 -05:00
Sonny Jiang	4e4b1a1b80	drm/amdgpu: Add VCN_5_0_1 codec query Support VCN_5_0_1 codec query v2: squash in updates Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:21 -05:00
Sonny Jiang	fdce10ff8f	drm/amdgpu: Add VCN_5_0_1 firmware Add vcn_5_0_1 firmware support Signed-off-by: Sonny Jiang <sonjiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:18 -05:00
Sathishkumar S	20a3029227	drm/amdgpu: update macro for maximum jpeg rings Update the macro to accomdate more rings. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Sonny Jiang <sonjiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:13 -05:00
Alex Deucher	b1d0286c81	drm/amdgpu: update irq sec header for vcn 5.0.0 No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:29:01 -05:00
Alex Deucher	26893116c3	drm/amdgpu: update irq sec header for jpeg 5.0.0 No functional change. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:55 -05:00
Candice Li	33f1aa210a	drm/amdgpu: Add umc v8_14 ras functions Add umc v8_14 ras functions. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:39 -05:00
Candice Li	2c2b84f193	drm/amdgpu: Add psp v14_0_3 ras support Add psp v14_0_3 ras support. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:21 -05:00
Srinivasan Shanmugam	55f4139b65	drm/amd/amdgpu: Add Annotations to Process Isolation functions This update adds explanations to key functions that manage how the Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the GPU. amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period for KFD to ensure it takes turns with KGD in using the GPU. It uses a mutex to safely manage shared data, like timing and state, and tracks when KFD starts and stops waiting. amdgpu_gfx_enforce_isolation_ring_begin_use: Ensures KFD has enough time to run before new tasks are submitted to the GPU ring. It uses a mutex to synchronize access and may adjust the KFD scheduler. amdgpu_gfx_enforce_isolation_ring_end_use: Handles cleanup and state updates when finishing the use of a GPU ring. It may also adjust the KFD scheduler, using a mutex to manage shared data access. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:13 -05:00
Hawking Zhang	57bcfa89fe	drm/amdgpu: Init mmhub v1_8_1 ras func reuse mmhub v1_8 ras functuion Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:28:09 -05:00
Shiwu Zhang	bd18b11f2d	drm/amdgpu: Enable xgmi for gfx v9_5_0 Enable xgmi for gfx v9_5_0 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:58 -05:00
Asad Kamal	f79cfbac5c	drm/amdgpu: Fetch refclock for SMU v13.0.12 Add support to fetch refclock value for SMU v13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:54 -05:00
Asad Kamal	100350c373	drm/amd/pm: Add mode2 support for SMU v13.0.12 Add mode2 reset support for smu version 13.0.12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:50 -05:00
Srinivasan Shanmugam	a69f4cc278	drm/amd/amdgpu: Add Descriptions to Process Isolation and Cleaner Shader Sysfs Functions This update adds explanations to key functions related to process isolation and cleaner shader execution sysfs interfaces. - `amdgpu_gfx_set_run_cleaner_shader`: Describes how to manually run a cleaner shader, which clears the Local Data Store (LDS) and General Purpose Registers (GPRs) to ensure data isolation between GPU workloads. - `amdgpu_gfx_get_enforce_isolation`: Describes how to query the current settings of the 'enforce_isolation' feature for each GPU partition. - `amdgpu_gfx_set_enforce_isolation`: Describes how to enable or disable process isolation for GPU partitions through the sysfs interface. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:31 -05:00
Hawking Zhang	9a826c4af8	drm/amdgpu: Enable RAS for psp v13_0_12 Enable RAS Cap check and initialize RAS funcs for psp v13_0_12 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:28 -05:00
Hawking Zhang	98230feb55	drm/amdgpu: Load spdm_drv for psp v13_0_12 spdm_drv is a firmware that needs to be loaded in driver initialization phase. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:23 -05:00
Hawking Zhang	3516d35f81	drm/amdgpu: Add psp v13_0_12 firmware specifiers Add psp v13_0_12 firmware specifiers for sos and ta Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:19 -05:00
Le Ma	2d2f1622c8	drm/amdgpu: add psp 13_0_12 version support Add support for new psp 13_0_12 version Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:27:08 -05:00
Mario Limonciello	b6e6871a56	drm/amd: Show an info message about optional firmware missing With the warning from the core about missing firmware gone, users still may be notified of missing optional firmware by a more friendly message to clarify it's optional. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	2a50d94b11	drm/amdgpu: add ACA support for jpeg v4.0.3 Add ACA support for jpeg v4.0.3. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	3748c439bb	drm/amdgpu: add ACA support for vcn v4.0.3 v1: Add ACA support for vcn v4.0.3. v2: - split VCN ACA(v1) to 2 parts: vcn and jpeg. - move mmSMNAID_AID0_MCA_SMU to amdgpu_aca.h file. v3: - split JPEG ACA to another patch. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Yang Wang	abfcf95607	drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h move common ACA ipid defines into amdgpu_aca.h file. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:52 -05:00
Alex Sierra	1a3d4abd54	drm/amdgpu: add ih cam support for IH 4.4.4 Same as IH 4.4.2. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Le Ma	968e3811c3	drm/amdgpu: add initial support for sdma444 add sdma444 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Lijo Lazar	fd0c6bd82d	drm/amdgpu: Increase FRU File Id buffer size Some boards use longer File Ids. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	ae756cd853	drm/amdgpu: correct the calculation of RAS bad page After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Tao Zhou	1f06e7f344	drm/amdgpu: split ras_eeprom_init into init and check functions Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Mario Limonciello	ea5d493498	drm/amd: Add the capability to mark certain firmware as "required" Some of the firmware that is loaded by amdgpu is not actually required. For example the ISP firmware on some SoCs is optional, and if it's not present the ISP IP block just won't be initialized. The firmware loader core however will show a warning when this happens like this: ``` Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2 ``` To avoid confusion for non-required firmware, adjust the amd-ucode helper to take an extra argument indicating if the firmware is required or optional. On optional firmware use firmware_request_nowarn() instead of request_firmware() to avoid the warnings. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Hawking Zhang	0ca6d97596	drm/amdgpu: Apply gc v9_5_0 golden settings Apply gc v9_5_0 golden settings. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:51 -05:00
Alex Sierra	dad0c70507	drm/amd: update mtype flags for gfx 9.5.0 Update mtype flags to meet gfx 9.5.0 requirements for remote GPU memory and system memory. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	0b58a55af5	drm/amdgpu: add initial support for gfx950 add gfx950 basic support Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Le Ma	ebc7d1acf3	drm/amdgpu/gfx: add gfx950 microcode Add firmware declarations. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Alex Sierra	9bfe4caa4e	drm/amd: define gc ip version local variable For better readability. Also leftover orphaned code. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	3f1e050c99	drm/amdgpu: Remove gfxoff usage GFXOFF is not valid for these IP versions. Also, SDMA v4.4.2 is not in GFX domain. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Prike Liang	d2382f29ce	drm/amdgpu: Avoid to release the FW twice in the validated error There will to release the FW twice when the FW validated error. Even if the release_firmware() will further validate the FW whether is empty, but that will be redundant and inefficient. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Randy Dunlap	a567db808e	drm/amdgpu: device: fix spellos and punctuation Make spelling and punctuation changes to ease reading of the comments. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Sathishkumar S	de258d06fd	drm/amdgpu: Add amdgpu_vcn_sched_mask debugfs Add debugfs entry to enable or disable job submission to specific vcn instances. The entry is created only when there is more than an instance and is unified queue type. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Jinzhou Su	9db3aed8ea	drm/amdgpu: return error when eeprom checksum failed Return eeprom table checksum error result, otherwise it might be overwritten by next call. V2: replace DRM_ERROR with dev_err Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Mario Limonciello	2965e6355d	drm/amd: Add Suspend/Hibernate notification callback support As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend. Another class of issues exist though where due to memory fragementation there isn't a large enough contiguous space and swap isn't accessible. Add support for a suspend/hibernate notification callback that could evict VRAM before tasks are frozen. This should allow paging out to swap if necessary. Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3781 Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:50 -05:00
Lijo Lazar	edd628ad17	drm/amdgpu: Simplify cleanup check for FRU sysfs FRU info is expected to be non-NULL if FRU sys files are created. Simplify the check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Jinzhou Su	e1a34ed917	drm/amdgpu: Add secure display v2 command Add secure display v2 command to support multiple ROI instances per display. v2: fix typo and coding style issue Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Mario Limonciello	c2ee5c2f0e	drm/amd: Invert APU check for amdgpu_device_evict_resources() Resource eviction isn't needed for s3 or s2idle on APUs, but should be run for S4. As amdgpu_device_evict_resources() will be called by prepare notifier adjust logic so that APUs only cover S4. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Link: https://lore.kernel.org/r/20241128032656.2090059-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	86fa54f349	drm/amdgpu: add "restore" missing variable comment add "restore" missing variable in the fucntions sdma_v4_4_2_page_resume and sdma_v4_4_2_inst_start. This fixes the warning: warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_page_resume' warning: Function parameter or struct member 'restore' not described in 'sdma_v4_4_2_inst_start' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:49 -05:00
Sunil Khatri	093bbeb994	drm/amdgpu: Update the variable name to dma_buf Instead of fixing the warning for missing variable its better to update the variable name to match with the style followed in the code. This will fix the below mentioned warning: warning: Function parameter or struct member 'dbuf' not described in 'amdgpu_bo_create_isp_user' warning: Excess function parameter 'dma_buf' description in 'amdgpu_bo_create_isp_user' Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	ea8094abfb	drm/amdgpu: set UMC PA per NPS mode when PA is 0 The shift bit of PA varys according to NPS mode due to different address format. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	d08fb66370	drm/amdgpu: remove is_mca_add for ras_add_bad_pages Remove unnecessary variable and simplify the logic. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Tao Zhou	a8d133e625	drm/amdgpu: parse legacy RAS bad page mixed with new data in various NPS modes All legacy RAS bad pages are generated in NPS1 mode, but new bad page can be generated in any NPS mode, so we can't use retired_page stored on eeprom directly in non-nps1 mode even for legacy data. We need to take different actions for different data, new data can be identified from old data by UMC_CHANNEL_IDX_V2 flag. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Shikang Fan	0859eb540f	drm/amdgpu: Check fence emitted count to identify bad jobs In SRIOV, when host driver performs MODE 1 reset and notifies FLR to guest driver, there is a small chance that there is no job running on hw but the driver has not updated the pending list yet, causing the driver not respond the FLR request. Modify the has_job_running function to make sure if there is still running job. v2: Use amdgpu_fence_count_emitted to determine job running status. v3: Remove the timeout wait in has_job_running Signed-off-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Srinivasan Shanmugam	85b495bbbe	drm/amd/amdgpu/vcn: Fix kdoc entries for VCN clock/power gating functions This commit corrects the descriptors for the vcn_v4_0/v4_0_3/v4_0_5/v5_0_0 _set_clockgating_state and vcn_v4_0/v4_0_3/v4_0_5/v5_0_0 _set_powergating_state functions in the amdgpu driver. The parameter descriptors in the comments were mismatched with the actual function parameters. The non-existent 'handle' parameter has been replaced with the correct 'ip_block' parameter in the comments to accurately reflect the function signatures and to resolving the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1232: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1232: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1263: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1263: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2012: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2012: warning: Excess function parameter 'handle' description in 'vcn_v4_0_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2043: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:2043: warning: Excess function parameter 'handle' description in 'vcn_v4_0_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1505: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1505: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_set_clockgating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1536: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1536: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1629: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_set_powergating_state' drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1629: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_set_powergating_state' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:48 -05:00
Boyuan Zhang	cf1aa9ffd4	drm/amdgpu: move per inst variables to amdgpu_vcn_inst Move all per instance variables from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.fw[i] from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.vcn_config[i] from amdgpu_vcn to amdgpu_vcn_inst. Move adev->vcn.vcn_codec_disable_mask[i] from amdgpu_vcn to amdgpu_vcn_inst. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	f2ba8c3d51	drm/amdgpu: pass ip_block in set_clockgating_state Pass ip_block instead of adev in set_clockgating_state() callback functions. Modify set_clockgating_state()for all correspoding ip blocks. v2: remove all changes for is_idle(), remove type casting Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	80d8051124	drm/amdgpu: pass ip_block in set_powergating_state Pass ip_block instead of adev in set_powergating_state callback function. Modify set_powergating_state ip functions for all correspoding ip blocks. v2: fix a ip block index error. v3: remove type casting Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	393f026b16	drm/amdgpu: add inst to amdgpu_dpm_enable_vcn Add an instance parameter to amdgpu_dpm_enable_vcn() function, and change all calls from vcn ip functions to add instance argument. vcn generations with only one instance (v1.0, v2.0) always use 0 as instance number. vcn generations with multiple instances (v2.5, v3.0, v4.0, v4.0.3, v4.0.5, v5.0.0) use the actual instance number. v2: remove for-loop in amdgpu_dpm_enable_vcn(), and temporarily move it to vcn ip with multiple instances, in order to keep the exact same logic as before, until further separation in next patch. v3: fix missing prefix Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Boyuan Zhang	ff69bba05f	drm/amd/pm: add inst to dpm_set_powergating_by_smu Add an instance parameter to amdgpu_dpm_set_powergating_by_smu() function, and use the instance to call set_powergating_by_smu(). v2: remove duplicated functions. remove for-loop in amdgpu_dpm_set_powergating_by_smu(), and temporarily move it to amdgpu_dpm_enable_vcn(), in order to keep the exact same logic as before, until further separation in next patch. v3: drop SI logic in amdgpu_dpm_enable_vcn(). Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Tao Zhou	fcb600b078	drm/amdgpu: add interface to get die id from memory address And implement it for UMC v12_0. The die id is calculated from IPID register in bad page retirement flow, but we don't store it on eeprom and it can be also gotten from physical address. v2: get PA_C4 and PA_R13 from MCA address since they may be cleared in retired page. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:47 -05:00
Tao Zhou	2206daa1f9	drm/amdgpu: add a flag to indicate UMC channel index version v1 (legacy way): store channel index within a UMC instance in eeprom v2: store global channel index in eeprom V2: only save the flag on eeprom, clear it after saving. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	71a0e96300	drm/amdgpu: save UMC global channel index to eeprom Save the global channel index returned by RAS TA to eeprom. We can get memory physical address by MCA address and channel index. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	07dd49e1fc	drm/amdgpu: support to find RAS bad pages via old TA Old version of RAS TA doesn't support to convert MCA address stored on eeprom to physical address (PA), support to find all bad pages in one memory row by PA with old RAS TA. This approach is only suitable for nps1 mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	b02ef40772	drm/amdgpu: add function to find all memory pages in one physical row And the function can be reused across amdgpu driver. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	19d4b27aed	drm/amdgpu: retire RAS bad pages in different NPS modes There are some changes in format of memory normalized address per NPS mode, need to adjust bit mapping according to NPS mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	c3d4acf0c3	drm/amdgpu: store only one RAS bad page record for all pages in one row So eeprom space can be saved, compatible with legacy way. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Lijo Lazar	e1ee2111ca	drm/amdgpu: Prefer RAS recovery for scheduler hang Before scheduling a recovery due to scheduler/job hang, check if a RAS error is detected. If so, choose RAS recovery to handle the situation. A scheduler/job hang could be the side effect of a RAS error. In such cases, it is required to go through the RAS error recovery process. A RAS error recovery process in certains cases also could avoid a full device device reset. An error state is maintained in RAS context to detect the block affected. Fatal Error state uses unused block id. Set the block id when error is detected. If the interrupt handler detected a poison error, it's not required to look for a fatal error. Skip fatal error checking in such cases. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	0eecff79e4	drm/amdgpu: do RAS MCA2PA conversion in device init phase NPS mode is introduced, the value of memory physical address (PA) related to a MCA address varies per nps mode. We need to rely on MCA address and convert it into PA accroding to nps mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	772df3df80	drm/amdgpu: add flag to indicate the type of RAS eeprom record One UMC MCA address could map to multiply physical address (PA): AMDGPU_RAS_EEPROM_REC_PA: one record store one PA AMDGPU_RAS_EEPROM_REC_MCA: one record store one MCA address, PA is not cared about Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	95024c714b	drm/amdgpu: add TA_RAS_INV_NODE value We can set UMC node instance to invalid state if we use global channel index, and RAS TA can choose UMC address conversion approach by checking node_inst value. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	f44a30583b	drm/amdgpu: add return value for convert_ras_err_addr So upper layer can return failure directly if address conversion fails. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	76723fbc5f	drm/amdgpu: reduce memory usage for umc_lookup_bad_pages_in_a_row The function handles one page in one time, allocating umc.retire_unit bad page records is enough. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:46 -05:00
Tao Zhou	4e7812e237	drm/amdgpu: make convert_ras_err_addr visible outside UMC block And change some UMC v12 specific functions to generic version, so the code can be shared. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	3d60a30c85	drm/amdgpu: store PA with column bits cleared for RAS bad page So the code can be simplified, and no need to expose the detail of PA format outside address conversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Prike Liang	66f4f7d5aa	drm/amdgpu: reduce the mmio writes in kiq setting There's no need to perform the two MMIO writes in the KIQ Setting registers programmed period, and reducing the MMIO writes will save the driver loading time. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Jiadong Zhu	52b10d55c1	drm/amdgpu/sdma4.4.2: implement ring reset callback for sdma4.4.2 Implement sdma queue reset callback via SMU interface. v2: Leverage inst_stop/start functions in reset sequence. Use GET_INST for physical SDMA instance. Disable apu for sdma reset. v3: Rephrase error prints. v4: Remove redundant prints. Remove setting PREEMPT registers as soft reset handles it. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	5c8baccc1e	drm/amdgpu: remove redundant RAS error address coversion code Only one interface is responsible for the conversion. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Pratap Nirujogi	ebbe34edc0	drm/amd/amdgpu: Add support for isp buffers Add support to create user BOs with MC address for isp using the dma-buf handle exported for the buffers allocated from system memory in isp driver. Export amdgpu_bo_create_kernel() and amdgpu_bo_free_kernel() as well for isp to allocate GTT internal buffers required for fw to run. Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Tao Zhou	150f6c9030	drm/amdgpu: simplify RAS page retirement in one memory row Take R13 and column bits as a whole for UMC v12. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:26:45 -05:00
Christian König	f4df208177	drm/amdgpu: fix when the cleaner shader is emitted Emitting the cleaner shader must come after the check if a VM switch is necessary or not. Otherwise we will emit the cleaner shader every time and not just when it is necessary because we switched between applications. This can otherwise crash on gang submit and probably decreases performance quite a bit. v2: squash in fix from Srini (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `ee7a846ea2` ("drm/amdgpu: Emit cleaner shader at end of IB submission") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-10 10:26:18 -05:00
Pratap Nirujogi	ee2003d5fd	drm/amdgpu: Fix ISP HW init issue ISP hw_init is not called with the recent changes related to hw init levels. AMDGPU_INIT_LEVEL_DEFAULT is ignoring the ISP IP block as AMDGPU_IP_BLK_MASK_ALL is derived using incorrect max number of IP blocks. Update AMDGPU_IP_BLK_MASK_ALL to use AMD_IP_BLOCK_TYPE_NUM instead of AMDGPU_MAX_IP_NUM to fix the issue. Fixes: `14f2fe34f5` ("drm/amdgpu: Add init levels") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-10 10:19:20 -05:00
Maarten Lankhorst	33f029af89	Merge remote-tracking branch 'drm/drm-next' into drm-misc-next The v6.13-rc2 release included a bunch of breaking changes, specifically the MODULE_IMPORT_NS commit. Backmerge in order to fix them before the next pull-request. Include the fix from Stephen Roswell. Caused by commit `25c3fd1183` ("drm/virtio: Add a helper to map and note the dma addrs and lengths") Interacting with commit `cdd30ebb1b` ("module: Convert symbol namespace to string literal") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://patchwork.freedesktop.org/patch/msgid/20241209121717.2abe8026@canb.auug.org.au Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>	2024-12-09 16:35:21 +01:00
Linus Torvalds	c7cde621b2	drm fixes for -rc2, part 2 - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume firxes - amdkfd fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEciDa6W7nX7VwIxa1EpWk/0GBDXwFAmdTZ8wACgkQEpWk/0GB DXzG2A/+LJHLt+sDPRg+jkDRElAhPqwthZXXUE14Nin7xZcLlzu3L15MkWZH9nup i9B21RfkJYWCmZRDzLuX2YJkCPfESedIM9NQFqWlRn+ZdE1JTaoan4MoBLbUSD4s E1zAI6eTp5VU9kt0Ckn7GxGagSVKlFhOVxP4T7Ui4MqhBpNWUyHCm5Z1a/jp1l/u Bb0f8LWbmi7aQ8DtrkrBVloDfAeracZpp0rR2uw+a+aqnaozE0KrSeNVCODTcazT H/TjpZJoWjE74mqmcH6p+8KWkJYUkLB0A0gRnLeAQfrW+j35EppOEOK0H89JnCfw ft+/wFr/RbKyPPULJ5kVRwwyPDTPTToEP1auX+SocfVe2YkVDeRvcvlV1ElRMprO 332WGhpSZt1MXbnbjemSUn767ORwiWbMvfIGYJ1AFefYbxHrwB17LWbMAouBabR6 BwAW4Zdc+U9zb1G5xx19Hm6OCzPOaQuX8D3Gbt1FuPjrdmefExSzZF6ySXdXA8ZS EwK6qDJVV8UThWwM36zvCY3t+1LncB8JJZeFPY/gNV9bT/yBgxYwpS3AR+M5jOA1 pVvYBIcTpkQmayL15M9jDm3lZ1kmmObH3qd7nh5D1UubyqvSxek1VOUqtzVjTmkd 7AgNpB72Cm8mlzq9zz7fJgHhfpNrGndOZDGCBC+Be4y5NJS3xbU= =7EBF -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Simona Vetter: "Due to mailing list unreliability we missed the amdgpu pull, hence part two with that now included: - amdgu: mostly display fixes + jpeg vcn 1.0, sriov, dcn4.0 resume fixes - amdkfd fixes" * tag 'drm-fixes-2024-12-06' of https://gitlab.freedesktop.org/drm/kernel: drm/amdgpu: rework resume handling for display (v2) drm/amd/pm: fix and simplify workload handling Revert "drm/amd/pm: correct the workload setting" drm/amdgpu: fix sriov reinit late orders drm/amdgpu: Fix ISP hw init issue drm/amd/display: Add hblank borrowing support drm/amd/display: Limit VTotal range to max hw cap minus fp drm/amd/display: Correct prefetch calculation drm/amd/display: Add option to retrieve detile buffer size drm/amd/display: Add a left edge pixel if in YCbCr422 or YCbCr420 and odm drm/amdkfd: hard-code cacheline for gc943,gc944 drm/amdkfd: add MEC version that supports no PCIe atomics for GFX12 drm/amd/display: Fix programming backlight on OLED panels drm/amd: Sanity check the ACPI EDID drm/amdgpu/hdp7.0: do a posting read when flushing HDP drm/amdgpu/hdp6.0: do a posting read when flushing HDP drm/amdgpu/hdp5.2: do a posting read when flushing HDP drm/amdgpu/hdp5.0: do a posting read when flushing HDP drm/amdgpu/hdp4.0: do a posting read when flushing HDP drm/amdgpu/jpeg1.0: fix idle work handler	2024-12-06 13:16:41 -08:00
David (Ming Qiang) Wu	47f402a3e0	amdgpu/uvd: get ring reference from rq scheduler base.sched may not be set for each instance and should not be used for cases such as non-IB tests. Fixes: `2320c9e6a7` ("drm/sched: memset() 'job' in drm_sched_job_init()") Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-05 14:15:22 -05:00
Christian König	12f325bcd2	drm/amdgpu: fix UVD contiguous CS mapping problem When starting the mpv player, Radeon R9 users are observing the below error in dmesg. [drm:amdgpu_uvd_cs_pass2 [amdgpu]] ERROR msg/fb buffer ff00f7c000-ff00f7e000 out of 256MB segment! The patch tries to set the TTM_PL_FLAG_CONTIGUOUS for both user flag(AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS) set and not set cases. v2: Make the TTM_PL_FLAG_CONTIGUOUS mandatory for user BO's. v3: revert back to v1, but fix the check instead (chk). Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3599 Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3501 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.10+	2024-12-05 14:14:53 -05:00
Victor Zhao	9a4ab400f1	drm/amdgpu: use sjt mec fw on gfx943 for sriov Use second jump table in sriov for live migration or mulitple VF support so different VF can load different version of MEC as long as they support sjt Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-05 14:14:40 -05:00
Pratap Nirujogi	9f4ddfdc2c	Revert "drm/amdgpu: Fix ISP hw init issue" This reverts commit `274e3f4596`. Additional review comments to address. Will resubmit. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-05 14:14:11 -05:00
Jani Nikula	cb2e1c2136	drm: remove driver date from struct drm_driver and all drivers We stopped using the driver initialized date in commit `7fb8af6798` ("drm: deprecate driver date") and (eventually) started returning "0" for drm_version ioctl instead. Finish the job, and remove the unused date member from struct drm_driver, its initialization from drivers, along with the common DRIVER_DATE macros. v2: Also update drivers/accel (kernel test robot) Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Simon Ser <contact@emersion.fr> Acked-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> # msm Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/1f2bf2543aed270a06f6c707fd6ed1b78bf16712.1733322525.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2024-12-05 12:35:42 +02:00
Alex Deucher	73dae652dc	drm/amdgpu: rework resume handling for display (v2) Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1. Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata for DCN. v2: fix fence irq resume ordering Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-12-03 18:19:23 -05:00
Yiqing Yao	f3bb57b66d	drm/amdgpu: fix sriov reinit late orders Use found block to call correct init/resume function on the block. Set status.hw for resume and init. Print re-init result again. Change to use dev_info. Use amdgpu_device_ip_get_ip_block to get target block instead of loop. Fixes: `502d76308d` ("drm/amdgpu: validate resume before function call") Signed-off-by: Yiqing Yao <YiQing.Yao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-02 18:35:42 -05:00
Pratap Nirujogi	274e3f4596	drm/amdgpu: Fix ISP hw init issue ISP hw_init is not called with the recent changes related to hw init levels. AMDGPU_INIT_LEVEL_DEFAULT is ignoring the ISP IP block as AMDGPU_IP_BLK_MASK_ALL is derived using incorrect max number of IP blocks. Update AMDGPU_IP_BLK_MASK_ALL to use AMDGPU_MAX_IP_NUM instead of (AMDGPU_MAX_IP_NUM - 1) to fix the issue. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Fixes: `14f2fe34f5` ("drm/amdgpu: Add init levels") Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-02 18:35:36 -05:00
Alex Deucher	689275140c	drm/amdgpu/hdp7.0: do a posting read when flushing HDP Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-02 18:05:04 -05:00
Alex Deucher	abe1cbaec6	drm/amdgpu/hdp6.0: do a posting read when flushing HDP Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-02 17:38:47 -05:00
Alex Deucher	f756dbac1c	drm/amdgpu/hdp5.2: do a posting read when flushing HDP Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-02 17:38:41 -05:00
Alex Deucher	cf424020e0	drm/amdgpu/hdp5.0: do a posting read when flushing HDP Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-02 17:38:26 -05:00
Alex Deucher	c9b8dcabb5	drm/amdgpu/hdp4.0: do a posting read when flushing HDP Need to read back to make sure the write goes through. Cc: David Belanger <david.belanger@amd.com> Reviewed-by: Frank Min <frank.min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-12-02 17:38:05 -05:00
Alex Deucher	c6c2f66372	drm/amdgpu/jpeg1.0: fix idle work handler On VCN 1.0, VCN and JPEG use the same worker thread so cancel the vcn worker rather than jpeg. On VCN 2.0 and newer there are separate workers for each. Fixes: `93df748737` ("drm/amdgpu/jpeg: cancel the jpeg worker") Tested-by: George Zhang <george.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-12-02 17:37:31 -05:00
Peter Zijlstra	cdd30ebb1b	module: Convert symbol namespace to string literal Clean up the existing export namespace code along the same lines of commit `33def8498f` ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS \| while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify$ns$/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS$([^)])$/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(])$([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(])\(([^,]+), ([^)]+)$/ && $0 !~ /(EXPORT_SYMBOL_NS[^(])/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(])$([^,]+), ([^)]+)$/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-12-02 11:34:44 -08:00
Maxime Ripard	3aba2eba84	Merge drm/drm-next into drm-misc-next Kickstart 6.14 cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2024-12-02 12:44:18 +01:00
Linus Torvalds	2ba9f676d0	drm fixes for v6.13-rc1 i915: - hdcp: Fix when the first read and write are retried xe: - Wake up waiters after wait condition set to true - Mark the preempt fence workqueue as reclaim - Update xe2 graphics name string - Fix a couple of guc submit races - Fix pat index usage in migrate - Ensure non-cached migrate pagetable bo mappings - Take a PM ref in the delayed snapshot capture worker amdgpu: - SMU 13.0.6 fixes - XGMI fixes - SMU 13.0.7 fixes - Misc code cleanups - Plane refcount fixes - DCN 4.0.1 fixes - DC power fixes - DTO fixes - NBIO 7.11 fixes - SMU 14.0.x fixes - Reset fixes - Enable DC on LoongArch - Sysfs hotplug warning fix - Misc small fixes - VCN 4.0.3 fix - Slab usage fix - Jpeg delayed work fix amdkfd: - wptr handling fixes radeon: - Use ttm_bo_move_null() - Constify struct pci_device_id - Fix spurious hotplug - HPD fix rockchip - fix 32-bit build -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmdI09kACgkQDHTzWXnE hr6Ccg/+OvfKJDqM/U9L6GhltQ92M5zGAq0x9mjNwa2/LRrEzQvms0k2YHRxOzje 4QMj1sB/+GLrTSaf/dw4bj8IS6oMnyE6cl0HBwYwdiGP/TgUKfBRlIV+ghoPK8pJ fFuNSlSlk7OwHDvt2N7bXD8RFbH6YTEbb6E/aFHXa/920KjwJYoi8Vxwrd0dvTGV AxqXr8xFhKUoWYeajSI9E7q49FqZUzH1ZslyAIz4xRcO0jGLiGYReya6mafITjTy GFicEpbOmDNaDyliSnB+VuTd3iJAH8qBzflxXvditZo49gcFtSKTrRh+HFgAt3xD 9YaygZ7gFHiJQTrwKBAZpvz0f+OhLBm9ZW/T7ei7S6KS4KSpW0nz/BWcXJqKLub7 NUlV5nKzlgD/FIXuBJfBvM/FSyo/Nqqq/CLNQJEz9TyDWTTy/6HwmZMGD80msXrv 1D4AbisGUJzZCFjqxGE2zpTA2fAfGX0Y41+RbGTOwDwqmuzH2Oxa3ZyqW5Mw3UNi rkO7UNHrwQkLqWyamD/ENp0laxc6QNcg6kE3lRhUIrEEivMfPJF47xBlUHvfkPdq 0er+vFcekkKnsy8jDoQQ58Gzwk2+lCxvOY/BesNjjpUDdlm7mZaQ93TpBOt6KkXQ l9p5hNPeIBGbmSwIRRB4ByjMPecDS88cFDOI6qB4gBnyfkLVdmU= =opXf -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Merge window fixes, mostly amdgpu and xe, with a few other minor ones, all looks fairly normal, i915: - hdcp: Fix when the first read and write are retried xe: - Wake up waiters after wait condition set to true - Mark the preempt fence workqueue as reclaim - Update xe2 graphics name string - Fix a couple of guc submit races - Fix pat index usage in migrate - Ensure non-cached migrate pagetable bo mappings - Take a PM ref in the delayed snapshot capture worker amdgpu: - SMU 13.0.6 fixes - XGMI fixes - SMU 13.0.7 fixes - Misc code cleanups - Plane refcount fixes - DCN 4.0.1 fixes - DC power fixes - DTO fixes - NBIO 7.11 fixes - SMU 14.0.x fixes - Reset fixes - Enable DC on LoongArch - Sysfs hotplug warning fix - Misc small fixes - VCN 4.0.3 fix - Slab usage fix - Jpeg delayed work fix amdkfd: - wptr handling fixes radeon: - Use ttm_bo_move_null() - Constify struct pci_device_id - Fix spurious hotplug - HPD fix rockchip - fix 32-bit build" * tag 'drm-next-2024-11-29' of https://gitlab.freedesktop.org/drm/kernel: (48 commits) drm/xe: Take PM ref in delayed snapshot capture worker drm/xe/migrate: use XE_BO_FLAG_PAGETABLE drm/xe/migrate: fix pat index usage drm/xe/guc_submit: fix race around suspend_pending drm/xe/guc_submit: fix race around pending_disable drm/xe: Update xe2_graphics name string drm/rockchip: avoid 64-bit division Revert "drm/radeon: Delay Connector detecting when HPD singals is unstable" drm/amdgpu/jpeg: cancel the jpeg worker drm/amdgpu: fix usage slab after free drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3 drm/amdgpu: Fix sysfs warning when hotplugging drm/amdgpu: Add sysfs interface for vcn reset mask drm/amdgpu/gmc7: fix wait_for_idle callers drm/amd/pm: Remove arcturus min power limit drm/amd/pm: skip setting the power source on smu v14.0.2/3 drm/amd/pm: disable pcie speed switching on Intel platform for smu v14.0.2/3 drm/amdkfd: Use the correct wptr size drm/xe: Mark preempt fence workqueue as reclaim drm/xe/ufence: Wake up waiters after setting ufence->signalled ...	2024-11-29 13:06:06 -08:00
Linus Torvalds	55cb93fd24	Driver core changes for 6.13-rc1 Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the 2 simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog As mentioned above, there is 2 merge conflicts with your tree, one is where the file is removed (easy enough to resolve), the second is a build time error, that has been found in linux-next and the fix can be seen here: https://lore.kernel.org/r/20241107212645.41252436@canb.auug.org.au Other than that, the changes here have been in linux-next with no other reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZ0lEog8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ym+0ACgw6wN+LkLVIHWhxTq5DYHQ0QCxY8AoJrRIcKe 78h0+OU3OXhOy8JGz62W =oI5S -----END PGP SIGNATURE----- Merge tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core changes for 6.13-rc1. Nothing major for this merge cycle, except for the two simple merge conflicts are here just to make life interesting. Included in here are: - sysfs core changes and preparations for more sysfs api cleanups that can come through all driver trees after -rc1 is out - fw_devlink fixes based on many reports and debugging sessions - list_for_each_reverse() removal, no one was using it! - last-minute seq_printf() format string bug found and fixed in many drivers all at once. - minor bugfixes and changes full details in the shortlog" * tag 'driver-core-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (35 commits) Fix a potential abuse of seq_printf() format string in drivers cpu: Remove spurious NULL in attribute_group definition s390/con3215: Remove spurious NULL in attribute_group definition perf: arm-ni: Remove spurious NULL in attribute_group definition driver core: Constify bin_attribute definitions sysfs: attribute_group: allow registration of const bin_attribute firmware_loader: Fix possible resource leak in fw_log_firmware_info() drivers: core: fw_devlink: Fix excess parameter description in docstring driver core: class: Correct WARN() message in APIs class_(for_each\|find)_device() cacheinfo: Use of_property_present() for non-boolean properties cdx: Fix cdx_mmap_resource() after constifying attr in ->mmap() drivers: core: fw_devlink: Make the error message a bit more useful phy: tegra: xusb: Set fwnode for xusb port devices drm: display: Set fwnode for aux bus devices driver core: fw_devlink: Stop trying to optimize cycle detection logic driver core: Constify attribute arguments of binary attributes sysfs: bin_attribute: add const read/write callback variants sysfs: implement all BIN_ATTR_* macros in terms of __BIN_ATTR() sysfs: treewide: constify attribute callback of bin_attribute::llseek() sysfs: treewide: constify attribute callback of bin_attribute::mmap() ...	2024-11-29 11:43:29 -08:00
Linus Torvalds	28eb75e178	drm for 6.13-rc1 core: - split DSC helpers from DP helpers - clang build fixes for drm/mm test - drop simple pipeline support for gem vram - document submission error signaling - move drm_rect to drm core module from kms helper - add default client setup to most drivers - move to video aperture helpers instead of drm ones tests: - new framebuffer tests ttm: - remove swapped and pinned BOs from TTM lru panic: - fix uninit spinlock - add ABGR2101010 support bridge: - add TI TDP158 support - use standard PM OPS dma-fence: - use read_trylock instead of read_lock to help lockdep scheduler: - add errno to sched start to report different errors - add locking to drm_sched_entity_modify_sched - improve documentation xe: - add drm_line_printer - lots of refactoring - Enable Xe2 + PES disaggregation - add new ARL PCI ID - SRIOV development work - fix exec unnecessary implicit fence - define and parse OA sync props - forcewake refactoring i915: - Enable BMG/LNL ultra joiner - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+ - use DSB for plane/color mgmt - Arrow lake PCI IDs - lots of i915/xe display refactoring - enable PXP GuC autoteardown - Pantherlake (PTL) Xe3 LPD display enablement - Allow fastset HDR infoframe changes - write DP source OUI for non-eDP sinks - share PCI IDs between i915 and xe amdgpu: - SDMA queue reset support - SMU 13.0.6, JPEG 4.0.3 updates - Initial runtime repartitioning support - rework IP structs for multiple IP instances - Fetch EDID from _DDC if available - SMU13 zero rpm user control - lots of fixes/cleanups amdkfd: - Increase event FIFO size - add topology cap flag for per queue reset msm: - DPU: - SA8775P support - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable large framebuffer support - Drop MSM8998 and SDM845 - DP: - SA8775P support - GPU: - a7xx preemption support - Adreno A663 support ast: - warn about unsupported TX chips ivpu: - add coredump - add pantherlake support rockchip: - 4K@60Hz display enablement - generate pll programming tables panthor: - add timestamp query API - add realtime group priority - add fdinfo support etnaviv: - improve handling of DMA address limits - improve GPU hangcheck exynos: - Decon Exynos7870 support mediatek: - add OF graph support omap: - locking fixes bochs: - convert to gem/shmem from simpledrm v3d: - support big/super pages - add gemfs vc4: - BCM2712 support refactoring - add YUV444 format support udmabuf: - folio related fixes nouveau: - add panic support on nv50+ -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmc+efwACgkQDHTzWXnE hr6Dyg/9HVVI3lxuWAz9MEt3w+BON5KTJAxg5Zhvc5DwiUbDXghu8sfkUfanDWS5 /MqyPqLt5srXrtKTRDnzEI0Vf8YHeiDEcaydjpshEpCfteHZ7SADpvem8fp6/otV iYt8U6tMcGe9I+M2kwDkOTrKJIiyCKPi5hfBIAkxEAh6806ifPRtLkeMGbaSwBxH x6kZTE9ygGWAY7bAgbmVmm3JwrXG9mYDl9dW3cbi9gZ6PGAXHPZRUPvZoHhvfC2A UVgROH76Spm4rdWYGI3azj+gW3HsdGgUHcysb+lu37i261E+sT7kuV2UYtnOMzr5 igO1RlQ+rcfPYLG4n+oNXDMu5d1OQXELrlQzXptym4Konpd7b/GSeVctWV0wHWuv nG8g7DWAFFnLAdeWqLZpf1Brze33h5+572D3BioWB4LYSEATjwoTwcBKsdRuc4Wk RHxjumCidybTdo/8EB1ElGlH39m/mDQA0scMlVhS/BuiIssfgcBRfltI8S3HzHcW YQYq6xH7F9E3shs3/TYbWR4clm66ZTnZV6ClDfGJolzyF/hbV0rsbeSpDelpooE8 1Js7KuwVa+HvA4jtupY9vqxMTdXWwoGPfuUgKpOAreYibnd1T9Q1zVme/B1bUH05 518IjiMGCxDnBvFWaPT9DcX4zg7pS3yzjw3hGkdz3reUqat0Gy8= =8cUI -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "There's a lot of rework, the panic helper support is being added to more drivers, v3d gets support for HW superpages, scheduler documentation, drm client and video aperture reworks, some new MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel has some Pantherlake enablement and xe is getting some SRIOV bits, but just lots of stuff everywhere. core: - split DSC helpers from DP helpers - clang build fixes for drm/mm test - drop simple pipeline support for gem vram - document submission error signaling - move drm_rect to drm core module from kms helper - add default client setup to most drivers - move to video aperture helpers instead of drm ones tests: - new framebuffer tests ttm: - remove swapped and pinned BOs from TTM lru panic: - fix uninit spinlock - add ABGR2101010 support bridge: - add TI TDP158 support - use standard PM OPS dma-fence: - use read_trylock instead of read_lock to help lockdep scheduler: - add errno to sched start to report different errors - add locking to drm_sched_entity_modify_sched - improve documentation xe: - add drm_line_printer - lots of refactoring - Enable Xe2 + PES disaggregation - add new ARL PCI ID - SRIOV development work - fix exec unnecessary implicit fence - define and parse OA sync props - forcewake refactoring i915: - Enable BMG/LNL ultra joiner - Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+ - use DSB for plane/color mgmt - Arrow lake PCI IDs - lots of i915/xe display refactoring - enable PXP GuC autoteardown - Pantherlake (PTL) Xe3 LPD display enablement - Allow fastset HDR infoframe changes - write DP source OUI for non-eDP sinks - share PCI IDs between i915 and xe amdgpu: - SDMA queue reset support - SMU 13.0.6, JPEG 4.0.3 updates - Initial runtime repartitioning support - rework IP structs for multiple IP instances - Fetch EDID from _DDC if available - SMU13 zero rpm user control - lots of fixes/cleanups amdkfd: - Increase event FIFO size - add topology cap flag for per queue reset msm: - DPU: - SA8775P support - (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable large framebuffer support - Drop MSM8998 and SDM845 - DP: - SA8775P support - GPU: - a7xx preemption support - Adreno A663 support ast: - warn about unsupported TX chips ivpu: - add coredump - add pantherlake support rockchip: - 4K@60Hz display enablement - generate pll programming tables panthor: - add timestamp query API - add realtime group priority - add fdinfo support etnaviv: - improve handling of DMA address limits - improve GPU hangcheck exynos: - Decon Exynos7870 support mediatek: - add OF graph support omap: - locking fixes bochs: - convert to gem/shmem from simpledrm v3d: - support big/super pages - add gemfs vc4: - BCM2712 support refactoring - add YUV444 format support udmabuf: - folio related fixes nouveau: - add panic support on nv50+" * tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits) drm/xe/guc: Fix dereference before NULL check drm/amd: Fix initialization mistake for NBIO 7.7.0 Revert "drm/amd/display: parse umc_info or vram_info based on ASIC" drm/amd/display: Fix failure to read vram info due to static BP_RESULT drm/amdgpu: enable GTT fallback handling for dGPUs only drm/amd/amdgpu: limit single process inside MES drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X drm/amdgpu/mes12: correct kiq unmap latency drm/amdgpu: Support vcn and jpeg error info parsing drm/amd : Update MES API header file for v11 & v12 drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling drm/amdkfd: change kfd process kref count at creation drm/amdgpu: Cleanup shift coding style drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data drm/amdgpu: Implement virt req_ras_err_count drm/amdgpu: VF Query RAS Caps from Host if supported drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support drm/amd/display: 3.2.309 drm/amd/display: Adjust VSDB parser for replay feature ...	2024-11-21 14:56:17 -08:00
Alex Deucher	93df748737	drm/amdgpu/jpeg: cancel the jpeg worker Looks like these got missed when jpeg was split from vcn. Cancel the jpeg workers rather than vcn workers. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-21 15:56:23 -05:00
Vitaly Prosyak	b61badd20b	drm/amdgpu: fix usage slab after free [ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147 [ +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1 [ +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Call Trace: [ +0.000008] <TASK> [ +0.000009] dump_stack_lvl+0x76/0xa0 [ +0.000017] print_report+0xce/0x5f0 [ +0.000017] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] ? srso_return_thunk+0x5/0x5f [ +0.000015] ? kasan_complete_mode_report_info+0x72/0x200 [ +0.000016] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] kasan_report+0xbe/0x110 [ +0.000015] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000023] __asan_report_load8_noabort+0x14/0x30 [ +0.000014] drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000016] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? enable_work+0x124/0x220 [ +0.000015] ? __pfx_enable_work+0x10/0x10 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? free_large_kmalloc+0x85/0xf0 [ +0.000016] drm_sched_entity_destroy+0x18/0x30 [gpu_sched] [ +0.000020] amdgpu_vce_sw_fini+0x55/0x170 [amdgpu] [ +0.000735] ? __kasan_check_read+0x11/0x20 [ +0.000016] vce_v4_0_sw_fini+0x80/0x110 [amdgpu] [ +0.000726] amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu] [ +0.000679] ? mutex_unlock+0x80/0xe0 [ +0.000017] ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu] [ +0.000662] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? mutex_unlock+0x80/0xe0 [ +0.000016] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000663] drm_minor_release+0xc9/0x140 [drm] [ +0.000081] drm_release+0x1fd/0x390 [drm] [ +0.000082] __fput+0x36c/0xad0 [ +0.000018] __fput_sync+0x3c/0x50 [ +0.000014] __x64_sys_close+0x7d/0xe0 [ +0.000014] x64_sys_call+0x1bc6/0x2680 [ +0.000014] do_syscall_64+0x70/0x130 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit_to_user_mode+0x60/0x190 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit+0x43/0x50 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? exc_page_fault+0x7c/0x110 [ +0.000015] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] RIP: 0033:0x7ffff7b14f67 [ +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003 [ +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000 [ +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000020] </TASK> [ +0.000016] Allocated by task 383 on cpu 7 at 26.880319s: [ +0.000014] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_alloc_info+0x38/0x60 [ +0.000007] __kasan_kmalloc+0xc1/0xd0 [ +0.000007] kmalloc_trace_noprof+0x180/0x380 [ +0.000007] drm_sched_init+0x411/0xec0 [gpu_sched] [ +0.000012] amdgpu_device_init+0x695f/0xa610 [amdgpu] [ +0.000658] amdgpu_driver_load_kms+0x1a/0x120 [amdgpu] [ +0.000662] amdgpu_pci_probe+0x361/0xf30 [amdgpu] [ +0.000651] local_pci_probe+0xe7/0x1b0 [ +0.000009] pci_device_probe+0x248/0x890 [ +0.000008] really_probe+0x1fd/0x950 [ +0.000008] __driver_probe_device+0x307/0x410 [ +0.000007] driver_probe_device+0x4e/0x150 [ +0.000007] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000007] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x2ac/0x5f0 [ +0.000006] driver_register+0x13d/0x490 [ +0.000008] __pci_register_driver+0x1ee/0x2b0 [ +0.000007] llc_sap_close+0xb0/0x160 [llc] [ +0.000009] do_one_initcall+0x9c/0x3e0 [ +0.000008] do_init_module+0x241/0x760 [ +0.000008] load_module+0x51ac/0x6c30 [ +0.000006] __do_sys_init_module+0x234/0x270 [ +0.000007] __x64_sys_init_module+0x73/0xc0 [ +0.000006] x64_sys_call+0xe3/0x2680 [ +0.000006] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000015] Freed by task 2147 on cpu 6 at 160.507651s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000007] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_free_info+0x3b/0x60 [ +0.000007] poison_slab_object+0x115/0x1c0 [ +0.000007] __kasan_slab_free+0x34/0x60 [ +0.000007] kfree+0xfa/0x2f0 [ +0.000007] drm_sched_fini+0x19d/0x410 [gpu_sched] [ +0.000012] amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu] [ +0.000662] amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu] [ +0.000653] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000655] drm_minor_release+0xc9/0x140 [drm] [ +0.000071] drm_release+0x1fd/0x390 [drm] [ +0.000071] __fput+0x36c/0xad0 [ +0.000008] __fput_sync+0x3c/0x50 [ +0.000007] __x64_sys_close+0x7d/0xe0 [ +0.000007] x64_sys_call+0x1bc6/0x2680 [ +0.000007] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] The buggy address belongs to the object at ffff8881b8605f80 which belongs to the cache kmalloc-64 of size 64 [ +0.000020] The buggy address is located 8 bytes inside of freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0) [ +0.000028] The buggy address belongs to the physical page: [ +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605 [ +0.000008] anon flags: 0x17ffffc0000000(node=0\|zone=2\|lastcpupid=0x1fffff) [ +0.000007] page_type: 0xffffefff(slab) [ +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001 [ +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000 [ +0.000006] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000013] ^ [ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb [ +0.000013] ================================================================== The issue reproduced on VG20 during the IGT pci_unplug test. The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill. In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by each entity within the run queue, leading to invalid memory access. To resolve this, the order of cleanup calls is updated: Before: amdgpu_fence_driver_sw_fini amdgpu_device_ip_fini After: amdgpu_device_ip_fini amdgpu_fence_driver_sw_fini This updated order ensures that all entities in the IPs are cleaned up first, followed by proper cleanup of the schedulers. Additional Investigation: During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo buffer must be freed only as the final step in the cleanup process to prevent any premature access during earlier cleanup stages. v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-11-21 15:56:22 -05:00
Xiang Liu	928cd772e1	drm/amdgpu/vcn: reset fw_shared when VCPU buffers corrupted on vcn v4.0.3 It is not necessarily corrupted. When there is RAS fatal error, device memory access is blocked. Hence vcpu bo cannot be saved to system memory as in a regular suspend sequence before going for reset. In other full device reset cases, that gets saved and restored during resume. v2: Remove redundant code like vcn_v4_0 did v2: Refine commit message v3: Drop the volatile v3: Refine commit message Signed-off-by: Xiang Liu <xiang.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-21 15:56:22 -05:00
Jesse.zhang@amd.com	2f1b13521d	drm/amdgpu: Fix sysfs warning when hotplugging Fix the similar warning when hotplugging: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp llc overlay nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd amdgpu kvm_amd kvm ipmi_ssif amdxcp rapl drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm pcspkr drm_display_helper acpi_cpufreq drm_kms_helper video wmi k10temp i2c_piix4 acpi_ipmi ipmi_si drm zram ip_tables loop squashfs dm_multipath crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 sp5100_tco ixgbe rfkill ccp dca sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_devintf ipmi_msghandler fuse [ 155.685224] systemd-journald[1354]: Compressed data object 957 -> 524 using ZSTD [ 155.685687] CPU: 3 PID: 6960 Comm: amd_pci_unplug Not tainted 6.10.0-1148853.1.zuul.164395107d6642bdb451071313e9378d #1 [ 155.704149] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 155.712383] RIP: 0010:kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.717805] Code: a0 00 48 89 ef e8 37 96 c7 ff 5b b8 fe ff ff ff 5d 41 5c 41 5d e9 f7 96 a0 00 0f 0b eb ab 48 c7 c7 48 ba 7e 8f e8 f7 66 bf ff <0f> 0b eb dc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 155.736766] RSP: 0018:ffffb1685d7a3e20 EFLAGS: 00010296 [ 155.742108] RAX: 0000000000000038 RBX: ffff929e94c80000 RCX: 0000000000000000 [ 155.749363] RDX: ffff928e1efaf200 RSI: ffff928e1efa18c0 RDI: ffff928e1efa18c0 [ 155.756612] RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000003 [ 155.763855] R10: ffffb1685d7a3cd8 R11: ffffffff8fb3e1c8 R12: ffffffffc1ef5341 [ 155.771104] R13: ffff929e94cc5530 R14: 0000000000000000 R15: 0000000000000000 [ 155.778357] FS: 00007fd9dd8d9c40(0000) GS:ffff928e1ef80000(0000) knlGS:0000000000000000 [ 155.786594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155.792450] CR2: 0000561245ceee38 CR3: 0000000113018000 CR4: 00000000003506f0 [ 155.799702] Call Trace: [ 155.802254] <TASK> [ 155.804460] ? __warn+0x80/0x120 [ 155.807798] ? kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.812617] ? report_bug+0x164/0x190 [ 155.816393] ? handle_bug+0x3c/0x80 [ 155.819994] ? exc_invalid_op+0x17/0x70 [ 155.823939] ? asm_exc_invalid_op+0x1a/0x20 [ 155.828235] ? kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.833058] amdgpu_gfx_sysfs_fini+0x59/0xd0 [amdgpu] [ 155.838637] gfx_v9_0_sw_fini+0x123/0x1c0 [amdgpu] [ 155.843887] amdgpu_device_fini_sw+0xbc/0x3e0 [amdgpu] [ 155.849432] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] [ 155.855235] drm_dev_put.part.0+0x3c/0x60 [drm] [ 155.859914] drm_release+0x8b/0xc0 [drm] [ 155.863978] __fput+0xf1/0x2c0 [ 155.867141] __x64_sys_close+0x3c/0x80 [ 155.870998] do_syscall_64+0x64/0x170 V2: Add details in comments (Tim) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reported-by: Andy Dong <andy.dong@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-21 15:56:22 -05:00
Jesse.zhang@amd.com	fb9898243a	drm/amdgpu: Add sysfs interface for vcn reset mask Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) v4: s/sdma/vcn/ in the reset mask setup Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-21 15:56:22 -05:00
Alex Deucher	4c28e645aa	drm/amdgpu/gmc7: fix wait_for_idle callers The wait_for_idle signature was changed, but the callers were not. Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reported-by: Michel Dänzer <michel@daenzer.net> Fixes: `82ae6619a4` ("drm/amdgpu: update the handle ptr in wait_for_idle") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sunil Khatri <sunil.khatri@amd.com>	2024-11-21 15:56:22 -05:00
Thomas Weißschuh	c2753b2471	drm/amd/display: Add support for minimum backlight quirk Not all platforms provide the full range of PWM backlight capabilities supported by the hardware through ATIF. Use the generic drm panel minimum backlight quirk infrastructure to override the capabilities where necessary. Testing the backlight quirk together with the "panel_power_savings" sysfs file has not shown any negative impact. One quirk seems to be that 0% at panel_power_savings=0 seems to be slightly darker than at panel_power_savings=4. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Tested-by: Dustin L. Howett <dustin@howett.net> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241111-amdgpu-min-backlight-quirk-v7-2-f662851fda69@weissschuh.net	2024-11-21 09:28:13 -06:00
Lijo Lazar	e283f4fb08	drm/amdgpu: Use reset recovery state checks Some in_reset checks are infact checking whether the state is reinitialization after reset. Replace with reset_in_recovery calls to identify that it's really checking for recovery stage after reset. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-20 10:03:05 -05:00
Lijo Lazar	a86e0c0e94	drm/amdgpu: Add init level for post reset reinit When device needs to be reset before initialization, it's not required for all IPs to be initialized before a reset. In such cases, it needs to identify whether the IP/feature is initialized for the first time or whether it's reinitialized after a reset. Add RESET_RECOVERY init level to identify post reset reinitialization phase. This only provides a device level identification, IP/features may choose to track their state independently also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-20 10:03:05 -05:00
Mario Limonciello	349af06a3a	drm/amd: Fix initialization mistake for NBIO 7.11 devices There is a strapping issue on NBIO 7.11.x that can lead to spurious PME events while in the D0 state. Cc: stable@vger.kernel.org Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241118174611.10700-2-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-20 10:03:05 -05:00
Asad Kamal	466a59abac	drm/amd/pm: Get xgmi link status for XGMI_v_6_4_0 Get XGMI_v_6_4_0 link status and populate it to metrics v1_7 for SMU_v_13_0_6 v2: Get link status register value for each soc from separate function (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-20 09:36:48 -05:00
Linus Torvalds	0f25f0e4ef	the bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9 CN18glYmD3wRyU6Bwl4vGODouSJvDgA= =gVH3 -----END PGP SIGNATURE----- Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...	2024-11-18 12:24:06 -08:00
Thomas Zimmermann	b86711c6d6	drm/client: Move public client header to clients/ subdirectory Move the public header file drm_client_setup.h to the clients/ subdirectory and update all drivers. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241108154600.126162-3-tzimmermann@suse.de	2024-11-15 09:42:13 +01:00
Vijendar Mukunda	7013a8268d	drm/amd: Fix initialization mistake for NBIO 7.7.0 There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME events while in the D0 state. Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `447a54a0f7`) Cc: stable@vger.kernel.org	2024-11-12 17:37:39 -05:00
Christian König	5a67c31669	drm/amdgpu: enable GTT fallback handling for dGPUs only That is just a waste of time on APUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704 Fixes: `216c1282dd` ("drm/amdgpu: use GTT only as fallback for VRAM\|GTT") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e8fc090d32`) Cc: stable@vger.kernel.org	2024-11-12 17:37:38 -05:00
Vijendar Mukunda	447a54a0f7	drm/amd: Fix initialization mistake for NBIO 7.7.0 There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME events while in the D0 state. Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-12 17:10:40 -05:00
Christian König	e8fc090d32	drm/amdgpu: enable GTT fallback handling for dGPUs only That is just a waste of time on APUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704 Fixes: `216c1282dd` ("drm/amdgpu: use GTT only as fallback for VRAM\|GTT") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-12 17:10:05 -05:00
Shaoyun Liu	8521e3c5f0	drm/amd/amdgpu: limit single process inside MES This is for MES to limit only one process for the user queues Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-12 17:02:04 -05:00
Jack Xiao	79365ea707	drm/amdgpu/mes12: correct kiq unmap latency Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `cfe98204a0`) Cc: stable@vger.kernel.org # 6.11.x	2024-11-11 14:05:51 -05:00
Christian König	0e5ac88fb9	drm/amdgpu: fix check in gmc_v9_0_get_vm_pte() The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `d1a372af1c` ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1b4ca8546f`) Cc: stable@vger.kernel.org	2024-11-11 14:05:44 -05:00
David Rosca	d641a151fc	drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `69e9a9e65b`) Cc: stable@vger.kernel.org	2024-11-11 14:05:36 -05:00
Jack Xiao	cfe98204a0	drm/amdgpu/mes12: correct kiq unmap latency Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 12:22:58 -05:00
Advait Dhamorikar	408d208127	drm/amdgpu: Cleanup shift coding style Improves the coding style by updating bit-shift operations in the amdgpu_jpeg.c driver file. It ensures consistency and avoids potential issues by explicitly using 1U and 1ULL for unsigned and unsigned long long shifts in all relevant instances. Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:56:03 -05:00
shaoyunl	92fd1714ee	drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data MES internal scratch data is useful for mes debug, it can only located in VRAM, change the allocation type and increase size for mes 11 Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Feifei Xu <Feifei.Xu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:55:49 -05:00
Victor Skvortsov	84a2947ecc	drm/amdgpu: Implement virt req_ras_err_count Enable RAS late init if VF RAS Telemetry is supported. When enabled, the VF can use this interface to query total RAS error counts from the host. The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input. The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data. v2: Flip generate report & update statistics order for VF Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:55:42 -05:00
Victor Skvortsov	907fec2dfd	drm/amdgpu: VF Query RAS Caps from Host if supported If VF RAS Capability support is enabled, guest is able to retrieve the real RAS support from the host. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:55:36 -05:00
Victor Skvortsov	9928509dfc	drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry Add message handlers for RAS telemetry. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:55:08 -05:00
Victor Skvortsov	60c58d72af	drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data. Add Host RAS Telemetry enable capabilities bitfields. Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated in the RAS Telemetry region. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-11 11:55:01 -05:00
Srinivasan Shanmugam	dfb214ec91	drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUs Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
Christian König	1b4ca8546f	drm/amdgpu: fix check in gmc_v9_0_get_vm_pte() The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: `d1a372af1c` ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
Ramesh Errabolu	8e29057eec	drm/amdgpu: Inform if PCIe based P2P links are not available Raise an info message in kernel log if PCIe root complex determines that a AMD GPU device D<i> cannot have P2P communication with another AMD GPU device D<j> Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
David Rosca	69e9a9e65b	drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com	96f0b56c34	drm/amdgpu: Add sysfs interface for jpeg reset mask Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com	ea02ea9437	drm/amdgpu: Add sysfs interface for vpe reset mask Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:29 -05:00
Jesse.zhang@amd.com	59fd50b866	drm/amdgpu: Add sysfs interface for sdma reset mask Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:45:18 -05:00
Sathishkumar S	edd345f7ef	drm/amdgpu: Normalize reg offsets on VCN v4.0.3 Remote access to external AIDs isn't possible with VCN RRMT disabled and it is disabled on SoCs with GC 9.4.4, so use only local offsets. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:11:49 -05:00
Lijo Lazar	7b1ebbe856	drm/amdgpu: Avoid kcq disable during reset Reset sequence indicates that hardware already ran into a bad state. Avoid sending unmap queue request to reset KCQ. This will also cover RAS error scenarios which need a reset to recover, hence remove the check. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:11:42 -05:00
Lijo Lazar	fa31798582	drm/amdgpu: Fix map/unmap queue logic In current logic, it calls ring_alloc followed by a ring_test. ring_test in turn will call another ring_alloc. This is illegal usage as a ring_alloc is expected to be closed properly with a ring_commit. Change to commit the map/unmap queue packet first followed by a ring_test. Add a comment about the usage of ring_test. Also, reorder the current pre-condition checks of job hang or kiq ring scheduler not ready. Without them being met, it is not useful to attempt ring or memory allocations. Fixes tag refers to the original patch which introduced this issue which then got carried over into newer code. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: `6c10b5cc4e` ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c")	2024-11-08 11:10:00 -05:00
Yang Wang	2bb7dced1c	drm/amdgpu: fix ACA bank count boundary check error fix ACA bank count boundary check error. Fixes: `f5e4cc8461` ("drm/amdgpu: implement RAS ACA driver framework") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:08:38 -05:00
Jesse.zhang@amd.com	6c8d1f4b04	drm/amdgpu: Add sysfs interface for gc reset mask Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) v4: Fixing uninitialized variables (Tim) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:08:01 -05:00
chongli2	f4a3246a2c	drm/amdgpu: fix return random value when multiple threads read registers via mes. The currect code use the address "adev->mes.read_val_ptr" to store the value read from register via mes. So when multiple threads read register, multiple threads have to share the one address, and overwrite the value each other. Assign an address by "amdgpu_device_wb_get" to store register value. each thread will has an address to store register value. Signed-off-by: chongli2 <chongli2@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:07:50 -05:00
Asad Kamal	04e9101766	drm/amdgpu: Add supported NPS modes node Add sysfs node to show supported NPS mode for the partition configuration selected using xcp_config v2: Hide node if dynamic nps switch not supported v3: Fix removal of files in case of error Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-08 11:07:34 -05:00
Dave Airlie	1f8bdc31c7	amd-drm-next-6.13-2024-11-06: amdgpu: - Misc cleanups - OLED fixes - DCN 4.x fixes - DCN 3.5 fixes - 8K fixes - IPS fixes - DSC fixes - S3 fix - KASAN fix - SMU13 fixes - fdinfo fixes - USB-C fixes - ACPI fix - Fix dummy page overlapping mappings - Fix workload profile handling - Add user control for zero RPM on SMU13 - Cleaner shader updates - Stop syncing PRT map operations - Debugfs permissions fixes - Debugfs bounds check fix - RAS cleanups - Enforce isolation updates amdkfd: - Add topology cap flag for per queue reset - Add an interface to query whether KFD queues are present - Use dynamic allocation for get_cu_occupancy -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZyua0wAKCRC93/aFa7yZ 2DjiAP9aBOidQQX+qgq9brFBcm6QlSOFKnOf8ZNKJEZ3yYOYBwEAv7EY0S2xnox1 UrmLDd8APpVJZhDbQgJWaQUe09fkIgg= =G1Jb -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.13-2024-11-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.13-2024-11-06: amdgpu: - Misc cleanups - OLED fixes - DCN 4.x fixes - DCN 3.5 fixes - 8K fixes - IPS fixes - DSC fixes - S3 fix - KASAN fix - SMU13 fixes - fdinfo fixes - USB-C fixes - ACPI fix - Fix dummy page overlapping mappings - Fix workload profile handling - Add user control for zero RPM on SMU13 - Cleaner shader updates - Stop syncing PRT map operations - Debugfs permissions fixes - Debugfs bounds check fix - RAS cleanups - Enforce isolation updates amdkfd: - Add topology cap flag for per queue reset - Add an interface to query whether KFD queues are present - Use dynamic allocation for get_cu_occupancy From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106163904.189108-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-11-08 12:04:24 +10:00
Alex Deucher	4d75b94680	drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read() Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f5d873f582`) Cc: stable@vger.kernel.org	2024-11-05 10:54:11 -05:00
Alex Deucher	f790a2c494	drm/amdgpu: Adjust debugfs eviction and IB access permissions Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7ba9395430`) Cc: stable@vger.kernel.org	2024-11-05 10:53:48 -05:00
Alex Deucher	b46dadf7e3	drm/amdgpu: Adjust debugfs register access permissions Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c0cfd2e652`) Cc: stable@vger.kernel.org	2024-11-05 10:53:21 -05:00
Lijo Lazar	3ce3f85787	drm/amdgpu: Fix DPX valid mode check on GC 9.4.3 For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: `1589c82a10` ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `990c4f5807`) Cc: stable@vger.kernel.org	2024-11-05 10:52:40 -05:00
Alex Deucher	f5d873f582	drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read() Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:59 -05:00
Alex Deucher	7ba9395430	drm/amdgpu: Adjust debugfs eviction and IB access permissions Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:56 -05:00
Alex Deucher	c0cfd2e652	drm/amdgpu: Adjust debugfs register access permissions Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:49 -05:00
Christian König	bc56678184	drm/amdgpu: stop syncing PRT map operations Requested by both Bas and Friedrich. Mapping PTEs as PRT doesn't need to sync for anything. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:39 -05:00
Prike Liang	e2e9743578	drm/amdgpu: set the right AMDGPU sg segment limitation The driver needs to set the correct max_segment_size; otherwise debug_dma_map_sg() will complain about the over-mapping of the AMDGPU sg length as following: WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370 [ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd [ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom #492 [ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370 [ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05 [ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286 [ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027 [ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680 [ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930 [ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000 [ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800 [ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000 [ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0 [ 364.049605] Call Trace: [ 364.049607] <TASK> [ 364.049609] ? show_regs+0x6d/0x80 [ 364.049614] ? __warn+0x8c/0x140 [ 364.049618] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049621] ? report_bug+0x193/0x1a0 [ 364.049627] ? handle_bug+0x46/0x80 [ 364.049631] ? exc_invalid_op+0x1d/0x80 [ 364.049635] ? asm_exc_invalid_op+0x1f/0x30 [ 364.049642] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049647] __dma_map_sg_attrs+0x90/0xe0 [ 364.049651] dma_map_sgtable+0x25/0x40 [ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu] [ 364.049935] ? srso_return_thunk+0x5/0x5f [ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu] [ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm] [ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm] [ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu] [ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu] [ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu] [ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 364.051054] ? srso_return_thunk+0x5/0x5f [ 364.051057] ? tomoyo_file_ioctl+0x20/0x30 [ 364.051063] __x64_sys_ioctl+0x9c/0xd0 [ 364.051068] x64_sys_call+0x1219/0x20d0 [ 364.051073] do_syscall_64+0x51/0x120 [ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 364.051081] RIP: 0033:0x7fb2d2f1a94f Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:35:26 -05:00
Lijo Lazar	990c4f5807	drm/amdgpu: Fix DPX valid mode check on GC 9.4.3 For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: `1589c82a10` ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:34:06 -05:00
Srinivasan Shanmugam	949d817c78	drm/amdgpu/gfx11: Add cleaner shader for GFX11.0.3 This commit adds the cleaner shader microcode for GFX11.0.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_11_0_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:33:56 -05:00
Alex Deucher	e89bd3615b	drm/amdgpu/mes: fetch fw version from firmware header We need this prior to the firmware being loaded so fetch from the header. v2: fetch directly from the firmware v3: store both fw versions Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-05 10:33:39 -05:00
Thomas Weißschuh	b626816fdd	sysfs: treewide: constify attribute callback of bin_is_visible() The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-05 14:00:28 +01:00
Antonio Quartulli	a6dd15981c	drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: `c9b7c809b8` ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `91c9e221fe`) Cc: stable@vger.kernel.org	2024-11-04 12:48:21 -05:00
Lijo Lazar	e5ad71779d	drm/amdgpu: Add compatible NPS mode info Populate the compatible NPS modes also for providing partition configuration details through sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Lijo Lazar	81db4eab28	drm/amdgpu: Skip IP coredump for RAS errors For RAS errors, source of error is known. Skip the core dump of IP states. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Lijo Lazar	047767ddc9	drm/amdgpu: Group gfx sysfs functions Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Candice Li	12e5df81bb	drm/amdgpu: Add nps_mode in RAS init_flag Add nps_mode in RAS init_flag. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:23 -05:00
Jesse Zhang	d2e3961ae3	drm/amdgpu: add amdgpu_sdma_sched_mask debugfs Userspace wants to run jobs on a specific sdma ring for verification purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the sdma ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:06:15 -05:00
Jesse Zhang	c5c63d9cb5	drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs compute/gfx may have multiple rings on some hardware. In some cases, userspace wants to run jobs on a specific ring for validation purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the gfx/compute ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:53 -05:00
Prike Liang	b78612939d	drm/amdgpu: Fix dummy_read_page overlapping mappings Use the dma_map_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC attribute setting to handle the dummy page overlapping mappings. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:30 -05:00
Victor Zhao	afe260df55	drm/amdgpu: skip amdgpu_device_cache_pci_state under sriov Under sriov, host driver will save and restore vf pci cfg space during reset. And during device init, under sriov, pci_restore_state happens after fullaccess released, and it can have race condition with mmio protection enable from host side leading to missing interrupts. So skip amdgpu_device_cache_pci_state for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:30 -05:00
Antonio Quartulli	91c9e221fe	drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: `c9b7c809b8` ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 12:05:16 -05:00
R Sundar	b95264cf75	drm/amdgpu: use string choice helpers Use string choice helpers for better readability. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202410161814.I6p2Nnux-lkp@intel.com/ Signed-off-by: R Sundar <prosunofficial@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:33:24 -05:00
jeffbai@aosc.io	0174c0791c	drm/amdgpu: fix comment about amdgpu.abmlevel defaults Since `040fdcde28` ("drm/amdgpu: respect the abmlevel module parameter value if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto. However, the comment explaining the default value was not updated to reflect the change (-1, or auto; not -1, or disabled). Clarify that the default value (-1) means auto. Fixes: `040fdcde28` ("drm/amdgpu: respect the abmlevel module parameter value if it is set") Reported-by: Ruikai Liu <rickliu2000@outlook.com> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:57 -05:00
Tvrtko Ursulin	aa2ac51c8e	drm/amdgpu: Expose special on chip memory pools in fdinfo In the past these specialized on chip memory pools were reported as system memory (aka 'cpu') which was not correct and misleading. That has since been removed so lets make them visible as their own respective memory regions. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:52 -05:00
Tvrtko Ursulin	cd3037f3fc	drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfo So far these specialized on chip memory pools were reported as system memory (aka 'cpu') which is not correct and misleading. Lets remove that and consider later making them visible as their own thing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:47 -05:00
Yunxiang Li	fdee0872a2	drm/amdgpu: stop tracking visible memory stats Since on modern systems all of vram can be made visible anyways, to simplify the new implementation, drops tracking how much memory is visible for now. If this is really needed we can add it back on top of the new implementation, or just report all the BOs as visible. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:32:40 -05:00
Yunxiang Li	f286365038	drm/amdgpu: make drm-memory-* report resident memory The old behavior reports the resident memory usage for this key and the documentation say so as well. However this was accidentally changed to include buffers that was evicted. Fixes: `04bdba4654` ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo") Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:30:58 -05:00
Li Huafei	a1144da794	drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info() Fix two issues with memory allocation in amdgpu_discovery_get_nps_info() for mem_ranges: - Add a check for allocation failure to avoid dereferencing a null pointer. - As suggested by Christophe, use kvcalloc() for memory allocation, which checks for multiplication overflow. Additionally, assign the output parameters nps_type and range_cnt after the kvcalloc() call to prevent modifying the output parameters in case of an error return. Fixes: `b194d21b9b` ("drm/amdgpu: Use NPS ranges from discovery table") Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:30:28 -05:00
Alex Deucher	35984fd4a0	drm/amdgpu: add ring reset messages Add messages to make it clear when a per ring reset happens. This is helpful for debugging and aligns with other reset methods. v2: add ring name in success/fail messages (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:29:49 -05:00
Alex Deucher	efe6a87743	drm/amdgpu: fix fairness in enforce isolation handling Make sure KFD gets a turn when serializing access to the GC IP. Currently non-KFD jobs can starve KFD if they submit often enough. This patch prevents that by stalling non-KFD if its time period has elapsed. v2: fix units v3: check enablement properly Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:27:04 -05:00
Alex Deucher	8fe7cf58ff	drm/amdkfd: add an interface to query whether is KFD is active Add an interface to query whether KFD has any active queues. v2: fix build issues Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-11-04 11:25:42 -05:00
Al Viro	6348be02ee	fdget(), trivial conversions fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-11-03 01:28:06 -05:00
Dave Airlie	8a07b2623e	Merge tag 'drm-misc-next-2024-10-31' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: All of the previous pull request, with MORE! Core Changes: - Update documentation for scheduler start/stop and job init. - Add dedede and sm8350-hdk hardware to ci runs. Driver Changes: - Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d, panthor docs, and leadtek-ltk050h3146w. - Crashdump support for qaic. - Support DP compliance in zynqmp. - Add Samsung S6E88A0-AMS427AP24 panel. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com	2024-11-01 13:46:03 +10:00
Dave Airlie	e7103f8785	amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZxua4QAKCRC93/aFa7yZ 2C/TAQC3PZqI36hkKOPwdcbFq2ydK1r3xiG7Q60K0PxpTnsqKQEAuF1MEuTXfamv mVqZfJuqF3wWXzoqM190qf3947f0eQk= =MSZa -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2024-10-29 18:25:24 +10:00
Yang Wang	7daa0f6b28	drm/amdgpu: optimize ACA log print - skip to print CE ACA log. - optimize ACA log print for MCA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:41:26 -04:00
Le Ma	ea9d8863da	drm/amdgpu: add generic func to check if ta fw is applicable Separated xgmi ta is required for specific APU, and driver needs parse the ta binary properly with aux xgmi ta packed. v2: make the check function more generic (Lijo) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:41:13 -04:00
Prike Liang	d5e3d8a2a6	drm/amdgpu: clean up the suspend_complete To check the status of S3 suspend completion, use the PM core pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, clean up the AMDGPU driver's private flag suspend_complete. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:40:58 -04:00
Prike Liang	58a8c756fc	drm/amdgpu: correct the S3 abort check condition In the normal S3 entry, the TOS cycle counter is not reset during BIOS execution the _S3 method, so it doesn't determine whether the _S3 method is executed exactly. Howerver, the PM core performs the S3 suspend will set the PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend successfully. Therefore, drivers can check the pm_suspend_global_flags bit(1) to detect the S3 suspend abort event. Fixes: `6704dbf719` ("drm/amdgpu: update suspend status for aborting from deeper suspend") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:39:23 -04:00
Christian König	57e92d991e	drm/amdgpu: drop volatile from ring buffer Volatile only prevents the compiler from re-ordering reads and writes. Since we always only modify the ring buffer from one CPU thread and have an explicit barrier before signaling the HW this should have no effect at all and just prevents compiler optimisations. While at it drop the local variables as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-28 16:32:03 -04:00
Dan Carpenter	dac64cb3e0	drm/amdgpu: Fix amdgpu_ip_block_hw_fini() This NULL check is reversed so the function doesn't work. Fixes: `dad01f93f4` ("drm/amdgpu: validate hw_fini before function call") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/f4fc849e-4e76-4448-8657-caa4c69910b0@stanley.mountain Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:07:10 -04:00
Kent Russell	3c0be69bad	amdgpu: Don't print L2 status if there's nothing to print If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why some VM fault status prints in dmesg are all zeroes. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:51 -04:00
Jonathan Kim	e46738a58f	drm/amdkfd: sever xgmi io link if host driver has disable sharing Host drivers can create partial hives per guest by disabling xgmi sharing between certain peers in the main hive. Typically, these partial hives are fully connected per guest session. In the event that the host makes a mistake by adding a non-shared node to a guest session, have the KFD reflect sharing disabled by severing the IO link. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: James Yao <yiqing.yao@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:34 -04:00
Lang Yu	46186667f9	drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr Free sg table when dma_map_sgtable() failed to avoid memory leak. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:06:24 -04:00
Lijo Lazar	d37bc6a4ed	drm/amdgpu: Fix the logic for NPS request failure On a hive, NPS request is placed by the first one for all devices in the hive. If the request fails, mark the mode as UNKNOWN so that subsequent devices on unload don't request it. Also, fix the mutex double lock issue in error condition, should have been mutex_unlock. Fixes: `ee52489d12` ("drm/amdgpu: Place NPS mode request on unload") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:04:59 -04:00
YiPeng Chai	3d0ffc6418	drm/amdgpu: Reduce redundant gpu resets on nbio v7.4 On nbio v7.4, ras controller interrupt and athub interrupt are generated after injecting UE to PCIE, but gpu reset only needs to be triggered once. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-24 18:04:34 -04:00
Frank Min	108bc59fe8	drm/amdgpu: fix random data corruption for sdma 7 There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `75400f8d6e`) Cc: stable@vger.kernel.org # 6.11.x	2024-10-22 18:11:43 -04:00
Mario Limonciello	bf58f03931	drm/amd: Guard against bad data for ATIF ACPI method If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c9b7c809b8`) Cc: stable@vger.kernel.org	2024-10-22 18:08:12 -04:00
Prike Liang	32e7ee293f	drm/amdgpu: Dereference the ATCS ACPI buffer Need to dereference the atcs acpi buffer after the method is executed, otherwise it will result in a memory leak. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Lijo Lazar	591aec150a	drm/amdgpu: Save VCN shared memory with init reset VCN shared memory is in framebuffer and there are some flags initialized during sw_init. Ideally, such programming should be during hw_init. Make sure the flags are saved during reset on initialization since that reset will affect frame buffer region. For clarity, separate it out to another function. Fixes: `1e4acf4d93` ("drm/amdgpu: Add reset on init handler for XGMI") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Hao Zhou <hao.zhou@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Sunil Khatri	971d8e1c3f	drm/amdgpu: clean unused functions of uvd/vcn/vce Some of the functions pointers of amdgpu_ip_funcs are not used and are left commented out. Hence this cleans those up which arent used. Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:51:19 -04:00
Victor Lu	8b22f04833	drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih Port this change to vega20_ih.c: commit `afbf7955ff` ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts") Original commit message: "Why: Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit if RB_ENABLE is not set. How to fix: Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set. The RB_ENABLE bit is required to be set, together with WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit would clear the RB_OVERFLOW." Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:40 -04:00
Sunil Khatri	0016e87054	drm/amdgpu: Clean the functions pointer set as NULL We dont need to set the functions to NULL which arent needed as global structure members are by default set to zero or NULL for pointers. Cc: Leo Liu <leo.liu@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	8231e3af96	drm/amdgpu: clean the dummy soft_reset functions Remove the dummy soft_reset functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	f13c7da118	drm/amdgpu: clean the dummy wait_for_idle functions Remove the dummy wait_for_idle functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	aa980de3b5	drm/amdgpu: clean the dummy suspend functions Remove the dummy suspend functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	fbcd0ad5d1	drm/amdgpu: clean the dummy resume functions Remove the dummy resume functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	780002b654	drm/amdgpu: validate wait_for_idle before function call Before making a function call to wait_for_idle, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	502d76308d	drm/amdgpu: validate resume before function call Before making a function call to resume, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	e095026f00	drm/amdgpu: validate suspend before function call Before making a function call to suspend, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Sunil Khatri	dad01f93f4	drm/amdgpu: validate hw_fini before function call Before making a function call to hw_fini, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Srinivasan Shanmugam	9343b904e7	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2 This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_2_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. Also, this patch updates the `gfx_v9_0_sw_init` function to initialize the cleaner shader if the MEC firmware version is 88 or higher. It sets the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate values and attempts to initialize the cleaner shader. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This change ensures that the GPU memory is properly cleared between different processes, preventing data leakage and enhancing security. It also aligns with the serialization mechanism between KGD and KFD, ensuring that the GPU state is consistent across different workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:39 -04:00
Frank Min	c379dcf797	drm/amdgpu: fix typo for sdma6 constant fill packet Fix typo for sdma6 constant fill packet Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:38 -04:00
Frank Min	75400f8d6e	drm/amdgpu: fix random data corruption for sdma 7 There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:38 -04:00
Sunil Khatri	5ebdb6fd60	drm/amdgpu: clean the dummy sw_fini functions Remove the dummy sw_fini functions for all ip blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Lijo Lazar	785504dd7f	drm/amdgpu: Use SPX as default in partition config In certain cases - ex: when a reset is required on initialization - XCP manager won't have a valid partition mode. In such cases, use SPX as the default selected mode for which partition configuration details are populated. Fixes: `4ae86dc878` ("drm/amdgpu: Add sysfs nodes to get xcp details") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: Hao Zhou <hao.zhou@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	278b8fbf06	drm/amdgpu: validate sw_fini before function call Before making a function call to sw_fini, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	7fd12379bd	drm/amdgpu: clean the dummy sw_init functions Remove the dummy sw_init functions for all IP blocks. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Sunil Khatri	df6e463d8f	drm/amdgpu: validate sw_init before function call Before making a function call to sw_init, validate the function pointer like we do in late_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:37 -04:00
Xiaogang Chen	10112bf828	drm/amdkfd: Not restore userptr buffer if kfd process has been removed When kfd process has been terminated not restore userptr buffer after mmu notifier invalidates a range. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:12 -04:00
Lijo Lazar	8e3a3e847e	drm/amdgpu: Zero-initialize mqd backup memory Zero-initialize mqd backup memory, otherwise the check for 'already-backed-up' could go wrong. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:50:12 -04:00
Alex Deucher	32f0028969	Revert "drm/amdgpu/gfx9: put queue resets behind a debug option" This reverts commit `7c1a2d8aba`. Extended validation has completed successfully, so enable these features by default. Acked-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com>	2024-10-22 17:50:11 -04:00
Zhu Lingshan	9ee8ab245c	drm/amdgpu: init saw registers for mmhub v1.0 This commits init registers in the Stand Along Walker for mmhub v1.0, to support ISP use cases. Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Reported-and-tested-by: Du Bin <bin.du@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:49:38 -04:00
Alex Deucher	d2f57b6d89	drm/amdgpu/discovery: add ISP discovery entries for old APUs Raven1/2 and Picasso have ISP 2.0.0, however their ISP blocks are not in the IP discovery table yet. This commit fixes this issue by adding new ISP entries for Raven and Picasso in the IP discovery table. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:49:30 -04:00
Mario Limonciello	c9b7c809b8	drm/amd: Guard against bad data for ATIF ACPI method If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-22 17:22:44 -04:00
Thomas Zimmermann	1f828b4dd4	drm/client: Make client support optional Only build client code if DRM_CLIENT has been selected. Automatially do so if one of the default clients has been enabled. If client support has been disabled, the helpers for client-related events are empty and the regular client functions are not present. Amdgpu has an internal DRM client, so it has to select DRM_CLIENT by itself unconditionally. v3: - provide empty drm_client_debugfs_init() if DRM_CLIENT=n (kernel test robot) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-12-tzimmermann@suse.de	2024-10-18 09:23:03 +02:00
Thomas Zimmermann	4cf50bae05	drm/amdgpu: Suspend and resume internal clients with client helpers Replace calls to drm_fb_helper_set_suspend_unlocked() with calls to the client functions drm_client_dev_suspend() and drm_client_dev_resume(). Any registered in-kernel client will now receive suspend and resume events. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241014085740.582287-9-tzimmermann@suse.de	2024-10-18 09:23:03 +02:00
Srinivasan Shanmugam	e7457532cb	drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data ctx_data, 1146 struct amdgpu_ring *out) 1147 { 1148 struct amdgpu_ring ring; 1149 struct amdgpu_mes_gang gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 / 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 / 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bfaf188360`)	2024-10-15 11:49:08 -04:00
Michael Chen	7760d7f93c	drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `144df260f3`) Cc: stable@vger.kernel.org # 6.11.x	2024-10-15 11:48:36 -04:00
Mohammed Anees	c0ec082f10	drm/amdgpu: prevent BO_HANDLES error from being overwritten Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: `fec5f8e8c6` ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `40f2cd9882`) Cc: stable@vger.kernel.org	2024-10-15 11:48:05 -04:00
Alex Deucher	d2c72d96df	drm/amdgpu: enable enforce_isolation sysfs node on VFs It should be enabled on both bare metal and VFs. Fixes: `e189be9b2e` ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit `dc8847b054`)	2024-10-15 11:47:41 -04:00
Dan Carpenter	9f7e94af35	drm/amdgpu: Fix off by one in current_memory_partition_show() The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of bounds read. Fixes: `012be6f22c` ("drm/amdgpu: Add sysfs interfaces for NPS mode") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:26:35 -04:00
Lijo Lazar	d25d26b8a8	drm/amdgpu: Wait for reset on init completion When reset on initialization is requested, wait for the reset to finish. In cases where module is loaded after boot, this makes sure all initialization work is done after a successful return of modprobe. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ramesh Errabolu <ramesh.errabolu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:22:26 -04:00
Srinivasan Shanmugam	bfaf188360	drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data ctx_data, 1146 struct amdgpu_ring *out) 1147 { 1148 struct amdgpu_ring ring; 1149 struct amdgpu_mes_gang gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 / 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 / 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: `d0c423b647` ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:21:31 -04:00
Michael Chen	144df260f3	drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:21:08 -04:00
Lijo Lazar	f8588f051d	drm/amdgpu: Show current compute partition on VF Enable sysfs node for current compute partition mode on VFs also. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:20:32 -04:00
Lijo Lazar	b3c6871692	drm/amdgpu: Fetch NPS mode for GCv9.4.3 VFs Use the memory ranges published in discovery table to deduce NPS mode of GC v9.4.3 VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com> Tested-by: Vignesh Chander <Vignesh.Chander@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:19:46 -04:00
Mohammed Anees	40f2cd9882	drm/amdgpu: prevent BO_HANDLES error from being overwritten Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: `fec5f8e8c6` ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:18:49 -04:00
Alex Deucher	dc8847b054	drm/amdgpu: enable enforce_isolation sysfs node on VFs It should be enabled on both bare metal and VFs. Fixes: `e189be9b2e` ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-10-15 11:17:32 -04:00
Lijo Lazar	c29aeadf0b	drm/amdgpu: Add NPS switch support for GC 9.4.3 Add dynamic NPS switch support for GC 9.4.3 variants. Only GC v9.4.3 and GC v9.4.4 currently support this. NPS switch is only supported if an SOC supports multiple NPS modes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:25 -04:00
Srinivasan Shanmugam	d594ddc686	drm/amdgpu/gfx12: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v12_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:16 -04:00
Sunil Khatri	f83fc3abd5	drm/amdgpu: optimize fn gfx_v12_ring_insert_nop Optimize gfx_v12_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:09 -04:00
Sunil Khatri	950dcb0158	drm/amdgpu: optimize fn gfx_v11_ring_insert_nop Optimize gfx_v11_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:17:03 -04:00
Sunil Khatri	6aa902938b	drm/amdgpu: optimize fn gfx_v10_ring_insert_nop Optimize gfx_v10_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:57 -04:00
Sunil Khatri	1537638ae3	drm/amdgpu: optimize fn gfx_v9_ring_insert_nop Optimize gfx_v9_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:52 -04:00
Sunil Khatri	a23575bb3c	drm/amdgpu: optimize fn gfx_v9_4_3_ring_insert_nop Optimize gfx_v9_4_3_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:46 -04:00
Sunil Khatri	ea4e4754c9	drm/amdgpu: optimize insert_nop using multi dwords Optimize the ring_insert_nop fn for n dwords in one step rather then call to amdgpu_ring_write for each nop packet. This avoid function call for each nop packet and also wptr is updated once only. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:40 -04:00
Lijo Lazar	ed3dac4bf9	drm/amdgpu: Check gmc requirement for reset on init Add a callback to check if there is any condition detected by GMC block for reset on init. One case is if a pending NPS change request is detected. If reset is done because of NPS switch, refresh NPS info from discovery table. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:32 -04:00
Lijo Lazar	ee52489d12	drm/amdgpu: Place NPS mode request on unload If a user has requested NPS mode switch, place the request through PSP during unload of the driver. For devices which are part of a hive, all requests are placed together. If one of them fails, revert back to the current NPS mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-15 11:16:20 -04:00
Thomas Zimmermann	ea1d2a38fb	drm/amdgpu: Use video aperture helpers DRM's aperture functions have long been implemented as helpers under drivers/video/ for use with fbdev. Avoid the DRM wrappers by calling the video functions directly. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Acked-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930130921.689876-2-tzimmermann@suse.de	2024-10-14 15:28:47 +02:00
Dave Airlie	fc4d262721	amd-drm-fixes-6.12-2024-10-08: amdgpu: - Fix invalid UBSAN warnings - Fix artifacts in MPO transitions - Hibernation fix amdkfd: - Fix an eviction fence leak radeon: - Add late register for connectors - Always set GEM function pointers -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZwVAzQAKCRC93/aFa7yZ 2JP2AQC/n4RMsATvyJ0iWNL7R9XGNLi6B6NryaZStd/iYh8RlgD9FUZ/S3svF8kQ lwRxw61x7+0vCVBOSCM/jyt270oYqwY= =pGmT -----END PGP SIGNATURE----- Merge tag 'amd-drm-fixes-6.12-2024-10-08' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-08: amdgpu: - Fix invalid UBSAN warnings - Fix artifacts in MPO transitions - Hibernation fix amdkfd: - Fix an eviction fence leak radeon: - Add late register for connectors - Always set GEM function pointers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008142831.3739244-1-alexander.deucher@amd.com	2024-10-09 16:31:16 +10:00
Dave Airlie	54bc1d3255	Merge tag 'drm-misc-next-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: UAPI Changes: - panthor: Add realtime group priority and priority query. Cross-subsystem Changes: - Add Vivek Kasireddy as udmabuf maintainer. - Assorted udmabuf changes. - Device tree binding updates. - dmabuf documentation fixes. - Move drm_rect to drm core module from kms helper. Core Changes: - Update scheduler documentation and concurrency fixes. - drm/ci updates. - Add memory-agnostic fbdev client and client-agnostic setup helper. - Huge driver conversion for using the above. Driver Changes: - Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms, host1x. - Add panel quirks for AYA NEO panels. - Make module autoloading work for bridge/it6505 and mcde. - Add huge page support to v3d using a custom shmfs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com	2024-10-09 11:58:39 +10:00
Dave Airlie	7fefa1edc2	Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: - Add panthor/DEV_QUERY_TIMESTAMP_INFO query. Cross-subsystem Changes: - Updated dt bindings. - Add documentation explaining default errnos for fences. - Mark dma-buf heaps creation functions as __init. Core Changes: - Split DSC helpers from DP helpers. - Clang build fixes for drm/mm test. - Remove simple pipeline support for gem-vram, no longer any users left after converting bochs. - Add erno to drm_sched_start to distinguish between GPU and queue reset. - Add drm_framebuffer testcases. - Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n. - Use read_trylock instead of read_lock in dma_fence_begin_signalling to quiesce lockdep. Driver Changes: - Assorted small fixes and updates for tegra, host1x, imagination, nouveau, panfrost, panthor, panel/ili9341, mali, exynos, panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a, bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050, panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d. - Add bridge/TI TDP158. - Assorted documentation updates. - Convert bochs from simple drm to gem shmem, and check modes against available memory. - Many VC4 fixes, most related to scaling and YUV support. - Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS. - Rockchip 4k@60 support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com	2024-10-09 09:03:46 +10:00
Sunil Khatri	555cd714bd	drm/amdgpu: no need to log error in multi ring write No need to log error in multi ring write as its taken care during ring commit. This is inline with change done in amdgpu_ring_write. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:46:23 -04:00
Sunil Khatri	ccc0a18748	drm/amdgpu: move error log from ring write to commit Move the error message from ring write as an optimization to avoid printing that message on every write instead print once during commit if it exceeds write the allocated size i.e ring->count_dw. Also we do not want to log the error message in between a ring write and complete the write as its mostly not harmful as it will overwrite stale data only as GPU read from ring is faster than CPU write to ring. This reduces the size of amdgpu.ko module by around 600 Kb as write is very often used function and hence the print. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:46:15 -04:00
Andrew Kreimer	16445e408c	drm/amdgpu: fix typos Fix typos in comments: "wether -> whether". Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:48 -04:00
Tvrtko Ursulin	89cfa73b61	drm/amdgpu: Remove the while loop from amdgpu_job_prepare_job While loop makes it sound like amdgpu_vmid_grab() potentially needs to be called multiple times to produce a fence, while in reality all code paths either return an error, assign a valid job->vmid or assign a vmid which will be valid once the returned fence signals. Therefore we can remove the loop to make it clear the call does not need to be repeated. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:43 -04:00
Tvrtko Ursulin	871f44b4ba	drm/amdgpu: Drop impossible condition from amdgpu_job_prepare_job Fence has been initialised to NULL so no need to test it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:39 -04:00
Tvrtko Ursulin	04bdba4654	drm/amdgpu: Use drm_print_memory_stats helper from fdinfo Convert fdinfo memory stats to use the common drm_print_memory_stats helper. This achieves alignment with the common keys as documented in drm-usage-stats.rst, adding specifically drm-total- key the driver was missing until now. Additionally I made the code stop skipping total size for objects which currently do not have a backing store, and I added resident, active and purgeable reporting. Legacy keys have been preserved, with the outlook of only potentially removing only the drm-memory- when the time gets right. The example output now looks like this: pos: 0 flags: 02100002 mnt_id: 24 ino: 1239 drm-driver: amdgpu drm-client-id: 4 drm-pdev: 0000:04:00.0 pasid: 32771 drm-total-cpu: 0 drm-shared-cpu: 0 drm-active-cpu: 0 drm-resident-cpu: 0 drm-purgeable-cpu: 0 drm-total-gtt: 2392 KiB drm-shared-gtt: 0 drm-active-gtt: 0 drm-resident-gtt: 2392 KiB drm-purgeable-gtt: 0 drm-total-vram: 44564 KiB drm-shared-vram: 31952 KiB drm-active-vram: 0 drm-resident-vram: 44564 KiB drm-purgeable-vram: 0 drm-memory-vram: 44564 KiB drm-memory-gtt: 2392 KiB drm-memory-cpu: 0 KiB amd-memory-visible-vram: 44564 KiB amd-evicted-vram: 0 KiB amd-evicted-visible-vram: 0 KiB amd-requested-vram: 44564 KiB amd-requested-visible-vram: 11952 KiB amd-requested-gtt: 2392 KiB drm-engine-compute: 46464671 ns v2: * Track purgeable via AMDGPU_GEM_CREATE_DISCARDABLE. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <robdclark@chromium.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:25 -04:00
Tvrtko Ursulin	fc282e9e86	drm/amdgpu: Drop unused fence argument from amdgpu_vmid_grab_used Fence argument is unused so lets drop it. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-08 09:43:17 -04:00
Lang Yu	d7d7b947a4	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: `9a1c1339ab` ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5fa4362894`) Cc: stable@vger.kernel.org	2024-10-07 14:53:23 -04:00
Lijo Lazar	012be6f22c	drm/amdgpu: Add sysfs interfaces for NPS mode Add a sysfs interface to see available NPS modes to switch to - cat /sys/bus/pci/devices/../available_memory_paritition Make the current_memory_partition sysfs node read/write for requesting a new NPS mode. The request is only cached and at a later point a driver unload/reload is required to switch to the new NPS mode. Ex: echo NPS1 > /sys/bus/pci/devices/../current_memory_paritition echo NPS4 > /sys/bus/pci/devices/../current_memory_paritition The above interfaces will be available only if the SOC supports more than one NPS mode. Also modify the current memory partition sysfs logic to be more generic. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Lijo Lazar	bbc160084e	drm/amdgpu: Add gmc interface to request NPS mode Add a common interface in GMC to request NPS mode through PSP. Also add a variable in hive and gmc control to track the last requested mode. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Srinivasan Shanmugam	b1cf3ddcc3	drm/amdgpu/gfx10: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v10_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:23 -04:00
Rajneesh Bhardwaj	212cc24119	drm/amdgpu: Add PSP interface for NPS switch Implement PSP ring command interface for memory partitioning on the fly on the supported asics. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:32:00 -04:00
Srinivasan Shanmugam	fbca196953	drm/amdgpu/gfx11: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v11_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:34 -04:00
Srinivasan Shanmugam	e7cee54595	drm/amdgpu/gfx12: Implement cleaner shader support for GFX12 hardware This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v12_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v12_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:28 -04:00
Colin Ian King	1845752b2f	drm/amdgpu: Fix spelling mistake "initializtion" -> "initialization" There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:09:10 -04:00
Srinivasan Shanmugam	8fc279e5e3	drm/amdgpu/gfx11: Implement cleaner shader support for GFX11 hardware The patch modifies the gfx_v11_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v11_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v11_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v11_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:56 -04:00
Sunil Khatri	7e6487ab21	drm/amdgpu: change the comment from handle to ip_block htmldoc generation depend upon the input arguments etc to generate the document. After update of handle to ip_block then update needs in comments too to fix the warnings. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410021904.YyGjlpk9-lkp@intel.com Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:35 -04:00
Srinivasan Shanmugam	2d5f74a867	drm/amdgpu/gfx10: Implement cleaner shader support for GFX10 hardware The patch modifies the gfx_v10_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v10_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v10_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v10_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:08:29 -04:00
Lang Yu	5fa4362894	drm/amdkfd: Fix an eviction fence leak Only creating a new reference for each process instead of each VM. Fixes: `9a1c1339ab` ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:52 -04:00
Sunil Khatri	692d2cd180	drm/amdgpu: update the handle ptr in hw_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_fini. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_fini on suspend and some followed clean up. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:25 -04:00
Sunil Khatri	58608034ed	drm/amdgpu: update the handle ptr in hw_init Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_init. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_init on resume. v2: squash in isp fix Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:03:25 -04:00
Sunil Khatri	7feb4f3ad8	drm/amdgpu: update the handle ptr in resume Update the *handle to amdgpu_ip_block ptr for all functions pointers of resume. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:50 -04:00
Sunil Khatri	982d7f9bfe	drm/amdgpu: update the handle ptr in suspend Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:45 -04:00
Sunil Khatri	82ae6619a4	drm/amdgpu: update the handle ptr in wait_for_idle Update the *handle to amdgpu_ip_block ptr for all functions pointers of wait_for_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-07 14:02:36 -04:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Sunil Khatri	e15ec812b5	drm/amdgpu: update the handle ptr in post_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of post_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:45:51 -04:00
Sunil Khatri	0ef2a1e7af	drm/amdgpu: update the handle ptr in soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:45:44 -04:00
Srinivasan Shanmugam	e47cb9d253	drm/amdgpu/gfx9: Add Cleaner Shader Deinitialization in gfx_v9_0 Module This commit addresses an omission in the previous patch related to the cleaner shader support for GFX9 hardware. Specifically, it adds the necessary deinitialization code for the cleaner shader in the gfx_v9_0_sw_fini function. The added line amdgpu_gfx_cleaner_shader_sw_fini(adev); ensures that any allocated resources for the cleaner shader are freed correctly, avoiding potential memory leaks and ensuring that the GPU state is clean for the next initialization sequence. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: `c2e70d307f` ("drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware") Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:44:47 -04:00
Sunil Khatri	9d5ee7ce88	drm/amdgpu: update the handle ptr in pre_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of pre_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:44:41 -04:00
Lijo Lazar	f0b919960d	drm/amdgpu: Fix logic to determine TOS reload Avoid comparing TOS version on APUs. On APUs driver doesn't take care of TOS load. Fixes: `0ff3822613` ("drm/amdgpu: Add interface for TOS reload cases") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <Rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:50 -04:00
Sunil Khatri	6a9456e0e3	drm/amdgpu: update the handle ptr in check_soft_reset Update the *handle to amdgpu_ip_block ptr for all functions pointers of check_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:45 -04:00
Sunil Khatri	94b2e07ad4	drm/amdgpu: update the handle ptr in prepare_suspend Update the *handle to amdgpu_ip_block ptr for all functions pointers of prepare_suspend. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:38 -04:00
Sunil Khatri	47d827f9c7	drm/amdgpu: update the handle ptr in late_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of late_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:43:33 -04:00
Sunil Khatri	904c402e97	drm/amdgpu: remove the dummy fn acp_early_init acp_early_init is a dummy function and is not being used and hence removed. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:42:53 -04:00
Mario Limonciello	b472b8d829	drm/amd: Taint the kernel when enabling overdrive Some distributions have been patching amdgpu to enable overdrive by default which may compromise stability. Furthermore when bug reports are brought upstream it's not obvious that the system has been tampered with. When overdrive is enabled taint the kernel and leave a critical message in the logs for users so that it's obvious in a bug report it's been tampered with. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:41:21 -04:00
Srinivasan Shanmugam	a443852f85	drm/amdkfd: Fix kdoc entry for 'get_wave_count()' function parameters Update kdoc entries to reflect the function's parameters. The descriptor for the 'queue_cnt' parameter has been added, and the incorrect mentions of 'wave_cnt' and 'vmid', which are not parameters but local variables, have been removed. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Function parameter or struct member 'queue_cnt' not described in 'get_wave_count' drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'wave_cnt' description in 'get_wave_count' drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:954: warning: Excess function parameter 'vmid' description in 'get_wave_count' Cc: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:56 -04:00
Sunil Khatri	90410d3996	drm/amdgpu: update the handle ptr in early_fini Update the *handle to amdgpu_ip_block ptr for all functions pointers of early_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:49 -04:00
Sunil Khatri	36aa9ab9c0	drm/amdgpu: update the handle ptr in sw_fini update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:43 -04:00
Sunil Khatri	d5347e8d27	drm/amdgpu: update the handle ptr in sw_init update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:37 -04:00
Sunil Khatri	3138ab2c5b	drm/amdgpu: update the handle ptr in late_init Update the ptr handle to amdgpu_ip_block ptr in all the functions of late_init function ptr. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:31 -04:00
Sunil Khatri	146b085ead	drm/amdgpu: update the handle ptr in early_init update the handle ptr to amdgpu_ip_block ptr for all functions pointers on early_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:22 -04:00
Asad Kamal	1007264254	drm/amdgpu: Add supported partition mode node Add sysfs node to show supported partition modes across all NPS modes Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:15 -04:00
Lijo Lazar	fcd91a95df	drm/amdgpu: Add option to refresh NPS data In certain use cases, NPS data needs to be refreshed again from discovery table. Add API parameter to refresh NPS data from discovery table. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:40:07 -04:00
Jiadong Zhu	5682cd86d6	drm/amdgpu/sdma5.2: implement ring reset callback for sdma5.2 Implement sdma queue reset callback via MMIO. v2: enter/exit safemode for mmio queue reset. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:39:57 -04:00
YuanShang	1fd7c37e3f	drm/amdgpu: Flush tlb by VM_INVALIDATION packet in sdma_v5_2 In order for SDMA not to be switched between VM_INVALIDATION request and ack, use an single VM_INVALIDATION packet in function sdma_v5_2_ring_emit_vm_flush. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-By: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:39:38 -04:00
Jiadong Zhu	64acf8f69e	drm/amdgpu/sdma5.2: split out per instance resume function Extract the resume sequence from sdma_v5_2_gfx_resume for starting/restarting an individual instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:38:44 -04:00
Jiadong Zhu	5fbba6bb98	drm/amdgpu/sdma5: implement ring reset callback for sdma5 Implement sdma queue reset callback via MMIO. v2: enter/exit safemode when sdma queue reset. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:38:38 -04:00
Sunil Khatri	d60e78bdef	drm/amdgpu: update the handle ptr in print_ip_state Update the ptr handle to amdgpu_ip_block ptr in all the functions affected. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:36:17 -04:00
Lijo Lazar	4ae86dc878	drm/amdgpu: Add sysfs nodes to get xcp details Add partition config nodes in sysfs to get resource instance details for a particular partition mode. A resource could be anything like an xcc, vcn decoder, system dma units etc. Details of various resource instances are available under /sys/bus/pci/devices/.../compute_partition_config/ Select a partition configuration: /sys/bus/pci/devices/.../compute_partition_config/xcp_config Number of instances of a resource: /sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_inst Total partitions sharing the resource: /sys/bus/pci/devices/.../compute_partition_config/<rsrc_name>/num_shared v2: Update node name as per spec Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:56 -04:00
Sunil Khatri	fa73462dc0	drm/amdgpu: update the handle ptr in dump_ip_state Update the ptr handle to amdgpu_ip_block ptr in all the functions. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:51 -04:00
Jiadong Zhu	94daae9744	drm/amdgpu/sdma5: split out per instance resume function Extract the resume sequence from sdma_v5_0_gfx_resume for starting/restarting an individual instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-10-01 17:28:43 -04:00
Linus Torvalds	994aeacbb3	drm fixes for 6.12-rc1 i915: - Fix BMG support to UHBR13.5 - Two PSR fixes - Fix colorimetry detection for DP xe - Fix macro for checking minimum GuC version - Fix CCS offset calculation for some BMG SKUs - Fix locking on memory usage reporting via fdinfo and BO destroy - Fix GPU page fault handler on a closed VM - Fix overflow in oa batch buffer amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix amdgpu: - CU occupancy logic fix - SDMA queue fix -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmb3QQoACgkQDHTzWXnE hr7+yQ//dBbBOdxwWcQA6N4p5KOQdhMecU1LaGkv0nbV6ft4mxhHq2XSczr7DMJM C56cgTplw4Lajfo0Q/gwgoLcgK4EzRc3kReb8gC9diyZ+zgolZbS5uSnJsS3IsST xriP2sb+lsPH6UAqtUZABMA6aOE7VYmJnRZlo0tOulRRYSnX++gTPQIi2PprVzBh jFlFmLABCqvZ5md0ux8NITzRpE2sODuawKTpdXoTMTVsrXF+YBtRaJD170eC4mj1 3JDmsY90TpvHWri4BHQ98VqJBpzLiIU8COQHaZab2cfV+yH+KfUo2puH1RS4swW5 gbrOAbK/OXzoX+6aT1rYuDihcrX5+88MZovhRW7Ik0dEm5Ysl8PRlRkftuMa2mg9 tUJjjUfmGDf9eiYCUt7/BuDguN2lc+r/TOM4F+2kmB0dxDkYn3u1W95DxseoiLHt Sq/M2sWm9p/TjDC9XW+vy9dfuoucEyQfdiPqKP27BheckCGF1SskLFW+oZCq3iF9 0RJsvpwQBSxsLR0/oJok9cxmSAhpZoUiV0zKuqCcP+OTIFI4urKujom/XrJIjayU fg0vaXzPd9crzSZX1rqF8/UDx8uV4uf4IHD9MNrCYIXpiVJHWzx0afU1AE5576F5 sT335W/nG6BHsrV/PIRR62v3QU0yLkjQv6VbWqJwMZumuQ2x/iI= =r1M/ -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Regular fixes for the week to end the merge window, i915 and xe have a few each, amdgpu makes up most of it with a bunch of SR-IOV related fixes amongst others. i915: - Fix BMG support to UHBR13.5 - Two PSR fixes - Fix colorimetry detection for DP xe: - Fix macro for checking minimum GuC version - Fix CCS offset calculation for some BMG SKUs - Fix locking on memory usage reporting via fdinfo and BO destroy - Fix GPU page fault handler on a closed VM - Fix overflow in oa batch buffer amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix - CU occupancy logic fix - SDMA queue fix" * tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits) drm/amd/pm: update workload mask after the setting drm/amdgpu: bump driver version for cleared VRAM drm/amdgpu: fix vbios fetching for SR-IOV drm/amdgpu: fix PTE copy corruption for sdma 7 drm/amdkfd: Add SDMA queue quantum support for GFX12 drm/amdgpu/vcn: enable AV1 on both instances drm/amdkfd: Fix CU occupancy for GFX 9.4.3 drm/amdkfd: Update logic for CU occupancy calculations drm/amdgpu: skip coredump after job timeout in SRIOV drm/amdgpu: sync to KFD fences before clearing PTEs drm/amdgpu/mes12: set enable_level_process_quantum_check drm/i915/dp: Fix colorimetry detection drm/amdgpu/mes12: reduce timeout drm/amdgpu/mes11: reduce timeout drm/amdgpu: use GEM references instead of TTMs v2 drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT` drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification' drm/amdgpu: update golden regs for gfx12 drm/amdgpu: clean up vbios fetching code drm/amd/display: handle nulled pipe context in DCE110's set_drr() ...	2024-09-28 08:47:46 -07:00
Lijo Lazar	c75c5285e5	drm/amdgpu: Add PSP reload case to reset-on-init A reset on initialization will be needed if a new PSP TOS needs to be loaded than the one currently active on the system. This is possible only on SOCs which support a full device reset which results in unload of active PSP TOS. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:07:16 -04:00
Lijo Lazar	0ff3822613	drm/amdgpu: Add interface for TOS reload cases Add interface to check if a different TOS needs to be loaded than the one which is which is already active on the SOC. Presently the interface is restricted to specific variants of PSPv13.0. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:07:07 -04:00
Lijo Lazar	c4f00312c1	drm/amdgpu: Support reset-on-init on select SOCs Add XGMI reset on init support to aldebaran and SOCs with GC v9.4.3. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:58 -04:00
Lijo Lazar	2accf9d683	drm/amdgpu: Drop delayed reset work handler Drop delayed reset work handler as it is no longer used. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:53 -04:00
Lijo Lazar	631af731ee	drm/amdgpu: Refactor XGMI reset on init handling Use XGMI hive information to rely on resetting XGMI devices on initialization rather than using mgpu structure. mgpu structure may have other devices as well. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <feifxu@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:46 -04:00
Lijo Lazar	b17f87329d	drm/amdgpu: Add helper to initialize badpage info Add a separate function to read badpage data during initialization. Reading bad pages will need hardware access and cannot be done during reset. Hence in cases where device needs a full reset during init itself, attempting to read will cause a deadlock. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:38 -04:00
Dr. David Alan Gilbert	0ee2399116	drm/amdgpu: Remove unused amdgpu_i2c functions amdgpu_i2c_add and amdgpu_i2c_init were added in 2015's commit `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") but never used. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:36 -04:00
Dr. David Alan Gilbert	9d7a8bdb90	drm/amdgpu: Remove unused amdgpu_gfx_bit_to_me_queue amdgpu_gfx_bit_to_me_queue has been unused since it was added in commit `7470bfcf20` ("drm/amdgpu: add helper function for gfx queue/bitmap transition") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:33 -04:00
Dr. David Alan Gilbert	1e10c12263	drm/amdgpu: Remove unused amdgpu_gmc_vram_cpu_pa amdgpu_gmc_vram_cpu_pa has been unused since commit `087451f372` ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:29 -04:00
Dr. David Alan Gilbert	6e261ecbb2	drm/amdgpu: Remove unused amdgpu_atpx functions amdgpu_atpx_dgpu_req_power_for_displays has been unused since commit `bdb1ccb080` ("drm/amdgpu: remove ATPX_DGPU_REQ_POWER_FOR_DISPLAYS check when hotplug-in") amdgpu_atpx_get_dhandle has been unused since commit `f9b7f3703f` ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:24 -04:00
Dr. David Alan Gilbert	632aac6299	drm/amdgpu: Remove unused amdgpu_device_ip_is_idle amdgpu_device_ip_is_idle is unused. It was renamed from 'amdgpu_is_idle' which was originally added in commit `5dbbb60ba6` ("drm/amdgpu: add IP helpers for wait_for_idle and is_idle") but hasn't been used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:19 -04:00
Lijo Lazar	1e4acf4d93	drm/amdgpu: Add reset on init handler for XGMI In some cases, device needs to be reset before first use. Add handlers for doing device reset during driver init sequence. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <feifxu@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:19 -04:00
Lijo Lazar	f501057aff	drm/amdgpu: Add callback get xcp resource info Add a callback interface to get the resource information of a partition mode. Presently the information has number of resources and number of entities sharing the resource. Add the implementation for aquavanjaram SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	1bc0b33915	drm/amd: Add helper to get partition config modes Add helper to get supported/available partition config modes Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
WangYuli	307b4ab7ba	drm/amdgpu: Fix typo "acccess" and improve the comment style here There are some spelling mistakes of 'acccess' in comments which should be instead of 'access'. And the comment style should be like this: /* * Text * Text */ Suggested-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/all/f75fbe30-528e-404f-97e4-854d27d7a401@amd.com/ Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/all/0c768bf6-bc19-43de-a30b-ff5e3ddfd0b3@suse.de/ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Alex Deucher	b1281b6d55	drm/amdgpu/gfx9: Explicitly halt CP before init Need to make sure it's halted as we don't know what state the GPU may have been left in previously. Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Alex Deucher	993fcc40ae	drm/amdgpu/gfx9: set additional bits on CP halt Need to set the pipe reset and cache invalidation bits on halt otherwise we can get stale state if the CP firmware changes (e.g., on module unload and reload). Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Sunil Khatri	37b993225d	drm/amdgpu: add amdgpu_device reference in ip block To handle amdgpu_device reference for different GPUs we add it's reference in each ip block which can be used to differentiate between difference gpu devices. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	6e37ae8b08	drm/amdgpu: Separate reinitialization after reset Move the reinitialization part after a reset to another function. No functional changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Tim Huang	381ec8161d	drm/amdgpu: check return for setting engine dram timings This resolves the unchecded return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	5839d27d5b	drm/amdgpu: Use init level for pending_reset flag Drop pending_reset flag in gmc block. Instead use init level to determine which type of init is preferred - in this case MINIMAL. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
YiPeng Chai	9e0feb7946	amd/amdgpu: Reduce unnecessary repetitive GPU resets In multiple GPUs case, after a GPU has started resetting all GPUs on hive, other GPUs do not need to trigger GPU reset again. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:18 -04:00
Lijo Lazar	14f2fe34f5	drm/amdgpu: Add init levels Add init levels to define the level to which device needs to be initialized. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Jane Jian	8a84d2a472	drm/amdgpu: Remove unneeded write in JPEG v4.0.3 HDP_DEBUG1(offset = 0x3fbc) is no longer functional, remove the redundant write. Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Lijo Lazar	8c50bf9beb	drm/amdgpu: Fix JPEG v4.0.3 register write EXTERNAL_REG_INTERNAL_OFFSET/EXTERNAL_REG_WRITE_ADDR should be used in pairs. If an external register shouldn't be written, both packets shouldn't be sent. Fixes: `a78b481469` ("drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Feifei Xu	3eebfd5e9c	drm/amdkfd:Add kfd function to config sq perfmon Expose the interface for kfd to config sq perfmon. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Sathishkumar S	f0b19b84d3	drm/amdgpu: add amdgpu_jpeg_sched_mask debugfs JPEG_4_0_3 has up to 32 jpeg cores and a single mjpeg video decode will use all available cores on the hardware. This debugfs entry helps to disable or enable job submission to a cluster of cores or one specific core in the ip for debugging. The entry is populated only if there is at least two or more cores in the jpeg ip. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:17 -04:00
Feifei Xu	400a7591d9	drm/amdgpu: Add psp command CONFIG_SQ_PERFMON Add support for enable/disable perfmon profiling. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Prike Liang	6704dbf719	drm/amdgpu: update suspend status for aborting from deeper suspend There're some other suspend abort cases which can call the noirq suspend except for executing _S3 method. In those cases need to process as incomplete suspendsion. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Asad Kamal	dc443aa4ab	drm/amd/amdgpu: Add helper to get ip block valid Add helper function to check if ip block is enabled Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	92c9b3e8e4	drm/amdgpu/sdma6: implement ring reset callback for sdma6 Implement sdma queue reset callback using mes_reset_queue_mmio. v2: check instance id before reset queue. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	df190e6753	drm/amdgpu/sdma6: split out per instance resume function Extract the resume sequence for individual sdma instance from sdma_v6_0_gfx_resume. The function could be used for start/restart scenario on a certain instance. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Jiadong Zhu	ced65debf4	drm/amdgpu/mes11: update mes_reset_queue function to support sdma queue Reset sdma queue through mmio based on me_id and queue_id. v2: simplify callflows and register calculation. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:06:16 -04:00
Alex Deucher	34ad56a467	drm/amdgpu: bump driver version for cleared VRAM Driver now clears VRAM on allocation. Bump the driver version so mesa knows when it will get cleared vram by default. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-26 17:04:47 -04:00
Alex Deucher	a8387ddc0d	drm/amdgpu: fix vbios fetching for SR-IOV SR-IOV fetches the vbios from VRAM in some cases. Re-enable the VRAM path for dGPUs and rename the function to make it clear that it is not IGP specific. Fixes: `042658d17a` ("drm/amdgpu: clean up vbios fetching code") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Tested-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-26 17:04:10 -04:00
Frank Min	3cb576bc6d	drm/amdgpu: fix PTE copy corruption for sdma 7 Without setting dcc bit, there is ramdon PTE copy corruption on sdma 7. so add this bit and update the packet format accordingly. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-26 17:03:39 -04:00
Thomas Zimmermann	32acc286b2	drm/amdgpu: Run DRM default client setup Call drm_client_setup() to run the kernel's default client setup for DRM. Set fbdev_probe in struct drm_driver, so that the client setup can start the common fbdev client. The amdgpu driver specifies a preferred color mode depending on the available video memory, with a default of 32. Adapt this for the new client interface. v5: - select DRM_CLIENT_SELECTION v2: - style changes Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Tested-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924071734.98201-66-tzimmermann@suse.de	2024-09-26 09:31:28 +02:00
Saleemkhan Jamadar	8048e5ade8	drm/amdgpu/vcn: enable AV1 on both instances v1 - remove cs parse code (Christian) On VCN v4_0_6 AV1 is supported on both the instances. Remove cs IB parse code since explict handling of AV1 schedule is not required. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-09-25 12:56:19 -04:00
Mukul Joshi	e45b011d2c	drm/amdkfd: Fix CU occupancy for GFX 9.4.3 Make CU occupancy calculations work on GFX 9.4.3 by updating the logic to handle multiple XCCs correctly. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:56:07 -04:00
Mukul Joshi	6ae9e1aba9	drm/amdkfd: Update logic for CU occupancy calculations Currently, the code uses the IH_VMID_X_LUT register to map a queue's vmid to the corresponding PASID. This logic is racy since CP can update the VMID-PASID mapping anytime especially when there are more processes than number of vmids. Update the logic to calculate CU occupancy by matching doorbell offset of the queue with valid wave counts against the process's queues. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:56:00 -04:00
ZhenGuo Yin	e1d27f7a9c	drm/amdgpu: skip coredump after job timeout in SRIOV VF FLR will be triggered by host driver before job timeout, hence the error status of GPU get cleared. Performing a coredump here is unnecessary. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:55:52 -04:00
Christian König	126be9b2be	drm/amdgpu: sync to KFD fences before clearing PTEs This patch tries to solve the basic problem we also need to sync to the KFD fences of the BO because otherwise it can be that we clear PTEs while the KFD queues are still running. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-25 12:55:44 -04:00
Jack Xiao	4771d2ecb7	drm/amdgpu/mes12: set enable_level_process_quantum_check enable_level_process_quantum_check is requried to enable process quantum based scheduling. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-25 12:55:14 -04:00
Linus Torvalds	f8ffbc365f	struct fd layout change (and conversion to accessor helpers) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ 63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d 592+iDgLsema/H/0/CqfqlaNtDNY8Q0= =HUl5 -----END PGP SIGNATURE----- Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.	2024-09-23 09:35:36 -07:00
Linus Torvalds	de848da12f	drm next for 6.12-rc1 string: - add mem_is_zero() core: - support more device numbers - use XArray for minor ids - add backlight constants - Split dma fence array creation into alloc and arm fbdev: - remove usage of old fbdev hooks kms: - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support dma-buf: - docs cleanup buddy: - Add start address support for trim function printk: - pass description to kmsg_dump scheduler; - Remove full_recover from drm_sched_start ttm: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory panic: - add display QR code (in rust) displayport: - mst: GUID improvements bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR - anx7625: simplify OF array handling - dw-hdmi: simplify clock handling - lontium-lt8912b: fix mode validation - nwl-dsi: fix mode vsync/hsync polarity xe: - Enable LunarLake and Battlemage support - Introducing Xe2 ccs modifiers for integrated and discrete graphics - rename xe perf to xe observation - use wb caching on DGFX for system memory - add fence timeouts - Lunar Lake graphics/media/display workarounds - Battlemage workarounds - Battlemage GSC support - GSC and HuC fw updates for LL/BM - use dma_fence_chain_free - refactor hw engine lookup and mmio access - enable priority mem read for Xe2 - Add first GuC BMG fw - fix dma-resv lock - Fix DGFX display suspend/resume - Use xe_managed for kernel BOs - Use reserved copy engine for user binds on faulting devices - Allow mixing dma-fence jobs and long-running faulting jobs - fix media TLB invalidation - fix rpm in TTM swapout path - track resources and VF state by PF i915: - Type-C programming fix for MTL+ - FBC cleanup - Calc vblank delay more accurately - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates - Fix DP LTTPR detection - limit relocations to INT_MAX - fix long hangs in buddy allocator on DG2/A380 amdgpu: - Per-queue reset support - SDMA devcoredump support - DCN 4.0.1 updates - GFX12/VCN4/JPEG4 updates - Convert vbios embedded EDID to drm_edid - GFX9.3/9.4 devcoredump support - process isolation framework for GFX 9.4.3/4 - take IOMMU mappings into account for P2P DMA amdkfd: - CRIU fixes - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Allow users to target recommended SDMA engines - KFD support for targetting queues on recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking msm: - DPU: - implement DP/PHY mapping on SC8180X - Enable writeback on SM8150, SC8180X, SM6125, SM6350 - DP: - Enable widebus on all relevant chipsets - MSM8998 HDMI support - GPU: - A642L speedbin support - A615/A306/A621 support - A7xx devcoredump support ast: - astdp: Support AST2600 with VGA - Clean up HPD - Fix timeout loop for DP link training - reorganize output code by type (VGA, DP, etc) - convert to struct drm_edid - fix BMC handling for all outputs exynos: - drop stale MAINTAINERS pattern - constify struct loongson: - use GEM refcount over TTM mgag200: - Improve BMC handling - Support VBLANK intterupts - transparently support BMC outputs nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's gm12u320: - convert to struct drm_edid gma500: - update i2c terms lcdif: - pixel clock fix host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling - use common helper for_each_endpoint_of_node() panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: add support for BOE NE140WUM-N6G; revert support for SDC ATNA45AF01 - visionox-vtdr6130: improve error handling; use devm_regulator_bulk_get_const() - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing - panel-edp: fix name for HKC MB116AN01 - jd9365da: fix "exit sleep" commands - jdi-fhd-r63452: simplify error handling with DSI multi-style helpers - mantix-mlaf057we51: simplify error handling with DSI multi-style helpers - simple: support Innolux G070ACE-LH3 plus DT bindings support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings - st7701: decouple DSI and DRM code add SPI support support Anbernic RG28XX plus DT bindings mediatek: - support alpha blending - remove cl in struct cmdq_pkt - ovl adaptor fix - add power domain binding for mediatek DPI controller renesas: - rz-du: add support for RZ/G2UL plus DT bindings rockchip: - Improve DP sink-capability reporting - dw_hdmi: Support 4k@60Hz - vop: Support RGB display on Rockchip RK3066; Support 4096px width sti: - convert to struct drm_edid stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - gr3d: improve PM domain handling - convert to struct drm_edid - Call drm_atomic_helper_shutdown() vc4: - fix PM during detect - replace DRM_ERROR() with drm_error() - v3d: simplify clock retrieval v3d: - Clean up perfmon virtio: - add DRM capset -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmbq43gACgkQDHTzWXnE hr4+lg/+O/r41E7ioitcM0DWeWem0dTlvQr41pJ8jujHvw+bXNdg0BMGWtsTyTLA eOft2AwofsFjg+O7l8IFXOT37mQLdIdfjb3+w5brI198InL3OWC3QV8ZSwY9VGET n8crO9jFoxNmHZnFniBZbtI6egTyl6H+2ey3E0MTnKiPUKZQvsK/4+x532yVLPob UUOze5wcjyGZc7LJEIZPohPVneCb9ki7sabDQqh4cxIQ0Eg+nqPpWjYM4XVd+lTS 8QmssbR49LrJ7z9m90qVE+8TjYUCn+ChDPMs61KZAAnc8k++nK41btjGZ23mDKPb YEguahCYthWJ4U8K18iXBPnLPxZv5+harQ8OIWAUYqdIOWSXHozvuJ2Z84eHV13a 9mQ5vIymXang8G1nEXwX/vml9uhVhBCeWu3qfdse2jfaTWYUb1YzhqUoFvqI0R0K 8wT03MyNdx965CSqAhpH5Jd559ueZmpd+jsHOfhAS+1gxfD6NgoPXv7lpnMUmGWX SnaeC9RLD4cgy7j2Swo7TEqQHrvK5XhZSwX94kU6RPmFE5RRKqWgFVQmwuikDMId UpNqDnPT5NL2UX4TNG4V4coyTXvKgVcSB9TA7j8NSLfwdGHhiz73pkYosaZXKyxe u6qKMwMONfZiT20nhD7RhH0AFnnKosAcO14dhn0TKFZPY6Ce9O8= =7jR+ -----END PGP SIGNATURE----- Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "This adds a couple of patches outside the drm core, all should be acked appropriately, the string and pstore ones are the main ones that come to mind. Otherwise it's the usual drivers, xe is getting enabled by default on some new hardware, we've changed the device number handling to allow more devices, and we added some optional rust code to create QR codes in the panic handler, an idea first suggested I think 10 years ago :-) string: - add mem_is_zero() core: - support more device numbers - use XArray for minor ids - add backlight constants - Split dma fence array creation into alloc and arm fbdev: - remove usage of old fbdev hooks kms: - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support dma-buf: - docs cleanup buddy: - Add start address support for trim function printk: - pass description to kmsg_dump scheduler: - Remove full_recover from drm_sched_start ttm: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory panic: - add display QR code (in rust) displayport: - mst: GUID improvements bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR - anx7625: simplify OF array handling - dw-hdmi: simplify clock handling - lontium-lt8912b: fix mode validation - nwl-dsi: fix mode vsync/hsync polarity xe: - Enable LunarLake and Battlemage support - Introducing Xe2 ccs modifiers for integrated and discrete graphics - rename xe perf to xe observation - use wb caching on DGFX for system memory - add fence timeouts - Lunar Lake graphics/media/display workarounds - Battlemage workarounds - Battlemage GSC support - GSC and HuC fw updates for LL/BM - use dma_fence_chain_free - refactor hw engine lookup and mmio access - enable priority mem read for Xe2 - Add first GuC BMG fw - fix dma-resv lock - Fix DGFX display suspend/resume - Use xe_managed for kernel BOs - Use reserved copy engine for user binds on faulting devices - Allow mixing dma-fence jobs and long-running faulting jobs - fix media TLB invalidation - fix rpm in TTM swapout path - track resources and VF state by PF i915: - Type-C programming fix for MTL+ - FBC cleanup - Calc vblank delay more accurately - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates - Fix DP LTTPR detection - limit relocations to INT_MAX - fix long hangs in buddy allocator on DG2/A380 amdgpu: - Per-queue reset support - SDMA devcoredump support - DCN 4.0.1 updates - GFX12/VCN4/JPEG4 updates - Convert vbios embedded EDID to drm_edid - GFX9.3/9.4 devcoredump support - process isolation framework for GFX 9.4.3/4 - take IOMMU mappings into account for P2P DMA amdkfd: - CRIU fixes - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Allow users to target recommended SDMA engines - KFD support for targetting queues on recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking msm: - DPU: - implement DP/PHY mapping on SC8180X - Enable writeback on SM8150, SC8180X, SM6125, SM6350 - DP: - Enable widebus on all relevant chipsets - MSM8998 HDMI support - GPU: - A642L speedbin support - A615/A306/A621 support - A7xx devcoredump support ast: - astdp: Support AST2600 with VGA - Clean up HPD - Fix timeout loop for DP link training - reorganize output code by type (VGA, DP, etc) - convert to struct drm_edid - fix BMC handling for all outputs exynos: - drop stale MAINTAINERS pattern - constify struct loongson: - use GEM refcount over TTM mgag200: - Improve BMC handling - Support VBLANK intterupts - transparently support BMC outputs nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's gm12u320: - convert to struct drm_edid gma500: - update i2c terms lcdif: - pixel clock fix host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling - use common helper for_each_endpoint_of_node() panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: - add support for BOE NE140WUM-N6G - revert support for SDC ATNA45AF01 - visionox-vtdr6130: - improve error handling - use devm_regulator_bulk_get_const() - boe-th101mb31ig002: - Support for starry-er88577 MIPI-DSI panel plus DT - Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: - Support Melfas lmfbx101117480 MIPI-DSI panel plus DT - Refactor for code sharing - panel-edp: fix name for HKC MB116AN01 - jd9365da: fix "exit sleep" commands - jdi-fhd-r63452: simplify error handling with DSI multi-style helpers - mantix-mlaf057we51: simplify error handling with DSI multi-style helpers - simple: - support Innolux G070ACE-LH3 plus DT bindings - support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings - st7701: - decouple DSI and DRM code - add SPI support - support Anbernic RG28XX plus DT bindings mediatek: - support alpha blending - remove cl in struct cmdq_pkt - ovl adaptor fix - add power domain binding for mediatek DPI controller renesas: - rz-du: add support for RZ/G2UL plus DT bindings rockchip: - Improve DP sink-capability reporting - dw_hdmi: Support 4k@60Hz - vop: - Support RGB display on Rockchip RK3066 - Support 4096px width sti: - convert to struct drm_edid stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: - Fix transparency after disabling plane - Remove unused interrupt tegra: - gr3d: improve PM domain handling - convert to struct drm_edid - Call drm_atomic_helper_shutdown() vc4: - fix PM during detect - replace DRM_ERROR() with drm_error() - v3d: simplify clock retrieval v3d: - Clean up perfmon virtio: - add DRM capset" * tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel: (1326 commits) drm/xe: Fix missing conversion to xe_display_pm_runtime_resume drm/xe/xe2hpg: Add Wa_15016589081 drm/xe: Don't keep stale pointer to bo->ggtt_node drm/xe: fix missing 'xe_vm_put' drm/xe: fix build warning with CONFIG_PM=n drm/xe: Suppress missing outer rpm protection warning drm/xe: prevent potential UAF in pf_provision_vf_ggtt() drm/amd/display: Add all planes on CRTC to state for overlay cursor drm/i915/bios: fix printk format width drm/i915/display: Fix BMG CCS modifiers drm/amdgpu: get rid of bogus includes of fdtable.h drm/amdkfd: CRIU fixes drm/amdgpu: fix a race in kfd_mem_export_dmabuf() drm: new helper: drm_gem_prime_handle_to_dmabuf() drm/amdgpu/atomfirmware: Silence UBSAN warning drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare' drm/amd/amdgpu: apply command submission parser for JPEG v1 drm/amd/amdgpu: apply command submission parser for JPEG v2+ drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3 drm/amd/pm: update the features set on smu v14.0.2/3 ...	2024-09-19 10:18:15 +02:00
Alex Deucher	84f76408ab	drm/amdgpu/mes12: reduce timeout The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Fixes: `94b51a3d01` ("drm/amdgpu/mes12: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:15:13 -04:00
Alex Deucher	856265caa9	drm/amdgpu/mes11: reduce timeout The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3627 Fixes: `f7c161a4c2` ("drm/amdgpu: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2024-09-18 16:15:13 -04:00
Christian König	6dcba0975d	drm/amdgpu: use GEM references instead of TTMs v2 Instead of a TTM reference grab a GEM reference whenever necessary. v2: fix typo in amdgpu_bo_unref pointed out by Vitaly, initialize the GEM funcs for kernel allocations as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:13 -04:00
Frank Min	7b6df1d732	drm/amdgpu: update golden regs for gfx12 update golden regs for gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:15:09 -04:00
Alex Deucher	042658d17a	drm/amdgpu: clean up vbios fetching code After splitting the logic between APU and dGPU, clean up some of the APU and dGPU specific logic that no longer applied. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Alex Deucher	375b035f68	drm/amdgpu/bios: split vbios fetching between APU and dGPU We need some different logic for dGPUs and the APU path can be simplified because there are some methods which are never used on APUs. This also fixes a regression on some older APUs causing the driver to fetch the unpatched ROM image rather than the patched image. Fixes: `9c081c11c6` ("drm/amdgpu: Reorder to read EFI exported ROM first") Reviewed-by: George Zhang <George.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Christian König	f2be7b39e4	drm/amdgpu: remove amdgpu_pin_restricted() We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Christian König	54b86443fd	drm/amdgpu: explicitely set the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag Instead of having that in the amdgpu_bo_pin() function applied for all pinned BOs. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Lijo Lazar	42ac749d5b	drm/amdgpu: Fix XCP instance mask calculation Fix instance mask calculation for VCN IP. There are cases where VCN instance could be shared across partitions. Fix here so that other blocks don't need to check for any shared instances based on partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:09 -04:00
Asad Kamal	ef126c06a9	drm/amdgpu: Fix get each xcp macro Fix get each xcp macro to loop over each partition correctly Fixes: `4bdca20579` ("drm/amdgpu: Add utility functions for xcp") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:08 -04:00
Le Ma	2778701b16	drm/amdgpu: load sos binary properly on the basis of pmfw version To be compatible with legacy IFWI, driver needs to carry legacy tOS and query pmfw version to load them accordingly. Add psp_firmware_header_v2_1 to handle the combined sos binary. Double the sos count limit for the case of aux sos fw packed. v2: pass the correct fw_bin_desc to parse_sos_bin_descriptor Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Le Ma	2ae6cd583c	drm/amdgpu: add psp funcs callback to check if aux fw is needed Query pmfw version to determine if aux sos fw needs to be loaded in psp v13.0. v2: refine callback to check if aux_fw loading is needed instead of getting pmfw version barely v3: return the comparison directly Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Christian König	7181faaa47	drm/amdgpu: nuke the VM PD/PT shadow handling This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Alex Deucher	c1de938fb7	drm/amdgpu/gfx9.4.3: Explicitly halt MEC before init Need to make sure it's halted as we don't know what state the GPU may have been left in previously. Tested-by: Amber Lin <Amber.Lin@amd.com> Acked-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
Alex Deucher	797fb15333	drm/amdgpu/gfx9.4.3: set additional bits on MEC halt Need to set the pipe reset and cache invalidation bits on halt otherwise we can get stale state if the CP firmware changes (e.g., on module unload and reload). Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:15:06 -04:00
David Belanger	03b5038c0a	drm/amdgpu: Fix selfring initialization sequence on soc24 Move enable_doorbell_selfring_aperture from common_hw_init to common_late_init in soc24, otherwise selfring aperture is initialized with an incorrect doorbell aperture base. Port changes from this commit from soc21 to soc24: commit `1c312e816c` ("drm/amdgpu: Enable doorbell selfring after resize FB BAR") Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:14:47 -04:00
Jack Xiao	3c75518cf2	drm/amdgpu/mes12: switch SET_SHADER_DEBUGGER pkt to mes schq pipe The SET_SHADER_DEBUGGER packet must work with the added hardware queue, switch the packet submitting to mes schq pipe. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x	2024-09-18 16:14:27 -04:00
Andrew Kreimer	c400ec6990	drm/amdgpu: Fix a typo Fix a typo in comments. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Yan Zhen	0110ac1195	drm/amdgpu: fix typo in the comment Correctly spelled comments make it easier for the reader to understand the code. Replace 'udpate' with 'update' in the comment & replace 'recieved' with 'received' in the comment & replace 'dsiable' with 'disable' in the comment & replace 'Initiailize' with 'Initialize' in the comment & replace 'disble' with 'disable' in the comment & replace 'Disbale' with 'Disable' in the comment & replace 'enogh' with 'enough' in the comment & replace 'availabe' with 'available' in the comment. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Alex Deucher	bfc00a7754	drm/amdgpu/gfx9.4.3: drop extra wrapper Drop wrapper used in one place. gfx_v9_4_3_xcc_cp_enable() is used in one place. gfx_v9_4_3_xcc_cp_compute_enable() is used everywhere else. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-18 16:14:26 -04:00
Bob Zhou	28b0ef9227	drm/amdgpu: Fix missing check pcie_p2p module param The module param pcie_p2p should be checked for kfd p2p feature, so add it. Fixes: `75f0efbc4b` ("drm/amdgpu: Take IOMMU remapping into account for p2p checks") Signed-off-by: Bob Zhou <bob.zhou@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Tao Zhou	c389a0604c	drm/amdgpu: disable GPU RAS bad page feature for specific ASIC The feature is not applicable to specific app platform. v2: update the disablement condition and commit description v3: move the setting to amdgpu_ras_check_supported Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Tim Huang	0da531c82a	drm/amdgpu: ensure the connector is not null before using it This resolves the dereference null return value warning reported by Coverity. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-17 10:04:57 -04:00
Linus Torvalds	8f72c31f45	vfs-6.12.misc -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEGwAKCRCRxhvAZXjc ojIuAQC433+hBkvjvmQ7H0r5rgZSjUuCTG3bSmdU7RJmPHUHhwEA85v/NGq53f+W IhandK6t+Cf0JYpFZ3N0bT88hDYVhQQ= =9zGL -----END PGP SIGNATURE----- Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual pile of misc updates: Features: - Add F_CREATED_QUERY fcntl() that allows userspace to query whether a file was actually created. Often userspace wants to know whether an O_CREATE request did actually create a file without using O_EXCL. The current logic is that to first attempts to open the file without O_CREAT \| O_EXCL and if ENOENT is returned userspace tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT \| O_EXCL will return ENOENT but the second openat() with O_CREAT \| O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT \| O_EXCL follows the symlink while O_CREAT \| O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. All available workarounds are really nasty (fanotify, bpf lsm etc) so add a simple fcntl(). - Try an opportunistic lookup for O_CREAT. Today, when opening a file we'll typically do a fast lookup, but if O_CREAT is set, the kernel always takes the exclusive inode lock. This was likely done with the expectation that O_CREAT means that we always expect to do the create, but that's often not the case. Many programs set O_CREAT even in scenarios where the file already exists (see related F_CREATED_QUERY patch motivation above). The series contained in the pr rearranges the pathwalk-for-open code to also attempt a fast_lookup in certain O_CREAT cases. If a positive dentry is found, the inode_lock can be avoided altogether and it can stay in rcuwalk mode for the last step_into. - Expose the 64 bit mount id via name_to_handle_at() Now that we provide a unique 64-bit mount ID interface in statx(2), we can now provide a race-free way for name_to_handle_at(2) to provide a file handle and corresponding mount without needing to worry about racing with /proc/mountinfo parsing or having to open a file just to do statx(2). While this is not necessary if you are using AT_EMPTY_PATH and don't care about an extra statx(2) call, users that pass full paths into name_to_handle_at(2) need to know which mount the file handle comes from (to make sure they don't try to open_by_handle_at a file handle from a different filesystem) and switching to AT_EMPTY_PATH would require allocating a file for every name_to_handle_at(2) call - Add a per dentry expire timeout to autofs There are two fairly well known automounter map formats, the autofs format and the amd format (more or less System V and Berkley). Some time ago Linux autofs added an amd map format parser that implemented a fair amount of the amd functionality. This was done within the autofs infrastructure and some functionality wasn't implemented because it either didn't make sense or required extra kernel changes. The idea was to restrict changes to be within the existing autofs functionality as much as possible and leave changes with a wider scope to be considered later. One of these changes is implementing the amd options: 1) "unmount", expire this mount according to a timeout (same as the current autofs default). 2) "nounmount", don't expire this mount (same as setting the autofs timeout to 0 except only for this specific mount) . 3) "utimeout=<seconds>", expire this mount using the specified timeout (again same as setting the autofs timeout but only for this mount) To implement these options per-dentry expire timeouts need to be implemented for autofs indirect mounts. This is because all map keys (mounts) for autofs indirect mounts use an expire timeout stored in the autofs mount super block info. structure and all indirect mounts use the same expire timeout. Fixes: - Fix missing fput for FSCONFIG_SET_FD in autofs - Use param->file for FSCONFIG_SET_FD in coda - Delete the 'fs/netfs' proc subtreee when netfs module exits - Make sure that struct uid_gid_map fits into a single cacheline - Don't flush in-flight wb switches for superblocks without cgroup writeback - Correcting the idmapping mount example in the idmapping documentation - Fix a race between evice_inodes() and find_inode() and iput() - Refine the show_inode_state() macro definition in writeback code - Prevent dump_mapping() from accessing invalid dentry.d_name.name - Show actual source for debugfs in /proc/mounts - Annotate data-race of busy_poll_usecs in eventpoll - Don't WARN for racy path_noexec check in exec code - Handle OOM on mnt_warn_timestamp_expiry() - Fix some spelling in the iomap design documentation - Fix typo in procfs comment - Fix typo in fs/namespace.c comment Cleanups: - Add the VFS git tree to the MAINTAINERS file - Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode bit in struct file bringing us to 5 free f_mode bits - Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify the wait mechanism - Remove the unused path_put_init() helper - Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi specific - Replace the unsigned long i_state member with a u32 i_state member in struct inode freeing up 4 bytes in struct inode. Instead of using the bit based wait apis we're now using the var event apis and using the individual bytes of the i_state member to wait on state changes - Explain how per-syscall AT_* flags should be allocated - Use in_group_or_capable() helper to simplify the posix acl mode update code - Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code - Removed comment about d_rcu_to_refcount() as that function doesn't exist anymore - Add kernel documentation for lookup_fast() - Don't re-zero evenpoll fields - Remove outdated comment after close_fd() - Fix imprecise wording in comment about the pipe filesystem - Drop GFP_NOFAIL mode from alloc_page_buffers - Missing blank line warnings and struct declaration improved in file_table - Annotate struct poll_list with __counted_by() - Remove the unused read parameter in percpu-rwsem - Remove linux/prefetch.h include from direct-io code - Use kmemdup_array instead of kmemdup for multiple allocation in mnt_idmapping code - Remove unused mnt_cursor_del() declaration Performance tweaks: - Dodge smp_mb in break_lease and break_deleg in the common case - Only read fops once in fops_{get,put}() - Use RCU in ilookup() - Elide smp_mb in iversion handling in the common case - Drop one lock trip in evict()" * tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits) uidgid: make sure we fit into one cacheline proc: Fix typo in the comment fs/pipe: Correct imprecise wording in comment fhandle: expose u64 mount id to name_to_handle_at(2) uapi: explain how per-syscall AT_* flags should be allocated fs: drop GFP_NOFAIL mode from alloc_page_buffers writeback: Refine the show_inode_state() macro definition fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation netfs: Delete subtree of 'fs/netfs' when netfs module exits fs: use LIST_HEAD() to simplify code inode: make i_state a u32 inode: port __I_LRU_ISOLATING to var event vfs: fix race between evice_inodes() and find_inode()&iput() inode: port __I_NEW to var event inode: port __I_SYNC to var event fs: reorder i_state bits fs: add i_state helpers MAINTAINERS: add the VFS git tree fs: s/__u32/u32/ for s_fsnotify_mask ...	2024-09-16 08:35:09 +02:00
Thomas Zimmermann	61b86391fb	Merge drm/drm-next into drm-misc-next Backmerging to get fixes from v6.12-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2024-09-11 09:48:49 +02:00
David (Ming Qiang) Wu	8409fb50ce	drm/amd/amdgpu: apply command submission parser for JPEG v1 Similar to jpeg_v2_dec_ring_parse_cs() but it has different register ranges and a few other registers access. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3d5adbdf1d`) Cc: stable@vger.kernel.org	2024-09-10 17:26:55 -04:00
David (Ming Qiang) Wu	3a23aa0b9c	drm/amd/amdgpu: apply command submission parser for JPEG v2+ This patch extends the same cs parser from JPEG v4.0.3 to other JPEG versions (v2 and above). Rename to more common name as jpeg_v2_dec_ring_parse_cs() from jpeg_v4_0_3_dec_ring_parse_cs(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `88dcad2d07`) Cc: stable@vger.kernel.org	2024-09-10 17:26:49 -04:00
Al Viro	4c3140fea6	drm/amdgpu: get rid of bogus includes of fdtable.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:30 -04:00
Al Viro	6c6ca71bc1	drm/amdgpu: fix a race in kfd_mem_export_dmabuf() Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor table while we'd been going through that song and dance. Switch kfd_mem_export_dmabuf() to using drm_gem_prime_handle_to_dmabuf() and leave the descriptor table alone... Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:30 -04:00
Srinivasan Shanmugam	b8faa981a7	drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare' This commit updates described non-existent parameters 'resv' and 'sync_mode', and failed to describe the existing 'sync' parameter. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Function parameter or struct member 'sync' not described in 'amdgpu_vm_cpu_prepare' drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Excess function parameter 'resv' description in 'amdgpu_vm_cpu_prepare' drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c:50: warning: Excess function parameter 'sync_mode' description in 'amdgpu_vm_cpu_prepare' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
David (Ming Qiang) Wu	3d5adbdf1d	drm/amd/amdgpu: apply command submission parser for JPEG v1 Similar to jpeg_v2_dec_ring_parse_cs() but it has different register ranges and a few other registers access. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
David (Ming Qiang) Wu	88dcad2d07	drm/amd/amdgpu: apply command submission parser for JPEG v2+ This patch extends the same cs parser from JPEG v4.0.3 to other JPEG versions (v2 and above). Rename to more common name as jpeg_v2_dec_ring_parse_cs() from jpeg_v4_0_3_dec_ring_parse_cs(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-10 13:44:29 -04:00
Jani Nikula	0df8ef6e1b	drm/amdgpu: drop redundant W=1 warnings from Makefile Since commit `a61ddb4393` ("drm: enable (most) W=1 warnings by default across the subsystem"), most of the extra warnings in the driver Makefile are redundant. Remove them. Note that -Wmissing-declarations and -Wmissing-prototypes are always enabled by default in scripts/Makefile.extrawarn. Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:17 -04:00
Christian König	7ccde2e6c0	drm/amdgpu: revert "use CPU for page table update if SDMA is unavailable" That is clearly not something we should do upstream. The SDMA is mandatory for the driver to work correctly. We could do this for emulation and bringup, but in those cases the engineer should probably enabled CPU based updates manually. This reverts commit `62eefd10ac`. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:06 -04:00
Dan Carpenter	27f9dcb9cc	drm/amdgpu/mes11: Indent an if statment Indent the "break" statement one more tab. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Ramesh Errabolu	01be2b62c0	drm/amdgpu: Surface svm_default_granularity, a RW module parameter Enables users to update SVM's default granularity, used in buffer migration and handling of recoverable page faults. Param value is set in terms of log(numPages(buffer)), e.g. 9 for a 2 MIB buffer Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:55:05 -04:00
Jesse Zhang	e8397d327e	drm/amdgpu: fix queue reset issue by mmio Initialize the queue type before resetting the queue using mmio. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:54:54 -04:00
Srinivasan Shanmugam	559a285816	drm/amdgpu: Replace 'amdgpu_job_submit_direct' with 'drm_sched_entity' in cleaner shader This commit replaces the use of amdgpu_job_submit_direct which submits the job to the ring directly, with drm_sched_entity in the cleaner shader job submission process. The change allows the GPU scheduler to manage the cleaner shader job. - The job is then submitted to the GPU using the drm_sched_entity_push_job function, which allows the GPU scheduler to manage the job. This change improves the reliability of the cleaner shader job submission process by leveraging the capabilities of the GPU scheduler. Fixes: `d361ad5d2f` ("drm/amdgpu: Add sysfs interface for running cleaner shader") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:33 -04:00
Srinivasan Shanmugam	2578487ebe	drm/amdgpu/: Add missing kdoc entry in amdgpu_vm_handle_fault function This commit adds a description for the 'ts' parameter in the amdgpu_vm_handle_fault function's comment block. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:2781: warning: Function parameter or struct member 'ts' not described in 'amdgpu_vm_handle_fault' Cc: Xiaogang.Chen <Xiaogang.Chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408251419.vgZHg3GV-lkp@intel.com/ Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:42:05 -04:00
Lijo Lazar	4481df364d	drm/amdgpu: Normalize reg offsets on JPEG v4.0.3 On VFs and SOCs with GC 9.4.4, VCN RRMT is disabled. Only local register offsets should be used on JPEG v4.0.3 as they cannot handle remote access to other AIDs. Since only local offsets are used, the special write to MCM_ADDR register is no longer needed. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:40:47 -04:00
Li Zetao	760e3c8b32	drm/amdgpu: use clamp() in amdgpu_vm_adjust_size() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:57 -04:00
Li Zetao	6fbbb660b1	drm/amd: use clamp() in amdgpu_pll_get_fb_ref_div() When it needs to get a value within a certain interval, using clamp() makes the code easier to understand than min(max()). Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:53 -04:00
Peng Liu	2c7795e245	drm/amdgpu: enable gfxoff quirk on HP 705G4 Enabling gfxoff quirk results in perfectly usable graphical user interface on HP 705G4 DM with R5 2400G. Without the quirk, X server is completely unusable as every few seconds there is gpu reset due to ring gfx timeout. Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:45 -04:00
Peng Liu	0126c0ae11	drm/amdgpu: add raven1 gfxoff quirk Fix screen corruption with openkylin. Link: https://bbs.openkylin.top/t/topic/171497 Signed-off-by: Peng Liu <liupeng01@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:38:33 -04:00
Lang Yu	4453808d9e	drm/amdgpu: fix invalid fence handling in amdgpu_vm_tlb_flush CPU based update doesn't produce a fence, handle such cases properly. Fixes: `d8a3f0a034` ("drm/amdgpu: implement TLB flush fence") Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:50 -04:00
Christian König	4da5a95bf1	drm/amdgpu: re-work VM syncing Rework how VM operations synchronize to submissions. Provide an amdgpu_sync container to the backends instead of an reservation object and fill in the amdgpu_sync object in the higher layers of the code. No intended functional change, just prepares for upcomming changes. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-06 17:36:24 -04:00
Christian König	b2ef808786	drm/sched: add optional errno to drm_sched_start() The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com	2024-09-06 18:05:52 +02:00
Dmitry Baryshkov	ca097d4d94	drm/display: split DSC helpers from DP helpers Currently the DRM DSC functions are selected by the DRM_DISPLAY_DP_HELPER Kconfig symbol. This is not optimal, since the DSI code (both panel and host drivers) end up selecting the seemingly irrelevant DP helpers. Split the DSC code to be guarded by the separate DRM_DISPLAY_DSC_HELPER Kconfig symbol. Reviewed-by: Jessica Zhang <quic_jesszhan@quicinc.com> Reviewed-by: Marijn Suijten <marijn.suijten@somainline.org> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #i915 Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240704-panel-sw43408-fix-v6-1-3ea1c94bbb9b@linaro.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	2024-09-03 00:13:44 +03:00
Alex Deucher	4de34b0478	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> (cherry picked from commit `6c0a7c3c69`) Cc: stable@vger.kernel.org # 6.10.x	2024-09-02 13:08:51 -04:00
Jack Xiao	34c36a77f4	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `52491d97aa`)	2024-09-02 13:05:39 -04:00
Alex Deucher	ead60e9c4e	drm/amdgpu/gfx10: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:40 -04:00
Alex Deucher	3f2d35c325	drm/amdgpu/gfx11: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:38 -04:00
Alex Deucher	21818f39be	drm/amdgpu/gfx12: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:34 -04:00
Alex Deucher	f8eee864ba	drm/amdgpu/gfx12: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:31 -04:00
Alex Deucher	01d05521f7	drm/amdgpu/gfx11: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:28 -04:00
Alex Deucher	bcee4c3f89	drm/amdgpu/gfx10: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:24 -04:00
Alex Deucher	1a1995b1dc	drm/amdgpu/gfx12: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:22 -04:00
Alex Deucher	01163079e1	drm/amdgpu/gfx11: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:20 -04:00
Alex Deucher	4d5ddfa4b1	drm/amdgpu/gfx10: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:16 -04:00
Jiadong Zhu	178ad0e280	drm/amdgpu/mes11: implement mmio queue reset for gfx11 Implement queue reset for graphic and compute queue. v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode. v3: use gfx_v11_0_request_gfx_index_mutex() v4: fix mutex handling Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:13 -04:00
Jiadong Zhu	01b4ae38e5	drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmio The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:07 -04:00
Jiadong Zhu	8b2429a13f	drm/amdgpu/mes: modify mes api for mmio queue reset Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:03 -04:00
Alex Deucher	8fe4fde381	drm/amdgpu/gfx12: fallback to driver reset compute queue directly Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:41:01 -04:00
Alex Deucher	2480599890	drm/amdgpu/gfx12: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:58 -04:00
Alex Deucher	d1f2144321	drm/amdgpu/gfx10: rework reset sequence To match other GFX IPs. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:55 -04:00
Jiadong Zhu	097af47d3c	drm/amdgpu/gfx10: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder (Jessie) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:47 -04:00
Jiadong Zhu	2f3806f781	drm/amdgpu/gfx10: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. v2: fix up error handling (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:44 -04:00
Alex Deucher	1741281a15	drm/amdgpu/gfx10: add ring reset callbacks Add ring reset callbacks for gfx and compute. v2: fix gfx handling v3: wait for KIQ to complete Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:38 -04:00
Jiadong Zhu	a10c93931b	drm/amdgpu/gfx11: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:34 -04:00
Alex Deucher	7d8e9e65f2	drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue() Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:31 -04:00
Prike Liang	072b441478	drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2) Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:40:21 -04:00
Alex Deucher	b3e9bfd866	drm/amdgpu/gfx11: add ring reset callbacks Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:35:12 -04:00
Prike Liang	ad17b124c3	drm/amdgpu/gfx9.4.3: Implement compute pipe reset Implement the compute pipe reset, and the driver will fallback to pipe reset when queue reset fails. The pipe reset only deactivates the queue which is scheduled in the pipe, and meanwhile the MEC pipe will be reset to the firmware _start pointer. So, it seems pipe reset will cost more cycles than the queue reset; therefore, the driver tries to recover by doing queue reset first. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-09-02 11:32:32 -04:00
Christian Brauner	641bb4394f	fs: move FMODE_UNSIGNED_OFFSET to fop_flags This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-30 08:22:36 +02:00
Alex Deucher	6c0a7c3c69	drm/amdgpu: always allocate cleared VRAM for GEM allocations This adds allocation latency, but aligns better with user expectations. The latency should improve with the drm buddy clearing patches that Arun has been working on. In addition this fixes the high CPU spikes seen when doing wipe on release. v2: always set AMDGPU_GEM_CREATE_VRAM_CLEARED (Christian) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3528 Fixes: `a68c7eaa7a` ("drm/amdgpu: Enable clear page functionality") Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com>	2024-08-29 14:56:37 -04:00
Jack Xiao	52491d97aa	drm/amdgpu/mes: add mes mapping legacy queue switch For mes11 old firmware has issue to map legacy queue, add a flag to switch mes to map legacy queue. Fixes: `f9d8c5c785` ("drm/amdgpu/gfx: enable mes to map legacy queue support") Reported-by: Andrew Worsley <amworsley@gmail.com> Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:41:02 -04:00
Alex Deucher	1125f95cd2	drm/amdgpu/gfx12: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:57 -04:00
Alex Deucher	1e487c9173	drm/amdgpu/gfx11: return early in preempt_ib() When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:48 -04:00
Sunil Khatri	30e8f4c2bd	drm/amdgpu: Move the dumping log out of for loop log message "Dumping IP State Completed" needs to be logged only once when state dumping is complete. Hence moving it out of the for loop. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Trigger Huang <Trigger.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:19 -04:00
Victor Zhao	af76ca8e18	drm/amd/amdgpu: move drain_workqueue before shutdown is set [background] when unloading amdgpu driver right after running a workload, drain_workqueue is causing "Fence fallback timer expired on ring sdma0.0". Under sriov, this issue will cause sriov full access timeout and a reset happening. move drain_workqueue before shutdown is set to allow ih process and before enter full access under sriov to avoid full access time cost. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:07 -04:00
Trigger Huang	c67db6a6a6	drm/amdgpu: Do core dump immediately when job tmo Do the coredump immediately after a job timeout to get a closer representation of GPU's error status. V2: This will skip printing vram_lost as the GPU reset is not happened yet (Alex) V3: Unconditionally call the core dump as we care about all the reset functions(soft-recovery and queue reset and full adapter reset, Alex) V4: Do the dump after adev->job_hang = true (Sunil) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:39:00 -04:00
Trigger Huang	6122f5c72e	drm/amdgpu: skip printing vram_lost if needed The vm lost status can only be obtained after a GPU reset occurs, but sometimes a dev core dump can be happened before GPU reset. So a new argument is added to tell the dev core dump implementation whether to skip printing the vram_lost status in the dump. And this patch is also trying to decouple the core dump function from the GPU reset function, by replacing the argument amdgpu_reset_context with amdgpu_job to specify the context for core dump. V2: Inform user if VRAM lost check is skipped so users don't assume VRAM wasn't lost (Alex) Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:53 -04:00
Alex Deucher	7c1a2d8aba	drm/amdgpu/gfx9: put queue resets behind a debug option Pending extended validation. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:48 -04:00
Alex Deucher	a9b67c036c	drm/amdgpu: add experimental resets debug flag Add this flag to enable experimental resets for testing before they are fully validated. Reviewed-and-tested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-29 13:38:36 -04:00
Likun Gao	6d5064c379	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `875ff9a7ee`)	2024-08-28 10:05:54 -04:00
Alex Deucher	959fc102ff	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `40318a2406`)	2024-08-28 10:04:53 -04:00
Daniel Vetter	e55ef65510	amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZszhcQAKCRC93/aFa7yZ 2M4ZAQD+xgIJkQ9HISQeqER5GblnfrorARd32yP/BH0c+JbGUAD9H/BIB41teZ80 vw2WTx+4TyB39awgvtpDH8iEQdkcSAE= =717w -----END PGP SIGNATURE----- Merge tag 'amd-drm-next-6.12-2024-08-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.12-2024-08-26: amdgpu: - SDMA devcoredump support - DCN 4.0.1 updates - DC SUBVP fixes - Refactor OPP in DC - Refactor MMHUBBUB in DC - DC DML 2.1 updates - DC FAMS2 updates - RAS updates - GFX12 updates - VCN 4.0.3 updates - JPEG 4.0.3 updates - Enable wave kill (soft recovery) for compute queues - Clean up CP error interrupt handling - Enable CP bad opcode interrupts - VCN 4.x fixes - VCN 5.x fixes - GPU reset fixes - Fix vbios embedded EDID size handling - SMU 14.x updates - Misc code cleanups and spelling fixes - VCN devcoredump support - ISP MFD i2c support - DC vblank fixes - GFX 12 fixes - PSR fixes - Convert vbios embedded EDID to drm_edid - DCN 3.5 updates - DMCUB updates - Cursor fixes - Overdrive support for SMU 14.x - GFX CP padding optimizations - DCC fixes - DSC fixes - Preliminary per queue reset infrastructure - Initial per queue reset support for GFX 9 - Initial per queue reset support for GFX 7, 8 - DCN 3.2 fixes - DP MST fixes - SR-IOV fixes - GFX 9.4.3/4 devcoredump support - Add process isolation framework - Enable process isolation support for GFX 9.4.3/4 - Take IOMMU remapping into account for P2P DMA checks amdkfd: - CRIU fixes - Improved input validation for user queues - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Initial per queue reset support for GFX 9 - Allow users to target recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking UAPI: - KFD support for targetting queues on recommended SDMA engines Proposed userspace: `2f588a2406` `eb30a5bbc7` drm/buddy: - Add start address support for trim function From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826201528.55307-1-alexander.deucher@amd.com	2024-08-27 14:33:12 +02:00
Daniel Vetter	4461e9e5c3	Linux 6.11-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmbK2B8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFwkH/10QpUgzIfbFKbF+ 5hwcvaqS5myxWwJ4PjN0eR1qGE6RzVO0Tb24+TVql+7pxu+iWm1kYgC3+/T5xJsP ECAszdmPWSco1xaHrh2y3PyCJjaBiqFbIxdjPp7odjDpG9qarbcty8YpWs44u/gd RDXzHUuScEShBhEt0ZhvE1pIDL8jJ8JL3yqOMZ+XaDxtJbjaHw4GHp8efxlBWc8N jZKIVJi22q5NWG5T0tGtPWwzCm0ewA/JNMTEvE9leoSoAgO85NZ0ivxMC76q/tbj BrYk5KnzfhJs4b/n/KtIwWaLTgLyXKGqHMaMq8sbXtp410aUdgnRJO2cl3fI+1vc vxQfAfk= =RemI -----END PGP SIGNATURE----- Merge v6.11-rc5 into drm-next amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2024-08-27 14:09:45 +02:00
Xiaogang Chen	6ef29715ac	drm/amdkfd: Change kfd/svm page fault drain handling When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is advese to kfd performance on page faults recovery. This patch uses time stamp of incoming page fault to decide to drop or recover page fault. When app unmap vm ranges kfd records each gpu device's ih ring current time stamp. These time stamps are used at kfd page fault recovery routine. Any page fault happened on unmapped ranges after unmap events is application bug that accesses vm range after unmap. It is not driver work to cover that. By using time stamp of page fault do not need drain page faults at deferred work. So, the time period that kfd does not handle page faults is reduced and can be controlled. Signed-off-by: Xiaogang.Chen <Xiaogang.Chen@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:55:13 -04:00
Likun Gao	875ff9a7ee	drm/amdgpu: support for gc_info table v1.3 Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:54:57 -04:00
Yang Wang	4416377ae1	drm/amdgpu: add list empty check to avoid null pointer issue Add list empty check to avoid null pointer issues in some corner cases. - list_for_each_entry_safe() Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:45 -04:00
Alex Deucher	40318a2406	drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:25 -04:00
Hawking Zhang	b05d6476ae	drm/amdgpu: Retire query_utcl2_poison_status callback Driver switches to interrupt source id to identify utcl2 poison event. polling interface is not needed. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:16 -04:00
Rahul Jain	75f0efbc4b	drm/amdgpu: Take IOMMU remapping into account for p2p checks when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remapping DMA v5: (Felix, Alex) - fixing comment as per Alex feedback - refactor code as per Felix v4: (Alex) - fix the comment and description v3: - remove iommu_remap variable v2: (Alex) - Fix as per review comments - add new function amdgpu_device_check_iommu_remap to check if iommu remap Signed-off-by: Rahul Jain <Rahul.Jain@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-23 10:53:05 -04:00
Alex Deucher	9cead81eff	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com> (cherry picked from commit `c69b07f7bb`) Cc: stable@vger.kernel.org # 6.10.x	2024-08-20 23:07:11 -04:00
Candice Li	c99769bcea	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c0a04e3570`) Cc: stable@vger.kernel.org	2024-08-20 23:04:17 -04:00
Alex Deucher	e3e4bf58ba	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2dc3851ef7`) Cc: stable@vger.kernel.org	2024-08-20 22:51:37 -04:00
Yang Wang	0b43312902	drm/amdgpu: fixing rlc firmware loading failure issue Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit `849e133c97` ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: `3af2c80ae2` ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `89ec85d16e`)	2024-08-20 22:51:31 -04:00
Alex Deucher	88c511dea1	drm/amd/gfx11: move the gfx mutex into the caller Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: `76acba7b7f` ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Victor Zhao	bf2bc61638	drm/amd/amdgpu: allow use kiq to do hdp flush under sriov when use cpu to do page table update under sriov runtime, since mmio access is blocked, kiq has to be used to flush hdp. change WREG32_NO_KIQ to WREG32 to allow kiq. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:14 -04:00
Alex Deucher	c69b07f7bb	drm/amdgpu: fix eGPU hotplug regression The driver needs to wait for the on board firmware to finish its initialization before probing the card. Commit `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") switched from using msleep() to using usleep_range() which seems to have caused init failures on some navi1x boards. Switch back to msleep(). Fixes: `959056982a` ("drm/amdgpu: Fix discovery initialization failure during pci rescan") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3559 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3500 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Ma Jun <Jun.Ma2@amd.com>	2024-08-20 22:14:14 -04:00
Candice Li	c0a04e3570	drm/amdgpu: Validate TA binary size Add TA binary size validation to avoid OOB write. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:13 -04:00
Mukul Joshi	ccf8ef6b75	drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11 Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:14:04 -04:00
Srinivasan Shanmugam	f846250b8a	drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. The commit also includes a check for the type of the ring. If the type of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the `enforce_isolation` structure in the `gfx` structure of the `amdgpu_device` is set to the `xcp_id` of the ring. This ensures that the correct `xcp_id` is used when enforcing isolation on compute rings. The `xcp_id` is an identifier for an XCP partition, and different rings can be associated with different XCP partitions. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:08:02 -04:00
Srinivasan Shanmugam	b710dbe55d	drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:07:58 -04:00
Srinivasan Shanmugam	afefd6f245	drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fences, it schedules the `enforce_isolation_work` to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences, it signals the Kernel Fusion Driver (KFD) to resume the runqueue. The function is synchronized using the `enforce_isolation_mutex`. This commit also introduces a reference count mechanism (kfd_sch_req_count) to keep track of the number of requests to enable the KFD scheduler. When a request to enable the KFD scheduler is made, the reference count is decremented. When the reference count reaches zero, a delayed work is scheduled to enforce isolation after a delay of GFX_SLICE_PERIOD. When a request to disable the KFD scheduler is made, the function first checks if the reference count is zero. If it is, it cancels the delayed work for enforcing isolation and checks if the KFD scheduler is active. If the KFD scheduler is active, it sends a request to stop the KFD scheduler and sets the KFD scheduler state to inactive. Then, it increments the reference count. The function is synchronized using the kfd_sch_mutex to ensure that the KFD scheduler state and reference count are updated atomically. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:35 -04:00
Amber Lin	234eebe161	drm/amdkfd: APIs to stop/start KFD scheduling Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added but those queues won't be mapped to runlist (so not scheduled) until amdgpu_amdkfd_start_sched is called. v2: fix build (Alex) Signed-off-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:28 -04:00
Srinivasan Shanmugam	b1f49ff9cb	drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware This commit extends the cleaner shader feature to support GFX9.4.4 hardware. The cleaner shader feature is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This operation needs to be performed in isolation, while no other tasks should be running on the GPU at the same time. Previously, the cleaner shader feature was implemented for GFX9.4.3 hardware. This commit adds support for GFX9.4.4 hardware by allowing the cleaner shader to be used with this hardware version. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:23 -04:00
Srinivasan Shanmugam	335288315a	drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3 This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_9_4_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps improve GPU performance and resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. v2: fix copyright date (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:16 -04:00
Srinivasan Shanmugam	d4c3815495	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware The patch modifies the gfx_v9_4_3_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_4_3 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_4_3_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:11 -04:00
Srinivasan Shanmugam	c2e70d307f	drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware The patch modifies the gfx_v9_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v9_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v9_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:07 -04:00
Srinivasan Shanmugam	22ff907d4f	drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet is a command packet used to instruct the GPU to execute the cleaner shader. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. The PACKET3_RUN_CLEANER_SHADER packet is used to trigger the execution of the cleaner shader on the GPU. The packet consists of a header followed by a RESERVED field, which is programmed to zero. When the GPU receives this packet, it fetches and executes the cleaner shader instructions from the location specified in the packet. The cleaner shader feature helps to enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:07:02 -04:00
Srinivasan Shanmugam	d361ad5d2f	drm/amdgpu: Add sysfs interface for running cleaner shader This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_cleaner_shader`. Write the number of the partition to this file to trigger the cleaner shader on that partition. There is only one partition on GPUs which do not support partitioning. Changes made in this patch: - Added `amdgpu_set_run_cleaner_shader` function to handle writes to the `run_cleaner_shader` sysfs file. - Added `run_cleaner_shader` to the list of device attributes in `amdgpu_device_attrs`. - Updated `default_attr_update` to handle `run_cleaner_shader`. - Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device attributes. v2: fix error handling (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>	2024-08-20 22:06:56 -04:00
Srinivasan Shanmugam	e189be9b2e	drm/amdgpu: Add enforce_isolation sysfs attribute This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for each ring. When it's disabled, the reserved VMIDs are freed. The set function locks a mutex before changing the 'enforce_isolation' flag and the VMIDs, and unlocks it afterwards. This ensures that these operations are atomic and prevents race conditions and other concurrency issues. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-20 22:06:52 -04:00
Srinivasan Shanmugam	dba1a6cfc3	drm/amdgpu: Enforce isolation as part of the job This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing the VM. The enforce_isolation field of the amdgpu_job structure is set directly after the job is allocated This change allows more fine-grained control over shader isolation, making it possible to enforce isolation on a per-job basis rather than globally. This can be useful in scenarios where only certain jobs require isolation. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-20 22:06:43 -04:00
Victor Skvortsov	19cff16559	drm/amdgpu: abort KIQ waits when there is a pending reset Stop waiting for the KIQ to return back when there is a reset pending. It's quite likely that the KIQ will never response. Signed-off-by: Koenig Christian <Christian.Koenig@amd.com> Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com> Tested-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:50 -04:00
Srinivasan Shanmugam	9659520419	drm/amdgpu: Make enforce_isolation setting per GPU This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU is used to determine whether to enforce isolation between graphics and compute processes on that GPU. In amdgpu_ids.c, the adev->enforce_isolation value for the current GPU and partition is used to determine whether to enforce isolation between graphics and compute processes on that GPU and partition. This allows the enforce_isolation setting to be controlled individually for each GPU and each partition, which is useful in a system with multiple GPUs and partitions where different isolation settings might be desired for different GPUs and partitions. v2: fix loop in amdgpu_vmid_mgr_init() (Alex) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:45 -04:00
Alex Deucher	ee7a846ea2	drm/amdgpu: Emit cleaner shader at end of IB submission This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during the VM flush process. The cleaner shader is only emitted if the `enable_cleaner_shader` flag is set in the `amdgpu_device` structure. This allows the cleaner shader emission to be controlled on a per-device basis. By emitting a cleaner shader at the end of the IB submission, we can ensure that the VM state is properly cleaned up after each submission. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:42 -04:00
Srinivasan Shanmugam	aec773a1fb	drm/amdgpu: Add infrastructure for Cleaner Shader feature The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Christian König <christian.koenig@amd.com>	2024-08-16 14:27:34 -04:00
Alex Deucher	f49280ffd2	drm/amdgpu: handle enforce isolation on non-0 gfxhub Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:28 -04:00
Alex Deucher	2dc3851ef7	drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1 The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: `a03ebf1163` ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:27:04 -04:00
Sunil Khatri	1a2103d685	drm/amdgpu: add vcn ip dump support for vcn_v2_6 Add support for logging the registers in devcoredump buffer for vcn_v2_6. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:57 -04:00
Sunil Khatri	bc62abe1b9	drm/amdgpu: add print support for vcn_v2_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:52 -04:00
Sunil Khatri	0eea81ee2e	drm/amdgpu: add vcn_v2_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:47 -04:00
Sunil Khatri	b910cacb4e	drm/amdgpu: add print support for vcn_v2_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:41 -04:00
Sunil Khatri	2239aaa204	drm/amdgpu: add vcn_v2_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v2_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:36 -04:00
Sunil Khatri	ef9f3b5fd9	drm/amdgpu: add print support for vcn_v1_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:31 -04:00
Sunil Khatri	837cc7f1bf	drm/amdgpu: add vcn_v1_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v1_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:25 -04:00
Sunil Khatri	439c3b124e	drm/amdgpu: add print support for vcn_v4_0_5 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:19 -04:00
Sunil Khatri	3a50a51d04	drm/amdgpu: add print support for vcn_v4_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:12 -04:00
Sunil Khatri	dc57edda81	drm/amdgpu: add print support for vcn_v4_0_3 ip dump Add support for logging the registers in devcoredump buffer for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:26:06 -04:00
Sunil Khatri	46553db49c	drm/amdgpu: add vcn_v4_0_5 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_5. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:59 -04:00
Sunil Khatri	9d87dac3f9	drm/amdgpu: add vcn_v4_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:53 -04:00
Sunil Khatri	8962915044	drm/amdgpu: add vcn_v4_0_3 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v4_0_3. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:30 -04:00
Sunil Khatri	f3c958ab85	drm/amdgpu: add print support for vcn_v5_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:12 -04:00
Alex Deucher	32aada4d0a	drm/amdgpu/mes12: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:09 -04:00
Alex Deucher	d4f1fde734	drm/amdgpu/mes11: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:06 -04:00
Alex Deucher	5b7a59de48	drm/amdgpu/mes: add API for user queue reset Add API for resetting user queues. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:25:02 -04:00
Alex Deucher	478efcb90b	drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex() It will be used by the queue reset code. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:56 -04:00
Alex Deucher	76acba7b7f	drm/amdgpu/gfx11: add a mutex for the gfx semaphore This will be used in more places in the future so add a mutex. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:33 -04:00
Alex Deucher	b5be054c58	drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL Need to enter safe mode before touching GC MMIO. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:23 -04:00
Alex Deucher	d479158f65	drm/amdgpu/gfx7: add ring reset callback for gfx Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:18 -04:00
Alex Deucher	4af8071b65	drm/amdgpu/gfx8: add ring reset callback for gfx Add ring reset callback for gfx. v2: fix operator precedence (kernel test robot) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:09 -04:00
Sunil Khatri	f685b38455	drm/amdgpu: add vcn_v5_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v5_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:24:04 -04:00
Sunil Khatri	6d88c0f94a	drm/amdgpu: add print support for vcn_v3_0 ip dump Add support for logging the registers in devcoredump buffer for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:59 -04:00
Sunil Khatri	ab10f77487	drm/amdgpu: add vcn_v3_0 ip dump support Add support of vcn ip dump in the devcoredump for vcn_v3_0. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:53 -04:00
Sunil Khatri	27a74c125d	drm/amdgpu: add vcn ip dump ptr in vcn global struct Add pointer to the vcn ip dump in the vcn global structure to be accessible for all vcn version via global adev. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:45 -04:00
Zhang Zekun	20588d5afc	drm/amd: Remove unused declarations amdgpu_gart_table_vram_pin() and amdgpu_gart_table_vram_unpin() has been removed since commit `575e55ee4f` ("drm/amdgpu: recover gart table at resume") remain the declarations untouched in the header files. Besides, amdgpu_dm_display_resume() has also beed removed since commit `a80aa93de1` ("drm/amd/display: Unify dm resume sequence into a single call"). So, let's remove this unused declarations. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:23:16 -04:00
Yang Wang	89ec85d16e	drm/amdgpu: fixing rlc firmware loading failure issue Skip rlc firmware validation to ignore firmware header size mismatch issues. This restores the workaround added in commit `849e133c97` ("drm/amdgpu: Fix the null pointer when load rlc firmware") Fixes: `3af2c80ae2` ("drm/amdgpu: refine gfx10 firmware loading") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3551 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:19:16 -04:00
Sunil Khatri	0f2c243dbf	drm/amdgpu: remove ME0 registers from mi300 dump Remove ME0 registers from MI300 gfx_9_4_3 ipdump MI300 does not have gfx ME and hence those register are just empty one and could be dropped. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:54 -04:00
Alex Deucher	3ec2ad7c34	drm/amdgpu/gfx9: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:51 -04:00
Alex Deucher	d082e5cde4	drm/amdgpu/gfx9.4.3: use rlc safe mode for soft recovery Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:46 -04:00
Alex Deucher	a48f31fb78	drm/amdgpu/gfx9.4.3: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:44 -04:00
Alex Deucher	27ef61f961	drm/amdgpu/gfx9: use proper rlc safe mode helpers Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:41 -04:00
Alex Deucher	c4f503551f	drm/amdgpu/gfx9: add ring reset callback for gfx Add ring reset callback for gfx. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:38 -04:00
Alex Deucher	31ef969301	drm/amdgpu/gfx9: per queue reset only on bare metal It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:35 -04:00
Jiadong Zhu	4dc4422f11	drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3 Using mmio to do queue reset. Enter safe mode before writing mmio registers. v2: set register instance offset according to xcc id. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:30 -04:00
Jiadong Zhu	2e9bbdd7b7	drm/amdgpu/gfx9: implement reset_hw_queue for gfx9 Using mmio to do queue reset. Enter safe mode when writing registers. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:27 -04:00
Jiadong Zhu	186020c166	drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue Add reset_hw_queue in kiq_pm4_funcs callbacks. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:25 -04:00
Jiadong Zhu	4c953e53cc	drm/amdgpu/gfx_9.4.3: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:22 -04:00
Jiadong Zhu	6f38589e17	drm/amdgpu/gfx9.4.3: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:19 -04:00
Alex Deucher	5d0112f777	drm/amdgpu/gfx9.4.3: add ring reset callback Add ring reset callback for compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:14 -04:00
Jiadong Zhu	fdbd69486b	drm/amdgpu/gfx9: wait for reset done before remap There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) v3: fix KIQ locking harder Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:06 -04:00
Jiadong Zhu	b5e1a3874f	drm/amdgpu/gfx9: remap queue after reset successfully Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:03 -04:00
Alex Deucher	5fb4d2a771	drm/amdgpu/gfx9: add ring reset callback Add ring reset callback for compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:18:00 -04:00
Prike Liang	fb0a5834a3	drm/amdgpu: increase the reset counter for the queue reset Update the reset counter for the amdgpu_cs_query_reset_state() Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:56 -04:00
Alex Deucher	15789fa0f0	drm/amdgpu: add per ring reset support (v5) If a specific job is hung, try and reset just the ring associated with the job. v2: move to amdgpu_job.c v3: fix drm_sched_stop() handling when ring reset fails v4: drop unnecessary amdgpu_fence_driver_clear_job_fences() and drm_sched_increase_karma() v5: rework sched_stop handling Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:52 -04:00
Alex Deucher	57a372f676	drm/amdgpu: add new ring reset callback Use this to reset just a single ring. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:40 -04:00
Soham Dandapat	406792dc2a	drm/amdgpu: Return earlier in amdgpu_sw_ring_ib_end if mcbp is off As we don't trigger preemption is sw ring muxer when mcbp is disabled,so return earlier in amdgpu_sw_ring_ib_end function if mcbp is disabled ,not required to call amdgpu_ring_mux_end_ib Signed-off-by: Soham Dandapat <sdandapa@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:31 -04:00
Sunil Khatri	37ee145623	drm/amdgpu: add cp queue registers print for gfx9_4_3 Add gfx9_4_3 print support of CP queue registers for all queues to be used by devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:26 -04:00
Sunil Khatri	f9e491c863	drm/amdgpu: add cp queue registers for gfx9_4_3 ipdump Add gfx9 support of CP queue registers for all queues to be used by devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-16 14:17:14 -04:00
Thomas Zimmermann	ddda6542c8	drm/amdgpu: Use backlight power constants Replace FB_BLANK_ constants with their counterparts from the backlight subsystem. The values are identical, so there's no change in functionality or semantics. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240731122311.1143153-2-tzimmermann@suse.de	2024-08-16 09:27:15 +02:00
Kenneth Feng	23acd1f344	drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1 add HDP_SD support on gc 12.0.0/1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `61cffacb3a`)	2024-08-13 13:20:43 -04:00
Yinjie Yao	507a2286c0	drm/amdgpu: Update kmd_fw_shared for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aa02486fb1`)	2024-08-13 13:20:36 -04:00
David (Ming Qiang) Wu	470516c292	drm/amd/amdgpu: command submission parser for JPEG Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a7f670d5d8`) Cc: stable@vger.kernel.org	2024-08-13 13:17:36 -04:00
Jack Xiao	4246b1077f	drm/amdgpu/mes12: fix suspend issue Use mes pipe to unmap kcq and kgq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f7fb9d677f`)	2024-08-13 13:17:36 -04:00
Jack Xiao	af401543df	drm/amdgpu/mes12: sw/hw fini for unified mes Free memory for two pipes and unmap pipe0 via pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `98cae695a8`)	2024-08-13 13:17:36 -04:00
Jack Xiao	7254027e1e	drm/amdgpu/mes12: configure two pipes hardware resources Configure two pipes with different hardware resources. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ea5d6db17a`)	2024-08-13 13:17:36 -04:00
Jack Xiao	1097727d6d	drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes Adjust mes12 sw/hw initiailization for both pipe0 and pipe1 enablement. The two pipes are almost identical pipe. Pipe0 behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aa539da8af`)	2024-08-13 13:17:36 -04:00
Jack Xiao	3738a7f0dd	drm/amdgpu/mes12: add mes pipe switch support Add mes pipe switch to let caller choose pipe to submit packet. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b2dee0837a`)	2024-08-13 13:17:36 -04:00
Jack Xiao	a13d91bf3c	drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1 Enable unified mes firmware to load on pipe0 and pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e69c2dd753`)	2024-08-13 13:17:36 -04:00
Jack Xiao	2029b3d7e1	drm/amdgpu/mes: add multiple mes ring instances support Add multiple mes ring instances in mes structure to support multiple mes pipes. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c7d4355648`)	2024-08-13 13:04:48 -04:00
Bas Nieuwenhuizen	0573a1e2ea	drm/amdgpu: Actually check flags for all context ops. Missing validation ... Checked libdrm and it clears all the structs, so we should be safe to just check everything. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c6b86421f1`) Cc: stable@vger.kernel.org	2024-08-13 13:03:57 -04:00
Alex Deucher	e6c6bd6253	drm/amdgpu/jpeg4: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c6c2e8b6a4`) Cc: stable@vger.kernel.org	2024-08-13 13:03:11 -04:00
Alex Deucher	e414a304f2	drm/amdgpu/jpeg2: properly set atomics vmid field This needs to be set as well if the IB uses atomics. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `35c628774e`) Cc: stable@vger.kernel.org	2024-08-13 13:02:48 -04:00
Jack Xiao	11752c013f	drm/amdgpu/mes: fix mes ring buffer overflow wait memory room until enough before writing mes packets to avoid ring buffer overflow. v2: squash in sched_hw_submission fix Fixes: `de32462541` ("drm/amdgpu: cleanup MES11 command submission") Fixes: `fffe347e14` ("drm/amdgpu: cleanup MES12 command submission") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `34e087e892`) Cc: stable@vger.kernel.org	2024-08-13 12:50:01 -04:00
Sunil Khatri	b232c4a63a	drm/amdgpu: add print support for gfx9_4_3 ipdump Add support of gfx9_4_3 ipdump print so devcoredump could trigger it to dump the captured registers in devcoredump. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Sunil Khatri	1091796fb1	drm/amdgpu: add gfx9_4_3 register support in ipdump Add general registers of gfx9_4_3 in ipdump for devcoredump support. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu	6a28a072d9	drm/amd/amdgpu: cleanup parse_cs callbacks Because gpu_addr is updated in the calling routine (amdgpu_cs_patch_ibs()),it is removed in the callback. Use .patch_cs_in_place instead of .parse_cs for amdgpu_vce_ring_parse_cs_vm() as there is no need for keeping a temporary IB, therefore ib->sa_bo is NULL and amdgpu_ib_free() is removed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
David (Ming Qiang) Wu	a7f670d5d8	drm/amd/amdgpu: command submission parser for JPEG Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	f7fb9d677f	drm/amdgpu/mes12: fix suspend issue Use mes pipe to unmap kcq and kgq. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	98cae695a8	drm/amdgpu/mes12: sw/hw fini for unified mes Free memory for two pipes and unmap pipe0 via pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:13:03 -04:00
Jack Xiao	ea5d6db17a	drm/amdgpu/mes12: configure two pipes hardware resources Configure two pipes with different hardware resources. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	aa539da8af	drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes Adjust mes12 sw/hw initiailization for both pipe0 and pipe1 enablement. The two pipes are almost identical pipe. Pipe0 behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	b2dee0837a	drm/amdgpu/mes12: add mes pipe switch support Add mes pipe switch to let caller choose pipe to submit packet. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	9e823f3070	drm/amdgpu: Block MMR_READ IOCTL in reset Register access from userspace should be blocked until reset is complete. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jonathan Kim	a85c3db6b3	drm/amdkfd: fallback to pipe reset on queue reset fail for gfx9 If queue reset fails, tell the CP to reset the pipe. Since queues multiplex context per pipe and we've issued a device wide preemption prior to the hang, we can assume the hung pipe only has one queue to reset on pipe reset. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Lijo Lazar	9c081c11c6	drm/amdgpu: Reorder to read EFI exported ROM first On EFI BIOSes, PCI ROM may be exported through EFI_PCI_IO_PROTOCOL and expansion ROM BARs may not be enabled. Choose to read from EFI exported ROM data before reading PCI Expansion ROM BAR. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Jack Xiao	e69c2dd753	drm/amdgpu/mes12: load unified mes fw on pipe0 and pipe1 Enable unified mes firmware to load on pipe0 and pipe1. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	f83cec3b3a	drm/amdgpu: Disable dpm_enabled flag while VF is in reset VFs do not perform HW fini/suspend in FLR, so the dpm_enabled is incorrectly kept enabled. Add interface to disable it in virt_pre_reset call. v2: Made implementation generic for all asics v3: Re-order conditionals so PP_MP1_STATE_FLR is only evaluated on VF Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Victor Skvortsov	35c7152202	Revert "drm/amdgpu: Extend KIQ reg polling wait for VF" KIQ timeouts no longer seen. This reverts commit `3a19a8af64`. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Yinjie Yao	aa02486fb1	drm/amdgpu: Update kmd_fw_shared for VCN5 kmd_fw_shared changed in VCN5 Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:52 -04:00
Kenneth Feng	61cffacb3a	drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1 add HDP_SD support on gc 12.0.0/1 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:51 -04:00
Victor Zhao	ef6c2cb349	drm/amd/sriov: extend NV_MAILBOX_POLL_MSG_TIMEDOUT on MI300/MI308 UBB products, when doing mode1 reset, since 1 gpu need to wait all 8 gpus finish mode1 reset and then do re-init. As observed, sometimes the gpu which triggered the reset need to wait 15s for all gpus to finish. If poll msg timeout, guest driver will send the reset message again, and may mess up the following reinit sequence on other gpus. So extend the time to cover the maximum time needed to recover. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 12:12:51 -04:00
Sunil Khatri	9b7e697839	drm/amdgpu: fix ptr check warning in gfx12 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:51 -04:00
Sunil Khatri	bd15f805cd	drm/amdgpu: fix ptr check warning in gfx11 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:31 -04:00
Sunil Khatri	98df5a7732	drm/amdgpu: fix ptr check warning in gfx10 ip_dump Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2024-08-13 10:34:23 -04:00

... 12 13 14 15 16 ...

15900 Commits